[Prev][Next][Index][Thread]

Extracting/Parsing HTML information



I'm currently doing a bit of extracting and manipulating data from
HTML files. So I'm doing lots of searching for various tags, looping
through tables to collect the raw data into Dylan objects and
collections. So far I've been using code like:

  let page = [...html page as a string...];
  let look-for = "<tr><td align=center><font size=-1><b>";
  let found = #t;
  while(found)
    found := subsequence-position(page, look-for);
    if(found)
      let (num, pos) = 
         string-to-integer(page, start: found + look-for.size);
      [...etc...]

Lots of code like that with different strings to search for. I'm
interested in any other approaches I could take that would be a bit
easier to code/maintain. Any other people doing similar tasks? Any
interesting Dylan libraries or uses of Dylan constructs for this sort
of thing?

Chris.
-- 
http://www.double.co.nz/dylan