Extracting/Parsing HTML information

To: info-dylan@ai.mit.edu
Subject: Extracting/Parsing HTML information
From: Chris Double <chris@double.co.nz>
Date: Sat, 14 Oct 2000 20:15:02 -0400 (EDT)
Organization: None.
Sender: unknown@double.mit.edu
User-Agent: Gnus/5.070099 (Pterodactyl Gnus v0.99) Emacs/20.6
Xref: traf.lcs.mit.edu comp.lang.dylan:12742

I'm currently doing a bit of extracting and manipulating data from
HTML files. So I'm doing lots of searching for various tags, looping
through tables to collect the raw data into Dylan objects and
collections. So far I've been using code like:

  let page = [...html page as a string...];
  let look-for = "<tr><td align=center><font size=-1><b>";
  let found = #t;
  while(found)
    found := subsequence-position(page, look-for);
    if(found)
      let (num, pos) = 
         string-to-integer(page, start: found + look-for.size);
      [...etc...]

Lots of code like that with different strings to search for. I'm
interested in any other approaches I could take that would be a bit
easier to code/maintain. Any other people doing similar tasks? Any
interesting Dylan libraries or uses of Dylan constructs for this sort
of thing?

Chris.
-- 
http://www.double.co.nz/dylan

Prev: SSL Sockets, HTTP Client support libraries for FD available
Next: Re: Currency and Dylan
Index(es):
- Main
- Thread