[Prev][Next][Index][Thread]

Re: Extracting/Parsing HTML information



Chris and I are working on releasing my HTML and XML parsers.
You can use them as event-based or DOM-based tools.  They
can both take DOM trees and "print" them as well.

I've just sent my code along to Chris to see if it meets his need.
If it does, we will release it either through Fun-O or another web
site with which I am presently working.

James wrote in message <39EA243E.D6041DBF@james.com>...
>Hi Chris,
>   First, I just wanted to say thanks for all your hard work. I think
>everyone who frequents this group appreciates it. As for what you are
>doing, I have always had an interest in working with html and dylan and
>have tried a few different approaches. I think what you are doing works
>well for certain formatted html pages, but my guess would be that it
>starts to break down when a person adds whitespace for formatting html
>pages or uses quotes or case differently than what you are searching
>for. ...
>efficient means of producing a DOM tree would be nice. If you can
>produce this tree, then I think finding matches to what you are looking
>for would just be a matter of traversing through the DOM looking for the
>first html object then checking to see it contains the second, and so
>on. I think this would work well, but my attempts didn't do the theory
justice.
>    Does this seem like a good approach to you? I have seen some other
>people express interest in using an xml parser for creating DOM trees
>for html pages. That seems to be the best way to give dylan and






References: