Next: , Previous: , Up: Textual Conversion Packages   [Contents][Index]

4.9 Parsing HTML

(require 'html-for-each)

Function: html-for-each file word-proc markup-proc white-proc newline-proc

file is an input port or a string naming an existing file containing HTML text. word-proc is a procedure of one argument or #f. markup-proc is a procedure of one argument or #f. white-proc is a procedure of one argument or #f. newline-proc is a procedure of no arguments or #f.

html-for-each opens and reads characters from port file or the file named by string file. Sequential groups of characters are assembled into strings which are either

  • enclosed by ‘<’ and ‘>’ (hypertext markups or comments);
  • end-of-line;
  • whitespace; or
  • none of the above (words).

Procedures are called according to these distinctions in order of the string’s occurrence in file.

newline-proc is called with no arguments for end-of-line not within a markup or comment.

white-proc is called with strings of non-newline whitespace.

markup-proc is called with hypertext markup strings (including ‘<’ and ‘>’).

word-proc is called with the remaining strings.

html-for-each returns an unspecified value.

Function: html:read-title file limit
Function: html:read-title file

file is an input port or a string naming an existing file containing HTML text. If supplied, limit must be an integer. limit defaults to 1000.

html:read-title opens and reads HTML from port file or the file named by string file, until reaching the (mandatory) ‘TITLE’ field. html:read-title returns the title string with adjacent whitespaces collapsed to one space. html:read-title returns #f if the title field is empty, absent, if the first character read from file is not ‘#\<’, or if the end of title is not found within the first (approximately) limit words.

Function: htm-fields htm

htm is a hypertext markup string.

If htm is a (hypertext) comment or DTD, then htm-fields returns #f. Otherwise htm-fields returns the hypertext element string consed onto an association list of the attribute name-symbols and values. If the tag ends with "/>", then "/" is appended to the hypertext element string. The name-symbols are created by string-ci->symbol. Each value is a string; or #t if the name had no value assigned within the markup.


Next: , Previous: , Up: Textual Conversion Packages   [Contents][Index]