Next: , Previous: HTTP and CGI, Up: Textual Conversion Packages


4.9 Parsing HTML

(require 'html-for-each)

— Function: html-for-each file word-proc markup-proc white-proc newline-proc

file is an input port or a string naming an existing file containing HTML text. word-proc is a procedure of one argument or #f. markup-proc is a procedure of one argument or #f. white-proc is a procedure of one argument or #f. newline-proc is a procedure of no arguments or #f.

html-for-each opens and reads characters from port file or the file named by string file. Sequential groups of characters are assembled into strings which are either

Procedures are called according to these distinctions in order of the string's occurrence in file.

newline-proc is called with no arguments for end-of-line not within a markup or comment.

white-proc is called with strings of non-newline whitespace.

markup-proc is called with hypertext markup strings (including ‘<’ and ‘>’).

word-proc is called with the remaining strings.

html-for-each returns an unspecified value.

— Function: html:read-title file limit
— Function: html:read-title file

file is an input port or a string naming an existing file containing HTML text. If supplied, limit must be an integer. limit defaults to 1000.

html:read-title opens and reads HTML from port file or the file named by string file, until reaching the (mandatory) ‘TITLE’ field. html:read-title returns the title string with adjacent whitespaces collapsed to one space. html:read-title returns #f if the title field is empty, absent, if the first character read from file is not ‘#\<’, or if the end of title is not found within the first (approximately) limit words.

— Function: htm-fields htm

htm is a hypertext markup string.

If htm is a (hypertext) comment or DTD, then htm-fields returns #f. Otherwise htm-fields returns the hypertext element string consed onto an association list of the attribute name-symbols and values. If the tag ends with "/>", then "/" is appended to the hypertext element string. The name-symbols are created by string-ci->symbol. Each value is a string; or #t if the name had no value assigned within the markup.