Next: , Previous: Ruleset Definition and Use, Up: Precedence Parsing


4.1.4 Token definition

— Function: tok:char-group group chars chars-proc

The argument chars may be a single character, a list of characters, or a string. Each character in chars is treated as though tok:char-group was called with that character alone.

The argument chars-proc must be a procedure of one argument, a list of characters. After tokenize has finished accumulating the characters for a token, it calls chars-proc with the list of characters. The value returned is the token which tokenize returns.

The argument group may be an exact integer or a procedure of one character argument. The following discussion concerns the treatment which the tokenizing routine, tokenize, will accord to characters on the basis of their groups.

When group is a non-zero integer, characters whose group number is equal to or exactly one less than group will continue to accumulate. Any other character causes the accumulation to stop (until a new token is to be read).

The group of zero is special. These characters are ignored when parsed pending a token, and stop the accumulation of token characters when the accumulation has already begun. Whitespace characters are usually put in group 0.

If group is a procedure, then, when triggerd by the occurence of an initial (no accumulation) chars character, this procedure will be repeatedly called with each successive character from the input stream until the group procedure returns a non-false value.

The following convenient constants are provided for use with tok:char-group.

— Constant: tok:decimal-digits

Is the string "0123456789".

— Constant: tok:upper-case

Is the string consisting of all upper-case letters ("ABCDEFGHIJKLMNOPQRSTUVWXYZ").

— Constant: tok:lower-case

Is the string consisting of all lower-case letters ("abcdefghijklmnopqrstuvwxyz").

— Constant: tok:whitespaces

Is the string consisting of all characters between 0 and 255 for which char-whitespace? returns true.