(require 'precedence-parse) or (require 'parse)
This package implements:
This package offers improvements over previous parsers.
? is substituted for
missing input.
The notion of binding power may be unfamiliar to those accustomed to BNF grammars.
When two consecutive objects are parsed, the first might be the prefix to the second, or the second might be a suffix of the first. Comparing the left and right binding powers of the two objects decides which way to interpret them.
Objects at each level of syntactic grouping have binding powers.
A syntax tree is not built unless the rules explicitly do so. The call graph of grammar rules effectively instantiate the sytnax tree.
The JACAL symbolic math system (http://people.csail.mit.edu/jaffer/JACAL) uses precedence-parse. Its grammar definitions in the file `jacal/English.scm' can serve as examples of use.
Here are the higher-level syntax types and an example of each. Precedence considerations are omitted for clarity. See section Grammar Rule Definition for full details.
bye
calls the function exit with no arguments.
- 42
Calls the function negate with the argument 42.
x - y
Calls the function difference with arguments x and y.
x + y + z
Calls the function sum with arguments x, y, and
y.
5 !
Calls the function factorial with the argument 5.
set foo bar
Calls the function set! with the arguments foo and
bar.
/* almost any text here */
Ignores the comment delimited by /* and */.
{0, 1, 2}
Calls the function list with the arguments 0, 1,
and 2.
f(x, y)
Calls the function funcall with the arguments f, x,
and y.
set foo bar;
delimits the extent of the restfix operator set.
prec:define-grammar.
The rules are appended to *syn-defs*. The value of
*syn-defs* is the grammar suitable for passing as an argument to
prec:parse.
*syn-ignore-whitespace*
In order to start defining a grammar, either
(set! *syn-defs* '())
or
(set! *syn-defs* *syn-ignore-whitespace*)
prec:define-grammar is used to define both the character classes
and rules for tokens.
Once your grammar is defined, save the value of *syn-defs* in a
variable (for use when calling prec:parse).
(define my-ruleset *syn-defs*)
prec:define-grammar and extracted from *syn-defs*.
The token delim may be a character, symbol, or string. A character delim argument will match only a character token; i.e. a character for which no token-group is assigned. A symbol or string will match only a token string; i.e. a token resulting from a token group.
prec:parse reads a ruleset grammar expression delimited
by delim from the given input port. prec:parse
returns the next object parsable from the given input port,
updating port to point to the first character past the end of the
external representation of the object.
For the purpose of reporting problems in error messages, this package
keeps track of the current column. Its initial value is passed
as the third argument to prec:parse.
If an end of file is encountered in the input before any characters are
found that can begin an object, then an end of file object is returned.
If a delimiter (such as delim) is found before any characters are
found that can begin an object, then #f is returned.
The port argument may be omitted, in which case it defaults to the
value returned by current-input-port. It is an error to parse
from a closed port.
tok:char-group was called with that character alone.
The argument chars-proc must be a procedure of one argument, a
list of characters. After tokenize has finished
accumulating the characters for a token, it calls chars-proc with
the list of characters. The value returned is the token which
tokenize returns.
The argument group may be an exact integer or a procedure of one
character argument. The following discussion concerns the treatment
which the tokenizing routine, tokenize, will accord to characters
on the basis of their groups.
When group is a non-zero integer, characters whose group number is equal to or exactly one less than group will continue to accumulate. Any other character causes the accumulation to stop (until a new token is to be read).
The group of zero is special. These characters are ignored when parsed pending a token, and stop the accumulation of token characters when the accumulation has already begun. Whitespace characters are usually put in group 0.
If group is a procedure, then, when triggerd by the occurence of an initial (no accumulation) chars character, this procedure will be repeatedly called with each successive character from the input stream until the group procedure returns a non-false value.
The following convenient constants are provided for use with
tok:char-group.
"0123456789".
char-whitespace? returns true.
This section describes advanced features. You can skip this section on first reading.
The Null Denotation (or nud) of a token is the procedure and arguments applying for that token when Left, an unclaimed parsed expression is not extant.
The Left Denotation (or led) of a token is the procedure, arguments, and lbp applying for that token when there is a Left, an unclaimed parsed expression.
In his paper,
Pratt, V. R. Top Down Operator Precendence. SIGACT/SIGPLAN Symposium on Principles of Programming Languages, Boston, 1973, pages 41-51
the left binding power (or lbp) was an independent property of tokens. I think this was done in order to allow tokens with NUDs but not LEDs to also be used as delimiters, which was a problem for statically defined syntaxes. It turns out that dynamically binding NUDs and LEDs allows them independence.
For the rule-defining procedures that follow, the variable tk may be a character, string, or symbol, or a list composed of characters, strings, and symbols. Each element of tk is treated as though the procedure were called for each element.
Character tk arguments will match only character tokens; i.e. characters for which no token-group is assigned. Symbols and strings will both match token strings; i.e. tokens resulting from token groups.
(list sop
arg1 ...) is incorporated.
If no NUD has been defined for a token; then if that token is a string, it is converted to a symbol and returned; if not a string, the token is returned.
If no LED has been defined for a token, and left is set, the parser issues a warning.
Here are procedures for defining rules for the syntax types introduced in section Precedence Parsing Overview.
For the rule-defining procedures that follow, the variable tk may be a character, string, or symbol, or a list composed of characters, strings, and symbols. Each element of tk is treated as though the procedure were called for each element.
For procedures prec:delim, ..., prec:prestfix, if the sop
argument is #f, then the token which triggered this rule is
converted to a symbol and returned. A false sop argument to the
procedures prec:commentfix, prec:matchfix, or prec:inmatchfix has a
different meaning.
Character tk arguments will match only character tokens; i.e. characters for which no token-group is assigned. Symbols and strings will both match token strings; i.e. tokens resulting from token groups.
prec:parse1 is called with binding-power bp.
prec:parse1; the resulting value is incorporated into the
expression being built. Otherwise, the list of sop and the
expression returned from prec:parse1 is incorporated.
Parsing of commentfix syntax differs from the others in several ways. It reads directly from input without tokenizing; It calls stp but does not return its value; nay any value. I added the stp argument so that comment text could be echoed.
0 until the token
match is reached. If the token sep does not appear between
each pair of expressions parsed, a warning is issued.
0 until the token
match is reached. If the token sep does not appear between
each pair of expressions parsed, a warning is issued.
(require 'format) or (require 'srfi-28)
Returns #t, #f or a string; has side effect of printing
according to format-string. If destination is #t,
the output is to the current output port and #t is returned. If
destination is #f, a formatted string is returned as the
result of the call. NEW: If destination is a string,
destination is regarded as the format string; format-string is
then the first argument and the output is returned as a string. If
destination is a number, the output is to the current error port
if available by the implementation. Otherwise destination must be
an output port and #t is returned.
format-string must be a string. In case of a formatting error
format returns #f and prints a message on the current output or
error port. Characters are output as if the string were output by the
display function with the exception of those prefixed by a tilde
(~). For a detailed description of the format-string syntax
please consult a Common LISP format reference manual. For a test suite
to verify this format implementation load `formatst.scm'.
Please consult a Common LISP format reference manual for a detailed description of the format string syntax. For a demonstration of the implemented directives see `formatst.scm'.
This implementation supports directive parameters and modifiers
(: and @ characters). Multiple parameters must be
separated by a comma (,). Parameters can be numerical parameters
(positive or negative), character parameters (prefixed by a quote
character ('), variable parameters (v), number of rest
arguments parameter (#), empty and default parameters. Directive
characters are case independent. The general form of a directive
is:
directive ::= ~{directive-parameter,}[:][@]directive-character
directive-parameter ::= [ [-|+]{0-9}+ | 'character | v | # ]
Documentation syntax: Uppercase characters represent the corresponding control directive characters. Lowercase characters represent control directive parameter descriptions.
~A
display does).
~@A
~mincol,colinc,minpad,padcharA
~S
write does).
~@S
~mincol,colinc,minpad,padcharS
~D
~@D
~:D
~mincol,padchar,commacharD
~X
~@X
~:X
~mincol,padchar,commacharX
~O
~@O
~:O
~mincol,padchar,commacharO
~B
~@B
~:B
~mincol,padchar,commacharB
~nR
~n,mincol,padchar,commacharR
~@R
~:@R
~:R
~R
~P
~@P
y and ies.
~:P
~P but jumps 1 argument backward.
~:@P
~@P but jumps 1 argument backward.
~C
~@C
#\ prefixing).
~:C
^C for ASCII 03).
~F
~width,digits,scale,overflowchar,padcharF
~@F
~E
Eee).
~width,digits,exponentdigits,scale,overflowchar,padchar,exponentcharE
~@E
~G
~width,digits,exponentdigits,scale,overflowchar,padchar,exponentcharG
~@G
~$
~digits,scale,width,padchar$
~@$
~:@$
~:$
~%
~n%
~&
~n&
~& and then n-1 newlines.
~|
~n|
~~
~n~
~<newline>
~:<newline>
~@<newline>
~T
~@T
~colnum,colincT
~?
~@?
~(str~)
string-downcase).
~:(str~)
string-capitalize.
~@(str~)
string-capitalize-first.
~:@(str~)
string-upcase.
~*
~n*
~:*
~n:*
~@*
~n@*
~[str0~;str1~;...~;strn~]
~n[
~@[
~:[
~;
~:;
~{str~}
~n{
~:{
~@{
~:@{
~^
~n^
~n,m^
~n,m,k^
~:A
#f as an empty list (see below).
~:S
#f as an empty list (see below).
~<~>
~:^
~mincol,padchar,commachar,commawidthD
~mincol,padchar,commachar,commawidthX
~mincol,padchar,commachar,commawidthO
~mincol,padchar,commachar,commawidthB
~n,mincol,padchar,commachar,commawidthR
~I
~F~@Fi with passed parameters for
~F.
~Y
~K
~?.
~!
~_
#\space character
~n_
#\space characters.
~/
#\tab character
~n/
#\tab characters.
~nC
integer->char. n must be a positive decimal number.~:S
#<...> as strings "#<...>" so that the format output can always
be processed by read.
~:A
#<...> as strings "#<...>" so that the format output can always
be processed by read.
~Q
~:Q
~F, ~E, ~G, ~$
Format has some configuration variables at the beginning of `format.scm' to suit the systems and users needs. There should be no modification necessary for the configuration that comes with SLIB. If modification is desired the variable should be set after the format code is loaded. Format detects automatically if the running scheme system implements floating point numbers and complex numbers.
symbol->string so the case type of the
printed symbols is implementation dependent.
format:symbol-case-conv is a one arg closure which is either
#f (no conversion), string-upcase, string-downcase
or string-capitalize. (default #f)
#f)
~E printing. (default
#\E)
#t, a ~{...~} control will iterate no more than the
number of times specified by format:max-iterations regardless of
the number of iterations implied by modifiers and arguments.
When #f, a ~{...~} control will iterate the number of
times implied by modifiers and arguments, unless termination is forced
by language or system limitations. (default #t)
~{...~} control.
Has effect only when format:iteration-bounded is #t.
(default 100)
~A, ~S,
~P, ~X uppercase printing. SLIB format 1.4 uses C-style
printf padding support which is completely replaced by the CL
format padding style.
~, which is not documented
(ignores all characters inside the format string up to a newline
character). (7.1 implements ~a, ~s,
~newline, ~~, ~%, numerical and variable
parameters and :/@ modifiers in the CL sense).
~A and ~S which print in
uppercase. (Elk implements ~a, ~s, ~~, and
~% (no directive parameters or modifiers)).
~a, ~s, ~c, ~%, and ~~ (no directive
parameters or modifiers)).
This implementation of format is solely useful in the SLIB context because it requires other components provided by SLIB.
requires printf and scanf and additionally defines
the symbols:
(current-input-port).
(current-output-port).
(current-error-port).
Each function converts, formats, and outputs its arg1 ... arguments according to the control string format argument and returns the number of characters output.
printf sends its output to the port (current-output-port).
fprintf sends its output to the port port. sprintf
string-set!s locations of the non-constant string argument
str to the output characters.
Two extensions of sprintf return new strings. If the first
argument is #f, then the returned string's length is as many
characters as specified by the format and data; if the first
argument is a non-negative integer k, then the length of the
returned string is also bounded by k.
The string format contains plain characters which are copied to the output stream, and conversion specifications, each of which results in fetching zero or more of the arguments arg1 .... The results are undefined if there are an insufficient number of arguments for the format. If format is exhausted while some of the arg1 ... arguments remain unused, the excess arg1 ... arguments are ignored.
The conversion specifications in a format string have the form:
% [ flags ] [ width ] [ . precision ] [ type ] conversion
An output conversion specifications consist of an initial `%' character followed in sequence by:
scanf functions with the `%i' conversion
(see section Standard Formatted Input).
6. If the precision is explicitly 0,
the decimal point character is suppressed.
For the `%g' and `%G' conversions, the precision specifies how
many significant digits to print. Significant digits are the first
digit before the decimal point, and all the digits after it. If the
precision is 0 or not specified for `%g' or `%G', it is
treated like a value of 1. If the value being printed cannot be
expressed accurately in the specified number of digits, the value is
rounded to the nearest number that fits.
For exact conversions, if a precision is supplied it specifies the
minimum number of digits to appear; leading zeros are produced if
necessary. If a precision is not supplied, the number is printed with
as many digits as necessary. Converting an exact `0' with an
explicit precision of zero produces no characters.
scanf
for input (see section Standard Formatted Input).
write (which can be read using read); otherwise,
output is as display prints. A precision specifies the maximum
number of characters to output; otherwise as many characters as needed
are output.
Note: `%a' and `%A' are SLIB extensions.
Each function reads characters, interpreting them according to the control string format argument.
scanf-read-list returns a list of the items specified as far as
the input matches format. scanf, fscanf, and
sscanf return the number of items successfully matched and
stored. scanf, fscanf, and sscanf also set the
location corresponding to arg1 ... using the methods:
set!
set-car!
set-cdr!
vector-set!
substring-move-left!
The argument to a substring expression in arg1 ... must
be a non-constant string. Characters will be stored starting at the
position specified by the second argument to substring. The
number of characters stored will be limited by either the position
specified by the third argument to substring or the length of the
matched string, whichever is less.
The control string, format, contains conversion specifications and other characters used to direct interpretation of input sequences. The control string contains:
Unless the specification contains the `n' conversion character (described below), a conversion specification directs the conversion of the next input field. The result of a conversion specification is returned in the position of the corresponding argument points, unless `*' indicates assignment suppression. Assignment suppression provides a way to describe an input field to be skipped. An input field is defined as a string of characters; it extends to the next inappropriate character or until the field width, if specified, is exhausted.
Note: This specification of format strings differs from the ANSI C and POSIX specifications. In SLIB, white space before an input field is not skipped unless white space appears before the conversion specification in the format string. In order to write format strings which work identically with ANSI C and SLIB, prepend whitespace to all conversion specifications except `[' and `c'.
The conversion code indicates the interpretation of the input field; For a suppressed field, no value is returned. The following conversion codes are legal:
scanf. No input is consumed by %n.
scanf cannot read a null string.
The scanf functions terminate their conversions at end-of-file,
at the end of the control string, or when an input character conflicts
with the control string. In the latter case, the offending character is
left unread in the input stream.
This routine implements Posix command line argument parsing. Notice
that returning values through global variables means that getopt
is not reentrant.
Obedience to Posix format for the getopt calls sows confusion.
Passing argc and argv as arguments while referencing
optind as a global variable leads to strange behavior,
especially when the calls to getopt are buried in other
procedures.
Even in C, argc can be derived from argv; what purpose
does it serve beyond providing an opportunity for
argv/argc mismatch? Just such a mismatch existed for
years in a SLIB getopt-- example.
I have removed the argc and argv arguments to getopt procedures; and replaced them with a global variable:
(vector-ref argv *optind*)) that matches a letter in
optstring. *argv* is a vector or list of strings, the 0th
of which getopt usually ignores. optstring is a string of
recognized option characters; if a character is followed by a colon,
the option takes an argument which may be immediately following it in
the string or in the next element of *argv*.
*optind* is the index of the next element of the *argv* vector
to be processed. It is initialized to 1 by `getopt.scm', and
getopt updates it when it finishes with each element of
*argv*.
getopt returns the next option character from *argv* that
matches a character in optstring, if there is one that matches.
If the option takes an argument, getopt sets the variable
*optarg* to the option-argument as follows:
(length *argv*), this indicates a missing option
argument, and getopt returns an error indication.
If, when getopt is called, the string (vector-ref argv
*optind*) either does not begin with the character #\- or is
just "-", getopt returns #f without changing
*optind*. If (vector-ref argv *optind*) is the string
"--", getopt returns #f after incrementing
*optind*.
If getopt encounters an option character that is not contained in
optstring, it returns the question-mark #\? character. If
it detects a missing option argument, it returns the colon character
#\: if the first character of optstring was a colon, or a
question-mark character otherwise. In either case, getopt sets
the variable getopt:opt to the option character that caused the
error.
The special option "--" can be used to delimit the end of the
options; #f is returned, and "--" is skipped.
RETURN VALUE
getopt returns the next option character specified on the command
line. A colon #\: is returned if getopt detects a missing
argument and the first character of optstring was a colon
#\:.
A question-mark #\? is returned if getopt encounters an
option character not in optstring or detects a missing argument
and the first character of optstring was not a colon #\:.
Otherwise, getopt returns #f when all command line options
have been parsed.
Example:
#! /usr/local/bin/scm
(require 'program-arguments)
(require 'getopt)
(define argv (program-arguments))
(define opts ":a:b:cd")
(let loop ((opt (getopt (length argv) argv opts)))
(case opt
((#\a) (print "option a: " *optarg*))
((#\b) (print "option b: " *optarg*))
((#\c) (print "option c"))
((#\d) (print "option d"))
((#\?) (print "error" getopt:opt))
((#\:) (print "missing arg" getopt:opt))
((#f) (if (< *optind* (length argv))
(print "argv[" *optind* "]="
(list-ref argv *optind*)))
(set! *optind* (+ *optind* 1))))
(if (< *optind* (length argv))
(loop (getopt (length argv) argv opts))))
(slib:exit)
getopt-- optstring
getopt-- is an extended version of getopt
which parses long option names of the form
`--hold-the-onions' and `--verbosity-level=extreme'.
Getopt-- behaves as getopt except for non-empty
options beginning with `--'.
Options beginning with `--' are returned as strings rather than
characters. If a value is assigned (using `=') to a long option,
*optarg* is set to the value. The `=' and value are
not returned as part of the option string.
No information is passed to getopt-- concerning which long
options should be accepted or whether such options can take arguments.
If a long option did not have an argument, *optarg* will be set
to #f. The caller is responsible for detecting and reporting
errors.
(define opts ":-:b:")
(define *argv* '("foo" "-b9" "--f1" "--2=" "--g3=35234.342" "--"))
(define *optind* 1)
(define *optarg* #f)
(require 'qp)
(do ((i 5 (+ -1 i)))
((zero? i))
(let ((opt (getopt-- opts)))
(print *optind* opt *optarg*)))
-|
2 #\b "9"
3 "f1" #f
4 "2" ""
5 "g3" "35234.342"
5 #f "35234.342"
read-command converts a command line into a list of strings
suitable for parsing by getopt. The syntax of command lines
supported resembles that of popular shells. read-command
updates port to point to the first character past the command
delimiter.
If an end of file is encountered in the input before any characters are found that can begin an object or comment, then an end of file object is returned.
The port argument may be omitted, in which case it defaults to the
value returned by current-input-port.
The fields into which the command line is split are delimited by
whitespace as defined by char-whitespace?. The end of a command
is delimited by end-of-file or unescaped semicolon (;) or
newline. Any character can be literally included in a field by
escaping it with a backslach (\).
The initial character and types of fields recognized are:
read starting with this character. The
read expression is evaluated, converted to a string
(using display), and replaces the expression in the returned
field.
The comment field differs from the previous fields in that it must be
the first character of a command or appear after whitespace in order to
be recognized. # can be part of fields if these conditions are
not met. For instance, ab#c is just the field ab#c.
read-dommand-line and backslashes before newlines in
comments are also ignored.
read-options-file converts an options file into a list of
strings suitable for parsing by getopt. The syntax of options
files is the same as the syntax for command
lines, except that newlines do not terminate reading (only ;
or end of file).
If an end of file is encountered before any characters are found that can begin an object or comment, then an end of file object is returned.
Arguments to procedures in scheme are distinguished from each other by their position in the procedure call. This can be confusing when a procedure takes many arguments, many of which are not often used.
A parameter-list is a way of passing named information to a procedure. Procedures are also defined to set unused parameters to default values, check parameters, and combine parameter lists.
A parameter has the form (parameter-name value1
...). This format allows for more than one value per
parameter-name.
A parameter-list is a list of parameters, each with a different parameter-name.
parameter-list-ref returns the value of parameter
parameter-name of parameter-list.
remove-parameter does not alter the argument
parameter-list.
If there are more than one parameter-name parameters, an error is signaled.
make-parameter-list
which created parameter-list. For each non-false element of
expanders that procedure is mapped over the corresponding
parameter value and the returned parameter lists are merged into
parameter-list.
This process is repeated until parameter-list stops growing. The
value returned from parameter-list-expand is unspecified.
make-parameter-list
which created parameter-list. fill-empty-parameters
returns a new parameter-list with each empty parameter replaced with the
list returned by calling the corresponding defaulter with
parameter-list as its argument.
make-parameter-list
which created parameter-list.
check-parameters returns parameter-list if each check
of the corresponding parameter-list returns non-false. If some
check returns #f a warning is signaled.
In the following procedures arities is a list of symbols. The
elements of arities can be:
single
optional
boolean
nary
nary1
single and boolean are converted to
the single value associated with them. The other arity types are
converted to lists of the value(s).
positions is a list of positive integers whose order matches the
order of the parameter-names in the call to
make-parameter-list which created parameter-list. The
integers specify in which argument position the corresponding parameter
should appear.
Returns *argv* converted to a parameter-list. optnames are the parameter-names. arities and types are lists of symbols corresponding to optnames.
aliases is a list of lists of strings or integers paired with
elements of optnames. Each one-character string will be treated
as a single `-' option by getopt. Longer strings will be
treated as long-named options (see section Getopt).
If the aliases association list has only strings as its
cars, then all the option-arguments after an option (and before
the next option) are adjoined to that option.
If the aliases association list has integers, then each (string) option will take at most one option-argument. Unoptioned arguments are collected in a list. A `-1' alias will take the last argument in this list; `+1' will take the first argument in the list. The aliases -2 then +2; -3 then +3; ... are tried so long as a positive or negative consecutive alias is found and arguments remain in the list. Finally a `0' alias, if found, absorbs any remaining arguments.
In all cases, if unclaimed arguments remain after processing, a warning is signaled and #f is returned.
Like getopt->parameter-list, but converts *argv* to an
argument-list as specified by optnames, positions,
arities, types, defaulters, checks, and
aliases. If the options supplied violate the arities or
checks constraints, then a warning is signaled and #f is returned.
These getopt functions can be used with SLIB relational
databases. For an example, See section Using Databases.
If errors are encountered while processing options, directions for using
the options (and argument strings desc ...) are printed to
current-error-port.
(begin
(set! *optind* 1)
(set! *argv* '("cmd" "-?")
(getopt->parameter-list
'(flag number symbols symbols string flag2 flag3 num2 num3)
'(boolean optional nary1 nary single boolean boolean nary nary)
'(boolean integer symbol symbol string boolean boolean integer integer)
'(("flag" flag)
("f" flag)
("Flag" flag2)
("B" flag3)
("optional" number)
("o" number)
("nary1" symbols)
("N" symbols)
("nary" symbols)
("n" symbols)
("single" string)
("s" string)
("a" num2)
("Abs" num3))))
-|
Usage: cmd [OPTION ARGUMENT ...] ...
-f, --flag
-o, --optional=<number>
-n, --nary=<symbols> ...
-N, --nary1=<symbols> ...
-s, --single=<string>
--Flag
-B
-a <num2> ...
--Abs=<num3> ...
ERROR: getopt->parameter-list "unrecognized option" "-?"
Returns a predicate which returns a non-false value if its string argument matches (the string) pattern, false otherwise. Filename matching is like glob expansion described the bash manpage, except that names beginning with `.' are matched and `/' characters are not treated specially.
These functions interpret the following characters specially in pattern strings:
Returns a function transforming a single string argument according to
glob patterns pattern and template. pattern and
template must have the same number of wildcard specifications,
which need not be identical. pattern and template may have
a different number of literal sections. If an argument to the function
matches pattern in the sense of filename:match?? then it
returns a copy of template in which each wildcard specification is
replaced by the part of the argument matched by the corresponding
wildcard specification in pattern. A * wildcard matches
the longest leftmost string possible. If the argument does not match
pattern then false is returned.
template may be a function accepting the same number of string
arguments as there are wildcard specifications in pattern. In
the case of a match the result of applying template to a list
of the substrings matched by wildcard specifications will be returned,
otherwise template will not be called and #f will be returned.
((filename:substitute?? "scm_[0-9]*.html" "scm5c4_??.htm") "scm_10.html") => "scm5c4_10.htm" ((filename:substitute?? "??" "beg?mid?end") "AZ") => "begAmidZend" ((filename:substitute?? "*na*" "?NA?") "banana") => "banaNA" ((filename:substitute?? "?*?" (lambda (s1 s2 s3) (string-append s3 s1))) "ABZ") => "ZA"
str can be a string or a list of strings. Returns a new string
(or strings) similar to str but with the suffix string old
removed and the suffix string new appended. If the end of
str does not match old, an error is signaled.
(replace-suffix "/usr/local/lib/slib/batch.scm" ".scm" ".c") => "/usr/local/lib/slib/batch.c"
tmpnam.
If proc returns, then any files named by the arguments to proc are
deleted automatically and the value(s) yielded by the proc is(are)
returned. k may be ommited, in which case it defaults to 1.
tmpnam,
each with the corresponding suffix string appended.
If proc returns, then any files named by the arguments to proc are
deleted automatically and the value(s) yielded by the proc is(are)
returned.
The batch procedures provide a way to write and execute portable scripts
for a variety of operating systems. Each batch: procedure takes
as its first argument a parameter-list (see section Parameter lists). This
parameter-list argument parms contains named associations. Batch
currently uses 2 of these:
batch-port
batch-dialect
The `batch' module uses 2 enhanced relational tables
(see section Using Databases) to store information linking the names of
operating-systems to batch-dialectes.
operating-system and batch-dialect tables and adds
the domain operating-system to the enhanced relational database
database.
*operating-system* is set to (software-type)
(see section Configuration) unless (software-type) is unix,
in which case finer distinctions are made.
batch:call-with-output-script writes an appropriate
header to file and then calls proc with file as the
only argument. If file is a string,
batch:call-with-output-script opens a output-file of name
file, writes an appropriate header to file, and then calls
proc with the newly opened port as the only argument. Otherwise,
batch:call-with-output-script acts as if it was called with the
result of (current-output-port) as its third argument.
The rest of the batch: procedures write (or execute if
batch-dialect is system) commands to the batch port which
has been added to parms or (copy-tree parms) by the
code:
(adjoin-parameters! parms (list 'batch-port port))
batch:try-command (below) with arguments, but signals an
error if batch:try-command returns #f.
These functions return a non-false value if the command was successfully
translated into the batch dialect and #f if not. In the case of
the system dialect, the value is non-false if the operation
suceeded.
batch-port in parms which executes
the program named string1 with arguments string2 ....
arg1 arg2 ... chunk
fits withing the platform's maximum command-line length.
batch:try-chopped-command calls batch:try-command with the
command and returns non-false only if the commands all fit and
batch:try-command of each command line returned non-false.
batch-port in parms which executes
the batch script named string1 with arguments string2
....
Note: batch:run-script and batch:try-command are not the
same for some operating systems (VMS).
batch-port in
parms.
batch-port in parms which create a
file named file with contents line1 ....
batch-port in parms which deletes
the file named file.
batch-port in parms which renames
the file old-name to new-name.
In addition, batch provides some small utilities very useful for writing scripts:
(truncate-up-to "/usr/local/lib/slib/batch.scm" "/") => "batch.scm"
equal? to elements of
list2, then those elements will appear first and in the order of
list1.
equal? to elements of
list1, then those elements will appear last and in the order of
list2.
batch-dialect to be used for the
operating-system named osname. os->batch-dialect uses the
tables added to database by batch:initialize!.
Here is an example of the use of most of batch's procedures:
(require 'databases)
(require 'parameters)
(require 'batch)
(require 'filename)
(define batch (create-database #f 'alist-table))
(batch:initialize! batch)
(define my-parameters
(list (list 'batch-dialect (os->batch-dialect *operating-system*))
(list 'operating-system *operating-system*)
(list 'batch-port (current-output-port)))) ;gets filled in later
(batch:call-with-output-script
my-parameters
"my-batch"
(lambda (batch-port)
(adjoin-parameters! my-parameters (list 'batch-port batch-port))
(and
(batch:comment my-parameters
"================ Write file with C program.")
(batch:rename-file my-parameters "hello.c" "hello.c~")
(batch:lines->file my-parameters "hello.c"
"#include <stdio.h>"
"int main(int argc, char **argv)"
"{"
" printf(\"hello world\\n\");"
" return 0;"
"}" )
(batch:command my-parameters "cc" "-c" "hello.c")
(batch:command my-parameters "cc" "-o" "hello"
(replace-suffix "hello.c" ".c" ".o"))
(batch:command my-parameters "hello")
(batch:delete-file my-parameters "hello")
(batch:delete-file my-parameters "hello.c")
(batch:delete-file my-parameters "hello.o")
(batch:delete-file my-parameters "my-batch")
)))
Produces the file `my-batch':
#! /bin/sh
# "my-batch" script created by SLIB/batch Sun Oct 31 18:24:10 1999
# ================ Write file with C program.
mv -f hello.c hello.c~
rm -f hello.c
echo '#include <stdio.h>'>>hello.c
echo 'int main(int argc, char **argv)'>>hello.c
echo '{'>>hello.c
echo ' printf("hello world\n");'>>hello.c
echo ' return 0;'>>hello.c
echo '}'>>hello.c
cc -c hello.c
cc -o hello hello.o
hello
rm -f hello
rm -f hello.c
rm -f hello.o
rm -f my-batch
When run, `my-batch' prints:
bash$ my-batch mv: hello.c: No such file or directory hello world
html:head. The tag produced is `<META
NAME="name" CONTENT="content">'. The string or symbol name can be
`author', `copyright', `keywords', `description',
`date', `robots', ....
html:head. The tag produced is `<META
HTTP-EQUIV="name" CONTENT="content">'. The string or symbol name can be
`Expires', `PICS-Label', `Content-Type',
`Refresh', ....
Returns a tag suitable for passing as the third argument to
html:head. If uri argument is supplied, then delay seconds after
displaying the page with this tag, Netscape or IE browsers will fetch
and display uri. Otherwise, delay seconds after displaying the page with
this tag, Netscape or IE browsers will fetch and redisplay this page.
Returns header string for an HTML page named title. If backlink is a string, it is used verbatim between the `H1' tags; otherwise title is used. If string arguments tags ... are supplied, then they are included verbatim within the <HEAD> section.
get, head, post,
put, or delete. The strings body form the body of the
form. html:form returns the HTML form.
The string or symbol submit-label appears on the button which submits the form.
If the optional second argument command is given, then *command*=command
and *button*=submit-label are set in the query. Otherwise,
*command*=submit-label is set in the query.
single
optional
nary
nary1
If the foreign-key table has a field named `visible-name', then the contents of that field are the names visible to the user for those choices. Otherwise, the foreign-key itself is visible.
For other types of domains:
single
optional
boolean
nary
nary1
Returns a HTML string for a form element embedded in a line of a
delimited list. Apply map form:delimited to the list returned by
command->p-specs.
The symbol command-table names a command table in the rdb relational database. The symbol command names a key in command-table.
command->p-specs returns a list of lists of pname, doc, aliat,
arity, default-list, and foreign-values. The
returned list has one element for each parameter of command command.
This example demonstrates how to create a HTML-form for the `build' command.
(require (in-vicinity (implementation-vicinity) "build.scm"))
(call-with-output-file "buildscm.html"
(lambda (port)
(display
(string-append
(html:head 'commands)
(html:body
(sprintf #f "<H2>%s:</H2><BLOCKQUOTE>%s</BLOCKQUOTE>\\n"
(html:plain 'build)
(html:plain ((comtab 'get 'documentation) 'build)))
(html:form
'post
(or "http://localhost:8081/buildscm" "/cgi-bin/build.cgi")
(apply html:delimited-list
(apply map form:delimited
(command->p-specs build '*commands* 'build)))
(form:submit 'build)
(form:reset))))
port)))
The positive integer k is the primary-key-limit (number of primary-keys) of the table. foreigns is a list of the filenames of foreign-key field pages and #f for non foreign-key fields.
html:linked-row-converter returns a procedure taking a row for its single argument. This
returned procedure returns the html string for that table row.
Returns the symbol table-name converted to a filename.
Returns HTML string for db table table-name chopped into 50-row HTML tables. Every foreign-key value is linked to the page (of the table) defining that key.
The optional match-key1 ... arguments restrict actions to a subset of the table. See section Table Operations.
Returns a complete HTML page. The string index-filename names the page which refers to this one.
The optional args ... arguments restrict actions to a subset of the table. See section Table Operations.
Returns HTML string for the catalog table of db.
A client can modify one row of an editable table at a time. For any change submitted, these routines check if that row has been modified during the time the user has been editing the form. If so, an error page results.
The behavior of edited rows is:
After any change to the table, a sync-database of the
database is performed.
Returns procedure (of db) which returns procedure to modify
row of table-name. null-keys is the list of null keys indicating the row is
to be deleted when any matches its corresponding primary key.
Optional arguments update, delete, and retrieve default to the row:update,
row:delete, and row:retrieve of table-name in db.
*command* tables
for editing one row of table-name at a time. command:make-editable-table returns a procedure taking a
row argument which returns the HTML string for editing that row.
Optional args are expressions (lists) added to the call to
command:modify-table.
The domain name of a column determines the expected arity of the data stored in that column. Domain names ending in:
The positive integer k is the primary-key-limit (number of primary-keys) of the table. names is a list of the field-names. edit-point is the list of primary-keys denoting the row to edit (or #f). edit-converter is the procedure called with k, names, and the row to edit.
html:editable-row-converter returns a procedure taking a row for its single argument. This
returned procedure returns the html string for that table row.
Each HTML table constructed using html:editable-row-converter has first k fields (typically
the primary key fields) of each row linked to a text encoding of these
fields (the result of calling row->anchor). The page so
referenced typically allows the user to edit fields of that row.
db->html-files creates an html page for each table in the database db in the
sub-directory named dir, or the current directory if dir is #f. The
top level page with the catalog of tables (captioned caption) is written
to a file named index-filename.
db->html-directory creates sub-directory dir if neccessary, and calls
(db->html-files db dir index-filename dir). The `file:' URI of index-filename is
returned.
db->netscape is just like db->html-directory, but calls
browse-url with the uri for the top page after the
pages are created.
(require 'http) or (require 'cgi)
car of which is followed by `: ', then the cdr.
(http:header alist) and the `Content-Length' prepended.
http:forwarding-page returns an HTML string for a page which automatically forwards to
uri after dly seconds. The returned page (string) contains any html-strings
... followed by a manual link to uri, in case the browser does not
forward automatically.
http:serve-query calls
serve-proc with three arguments, the request-line, query-string,
and header-alist. Otherwise, http:serve-query calls serve-proc with the
request-line, #f, and header-alist.
If serve-proc returns a string, it is sent to output-port. If serve-proc returns a list, then an error page with number 525 and strings from the list. If serve-proc returns #f, then a `Bad Request' (400) page is sent to output-port.
Otherwise, http:serve-query replies (to output-port) with appropriate HTML describing the
problem.
This example services HTTP queries from port-number:
(define socket (make-stream-socket AF_INET 0))
(and (socket:bind socket port-number) ; AF_INET INADDR_ANY
(socket:listen socket 10) ; Queue up to 10 requests.
(dynamic-wind
(lambda () #f)
(lambda ()
(do ((port (socket:accept socket) (socket:accept socket)))
(#f)
(let ((iport (duplicate-port port "r"))
(oport (duplicate-port port "w")))
(http:serve-query build:serve iport oport)
(close-port iport)
(close-port oport))
(close-port port)))
(lambda () (close-port socket))))
(current-input-port). If the query is a valid `"POST"'
or `"GET"' query, then cgi:serve-query calls serve-proc with three arguments, the
request-line, query-string, and header-alist.
Otherwise, cgi:serve-query calls serve-proc with the request-line, #f, and
header-alist.
If serve-proc returns a string, it is sent to (current-input-port).
If serve-proc returns a list, then an error page with number 525 and strings
from the list. If serve-proc returns #f, then a `Bad Request' (400)
page is sent to (current-input-port).
Otherwise, cgi:serve-query replies (to (current-input-port)) with
appropriate HTML describing the problem.
Returns a procedure of one argument. When that procedure is called
with a query-alist (as returned by uri:decode-query, the
value of the `*command*' association will be the command invoked
in command-table. If `*command*' is not in the query-alist then the
value of `*suggest*' is tried. If neither name is in the
query-alist, then the literal value `*default*' is tried in
command-table.
If optional third argument is non-false, then the command is called with just the parameter-list; otherwise, command is called with the arguments described in its table.
file is an input port or a string naming an existing file containing HTML text. word-proc is a procedure of one argument or #f. markup-proc is a procedure of one argument or #f. white-proc is a procedure of one argument or #f. newline-proc is a procedure of no arguments or #f.
html-for-each opens and reads characters from port file or the file named by
string file. Sequential groups of characters are assembled into
strings which are either
Procedures are called according to these distinctions in order of the string's occurrence in file.
newline-proc is called with no arguments for end-of-line not within a markup or comment.
white-proc is called with strings of non-newline whitespace.
markup-proc is called with hypertext markup strings (including `<' and `>').
word-proc is called with the remaining strings.
html-for-each returns an unspecified value.
html:read-title opens and reads HTML from port file or the file named by string file,
until reaching the (mandatory) `TITLE' field. html:read-title returns the
title string with adjacent whitespaces collapsed to one space. html:read-title
returns #f if the title field is empty, absent, if the first
character read from file is not `#\<', or if the end of title is
not found within the first (approximately) limit words.
htm is a hypertext markup string.
If htm is a (hypertext) comment or DTD, then htm-fields returns #f.
Otherwise htm-fields returns the hypertext element string consed onto an
association list of the attribute name-symbols and values. If the
tag ends with "/>", then "/" is appended to the hypertext element
string. The name-symbols are created by string-ci->symbol.
Each value is a string; or #t if the name had no value
assigned within the markup.
Implements Uniform Resource Identifiers (URI) as described in RFC 2396.
Returns a Uniform Resource Identifier string from component arguments.
Returns a URI string combining the components of list path.
(html:anchor "(section 7)") => ""
(html:link (make-uri "(section 7)") "section 7") => "section 7"
Returns a list of 5 elements corresponding to the parts (scheme authority path query fragment) of string uri-reference. Elements corresponding to absent parts are #f.
The path is a list of strings. If the first string is empty,
then the path is absolute; otherwise relative. The optional base-tree is a
tree as returned by uri->tree; and is used as the base address for relative
URIs.
If the authority component is a Server-based Naming Authority, then it is a list of the userinfo, host, and port strings (or #f). For other types of authority components the authority will be a string.
(uri->tree "http://www.ics.uci.edu/pub/ietf/uri/#Related")
=>
(http "www.ics.uci.edu" ("" "pub" "ietf" "uri" "") #f "Related")
Returns a list of txt split at each occurrence of chr. chr does not appear in the returned list of strings.
uric: prefixes indicate procedures dealing with
URI-components.
uric:decode decodes strings encoded by uric:encode.
uri:split-fields. uri:path->keys
returns a list of items returned by uri:decode-path, coerced
to types ptypes.
Before RFC 2396, the File Transfer Protocol (FTP) served a similar purpose.
Returns a list of the decoded FTP uri; or #f if indecipherable. FTP Uniform Resource Locator, ange-ftp, and getit formats are handled. The returned list has four elements which are strings or #f:
(require 'xml-parse) or (require 'ssax)
The XML standard document referred to in this module is
http://www.w3.org/TR/1998/REC-xml-19980210.html.
The present frameworks fully supports the XML Namespaces
Recommendation
http://www.w3.org/TR/REC-xml-names.
Given the list of fragments (some of which are text strings),
reverse the list and concatenate adjacent text strings. If
LIST-OF-FRAGS has zero or one element, the result of the procedure
is equal? to its argument.
Given the list of fragments (some of which are text strings), reverse the list and concatenate adjacent text strings while dropping "unsignificant" whitespace, that is, whitespace in front, behind and between elements. The whitespace that is included in character data is not affected.
Use this procedure to "intelligently" drop "insignificant"
whitespace in the parsed SXML. If the strict compliance with the
XML Recommendation regarding the whitespace is desired, use the
ssax:reverse-collect-str procedure instead.
The following functions either skip, or build and return tokens, according to inclusion or delimiting semantics. The list of characters to expect, include, or to break at may vary from one invocation of a function to another. This allows the functions to easily parse even context-sensitive languages.
Exceptions are mentioned specifically. The list of expected characters (characters to skip until, or break-characters) may include an EOF "character", which is coded as symbol *eof*
The input stream to parse is specified as a PORT, which is the last argument.
Reads a character from the port and looks it up in the char-list of expected characters. If the read character was found among expected, it is returned. Otherwise, the procedure writes a message using string as a comment and quits.
Reads characters from the port and disregards them, as long as they are mentioned in the char-list. The first character (which may be EOF) peeked from the stream that is not a member of the char-list is returned.
Returns an initial buffer for ssax:next-token* procedures.
ssax:init-buffer may allocate a new buffer at each invocation.
Skips any number of the prefix characters (members of the prefix-char-list), if any, and reads the sequence of characters up to (but not including) a break character, one of the break-char-list.
The string of characters thus read is returned. The break character
is left on the input stream. break-char-list may include the symbol *eof*;
otherwise, EOF is fatal, generating an error message including a
specified comment-string.
ssax:next-token-of is similar to ssax:next-token
except that it implements an inclusion rather than delimiting
semantics.
Reads characters from the port that belong to the list of characters inc-charset. The reading stops at the first character which is not a member of the set. This character is left on the stream. All the read characters are returned in a string.
Reads characters from the port for which pred (a procedure of one argument) returns non-#f. The reading stops at the first character for which pred returns #f. That character is left on the stream. All the results of evaluating of pred up to #f are returned in a string.
pred is a procedure that takes one argument (a character or the EOF object) and returns a character or #f. The returned character does not have to be the same as the input argument to the pred. For example,
(ssax:next-token-of (lambda (c)
(cond ((eof-object? c) #f)
((char-alphabetic? c) (char-downcase c))
(else #f)))
(current-input-port))
will try to read an alphabetic token from the current input port, and return it in lower case.
Reads len characters from the port, and returns them in a string. If EOF is encountered before len characters are read, a shorter string will be returned.
TAG-KIND
UNRES-NAME
(PREFIX . LOCALPART).
RES-NAME
(URI-SYMB . LOCALPART).
Otherwise, it's a single symbol.
ELEM-CONTENT-MODEL
URI-SYMB
URI-SYMB is
created by %-quoting of bad URI characters and converting the
resulting string into a symbol.
NAMESPACES
(prefix uri-symb . uri-symb) or
(prefix user-prefix . uri-symb)
(#f user-prefix . uri-symb)
(*DEFAULT* user-prefix . uri-symb)
(*DEFAULT* #f . #f)
ATTLIST
STR-HANDLER
ENTITIES
(named-entity-name . named-entity-body)where named-entity-name is a symbol under which the entity was declared, named-entity-body is either a string, or (for an external entity) a thunk that will return an input port (from which the entity can be read). named-entity-body may also be #f. This is an indication that a named-entity-name is currently being expanded. A reference to this named-entity-name will be an error: violation of the WFC nonrecursion.
XML-TOKEN
<P> => kind=START, head=P </P> => kind=END, head=P <BR/> => kind=EMPTY-EL, head=BR <!DOCTYPE OMF ...> => kind=DECL, head=DOCTYPE <?xml version="1.0"?> => kind=PI, head=xml &my-ent; => kind=ENTITY-REF, head=my-entCharacter references are not represented by xml-tokens as these references are transparently resolved into the corresponding characters.
XML-DECL
(elem-name elem-content decl-attrs)
elem-name is an UNRES-NAME for the element.
elem-content is an ELEM-CONTENT-MODEL.
decl-attrs is an ATTLIST, of
(attr-name . value) associations.
This element can declare a user procedure to handle parsing of an
element (e.g., to do a custom validation, or to build a hash of IDs
as they're encountered).
ATTLIST, declaration of one attribute:
(attr-name content-type use-type default-value)
attr-name is an UNRES-NAME for the declared attribute.
content-type is a symbol: CDATA, NMTOKEN,
NMTOKENS, ... or a list of strings for the enumerated
type.
use-type is a symbol: REQUIRED, IMPLIED, or
FIXED.
default-value is a string for the default value, or #f if not
given.
These procedures deal with primitive lexical units (Names,
whitespaces, tags) and with pieces of more generic productions.
Most of these parsers must be called in appropriate context. For
example, ssax:complete-start-tag must be called only when the
start-tag has been detected and its GI has been read.
Skip the S (whitespace) production as defined by
[3] S ::= (#x20 | #x09 | #x0D | #x0A)
ssax:skip-s returns the first not-whitespace character it encounters while
scanning the port. This character is left on the input stream.
Read a NCName starting from the current position in the port and return it as a symbol.
[4] NameChar ::= Letter | Digit | '.' | '-' | '_' | ':'
| CombiningChar | Extender
[5] Name ::= (Letter | '_' | ':') (NameChar)*
This code supports the XML Namespace Recommendation REC-xml-names, which modifies the above productions as follows:
[4] NCNameChar ::= Letter | Digit | '.' | '-' | '_'
| CombiningChar | Extender
[5] NCName ::= (Letter | '_') (NCNameChar)*
As the Rec-xml-names says,
"An XML document conforms to this specification if all other tokens [other than element types and attribute names] in the document which are required, for XML conformance, to match the XML production for Name, match this specification's production for NCName."
Element types and attribute names must match the production QName, defined below.
Read a (namespace-) Qualified Name, QName, from the current position in port; and return an UNRES-NAME.
From REC-xml-names:
[6] QName ::= (Prefix ':')? LocalPart [7] Prefix ::= NCName [8] LocalPart ::= NCName
This procedure starts parsing of a markup token. The current position in the stream must be `<'. This procedure scans enough of the input stream to figure out what kind of a markup token it is seeing. The procedure returns an XML-TOKEN structure describing the token. Note, generally reading of the current markup is not finished! In particular, no attributes of the start-tag token are scanned.
Here's a detailed break out of the return values and the position in the PORT when that particular value is returned:
ssax:skip-pi. ssax:read-attributes
may be useful as well (for PIs whose content is attribute-value
pairs).
ssax:read-cdata-body to read the rest.
ssax:complete-start-tag to finish parsing of the token.
The current position is inside a PI. Skip till the rest of the PI
The current position is right after reading the PITarget. We read the body of PI and return is as a string. The port will point to the character right after `?>' combination that terminates PI.
[16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
The current pos in the port is inside an internal DTD subset (e.g., after reading `#\[' that begins an internal DTD subset) Skip until the `]>' combination that terminates this DTD.
This procedure must be called after we have read a string `<![CDATA[' that begins a CDATA section. The current position must be the first position of the CDATA body. This function reads lines of the CDATA body and passes them to a str-handler, a character data consumer.
str-handler is a procedure taking arguments: string1, string2,
and seed. The first string1 argument to str-handler never
contains a newline; the second string2 argument often will.
On the first invocation of str-handler, seed is the one passed to ssax:read-cdata-body as the
third argument. The result of this first invocation will be passed
as the seed argument to the second invocation of the line
consumer, and so on. The result of the last invocation of the str-handler is
returned by the ssax:read-cdata-body. Note a similarity to the fundamental fold
iterator.
Within a CDATA section all characters are taken at their face value, with three exceptions:
[66] CharRef ::= '&#' [0-9]+ ';'
| '&#x' [0-9a-fA-F]+ ';'
This procedure must be called after we we have read `&#' that introduces a char reference. The procedure reads this reference and returns the corresponding char. The current position in PORT will be after the `;' that terminates the char reference.
Faults detected:
WFC: XML-Spec.html#wf-Legalchar
According to Section 4.1 Character and Entity References of the XML Recommendation:
"[Definition: A character reference refers to a specific character in the ISO/IEC 10646 character set, for example one not directly accessible from available input devices.]"
Expands and handles a parsed-entity reference.
name is a symbol, the name of the parsed entity to expand. content-handler is a procedure of arguments port, entities, and seed that returns a seed. str-handler is called if the entity in question is a pre-declared entity.
ssax:handle-parsed-entity returns the result returned by content-handler or str-handler.
Faults detected:
WFC: XML-Spec.html#wf-entdeclared
WFC: XML-Spec.html#norecursion
Add a name-value pair to the existing attlist, preserving its sorted ascending order; and return the new list. Return #f if a pair with the same name already exists in attlist
Given an non-null attlist, return a pair of values: the top and the rest.
This procedure reads and parses a production Attribute.
[41] Attribute ::= Name Eq AttValue
[10] AttValue ::= '"' ([^<&"] | Reference)* '"'
| "'" ([^<&'] | Reference)* "'"
[25] Eq ::= S? '=' S?
The procedure returns an ATTLIST, of Name (as UNRES-NAME), Value (as string) pairs. The current character on the port is a non-whitespace character that is not an NCName-starting character.
Note the following rules to keep in mind when reading an AttValue:
Before the value of an attribute is passed to the application or checked for validity, the XML processor must normalize it as follows:
- A character reference is processed by appending the referenced character to the attribute value.
- An entity reference is processed by recursively processing the replacement text of the entity. The named entities `amp', `lt', `gt', `quot', and `apos' are pre-declared.
- A whitespace character (#x20, #x0D, #x0A, #x09) is processed by appending #x20 to the normalized value, except that only a single #x20 is appended for a "#x0D#x0A" sequence that is part of an external parsed entity or the literal entity value of an internal parsed entity.
- Other characters are processed by appending them to the normalized value.
Faults detected:
WFC: XML-Spec.html#CleanAttrVals
WFC: XML-Spec.html#uniqattspec
Convert an unres-name to a RES-NAME, given the appropriate namespaces declarations. The last parameter, apply-default-ns?, determines if the default namespace applies (for instance, it does not for attribute names).
Per REC-xml-names/#nsc-NSDeclared, the "xml" prefix is considered pre-declared and bound to the namespace name "http://www.w3.org/XML/1998/namespace".
ssax:resolve-name tests for the namespace constraints:
http://www.w3.org/TR/REC-xml-names/#nsc-NSDeclared
Complete parsing of a start-tag markup. ssax:complete-start-tag must be called after the
start tag token has been read. tag is an UNRES-NAME. elems is an
instance of the ELEMS slot of XML-DECL; it can be #f to tell the
function to do no validation of elements and their
attributes.
ssax:complete-start-tag returns several values:
On exit, the current position in port will be the first character after `>' that terminates the start-tag markup.
Faults detected:
VC: XML-Spec.html#enum
VC: XML-Spec.html#RequiredAttr
VC: XML-Spec.html#FixedAttr
VC: XML-Spec.html#ValueType
WFC: XML-Spec.html#uniqattspec (after namespaces prefixes are resolved)
VC: XML-Spec.html#elementvalid
WFC: REC-xml-names/#dt-NSName
Note: although XML Recommendation does not explicitly say it, xmlns and xmlns: attributes don't have to be declared (although they can be declared, to specify their default value).
Parses an ExternalID production:
[75] ExternalID ::= 'SYSTEM' S SystemLiteral
| 'PUBLIC' S PubidLiteral S SystemLiteral
[11] SystemLiteral ::= ('"' [^"]* '"') | ("'" [^']* "'")
[12] PubidLiteral ::= '"' PubidChar* '"'
| "'" (PubidChar - "'")* "'"
[13] PubidChar ::= #x20 | #x0D | #x0A | [a-zA-Z0-9]
| [-'()+,./:=?;!*#@$_%]
Call ssax:read-external-id when an ExternalID is expected; that is, the current
character must be either #\S or #\P that starts correspondingly a
SYSTEM or PUBLIC token. ssax:read-external-id returns the SystemLiteral as a
string. A PubidLiteral is disregarded if present.
These procedures parse productions corresponding to the whole (document) entity or its higher-level pieces (prolog, root element, etc).
Scan the Misc production in the context:
[1] document ::= prolog element Misc* [22] prolog ::= XMLDecl? Misc* (doctypedec l Misc*)? [27] Misc ::= Comment | PI | S
Call ssax:scan-misc in the prolog or epilog contexts. In these contexts,
whitespaces are completely ignored. The return value from ssax:scan-misc is
either a PI-token, a DECL-token, a START token, or *EOF*. Comments
are ignored and not reported.
Read the character content of an XML document or an XML element.
[43] content ::= (element | CharData | Reference | CDSect | PI | Comment)*
To be more precise, ssax:read-char-data reads CharData, expands CDSect and character
entities, and skips comments. ssax:read-char-data stops at a named reference, EOF,
at the beginning of a PI, or a start/end tag.
expect-eof? is a boolean indicating if EOF is normal; i.e., the character data may be terminated by the EOF. EOF is normal while processing a parsed entity.
iseed is an argument passed to the first invocation of str-handler.
ssax:read-char-data returns two results: seed and token. The seed
is the result of the last invocation of str-handler, or the original iseed if str-handler
was never called.
token can be either an eof-object (this can happen only if expect-eof? was #t), or:
CDATA sections and character references are expanded inline and never returned. Comments are silently disregarded.
As the XML Recommendation requires, all whitespace in character data must be preserved. However, a CR character (#x0D) must be disregarded if it appears before a LF character (#x0A), or replaced by a #x0A character otherwise. See Secs. 2.10 and 2.11 of the XML Recommendation. See also the canonical XML Recommendation.
Make sure that token is of anticipated kind and has anticipated gi. Note that the gi argument may actually be a pair of two symbols, Namespace-URI or the prefix, and of the localname. If the assertion fails, error-cont is evaluated by passing it three arguments: token kind gi. The result of error-cont is returned.
These procedures are to instantiate a SSAX parser. A user can instantiate the parser to do the full validation, or no validation, or any particular validation. The user specifies which PI he wants to be notified about. The user tells what to do with the parsed character and element data. The latter handlers determine if the parsing follows a SAX or a DOM model.
Create a parser to parse and process one Processing Element (PI).
my-pi-handlers is an association list of pairs
(pi-tag . pi-handler) where pi-tag is an
NCName symbol, the PI target; and pi-handler is a procedure
taking arguments port, pi-tag, and seed.
pi-handler should read the rest of the PI up to and including
the combination `?>' that terminates the PI. The handler
should return a new seed. One of the pi-tags may be the
symbol *DEFAULT*. The corresponding handler will handle PIs
that no other handler will. If the *DEFAULT* pi-tag is not
specified, ssax:make-pi-parser will assume the default handler that skips the body of
the PI.
ssax:make-pi-parser returns a procedure of arguments port, pi-tag, and
seed; that will parse the current PI according to my-pi-handlers.
Create a parser to parse and process one element, including its character content or children elements. The parser is typically applied to the root element of a document.
ssax:make-pi-handler above.
The generated parser is a procedure taking arguments:
start-tag-head port elems entities namespaces preserve-ws? seed
The procedure must be called after the start tag token has been read. start-tag-head is an UNRES-NAME from the start-element tag. ELEMS is an instance of ELEMS slot of XML-DECL.
Faults detected:
VC: XML-Spec.html#elementvalid
WFC: XML-Spec.html#GIMatch
Create an XML parser, an instance of the XML parsing framework. This will be a SAX, a DOM, or a specialized parser depending on the supplied user-handlers.
ssax:make-parser takes an even number of arguments; user-handler-tag is a symbol that identifies
a procedure (or association list for PROCESSING-INSTRUCTIONS)
(user-handler) that follows the tag. Given below are tags and signatures of
the corresponding procedures. Not all tags have to be specified.
If some are omitted, reasonable defaults will apply.
skip-internal-dtd if we aren't interested in
reading it). port at exit must be at the first symbol after
the whole DOCTYPE declaration.
The handler-procedure must generate four values:
elems is as defined for the ELEMS slot of XML-DECL. It may be #f to switch off validation. namespaces will typically contain user-prefixes for selected uri-symbs. The default handler-procedure skips the internal subset, if any, and returnselems entities namespaces seed
(values #f '() '() seed).
The default handler-procedure returns (values #f '() '() seed)elems entities namespaces seed
ssax:make-pi-parser.
The default value is '()
The generated parser is a procedure of arguments port and seed.
This procedure parses the document prolog and then exits to an
element parser (created by ssax:make-elem-parser) to handle
the rest.
[1] document ::= prolog element Misc*
[22] prolog ::= XMLDecl? Misc* (doctypedec | Misc*)?
[27] Misc ::= Comment | PI | S
[28] doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S?
('[' (markupdecl | PEReference | S)* ']' S?)? '>'
[29] markupdecl ::= elementdecl | AttlistDecl
| EntityDecl
| NotationDecl | PI
| Comment
This is an instance of the SSAX parser that returns an SXML
representation of the XML document to be read from port. namespace-prefix-assig is a list
of (user-prefix . uri-string) that assigns
user-prefixes to certain namespaces identified by particular
uri-strings. It may be an empty list. ssax:xml->sxml returns an SXML
tree. The port points out to the first character after the root
element.
generic-write is a procedure that transforms a Scheme data value
(or Scheme program expression) into its textual representation and
prints it. The interface to the procedure is sufficiently general to
easily implement other useful formatting procedures such as pretty
printing, output to a string and truncated output.
#f to stop the transformation.
The value returned by generic-write is undefined.
Examples:
(write obj) == (generic-write obj #f #f display-string) (display obj) == (generic-write obj #t #f display-string)
where
display-string == (lambda (s) (for-each write-char (string->list s)) #t)
pretty-prints obj on port. If port is not
specified, current-output-port is used.
Example:
(pretty-print '((1 2 3 4 5) (6 7 8 9 10) (11 12 13 14 15)
(16 17 18 19 20) (21 22 23 24 25)))
-| ((1 2 3 4 5)
-| (6 7 8 9 10)
-| (11 12 13 14 15)
-| (16 17 18 19 20)
-| (21 22 23 24 25))
Returns the string of obj pretty-printed in width
columns. If width is not specified, (output-port-width) is
used.
Example:
(pretty-print->string '((1 2 3 4 5) (6 7 8 9 10) (11 12 13 14 15)
(16 17 18 19 20) (21 22 23 24 25)))
=>
"((1 2 3 4 5)
(6 7 8 9 10)
(11 12 13 14 15)
(16 17 18 19 20)
(21 22 23 24 25))
"
(pretty-print->string '((1 2 3 4 5) (6 7 8 9 10) (11 12 13 14 15)
(16 17 18 19 20) (21 22 23 24 25))
16)
=>
"((1 2 3 4 5)
(6 7 8 9 10)
(11
12
13
14
15)
(16
17
18
19
20)
(21
22
23
24
25))
"
(current-output-port).
outfile is a port or a string. If no outfile is specified
then current-output-port is assumed. These expanded expressions
are then pretty-printed to this port.
Whitepsace and comments (introduced by ;) which are not part of
scheme expressions are reproduced in the output. This procedure does
not affect the values returned by current-input-port,
current-error-port, and current-output-port.
pprint-filter-file can be used to pre-compile macro-expansion and
thus can reduce loading time. The following will write into
`exp-code.scm' the result of expanding all defmacros in
`code.scm'.
(require 'pprint-file) (require 'defmacroexpand) (defmacro:load "my-macros.scm") (pprint-filter-file "code.scm" defmacro:expand* "exp-code.scm")
If (provided? 'current-time):
The procedures current-time, difftime, and
offset-time deal with a calendar time datatype
which may or may not be disjoint from other Scheme datatypes.
get-universal-time in section Common-Lisp Time.
(+ caltime offset).
(require 'time-zone)
POSIX standards specify several formats for encoding time-zone rules.
-4:30.
The non-tzfile formats can optionally be followed by transition times specifying the day and time when a zone changes from standard to daylight-savings and back again.
time-zone cannot interpret TZ-string,
#f is returned.
tz:params returns a list of
three items:
tz:params is unaffected by the default timezone; inquiries can be
made of any timezone at any calendar time.
tz:std-offset returns the
number of seconds west of the Prime Meridian timezone tz is.
The rest of these procedures and variables are provided for POSIX compatability. Because of shared state they are not thread-safe.
tzset also sets the variables *timezone*, daylight?,
and tzname. This function is automatically called by the time
conversion procedures which depend on the time zone
(see section Time and Date).
*timezone* is initialized by tzset.
#t if the default timezone has rules for Daylight Savings
Time. Note: daylight? does not tell you when Daylight
Savings Time is in effect, just that the default zone sometimes has
Daylight Savings Time.
(require 'posix-time)
decode-universal-time.
decode-universal-time.
localtime sets the
variable *timezone* with the difference between Coordinated
Universal Time (UTC) and local standard time in seconds
(see section Time Zone).
"Wed Jun 30 21:49:08 1993".
(asctime (gmtime caltime)),
(asctime (localtime caltime)), and
(asctime (localtime caltime tz)), respectively.
(decode-universal-time (get-universal-time)).
current-time.
gmtime and localtime.
gmtime and localtime.
Notice that the values returned by decode-universal-time do not
match the arguments to encode-universal-time.
Notice that the values returned by decode-universal-time do not
match the arguments to encode-universal-time.
(require 'time-core)
(require 'tzfile)
Reads the NCBI-format DNA sequence following the word `ORIGIN' from port.
Reads the NCBI-format DNA sequence following the word `ORIGIN' from file.
Replaces `T' with `U' in str
Returns a list of three-letter symbol codons comprising the protein sequence encoded by cdna starting with its first occurence of `atg'.
Returns a list of three-letter symbols for the protein sequence encoded by cdna starting with its first occurence of `atg'.
Returns a string of one-letter amino acid codes for the protein sequence encoded by cdna starting with its first occurence of `atg'.
These cDNA count routines provide a means to check the nucleotide sequence with the `BASE COUNT' line preceding the sequence from NCBI.
Returns a list of counts of `a', `c', `g', and `t' occurrencing in cdna.
Prints the counts of `a', `c', `g', and `t' occurrencing in cdna.
Schmooz is a simple, lightweight markup language for interspersing Texinfo documentation with Scheme source code. Schmooz does not create the top level Texinfo file; it creates `txi' files which can be imported into the documentation using the Texinfo command `@include'.
(require 'schmooz) defines the function schmooz, which is
used to process files. Files containing schmooz documentation should
not contain (require 'schmooz).
schmooz extracts
top-level comments containing schmooz commands from filename.scm
and writes the converted Texinfo source to a file named
filename.txi.
schmooz calls itself with
the argument `filename.scm'.
Schmooz comments are distinguished (from non-schmooz comments) by their first line, which must start with an at-sign (@) preceded by one or more semicolons (;). A schmooz comment ends at the first subsequent line which does not start with a semicolon. Currently schmooz comments are recognized only at top level.
Schmooz comments are copied to the Texinfo output file with the leading contiguous semicolons removed. Certain character sequences starting with at-sign are treated specially. Others are copied unchanged.
A schmooz comment starting with `@body' must be followed by a Scheme definition. All comments between the `@body' line and the definition will be included in a Texinfo definition, either a `@defun' or a `@defvar', depending on whether a procedure or a variable is being defined.
Within the text of that schmooz comment, at-sign
followed by `0' will be replaced by @code{procedure-name}
if the following definition is of a procedure; or
@var{variable} if defining a variable.
An at-sign followed by a non-zero digit will expand to the variable citation of that numbered argument: `@var{argument-name}'.
If more than one definition follows a `@body' comment line without an intervening blank or comment line, then those definitions will be included in the same Texinfo definition using `@defvarx' or `@defunx', depending on whether the first definition is of a variable or of a procedure.
Schmooz can figure out whether a definition is of a procedure if it is of the form:
`(define (<identifier> <arg> ...) <expression>)'
or if the left hand side of the definition is some form ending in a lambda expression. Obviously, it can be fooled. In order to force recognition of a procedure definition, start the documentation with `@args' instead of `@body'. `@args' should be followed by the argument list of the function being defined, which may be enclosed in parentheses and delimited by whitespace, (as in Scheme), enclosed in braces and separated by commas, (as in Texinfo), or consist of the remainder of the line, separated by whitespace.
For example:
;;@args arg1 args ... ;;@0 takes argument @1 and any number of @2 (define myfun (some-function-returning-magic))
Will result in:
@defun myfun arg1 args @dots{}
@code{myfun} takes argument @var{arg1} and any number of @var{args}
@end defun
`@args' may also be useful for indicating optional arguments by name. If `@args' occurs inside a schmooz comment section, rather than at the beginning, then it will generate a `@defunx' line with the arguments supplied.
If the first at-sign in a schmooz comment is immediately followed by whitespace, then the comment will be expanded to whatever follows that whitespace. If the at-sign is followed by a non-whitespace character then the at-sign will be included as the first character of the expansion. This feature is intended to make it easy to include Texinfo directives in schmooz comments.
Words changed since last version are marked in red by HITCH.