Next: , Previous: , Up: The Implementation   [Contents][Index]

6.1 Data Types

In the descriptions below it is assumed that long ints are 32 bits in length. Acutally, SCM is written to work with any long int size larger than 31 bits. With some modification, SCM could work with word sizes as small as 24 bits.

All SCM objects are represented by type SCM. Type SCM come in 2 basic flavors, Immediates and Cells:


Next: , Previous: , Up: Data Types   [Contents][Index]

6.1.1 Immediates

An immediate is a data type contained in type SCM (long int). The type codes distinguishing immediate types from each other vary in length, but reside in the low order bits.

Macro: IMP x
Macro: NIMP x

Return non-zero if the SCM object x is an immediate or non-immediate type, respectively.

Immediate: inum

immediate 30 bit signed integer. An INUM is flagged by a 1 in the second to low order bit position. The high order 30 bits are used for the integer’s value.

Macro: INUMP x
Macro: NINUMP x

Return non-zero if the SCM x is an immediate integer or not an immediate integer, respectively.

Macro: INUM x

Returns the C long integer corresponding to SCM x.

Macro: MAKINUM x

Returns the SCM inum corresponding to C long integer x.

Immediate Constant: INUM0

is equivalent to MAKINUM(0).

Computations on INUMs are performed by converting the arguments to C integers (by a shift), operating on the integers, and converting the result to an inum. The result is checked for overflow by converting back to integer and checking the reverse operation.

The shifts used for conversion need to be signed shifts. If the C implementation does not support signed right shift this fact is detected in a #if statement in scmfig.h and a signed right shift, SRS, is constructed in terms of unsigned right shift.

Immediate: ichr

characters.

Macro: ICHRP x

Return non-zero if the SCM object x is a character.

Macro: ICHR x

Returns corresponding unsigned char.

Macro: MAKICHR x

Given char x, returns SCM character.

Immediate: iflags

These are frequently used immediate constants.

Immediate Constant: SCM BOOL_T

#t

Immediate Constant: SCM BOOL_F

#f

Immediate Constant: SCM EOL

(). If SICP is #defined, EOL is #defined to be identical with BOOL_F. In this case, both print as #f.

Immediate Constant: SCM EOF_VAL

end of file token, #<eof>.

Immediate Constant: SCM UNDEFINED

#<undefined> used for variables which have not been defined and absent optional arguments.

Immediate Constant: SCM UNSPECIFIED

#<unspecified> is returned for those procedures whose return values are not specified.

Macro: IFLAGP n

Returns non-zero if n is an ispcsym, isym or iflag.

Macro: ISYMP n

Returns non-zero if n is an ispcsym or isym.

Macro: ISYMNUM n

Given ispcsym, isym, or iflag n, returns its index in the C array isymnames[].

Macro: ISYMCHARS n

Given ispcsym, isym, or iflag n, returns its char * representation (from isymnames[]).

Macro: MAKSPCSYM n

Returns SCM ispcsym n.

Macro: MAKISYM n

Returns SCM iisym n.

Macro: MAKIFLAG n

Returns SCM iflag n.

Variable: isymnames

An array of strings containing the external representations of all the ispcsym, isym, and iflag immediates. Defined in repl.c.

Constant: NUM_ISPCSYM
Constant: NUM_ISYMS

The number of ispcsyms and ispcsyms+isyms, respectively. Defined in scm.h.

Immediate: isym

and, begin, case, cond, define, do, if, lambda, let, let*, letrec, or, quote, set!, #f, #t, #<undefined>, #<eof>, (), and #<unspecified>.

CAR Immediate: ispcsym

special symbols: syntax-checked versions of first 14 isyms

CAR Immediate: iloc

indexes to a variable’s location in environment

CAR Immediate: gloc

pointer to a symbol’s value cell

Immediate: CELLPTR

pointer to a cell (not really an immediate type, but here for completeness). Since cells are always 8 byte aligned, a pointer to a cell has the low order 3 bits 0.

There is one exception to this rule, CAR Immediates, described next.

A CAR Immediate is an Immediate point which can only occur in the CARs of evaluated code (as a result of ceval’s memoization process).


Next: , Previous: , Up: Data Types   [Contents][Index]

6.1.2 Cells

Cells represent all SCM objects other than immediates. A cell has a CAR and a CDR. Low-order bits in CAR identify the type of object. The rest of CAR and CDR hold object data. The number after tc specifies how many bits are in the type code. For instance, tc7 indicates that the type code is 7 bits.

Macro: NEWCELL x

Allocates a new cell and stores a pointer to it in SCM local variable x.

Care needs to be taken that stores into the new cell pointed to by x do not create an inconsistent object. See Signals.

All of the C macros decribed in this section assume that their argument is of type SCM and points to a cell (CELLPTR).

Macro: CAR x
Macro: CDR x

Returns the car and cdr of cell x, respectively.

Macro: TYP3 x
Macro: TYP7 x
Macro: TYP16 x

Returns the 3, 7, and 16 bit type code of a cell.

Cell: tc3_cons

scheme cons-cell returned by (cons arg1 arg2).

Macro: CONSP x
Macro: NCONSP x

Returns non-zero if x is a tc3_cons or isn’t, respectively.

Cell: tc3_closure

applicable object returned by (lambda (args) …). tc3_closures have a pointer to the body of the procedure in the CAR and a pointer to the environment in the CDR. Bits 1 and 2 (zero-based) in the CDR indicate a lower bound on the number of required arguments to the closure, which is used to avoid allocating rest argument lists in the environment cache. This encoding precludes an immediate value for the CDR: In the case of an empty environment all bits above 2 in the CDR are zero.

Macro: CLOSUREP x

Returns non-zero if x is a tc3_closure.

Macro: CODE x
Macro: ENV x

Returns the code body or environment of closure x, respectively.

Macro: ARGC x

Returns the a lower bound on the number of required arguments to closure x, it cannot exceed 3.


Next: , Previous: , Up: Data Types   [Contents][Index]

6.1.3 Header Cells

Headers are Cells whose CDRs point elsewhere in memory, such as to memory allocated by malloc.

Header: spare

spare tc7 type code

Header: tc7_vector

scheme vector.

Macro: VECTORP x
Macro: NVECTORP x

Returns non-zero if x is a tc7_vector or if not, respectively.

Macro: VELTS x
Macro: LENGTH x

Returns the C array of SCMs holding the elements of vector x or its length, respectively.

Header: tc7_ssymbol

static scheme symbol (part of initial system)

Header: tc7_msymbol

malloced scheme symbol (can be GCed)

Macro: SYMBOLP x

Returns non-zero if x is a tc7_ssymbol or tc7_msymbol.

Macro: CHARS x
Macro: UCHARS x
Macro: LENGTH x

Returns the C array of chars or as unsigned chars holding the elements of symbol x or its length, respectively.

Header: tc7_string

scheme string

Macro: STRINGP x
Macro: NSTRINGP x

Returns non-zero if x is a tc7_string or isn’t, respectively.

Macro: CHARS x
Macro: UCHARS x
Macro: LENGTH x

Returns the C array of chars or as unsigned chars holding the elements of string x or its length, respectively.

Header: tc7_Vbool

uniform vector of booleans (bit-vector)

Header: tc7_VfixZ32

uniform vector of integers

Header: tc7_VfixN32

uniform vector of non-negative integers

Header: tc7_VfixN16

uniform vector of non-negative short integers

Header: tc7_VfixZ16

uniform vector of short integers

Header: tc7_VfixN8

uniform vector of non-negative bytes

Header: tc7_VfixZ8

uniform vector of signed bytes

Header: tc7_VfloR32

uniform vector of short inexact real numbers

Header: tc7_VfloR64

uniform vector of double precision inexact real numbers

Header: tc7_VfloC64

uniform vector of double precision inexact complex numbers

Header: tc7_contin

applicable object produced by call-with-current-continuation

Header: tc7_specfun

subr that is treated specially within the evaluator

apply and call-with-current-continuation are denoted by these objects. Their behavior as functions is built into the evaluator; they are not directly associated with C functions. This is necessary in order to make them properly tail recursive.

tc16_cclo is a subtype of tc7_specfun, a cclo is similar to a vector (and is GCed like one), but can be applied as a function:

  1. the cclo itself is consed onto the head of the argument list
  2. the first element of the cclo is applied to that list. Cclo invocation is currently not tail recursive when given 2 or more arguments.
Function: makcclo proc len

makes a closure from the subr proc with len-1 extra locations for SCM data. Elements of a cclo are referenced using VELTS(cclo)[n] just as for vectors.

Macro: CCLO_LENGTH cclo

Expands to the length of cclo.


Next: , Previous: , Up: Data Types   [Contents][Index]

6.1.4 Subr Cells

A Subr is a header whose CDR points to a C code procedure. Scheme primitive procedures are subrs. Except for the arithmetic tc7_cxrs, the C code procedures will be passed arguments (and return results) of type SCM.

Subr: tc7_asubr

associative C function of 2 arguments. Examples are +, -, *, /, max, and min.

Subr: tc7_subr_0

C function of no arguments.

Subr: tc7_subr_1

C function of one argument.

Subr: tc7_cxr

These subrs are handled specially. If inexact numbers are enabled, the CDR should be a function which takes and returns type double. Conversions are handled in the interpreter.

floor, ceiling, truncate, round, real-sqrt, real-exp, real-ln, real-sin, real-cos, real-tan, real-asin, real-acos, real-atan, real-sinh, real-cosh, real-tanh, real-asinh, real-acosh, real-atanh, and exact->inexact are defined this way.

If the CDR is 0 (NULL), the name string of the procedure is used to control traversal of its list structure argument.

car, cdr, caar, cadr, cdar, cddr, caaar, caadr, cadar, caddr, cdaar, cdadr, cddar, cdddr, caaaar, caaadr, caadar, caaddr, cadaar, cadadr, caddar, cadddr, cdaaar, cdaadr, cdadar, cdaddr, cddaar, cddadr, cdddar, and cddddr are defined this way.

Subr: tc7_subr_3

C function of 3 arguments.

Subr: tc7_subr_2

C function of 2 arguments.

Subr: tc7_rpsubr

transitive relational predicate C function of 2 arguments. The C function should return either BOOL_T or BOOL_F.

Subr: tc7_subr_1o

C function of one optional argument. If the optional argument is not present, UNDEFINED is passed in its place.

Subr: tc7_subr_2o

C function of 1 required and 1 optional argument. If the optional argument is not present, UNDEFINED is passed in its place.

Subr: tc7_lsubr_2

C function of 2 arguments and a list of (rest of) SCM arguments.

Subr: tc7_lsubr

C function of list of SCM arguments.


Next: , Previous: , Up: Data Types   [Contents][Index]

6.1.5 Defining Subrs

If CCLO is #defined when compiling, the compiled closure feature will be enabled. It is automatically enabled if dynamic linking is enabled.

The SCM interpreter directly recognizes subrs taking small numbers of arguments. In order to create subrs taking larger numbers of arguments use:

Function: make_gsubr name req opt rest fcn

returns a cclo (compiled closure) object of name char * name which takes int req required arguments, int opt optional arguments, and a list of rest arguments if int rest is 1 (0 for not).

SCM (*fcn)() is a pointer to a C function to do the work.

The C function will always be called with req + opt + rest arguments, optional arguments not supplied will be passed UNDEFINED. An error will be signaled if the subr is called with too many or too few arguments. Currently a total of 10 arguments may be specified, but increasing this limit should not be difficult.

/* A silly example, taking 2 required args,
   1 optional, and a list of rest args */

#include <scm.h>

SCM gsubr_21l(req1,req2,opt,rst)
     SCM req1,req2,opt,rst;
{
  lputs("gsubr-2-1-l:\n req1: ", cur_outp);
  display(req1,cur_outp);
  lputs("\n req2: ", cur_outp);
  display(req2,cur_outp);
  lputs("\n opt: ", cur_outp);
  display(opt,cur_outp);
  lputs("\n rest: ", cur_outp);
  display(rst,cur_outp);
  newline(cur_outp);
  return UNSPECIFIED;
}

void init_gsubr211()
{
  make_gsubr("gsubr-2-1-l", 2, 1, 1, gsubr_21l);
}

Next: , Previous: , Up: Data Types   [Contents][Index]

6.1.6 Ptob Cells

A ptob is a port object, capable of delivering or accepting characters. See Ports in Revised(5) Report on the Algorithmic Language Scheme. Unlike the types described so far, new varieties of ptobs can be defined dynamically (see Defining Ptobs). These are the initial ptobs:

ptob: tc16_inport

input port.

ptob: tc16_outport

output port.

ptob: tc16_ioport

input-output port.

ptob: tc16_inpipe

input pipe created by popen().

ptob: tc16_outpipe

output pipe created by popen().

ptob: tc16_strport

String port created by cwos() or cwis().

ptob: tc16_sfport

Software (virtual) port created by mksfpt() (see Soft Ports).

Macro: PORTP x
Macro: OPPORTP x
Macro: OPINPORTP x
Macro: OPOUTPORTP x
Macro: INPORTP x
Macro: OUTPORTP x

Returns non-zero if x is a port, open port, open input-port, open output-port, input-port, or output-port, respectively.

Macro: OPENP x
Macro: CLOSEDP x

Returns non-zero if port x is open or closed, respectively.

Macro: STREAM x

Returns the FILE * stream for port x.

Ports which are particularly well behaved are called fports. Advanced operations like file-position and reopen-file only work for fports.

Macro: FPORTP x
Macro: OPFPORTP x
Macro: OPINFPORTP x
Macro: OPOUTFPORTP x

Returns non-zero if x is a port, open port, open input-port, or open output-port, respectively.


Next: , Previous: , Up: Data Types   [Contents][Index]

6.1.7 Defining Ptobs

ptobs are similar to smobs but define new types of port to which SCM procedures can read or write. The following functions are defined in the ptobfuns:

typedef struct {
  SCM   (*mark)P((SCM ptr));
  int   (*free)P((FILE *p));
  int   (*print)P((SCM exp, SCM port, int writing));
  SCM   (*equalp)P((SCM, SCM));
  int   (*fputc)P((int c, FILE *p));
  int   (*fputs)P((char *s, FILE *p));
  sizet (*fwrite)P((char *s, sizet siz, sizet num, FILE *p));
  int   (*fflush)P((FILE *stream));
  int   (*fgetc)P((FILE *p));
  int   (*fclose)P((FILE *p));
} ptobfuns;

The .free component to the structure takes a FILE * or other C construct as its argument, unlike .free in a smob, which takes the whole smob cell. Often, .free and .fclose can be the same function. See fptob and pipob in sys.c for examples of how to define ptobs. Ptobs that must allocate blocks of memory should use, for example, must_malloc rather than malloc See Allocating memory.


Next: , Previous: , Up: Data Types   [Contents][Index]

6.1.8 Smob Cells

A smob is a miscellaneous datatype. The type code and GCMARK bit occupy the lower order 16 bits of the CAR half of the cell. The rest of the CAR can be used for sub-type or other information. The CDR contains data of size long and is often a pointer to allocated memory.

Like ptobs, new varieties of smobs can be defined dynamically (see Defining Smobs). These are the initial smobs:

smob: tc_free_cell

unused cell on the freelist.

smob: tc16_flo

single-precision float.

Inexact number data types are subtypes of type tc16_flo. If the sub-type is:

  1. a single precision float is contained in the CDR.
  2. CDR is a pointer to a malloced double.
  1. CDR is a pointer to a malloced pair of doubles.
smob: tc_dblr

double-precision float.

smob: tc_dblc

double-precision complex.

smob: tc16_bigpos
smob: tc16_bigneg

positive and negative bignums, respectively.

Scm has large precision integers called bignums. They are stored in sign-magnitude form with the sign occuring in the type code of the SMOBs bigpos and bigneg. The magnitude is stored as a malloced array of type BIGDIG which must be an unsigned integral type with size smaller than long. BIGRAD is the radix associated with BIGDIG.

NUMDIGS_MAX (defined in scmfig.h) limits the number of digits of a bignum to 1000. These digits are base BIGRAD, which is typically 65536, giving 4816 decimal digits.

Why only 4800 digits? The simple multiplication algorithm SCM uses is O(n^2); this means the number of processor instructions required to perform a multiplication is some multiple of the product of the number of digits of the two multiplicands.

digits * digits  ==> operations
 5                    x
 50                   100 * x
 500                  10000 * x
 5000                 1000000 * x

To calculate numbers larger than this, FFT multiplication [O(n*log(n))] and other specialized algorithms are required. You should obtain a package which specializes in number-theoretical calculations:

ftp://megrez.math.u-bordeaux.fr/pub/pari/
smob: tc16_promise

made by DELAY. See Control features in Revised(5) Scheme.

smob: tc16_arbiter

synchronization object. See Process Synchronization.

smob: tc16_macro

macro expanding function. See Macro Primitives.

smob: tc16_array

multi-dimensional array. See Arrays.

This type implements both conventional arrays (those with arbitrary data as elements see Conventional Arrays) and uniform arrays (those with elements of a uniform type see Uniform Array).

Conventional Arrays have a pointer to a vector for their CDR. Uniform Arrays have a pointer to a Uniform Vector type (string, Vbool, VfixZ32, VfixN32, VfloR32, VfloR64, or VfloC64) in their CDR.


6.1.9 Defining Smobs

Here is an example of how to add a new type named foo to SCM. The following lines need to be added to your code:

long tc16_foo;

The type code which will be used to identify the new type.

static smobfuns foosmob = {markfoo,freefoo,printfoo,equalpfoo};

smobfuns is a structure composed of 4 functions:

typedef struct {
  SCM   (*mark)P((SCM));
  sizet (*free)P((CELLPTR));
  int   (*print)P((SCM exp, SCM port, int writing));
  SCM   (*equalp)P((SCM, SCM));
} smobfuns;
smob.mark

is a function of one argument of type SCM (the cell to mark) and returns type SCM which will then be marked. If no further objects need to be marked then return an immediate object such as BOOL_F. The smob cell itself will already have been marked. Note This is different from SCM versions prior to 5c5. Only additional data specific to a smob type need be marked by smob.mark.

2 functions are provided:

markcdr(ptr)

returns CDR(ptr).

mark0(ptr)

is a no-op used for smobs containing no additional SCM data. 0 may also be used in this case.

smob.free

is a function of one argument of type CELLPTR (the cell to collected) and returns type sizet which is the number of malloced bytes which were freed. Smob.free should free any malloced storage associated with this object. The function free0(ptr) is provided which does not free any storage and returns 0.

smob.print

is 0 or a function of 3 arguments. The first, of type SCM, is the smob object. The second, of type SCM, is the stream on which to write the result. The third, of type int, is 1 if the object should be writen, 0 if it should be displayed, and 2 if it should be writen for an error report. This function should return non-zero if it printed, and zero otherwise (in which case a hexadecimal number will be printed).

smob.equalp

is 0 or a function of 2 SCM arguments. Both of these arguments will be of type tc16foo. This function should return BOOL_T if the smobs are equal, BOOL_F if they are not. If smob.equalp is 0, equal? will return BOOL_F if they are not eq?.

tc16_foo = newsmob(&foosmob);

Allocates the new type with the functions from foosmob. This line goes in an init_ routine.

Promises and macros in eval.c and arbiters in repl.c provide examples of SMOBs. There are a maximum of 256 SMOBs. Smobs that must allocate blocks of memory should use, for example, must_malloc rather than malloc See Allocating memory.


Previous: , Up: Data Types   [Contents][Index]

6.1.10 Data Type Representations

IMMEDIATE:      B,D,E,F=data bit, C=flag code, P=pointer address bit
        ................................
inum    BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB10
ichr    BBBBBBBBBBBBBBBBBBBBBBBB11110100
iflag                   CCCCCCC101110100
isym                    CCCCCCC001110100
        IMCAR:  only in car of evaluated code, cdr has cell’s GC bit
ispcsym                 000CCCC00CCCC100
iloc    0DDDDDDDDDDDEFFFFFFFFFFF11111100
pointer PPPPPPPPPPPPPPPPPPPPPPPPPPPPP000
gloc    PPPPPPPPPPPPPPPPPPPPPPPPPPPPP001

   HEAP CELL:   G=gc_mark; 1 during mark, 0 other times.
        1s and 0s here indicate type.     G missing means sys (not GC’d)
        SIMPLE
cons    ..........SCM car..............0  ...........SCM cdr.............G
closure ..........SCM code...........011  ...........SCM env...........CCG
        HEADERs:
ssymbol .........long length....G0000101  ..........char *chars...........
msymbol .........long length....G0000111  ..........char *chars...........
string  .........long length....G0001101  ..........char *chars...........
vector  .........long length....G0001111  ...........SCM **elts...........
VfixN8  .........long length....G0010101  ......unsigned char *words......
VfixZ8  .........long length....G0010111  ..........char *words...........
VfixN16 .........long length....G0011101  ......unsigned short *words.....
VfixZ16 .........long length....G0011111  ........ short *words...........
VfixN32 .........long length....G0100101  ......unsigned medium *words....
VfixZ32 .........long length....G0100111  ........medium *words...........
VfixN64 .........long length....G0101101  ......unsigned long *words......
VfixZ64 .........long length....G0101111  ..........long *words...........
VfloR32 .........long length....G0110101  .........float *words...........
VfloC32 .........long length....G0110111  .........float *words...........
VfloR64 .........long length....G0111101  ........double *words...........
VfloC64 .........long length....G0111111  ........double *words...........

Vbool   .........long length....G1000101  ..........long *words...........
contin  .........long length....G1001101  .............*regs..............
specfun ................xxxxxxxxG1001111  ...........SCM name.............
cclo    ..short length..xxxxxx10G1001111  ...........SCM **elts...........
                        PTOBs
   port int portnum.CwroxxxxxxxxG1000111  ..........FILE *stream..........
 socket int portnum.C001xxxxxxxxG1000111  ..........FILE *stream..........
 inport int portnum.C011xxxxxxxxG1000111  ..........FILE *stream..........
outport int portnum.0101xxxxxxxxG1000111  ..........FILE *stream..........
 ioport int portnum.C111xxxxxxxxG1000111  ..........FILE *stream..........
fport   int portnum.C   00000000G1000111  ..........FILE *stream..........
pipe    int portnum.C   00000001G1000111  ..........FILE *stream..........
strport 00000000000.0   00000010G1000111  ..........FILE *stream..........
sfport  int portnum.C   00000011G1000111  ..........FILE *stream..........
        SUBRs
subr_0  ..........int hpoff.....01010101  ...........SCM (*f)()...........
subr_1  ..........int hpoff.....01010111  ...........SCM (*f)()...........
cxr     ..........int hpoff.....01011101  .........double (*f)()..........
subr_3  ..........int hpoff.....01011111  ...........SCM (*f)()...........
subr_2  ..........int hpoff.....01100101  ...........SCM (*f)()...........
asubr   ..........int hpoff.....01100111  ...........SCM (*f)()...........
subr_1o ..........int hpoff.....01101101  ...........SCM (*f)()...........
subr_2o ..........int hpoff.....01101111  ...........SCM (*f)()...........
lsubr_2 ..........int hpoff.....01110101  ...........SCM (*f)()...........
lsubr   ..........int hpoff.....01110111  ...........SCM (*f)()...........
rpsubr  ..........int hpoff.....01111101  ...........SCM (*f)()...........
                        SMOBs
free_cell
        000000000000000000000000G1111111  ...........*free_cell........000
flo     000000000000000000000001G1111111  ...........float num............
dblr    000000000000000100000001G1111111  ..........double *real..........
dblc    000000000000001100000001G1111111  .........complex *cmpx..........
bignum  ...int length...0000001 G1111111  .........short *digits..........
bigpos  ...int length...00000010G1111111  .........short *digits..........
bigneg  ...int length...00000011G1111111  .........short *digits..........
                        xxxxxxxx = code assigned by newsmob();
promise 000000000000000fxxxxxxxxG1111111  ...........SCM val..............
arbiter 000000000000000lxxxxxxxxG1111111  ...........SCM name.............
macro   000000000000000mxxxxxxxxG1111111  ...........SCM name.............
array   ...short rank..cxxxxxxxxG1111111  ............*array..............

Next: , Previous: , Up: The Implementation   [Contents][Index]