Next: , Previous: , Up: The Implementation   [Contents][Index]

6.2 Operations


6.2.1 Garbage Collection

The garbage collector is in the latter half of sys.c. The primary goal of garbage collection (or GC) is to recycle those cells no longer in use. Immediates always appear as parts of other objects, so they are not subject to explicit garbage collection.

All cells reside in the heap (composed of heap segments). Note that this is different from what Computer Science usually defines as a heap.


6.2.1.1 Marking Cells

The first step in garbage collection is to mark all heap objects in use. Each heap cell has a bit reserved for this purpose. For pairs (cons cells) the lowest order bit (0) of the CDR is used. For other types, bit 8 of the CAR is used. The GC bits are never set except during garbage collection. Special C macros are defined in scm.h to allow easy manipulation when GC bits are possibly set. CAR, TYP3, and TYP7 can be used on GC marked cells as they are.

Macro: GCCDR x

Returns the CDR of a cons cell, even if that cell has been GC marked.

Macro: GCTYP16 x

Returns the 16 bit type code of a cell.

We need to (recursively) mark only a few objects in order to assure that all accessible objects are marked. Those objects are sys_protects[] (for example, dynwinds), the current C-stack and the hash table for symbols, symhash.

Function: void gc_mark (SCM obj)

The function gc_mark() is used for marking SCM cells. If obj is marked, gc_mark() returns. If obj is unmarked, gc_mark sets the mark bit in obj, then calls gc_mark() on any SCM components of obj. The last call to gc_mark() is tail-called (looped).

Function: void mark_locations (STACKITEM x[], sizet len)

The function mark_locations is used for marking segments of C-stack or saved segments of C-stack (marked continuations). The argument len is the size of the stack in units of size (STACKITEM).

Each longword in the stack is tried to see if it is a valid cell pointer into the heap. If it is, the object itself and any objects it points to are marked using gc_mark. If the stack is word rather than longword aligned (#define WORD_ALIGN), both alignments are tried. This arrangement will occasionally mark an object which is no longer used. This has not been a problem in practice and the advantage of using the c-stack far outweighs it.


6.2.1.2 Sweeping the Heap

After all found objects have been marked, the heap is swept.

The storage for strings, vectors, continuations, doubles, complexes, and bignums is managed by malloc. There is only one pointer to each malloc object from its type-header cell in the heap. This allows malloc objects to be freed when the associated heap object is garbage collected.

Function: static void gc_sweep ()

The function gc_sweep scans through all heap segments. The mark bit is cleared from marked cells. Unmarked cells are spliced into freelist, where they can again be returned by invocations of NEWCELL.

If a type-header cell pointing to malloc space is unmarked, the malloc object is freed. If the type header of smob is collected, the smob’s free procedure is called to free its storage.


6.2.2 Memory Management for Environments

The memory management component of SCM contains special features which optimize the allocation and garbage collection of environments.

The optimizations are based on certain facts and assumptions:

The SCM evaluator creates many environments with short lifetimes and these account of a large portion of the total number of objects allocated.

The general purpose allocator allocates objects from a freelist, and collects using a mark/sweep algorithm. Research into garbage collection suggests that such an allocator is sub-optimal for object populations containing a large portion of short-lived members and that allocation strategies involving a copying collector are more appropriate.

It is a property of SCM, reflected throughout the source code, that a simple copying collector can not be used as the general purpose memory manager: much code assumes that the run-time stack can be treated as a garbage collection root set using conservative garbage collection techniques, which are incompatible with objects that change location.

Nevertheless, it is possible to use a mostly-separate copying-collector, just for environments. Roughly speaking, cons pairs making up environments are initially allocated from a small heap that is collected by a precise copying collector. These objects must be handled specially for the collector to work. The (presumably) small number of these objects that survive one collection of the copying heap are copied to the general purpose heap, where they will later be collected by the mark/sweep collector. The remaining pairs are more rapidly collected than they would otherwise be and all of this collection is accomplished without having to mark or sweep any other segment of the heap.

Allocating cons pairs for environments from this special heap is a heuristic that approximates the (unachievable) goal:

allocate all short-lived objects from the copying-heap, at no extra cost in allocation time.

Implementation Details

A separate heap (ecache_v) is maintained for the copying collector. Pairs are allocated from this heap in a stack-like fashion. Objects in this heap may be protected from garbage collection by:

  1. Pushing a reference to the object on a stack specially maintained for that purpose. This stack (scm_estk) is used in place of the C run-time stack by the SCM evaluator to hold local variables which refer to the copying heap.
  2. Saving a reference to every object in the mark/sweep heap which directly references the copying heap in a root set that is specially maintained for that purpose (scm_egc_roots). If no object in the mark/sweep heap directly references an object from the copying heap, that object can be preserved by storing a direct reference to it in the copying-collector root set.
  3. Keeping no other references to these objects, except references between the objects themselves, during copying collection.

When the copying heap or root-set becomes full, the copying collector is invoked. All protected objects are copied to the mark-sweep heap. All references to those objects are updated. The copying collector root-set and heap are emptied.

References to pairs allocated specificly for environments are inaccessible to the Scheme procedures evaluated by SCM. These pairs are manipulated by only a small number of code fragments in the interpreter. To support copying collection, those code fragments (mostly in eval.c) have been modified to protect environments from garbage collection using the three rules listed above.

During a mark-sweep collection, the copying collector heap is marked and swept almost like any ordinary segment of the general purpose heap. The only difference is that pairs from the copying heap that become free during a sweep phase are not added to the freelist.

The environment cache is disabled by adding #define NO_ENV_CACHE to eval.c; all environment cells are then allocated from the regular heap.

Relation to Other Work

This work seems to build upon a considerable amount of previous work into garbage collection techniques about which a considerable amount of literature is available.


6.2.3 Dynamic Linking Support

Dynamic linking has not been ported to all platforms. Operating systems in the BSD family (a.out binary format) can usually be ported to DLD. The dl library (#define SUN_DL for SCM) was a proposed POSIX standard and may be available on other machines with COFF binary format. For notes about porting to MS-Windows and finishing the port to VMS VMS Dynamic Linking.

DLD is a library package of C functions that performs dynamic link editing on GNU/Linux, VAX (Ultrix), Sun 3 (SunOS 3.4 and 4.0), SPARCstation (SunOS 4.0), Sequent Symmetry (Dynix), and Atari ST. It is available from:

These notes about using libdl on SunOS are from gcc.info:

On a Sun, linking using GNU CC fails to find a shared library and reports that the library doesn’t exist at all.

This happens if you are using the GNU linker, because it does only static linking and looks only for unshared libraries. If you have a shared library with no unshared counterpart, the GNU linker won’t find anything.

We hope to make a linker which supports Sun shared libraries, but please don’t ask when it will be finished–we don’t know.

Sun forgot to include a static version of libdl.a with some versions of SunOS (mainly 4.1). This results in undefined symbols when linking static binaries (that is, if you use ‘-static’). If you see undefined symbols ‘_dlclose’, ‘_dlsym’ or ‘_dlopen’ when linking, compile and link against the file mit/util/misc/dlsym.c from the MIT version of X windows.


6.2.4 Configure Module Catalog

The SLIB module catalog can be extended to define other require-able packages by adding calls to the Scheme source file mkimpcat.scm. Within mkimpcat.scm, the following procedures are defined.

feature should be a symbol. object-file should be a string naming a file containing compiled object-code. Each libn argument should be either a string naming a library file or #f.

If object-file exists, the add-link procedure registers symbol feature so that the first time require is called with the symbol feature as its argument, object-file and the lib1 … are dynamically linked into the executing SCM session.

If object-file exists, add-link returns #t, otherwise it returns #f.

For example, to install a compiled dll foo, add these lines to mkimpcat.scm:

        (add-link 'foo
                  (in-vicinity (implementation-vicinity) "foo"
                               link:able-suffix))
Function: add-alias alias feature

alias and feature are symbols. The procedure add-alias registers alias as an alias for feature. An unspecified value is returned.

add-alias causes (require 'alias) to behave like (require 'feature).

Function: add-source feature filename

feature is a symbol. filename is a string naming a file containing Scheme source code. The procedure add-source registers feature so that the first time require is called with the symbol feature as its argument, the file filename will be loaded. An unspecified value is returned.

Remember to delete the file slibcat after modifying the file mkimpcat.scm in order to force SLIB to rebuild its cache.


6.2.5 Automatic C Preprocessor Definitions

These ‘#defines’ are automatically provided by preprocessors of various C compilers. SCM uses the presence or absence of these definitions to configure include file locations and aliases for library functions. If the definition(s) corresponding to your system type is missing as your system is configured, add -Dflag to the compilation command lines or add a #define flag line to scmfig.h or the beginning of scmfig.h.

#define         Platforms:
-------         ----------
ARM_ULIB        Huw Rogers free unix library for acorn archimedes
AZTEC_C         Aztec_C 5.2a
__CYGWIN__      Cygwin
__CYGWIN32__    Cygwin
_DCC            Dice C on AMIGA
__GNUC__        Gnu CC (and DJGPP)
__EMX__         Gnu C port (gcc/emx 0.8e) to OS/2 2.0
__HIGHC__       MetaWare High C
__IBMC__        C-Set++ on OS/2 2.1
_MSC_VER        MS VisualC++ 4.2
MWC             Mark Williams C on COHERENT
__MWERKS__      Metrowerks Compiler; Macintosh and WIN32 (?)
_POSIX_SOURCE   ??
_QC             Microsoft QuickC
__STDC__        ANSI C compliant
__TURBOC__      Turbo C and Borland C
__USE_POSIX     ??
__WATCOMC__     Watcom C on MS-DOS
__ZTC__         Zortech C

_AIX            AIX operating system
__APPLE__       Apple Darwin
AMIGA           SAS/C 5.10 or Dice C on AMIGA
__amigaos__     Gnu CC on AMIGA
atarist         ATARI-ST under Gnu CC
__DragonflyBSD__ DragonflyBSD
__FreeBSD__     FreeBSD
GNUDOS          DJGPP (obsolete in version 1.08)
__GO32__        DJGPP (future?)
hpux            HP-UX
linux           GNU/Linux
macintosh       Macintosh (THINK_C and __MWERKS__ define)
MCH_AMIGA       Aztec_c 5.2a on AMIGA
__MACH__        Apple Darwin
__MINGW32__     MinGW - Minimalist GNU for Windows
MSDOS           Microsoft C 5.10 and 6.00A
_MSDOS          Microsoft CLARM and CLTHUMB compilers.
__MSDOS__       Turbo C, Borland C, and DJGPP
__NetBSD__      NetBSD
nosve           Control Data NOS/VE
__OpenBSD__     OpenBSD
SVR2            System V Revision 2.
sun             SunOS
__SVR4          SunOS
THINK_C         developement environment for the Macintosh
ultrix          VAX with ULTRIX operating system.
unix            most Unix and similar systems and DJGPP (!?)
__unix__        Gnu CC and DJGPP
_UNICOS         Cray operating system
vaxc            VAX C compiler
VAXC            VAX C compiler
vax11c          VAX C compiler
VAX11           VAX C compiler
_Windows        Borland C 3.1 compiling for Windows
_WIN32          MS VisualC++ 4.2 and Cygwin (Win32 API)
_WIN32_WCE      MS Windows CE
vms             (and VMS) VAX-11 C under VMS.

__alpha         DEC Alpha processor
__alpha__       DEC Alpha processor
__hppa__        HP RISC processor
hp9000s800      HP RISC processor
__ia64          GCC on IA64
__ia64__        GCC on IA64
_LONGLONG       GCC on IA64
__i386__        DJGPP
i386            DJGPP
_M_ARM          Microsoft CLARM compiler defines as 4 for ARM.
_M_ARMT         Microsoft CLTHUMB compiler defines as 4 for Thumb.
MULTIMAX        Encore computer
ppc             PowerPC
__ppc__         PowerPC
pyr             Pyramid 9810 processor
__sgi__         Silicon Graphics Inc.
sparc           SPARC processor
sequent         Sequent computer
tahoe           CCI Tahoe processor
vax             VAX processor
__x86_64        AMD Opteron

6.2.6 Signals

Function: init_signals

(in scm.c) initializes handlers for SIGINT and SIGALRM if they are supported by the C implementation. All of the signal handlers immediately reestablish themselves by a call to signal().

Function: int_signal sig
Function: alrm_signal sig

The low level handlers for SIGINT and SIGALRM.

If an interrupt handler is defined when the interrupt is received, the code is interpreted. If the code returns, execution resumes from where the interrupt happened. Call-with-current-continuation allows the stack to be saved and restored.

SCM does not use any signal masking system calls. These are not a portable feature. However, code can run uninterrupted by use of the C macros DEFER_INTS and ALLOW_INTS.

Macro: DEFER_INTS

sets the global variable ints_disabled to 1. If an interrupt occurs during a time when ints_disabled is 1, then deferred_proc is set to non-zero, one of the global variables SIGINT_deferred or SIGALRM_deferred is set to 1, and the handler returns.

Macro: ALLOW_INTS

Checks the deferred variables and if set the appropriate handler is called.

Calls to DEFER_INTS can not be nested. An ALLOW_INTS must happen before another DEFER_INTS can be done. In order to check that this constraint is satisfied #define CAREFUL_INTS in scmfig.h.


Next: , Previous: , Up: Operations   [Contents][Index]

6.2.7 C Macros

Macro: ASRTER cond arg pos subr

signals an error if the expression (cond) is 0. arg is the offending object, subr is the string naming the subr, and pos indicates the position or type of error. pos can be one of

  • ARGn (> 5 or unknown ARG number)
  • ARG1
  • ARG2
  • ARG3
  • ARG4
  • ARG5
  • WNA (wrong number of args)
  • OVFLOW
  • OUTOFRANGE
  • NALLOC
  • EXIT
  • HUP_SIGNAL
  • INT_SIGNAL
  • FPE_SIGNAL
  • BUS_SIGNAL
  • SEGV_SIGNAL
  • ALRM_SIGNAL
  • a C string (char *)

Error checking is not done by ASRTER if the flag RECKLESS is defined. An error condition can still be signaled in this case with a call to wta(arg, pos, subr).

Macro: ASRTGO cond label

goto label if the expression (cond) is 0. Like ASRTER, ASRTGO does is not active if the flag RECKLESS is defined.


Next: , Previous: , Up: Operations   [Contents][Index]

6.2.8 Changing Scm

When writing C-code for SCM, a precaution is recommended. If your routine allocates a non-cons cell which will not be incorporated into a SCM object which is returned, you need to make sure that a SCM variable in your routine points to that cell as long as part of it might be referenced by your code.

In order to make sure this SCM variable does not get optimized out you can put this assignment after its last possible use:

SCM_dummy1 = foo;

or put this assignment somewhere in your routine:

SCM_dummy1 = (SCM) &foo;

SCM_dummy variables are not currently defined. Passing the address of the local SCM variable to any procedure also protects it. The procedure scm_protect_temp is provided for this purpose.

Function: void scm_protect_temp (SCM *ptr)

Forces the SCM object ptr to be saved on the C-stack, where it will be traced for GC.

Also, if you maintain a static pointer to some (non-immediate) SCM object, you must either make your pointer be the value cell of a symbol (see errobj for an example) or (permanently) add your pointer to sys_protects using:

Function: SCM scm_gc_protect (SCM obj)

Permanently adds obj to a table of objects protected from garbage collection. scm_gc_protect returns obj.

To add a C routine to scm:

  1. choose the appropriate subr type from the type list.
  2. write the code and put into scm.c.
  3. add a make_subr or make_gsubr call to init_scm. Or put an entry into the appropriate iproc structure.

To add a package of new procedures to scm (see crs.c for example):

  1. create a new C file (foo.c).
  2. at the front of foo.c put declarations for strings for your procedure names.
    static char s_twiddle_bits[]="twiddle-bits!";
    static char s_bitsp[]="bits?";
    
  3. choose the appropriate subr types from the type list in code.doc.
  4. write the code for the procedures and put into foo.c
  5. create one iproc structure for each subr type used in foo.c
    static iproc subr3s[]= {
            {s_twiddle-bits,twiddle-bits},
            {s_bitsp,bitsp},
            {0,0} };
    
  6. create an init_<name of file> routine at the end of the file which calls init_iprocs with the correct type for each of the iprocs created in step 5.
    void init_foo()
    {
      init_iprocs(subr1s, tc7_subr_1);
      init_iprocs(subr3s, tc7_subr_3);
    }
    

    If your package needs to have a finalization routine called to free up storage, close files, etc, then also have a line in init_foo like:

    add_final(final_foo);
    

    final_foo should be a (void) procedure of no arguments. The finals will be called in opposite order from their definition.

    The line:

    add_feature("foo");
    

    will append a symbol 'foo to the (list) value of slib:features.

  7. put any scheme code which needs to be run as part of your package into Ifoo.scm.
  8. put an if into Init5f4.scm which loads Ifoo.scm if your package is included:
    (if (defined? twiddle-bits!)
        (load (in-vicinity (implementation-vicinity)
                           "Ifoo"
                           (scheme-file-suffix))))
    

    or use (provided? 'foo) instead of (defined? twiddle-bits!) if you have added the feature.

  9. put documentation of the new procedures into foo.doc
  10. add lines to your Makefile to compile and link SCM with your object file. Add a init_foo\(\)\; to the INITS=… line at the beginning of the makefile.

These steps should allow your package to be linked into SCM with a minimum of difficulty. Your package should also work with dynamic linking if your SCM has this capability.

Special forms (new syntax) can be added to scm.

  1. define a new MAKISYM in scm.h and increment NUM_ISYMS.
  2. add a string with the new name in the corresponding place in isymnames in repl.c.
  3. add case clause to ceval() near i_quasiquote (in eval.c).

New syntax can now be added without recompiling SCM by the use of the procedure->syntax, procedure->macro, procedure->memoizing-macro, and defmacro. For details, See Syntax.


Next: , Previous: , Up: Operations   [Contents][Index]

6.2.9 Allocating memory

SCM maintains a count of bytes allocated using malloc, and calls the garbage collector when that number exceeds a dynamically managed limit. In order for this to work properly, malloc and free should not be called directly to manage memory freeable by garbage collection. The following functions are provided for that purpose:

Function: SCM must_malloc_cell (long len, SCM c, char *what)
Function: char * must_malloc (long len, char *what)

len is the number of bytes that should be allocated, what is a string to be used in error or gc messages. must_malloc returns a pointer to newly allocated memory. must_malloc_cell returns a newly allocated cell whose car is c and whose cdr is a pointer to newly allocated memory.

Function: void must_realloc_cell (SCM z, long olen, long len, char *what)
Function: char * must_realloc (char *where, long olen, long len, char *what)

must_realloc_cell takes as argument z a cell whose cdr should be a pointer to a block of memory of length olen allocated with must_malloc_cell and modifies the cdr to point to a block of memory of length len. must_realloc takes as argument where the address of a block of memory of length olen allocated by must_malloc and returns the address of a block of length len.

The contents of the reallocated block will be unchanged up to the minimum of the old and new sizes.

what is a pointer to a string used for error and gc messages.

must_malloc, must_malloc_cell, must_realloc, and must_realloc_cell must be called with interrupts deferred See Signals. must_realloc and must_realloc_cell must not be called during initialization (non-zero errjmp_bad) – the initial allocations must be large enough.

Function: void must_free (char *ptr, sizet len)

must_free is used to free a block of memory allocated by the above functions and pointed to by ptr. len is the length of the block in bytes, but this value is used only for debugging purposes. If it is difficult or expensive to calculate then zero may be used instead.


Next: , Previous: , Up: Operations   [Contents][Index]

6.2.10 Embedding SCM

The file scmmain.c contains the definition of main(). When SCM is compiled as a library scmmain.c is not included in the library; a copy of scmmain.c can be modified to use SCM as an embedded library module.

Function: int main (int argc, char **argv)

This is the top level C routine. The value of the argc argument is the number of command line arguments. The argv argument is a vector of C strings; its elements are the individual command line argument strings. A null pointer always follows the last element: argv[argc] is this null pointer.

Variable: char *execpath

This string is the pathname of the executable file being run. This variable can be examined and set from Scheme (see Internal State). execpath must be set to executable’s path in order to use DUMP (see Dump) or DLD.

Rename main() and arrange your code to call it with an argv which sets up SCM as you want it.

If you need more control than is possible through argv, here are descriptions of the functions which main() calls.

Function: void init_sbrk (void)

Call this before SCM calls malloc(). Value returned from sbrk() is used to gauge how much storage SCM uses.

Function: char * scm_find_execpath (int argc, char **argv, char *script_arg)

argc and argv are as described in main(). script_arg is the pathname of the SCSH-style script (see Scripting) being invoked; 0 otherwise. scm_find_execpath returns the pathname of the executable being run; if scm_find_execpath cannot determine the pathname, then it returns 0.

scm_find_implpath is defined in scmmain.c. Preceeding this are definitions ofGENERIC_NAME and INIT_GETENV. These, along with IMPLINIT and dirsep control scm_find_implpath()’s operation.

If your application has an easier way to locate initialization code for SCM, then you can replace scm_find_implpath.

Function: char * scm_find_implpath (char *execpath)

Returns the full pathname of the Scheme initialization file or 0 if it cannot find it.

The string value of the preprocessor variable INIT_GETENV names an environment variable (default ‘"SCM_INIT_PATH"’). If this environment variable is defined, its value will be returned from scm_find_implpath. Otherwise find_impl_file() is called with the arguments execpath, GENERIC_NAME (default "scm"), INIT_FILE_NAME (default "Init5f4_scm"), and the directory separator string dirsep. If find_impl_file() returns 0 and IMPLINIT is defined, then a copy of the string IMPLINIT is returned.

Function: int init_buf0 (FILE *inport)

Tries to determine whether inport (usually stdin) is an interactive input port which should be used in an unbuffered mode. If so, inport is set to unbuffered and non-zero is returned. Otherwise, 0 is returned.

init_buf0 should be called before any input is read from inport. Its value can be used as the last argument to scm_init_from_argv().

Function: void scm_init_from_argv (int argc, char **argv, char *script_arg, int iverbose, int buf0stdin)

Initializes SCM storage and creates a list of the argument strings program-arguments from argv. argc and argv must already be processed to accomodate Scheme Scripts (if desired). The scheme variable *script* is set to the string script_arg, or #f if script_arg is 0. iverbose is the initial prolixity level. If buf0stdin is non-zero, stdin is treated as an unbuffered port.

Call init_signals and restore_signals only if you want SCM to handle interrupts and signals.

Function: void init_signals (void)

Initializes handlers for SIGINT and SIGALRM if they are supported by the C implementation. All of the signal handlers immediately reestablish themselves by a call to signal().

Function: void restore_signals (void)

Restores the handlers in effect when init_signals was called.

Function: SCM scm_top_level (char *initpath, SCM (*toplvl_fun)())

This is SCM’s top-level. Errors longjmp here. toplvl_fun is a callback function of zero arguments that is called by scm_top_level to do useful work – if zero, then repl, which implements a read-eval-print loop, is called.

If toplvl_fun returns, then scm_top_level will return as well. If the return value of toplvl_fun is an immediate integer then it will be used as the return value of scm_top_level. In the main function supplied with SCM, this return value is the exit status of the process.

If the first character of string initpath is ‘;’, ‘(’ or whitespace, then scm_ldstr() is called with initpath to initialize SCM; otherwise initpath names a file of Scheme code to be loaded to initialize SCM.

When a Scheme error is signaled; control will pass into scm_top_level by longjmp, error messages will be printed to current-error-port, and then toplvl_fun will be called again. toplvl_fun must maintain enough state to prevent errors from being resignalled. If toplvl_fun can not recover from an error situation it may simply return.

Function: void final_scm (int freeall)

Calls all finalization routines registered with add_final(). If freeall is non-zero, then all memory which SCM allocated with malloc() will be freed.

You can call indivdual Scheme procedures from C code in the toplvl_fun argument passed to scm_top_level(), or from module subrs (registered by an init_ function, see Changing Scm).

Use apply to call Scheme procedures from your C code. For example:

/* If this apply fails, SCM will catch the error */
apply(CDR(intern("srv:startup",sizeof("srv:startup")-1)),
      mksproc(srvproc),
      listofnull);

func = CDR(intern(rpcname,strlen(rpcname)));
retval = apply(func, cons(mksproc(srvproc), args), EOL);

Functions for loading Scheme files and evaluating Scheme code given as C strings are described in the next section, (see Callbacks).

Here is a minimal embedding program libtest.c:

/* gcc -o libtest libtest.c libscm.a -ldl -lm -lc */
#include "scm.h"
/* include patchlvl.h for SCM's INIT_FILE_NAME. */
#include "patchlvl.h"

void libtest_init_user_scm()
{
  fputs("This is libtest_init_user_scm\n", stderr); fflush(stderr);
  sysintern("*the-string*", makfrom0str("hello world\n"));
}

SCM user_main()
{
  static int done = 0;
  if (done++) return MAKINUM(EXIT_FAILURE);
  scm_ldstr("(display *the-string*)");
  return MAKINUM(EXIT_SUCCESS);
}

int main(argc, argv)
     int argc;
     const char **argv;
{
  SCM retval;
  char *implpath, *execpath;

  init_user_scm = libtest_init_user_scm;
  execpath = dld_find_executable(argv[0]);
  fprintf(stderr, "dld_find_executable(%s): %s\n", argv[0], execpath);
  implpath = find_impl_file(execpath, "scm", INIT_FILE_NAME, dirsep);
  fprintf(stderr, "implpath: %s\n", implpath);
  scm_init_from_argv(argc, argv, 0L, 0, 0);

  retval = scm_top_level(implpath, user_main);

  final_scm(!0);
  return (int)INUM(retval);
}

-|
dld_find_executable(./libtest): /home/jaffer/scm/libtest
implpath: /home/jaffer/scm/Init5f4.scm
This is libtest_init_user_scm
hello world

6.2.11 Callbacks

SCM now has routines to make calling back to Scheme procedures easier. The source code for these routines are found in rope.c.

Function: int scm_ldfile (char *file)

Loads the Scheme source file file. Returns 0 if successful, non-0 if not. This function is used to load SCM’s initialization file Init5f4.scm.

Function: int scm_ldprog (char *file)

Loads the Scheme source file (in-vicinity (program-vicinity) file). Returns 0 if successful, non-0 if not.

This function is useful for compiled code init_ functions to load non-compiled Scheme (source) files. program-vicinity is the directory from which the calling code was loaded (see Vicinity in SLIB).

Function: SCM scm_evstr (char *str)

Returns the result of reading an expression from str and evaluating it.

Function: void scm_ldstr (char *str)

Reads and evaluates all the expressions from str.

If you wish to catch errors during execution of Scheme code, then you can use a wrapper like this for your Scheme procedures:

(define (srv:protect proc)
  (lambda args
    (define result #f)                  ; put default value here
    (call-with-current-continuation
     (lambda (cont)
       (dynamic-wind (lambda () #t)
                     (lambda ()
                       (set! result (apply proc args))
                       (set! cont #f))
                     (lambda ()
                       (if cont (cont #f))))))
    result))

Calls to procedures so wrapped will return even if an error occurs.


Next: , Previous: , Up: Operations   [Contents][Index]

6.2.12 Type Conversions

These type conversion functions are very useful for connecting SCM and C code. Most are defined in rope.c.

Function: SCM long2num (long n)
Function: SCM ulong2num (unsigned long n)

Return an object of type SCM corresponding to the long or unsigned long argument n. If n cannot be converted, BOOL_F is returned. Which numbers can be converted depends on whether SCM was compiled with the BIGDIG or FLOATS flags.

To convert integer numbers of smaller types (short or char), use the macro MAKINUM(n).

Function: long num2long (SCM num, char *pos, char *s_caller)
Function: unsigned long num2ulong (SCM num, char *pos, char *s_caller)
Function: short num2short (SCM num, char *pos, char *s_caller)
Function: unsigned short num2ushort (SCM num, char *pos, char *s_caller)
Function: unsigned char num2uchar (SCM num, char *pos, char *s_caller)
Function: double num2dbl (SCM num, char *pos, char *s_caller)

These functions are used to check and convert SCM arguments to the named C type. The first argument num is checked to see it it is within the range of the destination type. If so, the converted number is returned. If not, the ASRTER macro calls wta with num and strings pos and s_caller. For a listing of useful predefined pos macros, See C Macros.

Note Inexact numbers are accepted only by num2dbl, num2long, and num2ulong (for when SCM is compiled without bignums). To convert inexact numbers to exact numbers, See inexact->exact in Revised(5) Scheme.

Function: unsigned long scm_addr (SCM args, char *s_name)

Returns a pointer (cast to an unsigned long) to the storage corresponding to the location accessed by aref(CAR(args),CDR(args)). The string s_name is used in any messages from error calls by scm_addr.

scm_addr is useful for performing C operations on strings or other uniform arrays (see Uniform Array).

Function: unsigned long scm_base_addr(SCM ra, char *s_name)

Returns a pointer (cast to an unsigned long) to the beginning of storage of array ra. Note that if ra is a shared-array, the strorage accessed this way may be much larger than ra.

Note While you use a pointer returned from scm_addr or scm_base_addr you must keep a pointer to the associated SCM object in a stack allocated variable or GC-protected location in order to assure that SCM does not reuse that storage before you are done with it. See scm_gc_protect.

Function: SCM makfrom0str (char *src)
Function: SCM makfromstr (char *src, sizet len)

Return a newly allocated string SCM object copy of the null-terminated string src or the string src of length len, respectively.

Function: SCM makfromstrs (int argc, char **argv)

Returns a newly allocated SCM list of strings corresponding to the argc length array of null-terminated strings argv. If argv is less than 0, argv is assumed to be NULL terminated. makfromstrs is used by scm_init_from_argv to convert the arguments SCM was called with to a SCM list which is the value of SCM procedure calls to program-arguments (see program-arguments).

Function: char ** makargvfrmstrs (SCM args, char *s_name)

Returns a NULL terminated list of null-terminated strings copied from the SCM list of strings args. The string s_name is used in messages from error calls by makargvfrmstrs.

makargvfrmstrs is useful for constructing argument lists suitable for passing to main functions.

Function: void must_free_argv (char **argv)

Frees the storage allocated to create argv by a call to makargvfrmstrs.


Next: , Previous: , Up: Operations   [Contents][Index]

6.2.13 Continuations

The source files continue.h and continue.c are designed to function as an independent resource for programs wishing to use continuations, but without all the rest of the SCM machinery. The concept of continuations is explained in call-with-current-continuation in Revised(5) Scheme.

The C constructs jmp_buf, setjmp, and longjmp implement escape continuations. On VAX and Cray platforms, the setjmp provided does not save all the registers. The source files setjump.mar, setjump.s, and ugsetjump.s provide implementations which do meet this criteria.

SCM uses the names jump_buf, setjump, and longjump in lieu of jmp_buf, setjmp, and longjmp to prevent name and declaration conflicts.

Data type: CONTINUATION jmpbuf length stkbse other parent

is a typedefed structure holding all the information needed to represent a continuation. The other slot can be used to hold any data the user wishes to put there by defining the macro CONTINUATION_OTHER.

Macro: SHORT_ALIGN

If SHORT_ALIGN is #defined (in scmfig.h), then the it is assumed that pointers in the stack can be aligned on short int boundaries.

Data type: STACKITEM

is a pointer to objects of the size specified by SHORT_ALIGN being #defined or not.

Macro: CHEAP_CONTINUATIONS

If CHEAP_CONTINUATIONS is #defined (in scmfig.h) each CONTINUATION has size sizeof CONTINUATION. Otherwise, all but root CONTINUATIONs have additional storage (immediately following) to contain a copy of part of the stack.

Note On systems with nonlinear stack disciplines (multiple stacks or non-contiguous stack frames) copying the stack will not work properly. These systems need to #define CHEAP_CONTINUATIONS in scmfig.h.

Macro: STACK_GROWS_UP

Expresses which way the stack grows by its being #defined or not.

Variable: long thrown_value

Gets set to the value passed to throw_to_continuation.

Function: long stack_size (STACKITEM *start)

Returns the number of units of size STACKITEM which fit between start and the current top of stack. No check is done in this routine to ensure that start is actually in the current stack segment.

Function: CONTINUATION * make_root_continuation (STACKITEM *stack_base)

Allocates (malloc) storage for a CONTINUATION of the current extent of stack. This newly allocated CONTINUATION is returned if successful, 0 if not. After make_root_continuation returns, the calling routine still needs to setjump(new_continuation->jmpbuf) in order to complete the capture of this continuation.

Function: CONTINUATION * make_continuation (CONTINUATION *parent_cont)

Allocates storage for the current CONTINUATION, copying (or encapsulating) the stack state from parent_cont->stkbse to the current top of stack. The newly allocated CONTINUATION is returned if successful, 0q if not. After make_continuation returns, the calling routine still needs to setjump(new_continuation->jmpbuf) in order to complete the capture of this continuation.

Function: void free_continuation (CONTINUATION *cont)

Frees the storage pointed to by cont. Remember to free storage pointed to by cont->other.

Function: void throw_to_continuation (CONTINUATION *cont, long value, CONTINUATION *root_cont)

Sets thrown_value to value and returns from the continuation cont.

If CHEAP_CONTINUATIONS is #defined, then throw_to_continuation does longjump(cont->jmpbuf, val).

If CHEAP_CONTINUATIONS is not #defined, the CONTINUATION cont contains a copy of a portion of the C stack (whose bound must be CONT(root_cont)->stkbse). Then:

  • the stack is grown larger than the saved stack, if neccessary.
  • the saved stack is copied back into it’s original position.
  • longjump(cont->jmpbuf, val);

Previous: , Up: Operations   [Contents][Index]

6.2.14 Evaluation

SCM uses its type representations to speed evaluation. All of the subr types (see Subr Cells) are tc7 types. Since the tc7 field is in the low order bit position of the CAR it can be retrieved and dispatched on quickly by dereferencing the SCM pointer pointing to it and masking the result.

All the SCM Special Forms get translated to immediate symbols (isym) the first time they are encountered by the interpreter (ceval). The representation of these immediate symbols is engineered to occupy the same bits as tc7. All the isyms occur only in the CAR of lists.

If the CAR of a expression to evaluate is not immediate, then it may be a symbol. If so, the first time it is encountered it will be converted to an immediate type ILOC or GLOC (see Immediates). The codes for ILOC and GLOC lower 7 bits distinguish them from all the other types we have discussed.

Once it has determined that the expression to evaluate is not immediate, ceval need only retrieve and dispatch on the low order 7 bits of the CAR of that cell, regardless of whether that cell is a closure, header, or subr, or a cons containing ILOC or GLOC.

In order to be able to convert a SCM symbol pointer to an immediate ILOC or GLOC, the evaluator must be holding the pointer to the list in which that symbol pointer occurs. Turning this requirement to an advantage, ceval does not recursively call itself to evaluate symbols in lists; It instead calls the macro EVALCAR. EVALCAR does symbol lookup and memoization for symbols, retrieval of values for ILOCs and GLOCs, returns other immediates, and otherwise recursively calls itself with the CAR of the list.

ceval inlines evaluation (using EVALCAR) of almost all procedure call arguments. When ceval needs to evaluate a list of more than length 3, the procedure eval_args is called. So ceval can be said to have one level lookahead. The avoidance of recursive invocations of ceval for the most common cases (special forms and procedure calls) results in faster execution. The speed of the interpreter is currently limited on most machines by interpreter size, probably having to do with its cache footprint. In order to keep the size down, certain EVALCAR calls which don’t need to be fast (because they rarely occur or because they are part of expensive operations) are instead calls to the C function evalcar.

Variable: symhash

Top level symbol values are stored in the symhash table. symhash is an array of lists of ISYMs and pairs of symbols and values.

Immediate: ILOC

Whenever a symbol’s value is found in the local environment the pointer to the symbol in the code is replaced with an immediate object (ILOC) which specifies how many environment frames down and how far in to go for the value. When this immediate object is subsequently encountered, the value can be retrieved quickly.

ILOCs work up to a maximum depth of 4096 frames or 4096 identifiers in a frame. Radey Shouman added FARLOC to handle cases exceeding these limits. A FARLOC consists of a pair whose CAR is the immediate type IM_FARLOC_CAR or IM_FARLOC_CDR, and whose CDR is a pair of INUMs specifying the frame and distance with a larger range than ILOCs span.

Adding #define TEST_FARLOC to eval.c causes FARLOCs to be generated for all local identifiers; this is useful only for testing memoization.

Immediate: GLOC

Pointers to symbols not defined in local environments are changed to one plus the value cell address in symhash. This incremented pointer is called a GLOC. The low order bit is normally reserved for GCmark; But, since references to variables in the code always occur in the CAR position and the GCmark is in the CDR, there is no conflict.

If the compile FLAG CAUTIOUS is #defined then the number of arguments is always checked for application of closures. If the compile FLAG RECKLESS is #defined then they are not checked. Otherwise, number of argument checks for closures are made only when the function position (whose value is the closure) of a combination is not an ILOC or GLOC. When the function position of a combination is a symbol it will be checked only the first time it is evaluated because it will then be replaced with an ILOC or GLOC.

Macro: EVAL expression env
Macro: SIDEVAL expression env

EVAL Returns the result of evaluating expression in env. SIDEVAL evaluates expression in env when the value of the expression is not used.

Both of these macros alter the list structure of expression as it is memoized and hence should be used only when it is known that expression will not be referenced again. The C function eval is safe from this problem.

Function: SCM eval (SCM expression)

Returns the result of evaluating expression in the top-level environment. eval copies expression so that memoization does not modify expression.


Next: , Previous: , Up: The Implementation   [Contents][Index]