schlepping snow http://people.csail.mit.edu/jaffer/Schlep

Schlep Toolchains

GCC logo  scm2c  Translate Scheme programs to C 
C#  scm2cs  Translate Scheme programs to C# 
Java logo: steaming cup  scm2java  Translate Scheme programs to Java 

toolchain, n 1. a set of computer programs (tools) that are used to create a product (typically another computer program or system of programs).

schlep, n 1. an arduous journey.
schlep, vt 1. to drag or haul (an object); to make a tedious journey (from Yiddish שלעפּן shlepn; cf. German schleppen) (OED, MW)

Motivations

Even among cross-platform languages, the choice of programming language for library projects or applications usually involves competing aspects such that no single choice is optimal. It also makes every project a gamble that the chosen language will win (or at least be supported) in the future.

In order to aid market penetration or hedge bets about language choice, it can be desirable have the same library or program run on multiple (language) platforms. For the dynamic object-oriented programming language Water to run both on servers and in browsers, ClearMethods targeted the Java Virtual Machine, C, and the Common Language Runtime (via C#).

Water is a persistent "databased" language, using WB B-Tree Databases for its persistant store. The original Schlep, then solely a Scheme-to-C translator, was developed for WB. Working at ClearMethods, Ravi kiran Gorrepati and I adapted Schlep to also do Scheme-to-Java and Scheme-to-C# translations.

The Target Languages

Water being dynamic foiled a strategy of translating Water programs into equivalent object-oriented code in C++, C#, and Java. So the Schlep translations are largely object-free; Java and C# classes are defined only to provide a place to define functions and to implement (Water) primitive datatypes.

There are 4 target languages:

Java
Write once, run anywhere.

Java provides garbage collection natively; calls to `free!' are ignored.

C#
C# is also known as ECMA-334 and ISO/IEC 23270. It provides garbage collection natively; calls to `free!' are ignored.

C for WB
For WB, all storage allocation and deallocation is explicit in the source code. Calls to the WB APIs which pass byte-vectors also pass their lengths.

C for Water
For Water, byte-vectors and strings (which are distinct) are dynamically allocated and have manifest lengths. Water-C uses the Hans Boehm GC.

There are small differences between the WB and Water versions of scm2c due to their different representations for byte-vectors. The WB version is provided here. The Water version will also be supplied in the future.

The Source Language

The source language which the Schlep translators translate from is a subset of the Algorithmic Language Scheme (R4RS), with SRFI-60 (Integers as Bits), SCM implementation extensions qase and vector-set-length! and (from Common-Lisp) #+, #-, defmacro, defvar, and defconst. The Schlep Dialect is described in a separate document:

One might assume that in order for Schlep-dialect code to map to these three languages, the Schlep-dialect must manifest the worst limitations of each language; but translation can actually rectify some of these limitations.

If I had to pick one practical feature of Scheme which elevates it above other languages, it would be internal definitions. The Algorithmic Language Scheme allows procedure definitions (using define, letrec, and named-let) inside of other procedures, which none of C, C#, or Java does. Internal definitions allow calls of internal procedures with a small number of arguments to replace the common alternatives:

C and C# have a `goto' statement, enabling Schlep to emulate calling of internal-procedures in the tail position using some variable assignments (sometimes including temporary variable binding to emulate simultaneous assignment) followed by a goto statement. The restriction to the tail-position does not allow internal recursion other than tail-recursion; but this facilitates use of internal procedures in many situations which would otherwise force less desirable practices.

Java lacks a `goto' statement. Tail-called internal procedures are instead implemented using while (true), continue, and break with labels. The resulting Java code is not as readable as the original Scheme-dialect; but that loss in clarity is balanced by greater expressive power.

Schlep-example gives an example of a procedure with an internal procedure and how Schlep translates this procedure into C, C#, and Java.

The Translation Process

Scheme identifiers can contain characters which the target languages do not allow. Each translator defines a procedure named schlep-name which maps Scheme identifiers to identifiers in the target language. This mapping is described on the individual translator home pages: scm2java.html, scm2cs.html, and scm2c.html.

Each of the target languages is statically typed, but the Schlep-Dialect is manifestly typed. Each of the translator home pages describes the multiple ways of declaring types for Scheme identifiers based on glob-matching the identifier names. Two cases are handled without declaration: identifiers ending in `!' or `?' are typed void and the native Boolean type, respectively. The type of a identifier bound to a procedure is the type of the return value of that procedure.

One could individually declare the type of every identifier used; but I recommend adopting matchable conventions; this makes for less work and more readable code. Examples of declaration files are scm2c.typ, scm2cs.typ, and scm2java.typ (text from semicolon to end-of-line is a comment).

For arithmetic and basic data operations accessing or setting variables, vectors, and strings, the translators emit the corresponding statement or expression in the target language. For utility and other procedures not handled, the translators emit procedure calls with the names translated appropriately for the target language. Utility procedures not intrinsic to the target language or its libraries must be supplied by target language files to be compiled with the translated code. Accessor-routines written in the target language allow composite data types to be operated on by translated code.

NULL versus False

Code to be translated to C must be written with the realization that NULL and false are conflated. They are separate in Java and C#; their static typing allows NULL, but not false to be a placeholder for missing object data. False (#f) is typically used for missing data in Scheme so that the logical operators work on them. Our solution for this is to have scm2java and scm2cs wrap test expressions which are not obviously boolean with the function (method) `a2b' in generated Java and C# code. These definitions are in SchlepRT.java and SchlepRT.cs, respectively:

Java
public static boolean a2b(boolean b) {return b;}
public static boolean a2b(Object i) {return (i != null);}
C#
public static bool a2b(bool b) {return b;}
public static bool a2b(Object i) {return (i != null);}
This makes the semantics of Java and C# conditionals close to C. One must still not depend on distinguishing false from NULL.

The Translation Programs

The translation programs, scm2java.scm, scm2cs.scm, and scm2c.scm, are written in (full) Scheme. The translations from Schlep-Dialect source files to target language files can be done by invoking the translation programs as SCM scripts, or by loading and calling a translator from a Scheme session. The first two lines of each program are written so that the SCM Scheme implementation can execute them as scripts.

#! /usr/local/bin/scm \
- !#

If your SCM binary is located in a different place, change `/usr/local/bin/scm' to the absolute path to the SCM executable on your computer. To try loading these files into another implementation, you may need to remove the first two lines.

scm2java
produces one file.java file for each file.scm file passed on the command line.
scm2cs
creates one file concatenating the translations of all the input files, which will be a combination of Scheme files and C# files with the .cs extension. The one-file-approach was adopted so that methods could call methods in other classes without explicit class prefixes. If you know how to do this with multiple C# files, please let me know.
scm2c
when called with file.c or file.scm, scm2c produces files file.h and file.c. When called with file.h, scm2c produces just file.h (from file.scm).

Performance

A coder experienced with Schlep can use it to generate C code (using scm2c) which is nearly as tight as can be written directly. Testing and debugging the source in SCM speeds development and eliminates range errors which are difficult to find in compiled C.

For Water running a spectralnorm benchmark, Sun's Java-1.6 HotSpot Virtual Machine runs Water-J nearly as fast (within a few percent) as GCC-4.3 compiled Water-C on Linux. The Mono JIT C# compiler version 2.4.2.3 compiled Water implementation runs more than 2 times slower.

Generating Documentation

The scm2java, scm2cs, and scm2c, programs generate Texinfo files with a .txi extension if the Scheme source file has Schmooz format comments. Nearly any documentation format can be generated from Texinfo files. Schmooz was written by Radey Shouman.

Each translator generates documentation for its target language API. So that .txi files generated for different language don't overwrite each other, the translated sources should be directed to distinct directories.

schmoozing (conversing)  schmooz  Literate Scheme Markup Language 
Blaise Pascal  pas2scm  Translate Pascal programs to Scheme 

Not part of the Schlep technology, Pas2scm is a Pascal-to-Scheme translator I wrote to revive some nifty graphics programs I wrote for Apollo Computer workstations. Pas2scm demonstrates that programming language translation can have Scheme as the target language. That being said, Wirth's Pascal language is small, easily parsed, and not object-oriented. Translating from C++ would prove more challenging.

Translators Written in Other Schemes

App Inventor  App Inventor  for Android 
The compiler that translates the visual blocks language for implementation on Android uses the Kawa Language Framework and Kawa's dialect of the Scheme programming language, developed by Per Bothner and distributed as part of the Gnu Operating System by the Free Software Foundation.

Copyright © 2001, 2002, 2003, 2007, 2009 Aubrey Jaffer

I am a guest and not a member of the MIT Computer Science and Artificial Intelligence Laboratory.  My actions and comments do not reflect in any way on MIT.
agj @ alum.mit.edu
Go Figure!