http://people.csail.mit.edu/jaffer/Schlep | |
Schlep Toolchains |
| |||
| |||
|
toolchain, n 1. a set of computer programs (tools) that are used to create a product (typically another computer program or system of programs).
schlep, n 1. an arduous journey.
schlep, vt 1. to drag or haul (an object); to make a tedious journey (from Yiddish שלעפּן shlepn; cf. German schleppen) (OED, MW)
Even among cross-platform languages, the choice of programming language for library projects or applications usually involves competing aspects such that no single choice is optimal. It also makes every project a gamble that the chosen language will win (or at least be supported) in the future.
In order to aid market penetration or hedge bets about language choice, it can be desirable have the same library or program run on multiple (language) platforms. For the dynamic object-oriented programming language Water to run both on servers and in browsers, ClearMethods targeted the Java Virtual Machine, C, and the Common Language Runtime (via C#).
Water is a persistent "databased" language, using WB B-Tree Databases for its persistant store. The original Schlep, then solely a Scheme-to-C translator, was developed for WB. Working at ClearMethods, Ravi kiran Gorrepati and I adapted Schlep to also do Scheme-to-Java and Scheme-to-C# translations.
Water being dynamic foiled a strategy of translating Water programs into equivalent object-oriented code in C++, C#, and Java. So the Schlep translations are largely object-free; Java and C# classes are defined only to provide a place to define functions and to implement (Water) primitive datatypes.
There are 4 target languages:
Java provides garbage collection natively; calls
to `free!'
are ignored.
`free!'
are
ignored.
There are small differences between the WB and Water versions
of scm2c
due to their different representations for
byte-vectors. The WB version is provided here. The Water version
will also be supplied in the future.
The source language which the Schlep translators translate from is a
subset of the Algorithmic Language Scheme
(R4RS),
with SRFI-60
(Integers as Bits), SCM implementation
extensions qase
and vector-set-length!
and
(from Common-Lisp)
#+
, #-
, defmacro
, defvar
,
and defconst
. The Schlep Dialect is described in a
separate document:
One might assume that in order for Schlep-dialect code to map to these three languages, the Schlep-dialect must manifest the worst limitations of each language; but translation can actually rectify some of these limitations.
If I had to pick one practical feature of Scheme which elevates it
above other languages, it would be internal definitions. The
Algorithmic Language Scheme allows procedure definitions
(using define
, letrec
, and
named-let
) inside of other procedures, which none of C,
C#, or Java does. Internal definitions allow calls of internal
procedures with a small number of arguments to replace the common
alternatives:
goto
statement. The restriction to the tail-position
does not allow internal recursion other than tail-recursion; but this
facilitates use of internal procedures in many situations which would
otherwise force less desirable practices.
Java lacks a `goto' statement. Tail-called internal
procedures are instead implemented
using while (true)
, continue
,
and break
with labels. The resulting Java code is not as
readable as the original Scheme-dialect; but that loss in clarity is
balanced by greater expressive power.
Schlep-example gives an example of a procedure with an internal procedure and how Schlep translates this procedure into C, C#, and Java.
Scheme identifiers can contain characters which the target languages
do not allow. Each translator defines a procedure
named schlep-name
which maps Scheme identifiers to
identifiers in the target language. This mapping is described on the
individual translator home pages:
scm2java.html
,
scm2cs.html
, and
scm2c.html
.
Each of the target languages is statically typed, but the Schlep-Dialect is manifestly typed. Each of the translator home pages describes the multiple ways of declaring types for Scheme identifiers based on glob-matching the identifier names. Two cases are handled without declaration: identifiers ending in `!' or `?' are typed void and the native Boolean type, respectively. The type of a identifier bound to a procedure is the type of the return value of that procedure.
One could individually declare the type of every identifier used; but I recommend adopting matchable conventions; this makes for less work and more readable code. Examples of declaration files are scm2c.typ, scm2cs.typ, and scm2java.typ (text from semicolon to end-of-line is a comment).
For arithmetic and basic data operations accessing or setting variables, vectors, and strings, the translators emit the corresponding statement or expression in the target language. For utility and other procedures not handled, the translators emit procedure calls with the names translated appropriately for the target language. Utility procedures not intrinsic to the target language or its libraries must be supplied by target language files to be compiled with the translated code. Accessor-routines written in the target language allow composite data types to be operated on by translated code.
Code to be translated to C must be written with the realization that
NULL
and false are conflated. They are separate in Java
and C#; their static typing allows NULL
, but not false to
be a placeholder for missing object data. False (#f
) is
typically used for missing data in Scheme so that the logical
operators work on them. Our solution for this is to have scm2java and
scm2cs wrap test expressions which are not obviously boolean with the
function (method) `a2b'
in generated Java and C# code.
These definitions are in SchlepRT.java
and SchlepRT.cs, respectively:
public static boolean a2b(boolean b) {return b;} public static boolean a2b(Object i) {return (i != null);}
public static bool a2b(bool b) {return b;} public static bool a2b(Object i) {return (i != null);}
The translation
programs,
scm2java.scm
,
scm2cs.scm
, and
scm2c.scm
, are written in (full)
Scheme. The translations
from Schlep-Dialect source files to
target language files can be done by invoking the translation programs
as SCM scripts, or by loading and calling a translator from a Scheme
session. The first two lines of each program are written so that
the SCM Scheme implementation can execute them as
scripts.
#! /usr/local/bin/scm \ - !#
If your SCM binary is located in a different place,
change `/usr/local/bin/scm'
to the absolute path to the
SCM executable on your computer. To try loading these files into
another implementation, you may need to remove the first two lines.
scm2java
.java
file for
each file.scm
file passed on the command
line.
scm2cs
.cs
extension. The one-file-approach was
adopted so that methods could call methods in other classes
without explicit class prefixes. If you know how to do this
with multiple C# files, please let me know.
scm2c
.c
or file.scm
, scm2c
produces
files file.h
and file.c
. When called
with file.h
, scm2c
produces
just file.h
(from file.scm
).
A coder experienced with Schlep can use it to generate C code
(using scm2c
) which is nearly as tight as can be written
directly. Testing and debugging the source in SCM speeds development
and eliminates range errors which are difficult to find in compiled C.
For Water running a spectralnorm benchmark, Sun's Java-1.6 HotSpot Virtual Machine runs Water-J nearly as fast (within a few percent) as GCC-4.3 compiled Water-C on Linux. The Mono JIT C# compiler version 2.4.2.3 compiled Water implementation runs more than 2 times slower.
The scm2java
, scm2cs
,
and scm2c
, programs
generate Texinfo
files with a .txi
extension if the Scheme source file
has Schmooz format comments. Nearly any
documentation format can be generated from Texinfo files. Schmooz was
written by
Radey Shouman.
Each translator generates documentation for its target language API.
So that .txi
files generated for different language don't
overwrite each other, the translated sources should be directed to
distinct directories.
| |||
|
Not part of the Schlep technology, Pas2scm is a Pascal-to-Scheme translator I wrote to revive some nifty graphics programs I wrote for Apollo Computer workstations. Pas2scm demonstrates that programming language translation can have Scheme as the target language. That being said, Wirth's Pascal language is small, easily parsed, and not object-oriented. Translating from C++ would prove more challenging.
|
The compiler that translates the visual blocks language for implementation on Android uses the Kawa Language Framework and Kawa's dialect of the Scheme programming language, developed by Per Bothner and distributed as part of the Gnu Operating System by the Free Software Foundation.
I am a guest and not a member of the MIT Computer Science and Artificial Intelligence Laboratory.
My actions and comments do not reflect in any way on MIT. | ||
agj @ alum.mit.edu | Go Figure! |