[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: What's so cool about Scheme?



Steve Dekorte wrote
> On Friday, June 6, 2003, at 07:58 AM, Anton van Straaten wrote:
> > If Scheme is a scripting language, it's the fastest one on the planet,
> > by a wide margin.
>
> That's impressive. Are one or more of the languages tested here:
> http://www.bagley.org/~doug/shootout/
> a flavor of Scheme? xemacs, guile maybe?

Xemacs and guile are interpreters, and not necessarily the fastest around.
Take a look at this page: http://www.bagley.org/~doug/shootout/craps.shtml

The Scheme implementation Bigloo is in the top 7, along with OCaml, SML,
Common Lisp, and oh yes, C and C++.  They all have fairly similar scores.
Java brings up the rear at #9, and it's downhill from there (look at the
performance-based scores).  Play with the weightings if you like, this isn't
a trick.

Of course, these are only micro-benchmarks.  A language like Scheme pays a
price for features like garbage collection, which doesn't work in its favor
in these micro-benchmarks.

The fastest language of any of these ought to be hand-coded assembler - why
aren't all the performance-sensitive folk using that?  C and C++ occupy a
very similar role today that assembler once did.  As actual deployed
language and compiler technology has improved, there's less and less reason
to write substantial amounts of code in a language that's designed to mirror
the architecture of a CPU.  The strongest justification you can give today
for C is that it's portable and highly-tuned across multiple architectures,
i.e. the legacy argument.  That one, I'll grant you.

> Which Scheme would result in a smaller code base? Scheme48 is about 25K
> lines of code. Given that the book makes no mention(I didn't see
> anything and there's not index entry) of garbage collection, I assume
> the code base would need to include Scheme itself for the runtime?

Yes, I was talking about picking a Scheme that compiles to C, and
implementing an interpreter in that.  The source code for an interpreter
like that doesn't need to be very large at all, of course depending on how
large the language it interprets is.

Scheme48 is a very comprehensive Scheme implementation that has been
extended to do things like migrate continuations across networks.  If you
want smaller, it's easy to achieve.  Your code base in Scheme is certainly
going to be smaller than an *equivalent* code base in C.

> A concern with developing a high level language is that there are heavy
> costs involved with doing things dynamically.

Scheme is actually very well suited to reasoning about statically.  That's
why it can so successfully be compiled to C.  The only exception, really, is
garbage collection.  I'll agree there may be situations in which you don't
want your implementation language to have garbage collection.  More on this
below.

> With C, I get the feeling that I can optimize the important bits,
> know what the important bits are and see to a reasonable extent
> what sort of instructions the machine will get from the ones I
> give it. With Scheme, these things aren't clear to me and I wonder
> if Scheme might make too many of these decisions for me.

I'm not suggesting that everyone should write all language implementations
in an off-the-shelf Scheme implementation from now on, although there are
certainly many areas in which that's very viable.

My real point was that what EOPL is teaching has relevance beyond simply
writing languages in Scheme.  Scheme is simply a very well-defined
computational substrate - like an idealized virtual machine, but one that
can be implemented practically and efficiently.  A program written in Scheme
lends itself to all sorts of automatic optimizations and transformations -
compiling to C being one of them.  A well-designed Scheme program can be a
sort of executable specification.

You mentioned Scheme48 - that's a good example.  The PreScheme dialect in
which the core of Scheme48 is implemented is a statically-typed Scheme
subset.  It has no garbage collection, and it compiles to C in very
predictable ways.  It gives leverage that you don't have in C - macros and
optimizations (in particular, partial evaluation) - that C can't do.

As a trivial example, with PreScheme, (factorial 5) will emit "120" in the
compiled C code.  You can do something like that with fancy template tricks
in C++, but it's a built-in automatic capability in PreScheme.  If you were
using this to implement some other language, any macros you use to achieve
the base syntax you're looking for in your language will be highly optimized
in the compiled code - not only translated to their underlying
representation, but partially evaluated to eliminate any unnecessary
generality.

One criticism of the use of Scheme for language implementation that you
didn't mention, is that it already has a type system which may not be what
you want in your language, and may not be optimal as a basis for the type
system you do want.  But PreScheme demonstrates that even this isn't a
fundamental barrier: it does static type inference and maps types directly
onto C types.

> For example, what if I need to integrate well with C libraries?
> What if the concurrency model needs to call back and forth
> between C and my language?

All of these things have been dealt with rather well by multiple Scheme
implementations.  You're no worse off here than you would be with your own
hand-implemented C-based language.

As just one example, in the small Scheme implementation SIOD, there's a
"hostile FFI" available which lets you call any C code directly, i.e. it
doesn't need specific wrapper or glue code, just give it the function
signature and the name of the DLL, and you can call that function.

If you have some specific requirements for C interfacing between your
language and Scheme, you might want to look at what some of the Scheme
implementations out there have already done.  Their requirements are likely
to be more demanding than yours, in fact - e.g. supporting continuations in
the presence of external C and OS code is tricky, but it's been done.

> What if lists aren't the the ideal data structure for my language?

You don't have to use lists for everything.  You can use structures that map
to C structures, you can use arrays with homogeneous types (Bigloo is one
implementation that supports this), etc.

> What if I need a different sort of garbage collector?

The Scheme implementation Larceny is mostly implemented in itself (except
for some OS interface stuff) and has pluggable garbage collectors.  That
doesn't mean you should use Larceny specifically.  I'm just trying to point
out the variety in what's possible.  Scheme is not like Java where there's a
single definition of a virtual machine and a single syntactic & semantic
specification.  R5RS is just a common base on which real implementations
build.

But I want to restate and make it clear: I'm not trying to say that all
projects should be implemented in Scheme, or that everyone should write in
Scheme, or that Scheme is the be-all and end-all of languages.  It's almost
the opposite: Scheme represents something close to a lowest common
denominator, computer language boiled down to its essential abstract
concepts.  That's exactly why it makes a good language for teaching things
like how to design and develop languages.  But, to absorb those lessons, you
first have to understand what Scheme "really" is.

Those same qualities mean that in the process of transforming a language
from its source form to machine code, you're likely to pass through a level
at which it resembles a Scheme-like language, whether you realize it or not.
To take a concrete example: static single assignment (SSA).  This is a
compiler optimization technique, used to some extent even in gcc (iirc), in
which variable mutation in a given scope is converted so that no variable's
value ever changes.  This simplifies the process of applying dataflow
analyses, and allows the kind of guaranteed meaning-preserving algebraic
transformations of code that are much more difficult in the presence of
variable mutation.  SSA form is essentially a functional form - equivalent,
in Scheme, to a series of nested lets.

The fact that I can concisely describe this low-level internal compiler
operation as "a series of nested lets" is itself indicative of how close
Scheme is to what's going on inside language implementations.  What you
might be thinking of as a scripting language is actually a kind of language
assembly language, which also happens to make an excellent scripting
language - as I said, the fastest on the planet.

Anton