[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Questions for a language designer
A friend at work has been toying with the idea of what
his "ideal programming language" would look like. I don't
think he's serious about actually implementing it, but he's
a really smart guy who wants to learn something about doing
language design. He wanted pointers to literature that
might give him some insight. I said I thought that, while
there are lots of books and papers that describe particular
languages, and books and papers that describe the details
of building compilers, run-times, type systems, semantics,
etc, there isn't really a whole lot about that describes
how to design a "good" language, where "good" is defined
as "meets some set of [measurable] criteria defining some
set of needs".
I've been involved in designing and implementing a couple
of languages, and I've built a lot of tools and environments,
and generally kept up with what people have been doing for
a long time. What this means, of course, is that I've been
directly involved in or witness to the making a bunch of
mistakes that might have been avoided if I/we had been able
to formulate the right questions at the outset.
So I decided to send him a list of questions that I would
ask myself, and describe why these questions are worth
asking. After it reached a certain length, I realized
that it would probably be worth doing a proper job of it,
since I've never seen all of these questions asked in the
I wonder if some of you might be willing to collaborate a
bit on turning this starting point into a *really* good
list of questions. What I need for each entry are one
or more related questions, a rationale for why these are
good or important questions, and (secondarily) a couple
of possible answers. The answers are to stimulate further
thinking, not to provide "the one right answer".
Public discussion is welcome, as are private replies. I
will collate all the contributions into this document
(properly credited, of course) and periodically send out
the result. If it turns out well, we can "publish" the
result someplace useful.
Thanks! The list (so far) follows:
- What need are you trying to fill? Don't fall into
the trap of "a scripting language", because they
always turn into general-purpose languages. In
particular, is high-performance an issue? This
says something about whether you want to implement
a VM-based or a natively compiled language.
- What about debuggability? If you plan to compile
it, you need to think about how to store debugging
- How do you want to bootstrap it? This, too, says
something about what kind of back-end you might build.
Perhaps you build a tiny VM in C, then compile to
C. This way, you avoid fun but time-consuming work
on code generation for modern super-scalar hardware,
register allocation, etc.
- Do you want to be able to catch type errors early or
late? That says something about your type system.
If you allow type declarations, do you want to think
about parameterized types? If you go whole-hog
with F-bounded polymorphism, you can get performance
*and* type-safety *and* ease of use, but it's hard
to get this exactly right.
- What about namespaces? Do you want to have a simple-
minded scheme like Java, where classes, namespaces,
and files are roughly equivalent? Lisp-style packages?
Dylan-style modules and libraries? Within a single
first-class namespace, how many second-class namespaces
are there? Java has 7 or 8: class names, function
names, local variable names, slot names, etc. Common
Lisp has at least 3 (function, variable, and class
names). Dylan and Scheme have one, which greatly
simplifies things at a small loss of generality which
can usually be worked around with name conventions.
- What about encapsulation? Do you want to do information-
hiding on a class basis like C++ and Java, or on a
"module" basis like Dylan?
- Do you want first-class functions? What about
lexical closures? First-class continuations? The
answer to those questions will tell you things about
heap- and stack-allocation, and will also tell you
how important it might be to do a continuation-based
compiler. It also tells you how hard your compiler
has to work to avoid consing environments unnecessarily.
Lots of sophisticated language designers go with
simple closures and avoid full continuations, because
full-scale environment capture is hard to do well.
- Do you want a first-class object system? Should it
extend all the way to the primitive types, or do you
want to special-case those like Java? Do you want a
Smalltalk/Java-style single-receiver object orientation,
or a CLOS-style multi-method generic function dispatch?
If the former, do you need some sort of static overloading
like C++? If the latter and performance is important,
do you need some sort of Dylan-style "sealing" so that
you can do some compile-time optimizations? Do you
want single inheritance, single inheritance with
interfaces, multiple inheritance, or a hybrid single
inheritance with mixins? If you've got a more static
type system, you'll need to deal with casts. Do you
additionally want auto-conversion?
- If you've got an object-system, how much of a meta-object
system do you want to expose? Do you want it to be
purely reflective, or more than that? In Dylan, we
separated 'make' from 'initialize', which was a good
idea, but do you also want to separate out 'allocate',
so that you have control over where an object is
created, e.g., in a "persistent memory" pool that
might be back-ended by a database?
- Are there different semantics for "pointer-ish" and
"non-pointer-ish" objects, like in C? Or is everything
a first-class object reference, like in Lisp?
- Do you need hairy CLOS-style method combination, or is
a simpler style like we did in Dylan enough? Do you care
about what Gregor Kiczales calls "aspects", which might
change your decision?
- Do you want to support threading? Do you want to roll
your own threads or use OS threads? Do you want to
support massive concurrency like Erlang? The answers
to those questions will tell you about aspects of the
run-time, memory allocation/GC, and performance. Oh
yeah -- it also tells you if you can actually take
advantage of the multiple processors sitting in most
of the machines we all have. Do you want Java-style
synchronization where it is built in to objects, or
should that be handled orthogonally?
- How well do you want to be able to integrate with
native libraries? This decision affects your memory
model, how you plan to represent run-time type info,
how function call/return works, how signalling works,
etc. By "memory model", I also mean to include what
sorts of objects are boxed or tagged. (Opinion: the
Harlqn/FunO Dylan compiler got it wrong -- I think we
should have boxed everything, and then concentrated our
efforts on box/unbox optimizations. This would have
*hugely* simplified FFI issues.) Good integration
with native code probably means that you will end up
using a conservative collector, and that will effect
the semantics of "finalization" (if you have it).
- Do you want to be able to return multiple values?
How about &rest arguments? These affect function
call/return, tail-call elimination, and stack vs. heap
- What's your order of evaluation in expressions? This
affects what sort of optimizations can be safely done.
- What compilation model do you want? Lots of include
files like C[++]? Lots of "packages" like Java?
Whole-worlds like Lisp? Separate libraries like Dylan?
This affects a lot of things, not least of which is
the ability to deliver small applications. It also
informs the design of your core run-time.
- Is the core run-time tiny like Scheme? Small like
Dylan? Huge like Common Lisp? If you like the Common
Lisp model, it's worth looking at EuLisp to see how
to re-package it in a more layered way.
- Even in a small run-time, you need to get the basic
types right. Are your numeric types "closed" (that is,
do they include reals -- rationals and irrationals --
and complex numbers)? Are your string and character
types rich enough to model Unicode?
- Think hard about collections. How do the following
relate to each other: sets, tables, vectors, arrays,
lists, sequences, ranges? In Dylan, we decided too
late having the tail of a list be a "cons" was maybe
not such a great idea; what about that? How do your
collections interact with your threading model?
- Think hard about iteration, especially over collections.
If all collections obey a uniform iteration protocol,
it means that you can do things like 'for e in c ...'.
Note that if iterators are done in a first-class way,
this has performance implications that your compiler
needs to worry about.
- Do you want macros? Lisp-style macros? Dylan-style
pattern-matching non-procedural hygienic macros?
Scheme-style 'syntax-case' pattern-matching procedural
hygienic macros? This says a lot about the syntax of
your language, and it also says a lot about the model
you choose for compile-time evaluation environments.
- What syntax do you want? Parentheses unaccountably
give lots of people hives, but S-expressions make a
lot of things much simpler. Infix syntax is quite
nice when it's done well, but you've got to get the
"kernel" of that exactly right if you want your infix
macro system ever to be usable. If you decide on
S-expressions, should they be represented as lists
and conses, or do you wany a first-class object for