[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Questions for a language designer



A friend at work has been toying with the idea of what
his "ideal programming language" would look like.  I don't
think he's serious about actually implementing it, but he's
a really smart guy who wants to learn something about doing
language design.  He wanted pointers to literature that
might give him some insight.  I said I thought that, while
there are lots of books and papers that describe particular
languages, and books and papers that describe the details
of building compilers, run-times, type systems, semantics,
etc, there isn't really a whole lot about that describes
how to design a "good" language, where "good" is defined
as "meets some set of [measurable] criteria defining some
set of needs".

I've been involved in designing and implementing a couple
of languages, and I've built a lot of tools and environments,
and generally kept up with what people have been doing for
a long time.  What this means, of course, is that I've been
directly involved in or witness to the making a bunch of
mistakes that might have been avoided if I/we had been able
to formulate the right questions at the outset.

So I decided to send him a list of questions that I would
ask myself, and describe why these questions are worth
asking.  After it reached a certain length, I realized
that it would probably be worth doing a proper job of it,
since I've never seen all of these questions asked in the
same place.

I wonder if some of you might be willing to collaborate a
bit on turning this starting point into a *really* good
list of questions.  What I need for each entry are one
or more related questions, a rationale for why these are
good or important questions, and (secondarily) a couple
of possible answers.  The answers are to stimulate further
thinking, not to provide "the one right answer".

Public discussion is welcome, as are private replies.  I
will collate all the contributions into this document
(properly credited, of course) and periodically send out
the result.  If it turns out well, we can "publish" the
result someplace useful.

Thanks!  The list (so far) follows:

  - What need are you trying to fill?  Don't fall into
    the trap of "a scripting language", because they
    always turn into general-purpose languages.  In
    particular, is high-performance an issue?  This
    says something about whether you want to implement
    a VM-based or a natively compiled language.
  - What about debuggability?  If you plan to compile
    it, you need to think about how to store debugging
    information.
  - How do you want to bootstrap it?  This, too, says
    something about what kind of back-end you might build.
    Perhaps you build a tiny VM in C, then compile to
    C.  This way, you avoid fun but time-consuming work
    on code generation for modern super-scalar hardware,
    register allocation, etc.
  - Do you want to be able to catch type errors early or
    late?  That says something about your type system.
    If you allow type declarations, do you want to think
    about parameterized types?  If you go whole-hog
    with F-bounded polymorphism, you can get performance
    *and* type-safety *and* ease of use, but it's hard
    to get this exactly right.
  - What about namespaces?  Do you want to have a simple-
    minded scheme like Java, where classes, namespaces,
    and files are roughly equivalent?  Lisp-style packages?
    Dylan-style modules and libraries?  Within a single
    first-class namespace, how many second-class namespaces
    are there?  Java has 7 or 8: class names, function
    names, local variable names, slot names, etc.  Common
    Lisp has at least 3 (function, variable, and class
    names).  Dylan and Scheme have one, which greatly
    simplifies things at a small loss of generality which
    can usually be worked around with name conventions.
  - What about encapsulation?  Do you want to do information-
    hiding on a class basis like C++ and Java, or on a
    "module" basis like Dylan?
  - Do you want first-class functions?  What about
    lexical closures?  First-class continuations?  The
    answer to those questions will tell you things about
    heap- and stack-allocation, and will also tell you
    how important it might be to do a continuation-based
    compiler.  It also tells you how hard your compiler
    has to work to avoid consing environments unnecessarily.
    Lots of sophisticated language designers go with
    simple closures and avoid full continuations, because
    full-scale environment capture is hard to do well.
  - Do you want a first-class object system?  Should it
    extend all the way to the primitive types, or do you
    want to special-case those like Java?  Do you want a
    Smalltalk/Java-style single-receiver object orientation,
    or a CLOS-style multi-method generic function dispatch?
    If the former, do you need some sort of static overloading
    like C++?  If the latter and performance is important,
    do you need some sort of Dylan-style "sealing" so that
    you can do some compile-time optimizations?  Do you
    want single inheritance, single inheritance with
    interfaces, multiple inheritance, or a hybrid single
    inheritance with mixins?  If you've got a more static
    type system, you'll need to deal with casts.  Do you
    additionally want auto-conversion?
  - If you've got an object-system, how much of a meta-object
    system do you want to expose?  Do you want it to be
    purely reflective, or more than that?  In Dylan, we
    separated 'make' from 'initialize', which was a good
    idea, but do you also want to separate out 'allocate',
    so that you have control over where an object is
    created, e.g., in a "persistent memory" pool that
    might be back-ended by a database?
  - Are there different semantics for "pointer-ish" and
    "non-pointer-ish" objects, like in C?  Or is everything
    a first-class object reference, like in Lisp?
  - Do you need hairy CLOS-style method combination, or is
    a simpler style like we did in Dylan enough?  Do you care
    about what Gregor Kiczales calls "aspects", which might
    change your decision?
  - Do you want to support threading?  Do you want to roll
    your own threads or use OS threads?  Do you want to
    support massive concurrency like Erlang?  The answers
    to those questions will tell you about aspects of the
    run-time, memory allocation/GC, and performance.  Oh
    yeah -- it also tells you if you can actually take
    advantage of the multiple processors sitting in most
    of the machines we all have.  Do you want Java-style
    synchronization where it is built in to objects, or
    should that be handled orthogonally?
  - How well do you want to be able to integrate with
    native libraries?  This decision affects your memory
    model, how you plan to represent run-time type info,
    how function call/return works, how signalling works,
    etc.  By "memory model", I also mean to include what
    sorts of objects are boxed or tagged.  (Opinion: the
    Harlqn/FunO Dylan compiler got it wrong -- I think we
    should have boxed everything, and then concentrated our
    efforts on box/unbox optimizations.  This would have
    *hugely* simplified FFI issues.)  Good integration
    with native code probably means that you will end up
    using a conservative collector, and that will effect
    the semantics of "finalization" (if you have it).
  - Do you want to be able to return multiple values?
    How about &rest arguments?  These affect function
    call/return, tail-call elimination, and stack vs. heap
    allocation optimizations.
  - What's your order of evaluation in expressions?  This
    affects what sort of optimizations can be safely done.
  - What compilation model do you want?  Lots of include
    files like C[++]?  Lots of "packages" like Java?
    Whole-worlds like Lisp?  Separate libraries like Dylan?
    This affects a lot of things, not least of which is
    the ability to deliver small applications.  It also
    informs the design of your core run-time.
  - Is the core run-time tiny like Scheme?  Small like
    Dylan?  Huge like Common Lisp?  If you like the Common
    Lisp model, it's worth looking at EuLisp to see how
    to re-package it in a more layered way.
  - Even in a small run-time, you need to get the basic
    types right.  Are your numeric types "closed" (that is,
    do they include reals -- rationals and irrationals --
    and complex numbers)?  Are your string and character
    types rich enough to model Unicode?
  - Think hard about collections.  How do the following
    relate to each other: sets, tables, vectors, arrays,
    lists, sequences, ranges?  In Dylan, we decided too
    late having the tail of a list be a "cons" was maybe
    not such a great idea; what about that?  How do your
    collections interact with your threading model?
  - Think hard about iteration, especially over collections.
    If all collections obey a uniform iteration protocol,
    it means that you can do things like 'for e in c ...'.
    Note that if iterators are done in a first-class way,
    this has performance implications that your compiler
    needs to worry about.
  - Do you want macros?  Lisp-style macros?  Dylan-style
    pattern-matching non-procedural hygienic macros?
    Scheme-style 'syntax-case' pattern-matching procedural
    hygienic macros?  This says a lot about the syntax of
    your language, and it also says a lot about the model
    you choose for compile-time evaluation environments.
  - What syntax do you want?  Parentheses unaccountably
    give lots of people hives, but S-expressions make a
    lot of things much simpler.  Infix syntax is quite
    nice when it's done well, but you've got to get the
    "kernel" of that exactly right if you want your infix
    macro system ever to be usable.  If you decide on
    S-expressions, should they be represented as lists
    and conses, or do you wany a first-class object for
    that?