[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Curl, multiple inheritance, and interfaces (was Re: cheerfulstatic typing)




On Fri, 11 Jan 2002, Dan Weinreb wrote:

>                              remember that email I sent a while ago
> about how it would be nice if interfaces, in addition to simply
> specifying the entrypoint names and their arg types and return types,
> could also say more interesting things about their semantics, like
> that write("foo") followed by write("bar") is the same as
> write("foobar") in the absence of exceptions, or that addition is
> commutative, etc? 

I'm curious how far one could take this approach.  Suppose you start with
a list of methods with type information for parameters and return values.  
Now you start adding more and more details about the interface's
"semantics".  This basically amounts to laying constraints on the set of
allowed implementations of that interface.  E.g. in your above example, if
an implementation doesn't fulfill the constraint that write("foo")
followed by write("bar") is equivalent to write("foobar"), then that
implementation is broken -- it violates an interface constraint.

I believe you have been using the word "contract" to refer to what I'm
calling "constraints".  But the reason I call them constraints is because
the cool idea that I'm trying to get at is that one could build a
constraint solver into the compiler, and thereby make implementations
unecessary -- all you'd specify is the interface, and the compiler would
generate one or more implementations automatically.

This would obviously require interfaces to specify a lot more about their
semantics than they currently do.  In fact, an "interface" might actually
resemble a unit test more than a set of methods.  Or it might look like
pre/post conditions, with the implementation elided.

I think most people will agree that, in the general case, this idea is at
least as difficult as hard AI... perhaps even amounting to solving the
halting problem.  So I've been thinking about a less general case -- I've
been trying to restrict the idea to data structures, and ignoring
algorithms (and hoping that "a data structure is a simple programming
language" won't come back to bite me).

For example, think about Java's List interface.  Its semantics can be
completely described by pre/post conditions.  There are several
implementations in Java's standard library, but they only involve
time/space tradeoffs, not actual behavior.  My goal would be to replace
all of the standard List implementations, and even some of the custom
implementations, with a single List *specification*.

I've tried to explain this idea before and people frequently think that
this List specification would compile down to one and only one
implementation.  That's not what I'm thinking at all.  I'm thinking that
the compiler would examine the context in which a List was used, and
*generate* a custom implementation tailored for that usage.  For example,
if the list needs to be persistent across program invocations, perhaps it
should be disk-backed and not actually memory resident at all.  Or maybe
it needs to be shared between two different address spaces, or two
different processor architectures.  Or maybe it becomes read-only after
creation.  Some of these things the compiler might figure out
automatically, some of them it might require the programmer to explicitly
mention.  But all of them would affect the implementation that actually
gets created.

Given that I'm posting this to the LL1 list, some people will probably
think that what I'm describing is most definitely *not* lightweight.  But
I would disagree -- to me, lightweight languages are all about being able
to ignore the details (e.g. GC).  And the choice of array versus linked
list is most definitely a detail.  I *am* currently thinking of this in
term of a language that has a definite compile-phase, where it can analyze
its options and come up with an optimal implementation.  But that's just
for my own convenience in trying to figure out how to do this in the first
place; it's not inherent in the idea at all.

One of the important things to realize is that a specification would
actually specify a *set* of implementations (e.g. a disk-backed list
versus a memory-backed list).  One could then imagine extending a
specification with even more constraints (e.g. accessing the element at a
particular index must be constant time).  This extension could be done by
"subclassing" the spec, or even on a variable-by-variable basis (i.e.
"this particular list should persist across program invocations").

Another objection that I've gotten when trying to explain this idea in the
past, is that it's not worth building such a complex compiler simply to be
able to get rid of a few data structure classes, especially since these
classes only have to be written once and then put in the standard libs.  
The thing is, I work on big programs -- usually 100K to 1M lines of code.  
Programs like these frequently have extremely complex data structures in
their core.  Over time, the program is optimized for speed, which usually
involves making the data structures even more complex (often increasing
the lines of code by a factor of 5 to 10 times).  At this point, the
algorithms become so infested with data structure optimization logic, that
they're very difficult to follow.

So my hope is to make a language where it's *impossible* for programmers
to layout their own data structures.  Instead, they'll only be able to
specify how it should behave, and the time/space constraints they want
satisfied.

This has gotten a lot longer than I intended, so I'll stop here, and
simply ask whether anybody knows of any existing work that sounds similar
to what I've described.  Most of the research I've found has been
influenced by logic languages, but I'm specifically interested in
supporting imperative programming.  Also all the research I've found has
focused on algorithms, not data structures (perhaps that should be telling
me something?)


Kimberley Burchett
Endeca, Inc