[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Optional types



Hmm. I just finished reading most of this whole thread on "Optional
types" for the first time.

A few comments based on my implementation and design experience with a
dynamic language VM, its object model, and a corresponding language that
has optional typing.

-------
TOPIC 1: "Compilation"
-------
The concept of "compilation" in a modern dynamic language virtual
machine means something different than one  might, at first, think.

In classic static languages we think of compilation as being a single
process for whole-cloth translation of source code into a binary
executable/linkable form. At which point the process ends, and any
interesting information about the program semantics are typically also
discarded [which is what makes the binaries fragile].

In modern dynamic language implementations, compilation from source code
produces a rich intermediate-code form which for all intents and
purposes is treated as a binary executable which depends on a well-known
shared-library [the VM/runtime]. That intermediate form retains (in a
well designed system) the important program semantics [extensible
metadata]. The binary form may then be "interpreted"; but more likely,
for many reasons, it will be jitted (just-in-time-compiled).

This latter notion of "JIT" takes on entirely new meaning when your
intermediate form is designed exclusively for jitting (which is very
reasonable since jitters for modern cpu's can be produced that have both
better size and performance than an interpreter).

In that model, the "static language notions" of "compilation and related
optimizations" shift (temporaly speaking) to become the responsibility
of the jit-architecture. 

And runtime (type, constraint, etc) heuristics and the "optional type"
declarations are similarly shifted to play (almost) the "same" role in
the jit-compilation-process that one sees static-type declarations
playing in the static-language compilation process. We call this
"adaptive compilation", and it is a basic principle/feature of a modern
dynamic language VM design.

In a JIT architecture, sufficient information exists to restructure any
object and regenerate affected code accordingly. Since behaviors
(classes or prototypes) and methods are also objects, this means that
methods can be modified at any time, and the corresponding
binary/machine code for those methods can be regenerated at any time.
More importantly, it means that such code does not need to be generated
until it will be invoked. 

And any such generation or re-generation can incorporate heuristic or
explicit type information and has the option to then perform the EXACT
same semantic analysis that a static-language compiler could/would have
done. But it can do so anytime metadata/schema changes occur and
therefore a modern dynamic language VM architecture has the capability
to eliminate the fragile nature of static-compilation while also having
the possibility of exceeding the static compilation performance due to
exploitation of cpu cache-knowledge, inline-cache-dispatching, and
utilization of heuristically gathered type information.

What I am describing is not something "new". It has been used to one
degree or another in Smalltalk VM's since the mid-80's where it was
pioneered. There are quite a few OOPSLA papers on related topics in this
space which have "submerged" under the Java static-language tidal wave.
It evolved from Smalltalk into the Self work, which in turn came back as
an adaptively compiled Smalltalk VM known as "animorphic hotspot", which
in turn was bought by Sun for exclusive use with Java and became known
as Java "hotspot" minus the useful Smalltalk dynamic language features.


-------
Topic 2: Optional Typing (in a dynamic language VM)
-------
Based on my last ten years work in designing and evolving the AOS
Platform as a dynamic language VM technology, I have learned a few
valuable lessons regarding type facilities, namespaces, modules,
interfaces, and foreign function calling facilities. 

Put simply, they are all intrinsically related and fundamentally
required within a dynamic language virtual machine architecture if the
goal is to have overall hi-performance including cross-language inter-op
with static languages.

An optional type declaration for a dynamic language VM's object model
(at least for the AOS platform) means something different than in a
static language compiler. The object-model already has 100% safe and
accurate information internally about the runtime-type of every object,
so it does not need nor does it trust what a human has declared
regarding type. 

Thus correctness for "safety" in compilation is not a factor like it
would be for a statically compiled language. Such information can be
used statically or with runtime gathered heuristics to inference and
analyze a design like one might do in a static type analysis -- we just
shift the temporal frame of reference for this process -- as mentioned
indirectly in the section above on "compilation".

Instead, a dynamic language object-model uses type information as a
guide with regard to the design intent (semantics) behind the code. In
other words it uses it to enforce additional execution semantics beyond
the default assumptions it would otherwise make.

For example, if you declare that a method's first argument should be an
<Integer> then the execution machinery will guarantee that the method
will *never* be invoked unless the first argument is an <Integer>.

Or, if you said that a methods first argument should be an object that
conforms to the <IStream> interface then it will guarantee that the
method will *never* be invoked unless the first argument conforms to the
given interface. 

This is the multi-method binding predicate constraints. The type system
allows arbitrary type combinations including parameterized types. Some
of the key constraints include the ability to specify exact type
matching (ignore inheritance) so that you can require an argument
must-be (or) must-not-be an exact match to a member of an explicit type
set within the signature.

The object-model also uses optional type information to provide fully
extensible automatic foreign function marshalling on call-in and
call-out. But, again, it doesn't assume anything is "correct" about the
human type declarations. Instead, it uses them to guide the marshalling
process by requesting the declared types to marshal the supplied values
dynamically. Again, remembering that everything is an object, so types
[behaviors] are objects, types can have methods and therefore the type
itself can handle the marshalling. Which makes it all extensible and
very fast (much faster than Java JNI for example).

The object model presumes objects will contain both "value" and
"reference" types fields. What some terminology would refer to as a
record/struct (value-fields), or OOP-slots (reference-fields). The
optional type system allows all fields to be fully typed with extensible
annotations. For value-types the object model uses the type information
to compose the struct portion of an object and correspondingly this is
used generate accessor operations via methods on a value-type (which the
JIT can optimize). 

The object model has an extensible mechanism for representing the
internal forms of objects, which includes safe interior-pointers that
integrate efficiently in the GC system. Similarly structs form a
transparent means of directly passing in/out to classic struct based
languages like C/C++ with no marshalling required at all -- the GC
supports zero-cost automatic pinning to make it fully safe in the
pre-emptive multi-threaded architecture of the AOS Platform.

So we also see optional type declarations providing a dynamic language
VM's object model with direct facilities for declaring and managing
structures in a fashion that is fully compatible with and
interchangeable with that seen for structs in Pascal, C, C++, etc.

In summary here, I would strongly suggest that what is considered to be
"an optional type system" should really be characterized as an
"extensible annotation system" that encompasses types.

NOTE: Behavior's are types. Classes are namespaces. Modules are classes.
Mixins are a special category of subclasses of Object. Interfaces are a
subclass of Mixins.

P.S., you're welcome to download the SmallScript system and play with
these ideas in a working system. The Native AOS Platform VM has no ($)
cost associated with it [its free], and that will remain true into the
future.

Cheers,

-- Dave S. [SmallScript LLC / www.smallscript.org]

SmallScript for the AOS & .NET Platforms
David.Simmons@SmallScript.com | http://www.smallscript.org


> -----Original Message-----
> From: owner-ll1-discuss@ai.mit.edu
[mailto:owner-ll1-discuss@ai.mit.edu]
> On Behalf Of Paul Prescod
> Sent: Friday, December 07, 2001 3:10 PM
> To: ll1-discuss@ai.mit.edu
> Subject: Re: Optional types
> 
> Eric Kidd wrote:
> >
> > 1) Every object belongs to a class.  Class membership--even for
> > parameterized classes--can be tested in a handful of instructions.
> 
> Here's the problem in the Python world. Python people are very used to
> being able to say: okay, X is not *really* a Y, but it looks enough
like
> a Y to pass. Certainly this is common in any dynamically typed
language.
> Now imagine module implementor decides to define an interface using
> static declarations for all of the maintenance and performance
benefits.
> All of a sudden, the programmer can't pull this looks-like-a trick
> anymore.
> 
> One approach is to go down the contraint path, but you lose O(1)
> typecheck performance. Another approach is to have some kind of "trust
> me" keyword but you force the programmer to add the keyword all over
the
> place. Yet another approach is to default to "trust me" but allow a
> programmer to say that they WANT stricter checking (perhaps choosing
> whether to do it dynamically or statically).
> 
> What do existing mixed-mode languages do to allow this style of
> programming?
> 
>  Paul Prescod