[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Optional types



The following is a message from David Simmons that he intended to post to this list but for some reasons his posts don't make it through. Copied below in its entirety:

---------------------8<------------------------
From: "David Simmons" <quasar@qks.com>
Subject: RE: Optional types
Newsgroups: comp.lang.smallscript,comp.lang.smallscript.advocacy,comp.lang.smallscript.aos
Date: Fri, 7 Dec 2001 17:32:35 -0800

=======================================================================
The following is a post from the MIT Dynamic Languages discussion group.
=======================================================================

Hmm. I just finished reading most of this whole thread on "Optional types"
for the first time.

A few comments based on my implementation and design experience with a
dynamic language VM, its object model, and a corresponding language that has
optional typing.

-------
TOPIC 1: "Compilation"
-------
The concept of "compilation" in a modern dynamic language virtual machine
means something different than one  might, at first, think.

In classic static languages we think of compilation as being a single
process for whole-cloth translation of source code into a binary
executable/linkable form. At which point the process ends, and any
interesting information about the program semantics are typically also
discarded [which is what makes the binaries fragile].

In modern dynamic language implementations, compilation from source code
produces a rich intermediate-code form which for all intents and purposes is
treated as a binary executable which depends on a well-known shared-library
[the VM/runtime]. That intermediate form retains (in a well designed system)
the important program semantics [extensible metadata]. The binary form may
then be "interpreted"; but more likely, for many reasons, it will be jitted
(just-in-time-compiled).

This latter notion of "JIT" takes on entirely new meaning when your
intermediate form is designed exclusively for jitting (which is very
reasonable since jitters for modern cpu's can be produced that have both
better size and performance than an interpreter).

In that model, the "static language notions" of "compilation and related
optimizations" shift (temporaly speaking) to become the responsibility of
the jit-architecture.

And runtime (type, constraint, etc) heuristics and the "optional type"
declarations are similarly shifted to play (almost) the "same" role in the
jit-compilation-process that one sees static-type declarations playing in
the static-language compilation process. We call this "adaptive
compilation", and it is a basic principle/feature of a modern dynamic
language VM design.

In a JIT architecture, sufficient information exists to restructure any
object and regenerate affected code accordingly. Since behaviors (classes or
prototypes) and methods are also objects, this means that methods can be
modified at any time, and the corresponding binary/machine code for those
methods can be regenerated at any time. More importantly, it means that such
code does not need to be generated until it will be invoked.

And any such generation or re-generation can incorporate heuristic or
explicit type information and has the option to then perform the EXACT same
semantic analysis that a static-language compiler could/would have done. But
it can do so anytime metadata/schema changes occur and therefore a modern
dynamic language VM architecture has the capability to eliminate the fragile
nature of static-compilation while also having the possibility of exceeding
the static compilation performance due to exploitation of cpu
cache-knowledge, inline-cache-dispatching, and utilization of heuristically
gathered type information.

What I am describing is not something "new". It has been used to one degree
or another in Smalltalk VM's since the mid-80's where it was pioneered.
There are quite a few OOPSLA papers on related topics in this space which
have "submerged" under the Java static-language tidal wave. It evolved from
Smalltalk into the Self work, which in turn came back as an adaptively
compiled Smalltalk VM known as "animorphic hotspot", which in turn was
bought by Sun for exclusive use with Java and became known as Java "hotspot"
minus the useful Smalltalk dynamic language features.


-------
Topic 2: Optional Typing (in a dynamic language VM)
-------
Based on my last ten years work in designing and evolving the AOS Platform
as a dynamic language VM technology, I have learned a few valuable lessons
regarding type facilities, namespaces, modules, interfaces, and foreign
function calling facilities.

Put simply, they are all intrinsically related and fundamentally required
within a dynamic language virtual machine architecture if the goal is to
have overall hi-performance including cross-language inter-op with static
languages.

An optional type declaration for a dynamic language VM's object model (at
least for the AOS platform) means something different than in a static
language compiler. The object-model already has 100% safe and accurate
information internally about the runtime-type of every object, so it does
not need nor does it trust what a human has declared regarding type.

Thus correctness for "safety" in compilation is not a factor like it would
be for a statically compiled language. Such information can be used
statically or with runtime gathered heuristics to inference and analyze a
design like one might do in a static type analysis -- we just shift the
temporal frame of reference for this process -- as mentioned indirectly in
the section above on "compilation".

Instead, a dynamic language object-model uses type information as a guide
with regard to the design intent (semantics) behind the code. In other words
it uses it to enforce additional execution semantics beyond the default
assumptions it would otherwise make.

For example, if you declare that a method's first argument should be an
<Integer> then the execution machinery will guarantee that the method will
*never* be invoked unless the first argument is an <Integer>.

Or, if you said that a methods first argument should be an object that
conforms to the <IStream> interface then it will guarantee that the method
will *never* be invoked unless the first argument conforms to the given
interface.

This is the multi-method binding predicate constraints. The type system
allows arbitrary type combinations including parameterized types. Some of
the key constraints include the ability to specify exact type matching
(ignore inheritance) so that you can require an argument must-be (or)
must-not-be an exact match to a member of an explicit type set within the
signature.

The object-model also uses optional type information to provide fully
extensible automatic foreign function marshalling on call-in and call-out.
But, again, it doesn't assume anything is "correct" about the human type
declarations. Instead, it uses them to guide the marshalling process by
requesting the declared types to marshal the supplied values dynamically.
Again, remembering that everything is an object, so types [behaviors] are
objects, types can have methods and therefore the type itself can handle the
marshalling. Which makes it all extensible and very fast (much faster than
Java JNI for example).

The object model presumes objects will contain both "value" and "reference"
types fields. What some terminology would refer to as a record/struct
(value-fields), or OOP-slots (reference-fields). The optional type system
allows all fields to be fully typed with extensible annotations. For
value-types the object model uses the type information to compose the struct
portion of an object and correspondingly this is used generate accessor
operations via methods on a value-type (which the JIT can optimize).

The object model has an extensible mechanism for representing the internal
forms of objects, which includes safe interior-pointers that integrate
efficiently in the GC system. Similarly structs form a transparent means of
directly passing in/out to classic struct based languages like C/C++ with no
marshalling required at all -- the GC supports zero-cost automatic pinning
to make it fully safe in the pre-emptive multi-threaded architecture of the
AOS Platform.

So we also see optional type declarations providing a dynamic language VM's
object model with direct facilities for declaring and managing structures in
a fashion that is fully compatible with and interchangeable with that seen
for structs in Pascal, C, C++, etc.

In summary here, I would strongly suggest that what is considered to be "an
optional type system" should really be characterized as an "extensible
annotation system" that encompasses types.

NOTE: Behavior's are types. Classes are namespaces. Modules are classes.
Mixins are a special category of subclasses of Object. Interfaces are a
subclass of Mixins.

P.S., you're welcome to download the SmallScript system and play with these
ideas in a working system. The Native AOS Platform VM has no ($) cost
associated with it [its free], and that will remain true into the future.

Cheers,

-- Dave S. [SmallScript LLC / www.smallscript.org]

SmallScript for the AOS & .NET Platforms David.Simmons@SmallScript.com |
http://www.smallscript.org


> -----Original Message-----
> From: owner-ll1-discuss@ai.mit.edu
[mailto:owner-ll1-discuss@ai.mit.edu]
> On Behalf Of Paul Prescod
> Sent: Friday, December 07, 2001 3:10 PM
> To: ll1-discuss@ai.mit.edu
> Subject: Re: Optional types
>
> Eric Kidd wrote:
> >
> > 1) Every object belongs to a class.  Class membership--even for
> > parameterized classes--can be tested in a handful of instructions.
>
> Here's the problem in the Python world. Python people are very used to
> being able to say: okay, X is not *really* a Y, but it looks enough
like
> a Y to pass. Certainly this is common in any dynamically typed
language.
> Now imagine module implementor decides to define an interface using
> static declarations for all of the maintenance and performance
benefits.
> All of a sudden, the programmer can't pull this looks-like-a trick
> anymore.
>
> One approach is to go down the contraint path, but you lose O(1)
> typecheck performance. Another approach is to have some kind of "trust
> me" keyword but you force the programmer to add the keyword all over
the
> place. Yet another approach is to default to "trust me" but allow a
> programmer to say that they WANT stricter checking (perhaps choosing
> whether to do it dynamically or statically).
>
> What do existing mixed-mode languages do to allow this style of
> programming?
>
>  Paul Prescod
---------------------8<------------------------