# Re: Paul Graham's PyCon Keynote

```[Paul Graham <pg@archub.org>]
> There's now an article derived from this talk at
> http://www.paulgraham.com/hundred.html  --pg
>

1) "If Moore's Law continues to put out..."
This is a great expression...

2) "How far will this flattening of data structures go? I can think of
possibilities that shock even me, with my conscientiously broadened
mind. Will we get rid of arrays, for example? After all, they're just
a subset of hash tables where the keys are vectors of integers. Will
we replace hash tables themselves with lists?"

A lot of this is actually the approach taken by Haskell (and, I
imagine, other related languages), in which the only real data
construct is the algrabraic (tagged) data type. Everything else is
basically syntactic sugar (or "semantic sugar", which I'll explain
in a bit).

So:
- strings are just lists of characters, and all lists of characters are
printed as strings by default. It's literally just a type alias.
- lists are just:
data List x = [] | Cons x (List x)
which defines cons cells and the list terminator in the obvious way
(a list of x's is either nil or a pair of an x and a list of x's).
The parser also provides syntactic sugar for this data-type in
that [1, 2, 3] == Cons 1 (Cons 2 (Cons 3 [])). Of course, one
could (and probably should) argue that a more general solution
would be to define [...] as a macro, so you could redefine it or add
similar constructs.
- integers are, _in_principle_, just constructors for another algebraic
data type. Of course, it's one that we could never write,
because it has infinitely many constructors (or, in a practical
computer, a finite but enormously large number of constructors),
but conceptually, we can say:
data Integer = 0 | 1 | -1 | 2 | -2 | 3 | -3 | ...

This approach has many nice features. The reasoning goes something like
this: since there's absolutely nothing special about
lists/strings/etc., compiler designers needs to really spend time
figuring out how to implement the general case of algebraic data in
the best possible way. This has a huge payoff for the programmer,
who knows that once the compiler designer has figured out how to
get good performance for strings and lists, that performance will
carry over into his own custom data types. This is in sharp
contrast to, for example, Java, where built-in operations on primitive
types are expected to be much more efficient than any operations I
could write on semantically similar custom types.

The basic insight is that whatever mechanism you give the
programmer to define his data types should be efficient and
powerful, and making sure that this is true is central to writing a
decent compiler/interpreter. Both to commit to acheiving this goal,
and as a litmus test for your success, you should use this same
mechanism to define all built-in types.

In Haskell, this has worked remarkably well.

A brief aside on the subject of arrays/hash tables. What you
describe is more or less the approach taken by Standard ML, where
vectors (arrays) are really just syntactic sugar for records (hash
tables) with integer keys... So, for instance:
- (5, "baz");
val it = (5,"baz") : int * string
- {foo=5, bar="baz"};
val it = {bar="baz",foo=5} : {bar:string, foo:int}
- {1=5, 2="baz"};
val it = (5,"baz") : int * string
Note that "int * string" is really just a type alias for
"{1:int, 2:string}"... Pretty neat! And of course, the actual core
language is defined solely in terms of the general case (records),
with the restricted case given as a derived form.

Matt

--
Matt Hellige                  matt@immute.net
http://matt.immute.net

```