[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Paul Graham's PyCon Keynote



[Paul Graham <pg@archub.org>]
> There's now an article derived from this talk at
> http://www.paulgraham.com/hundred.html  --pg
> 


1) "If Moore's Law continues to put out..."
   This is a great expression...

2) "How far will this flattening of data structures go? I can think of
   possibilities that shock even me, with my conscientiously broadened
   mind. Will we get rid of arrays, for example? After all, they're just
   a subset of hash tables where the keys are vectors of integers. Will
   we replace hash tables themselves with lists?"

   A lot of this is actually the approach taken by Haskell (and, I
   imagine, other related languages), in which the only real data
   construct is the algrabraic (tagged) data type. Everything else is 
   basically syntactic sugar (or "semantic sugar", which I'll explain
   in a bit).
   
   So:
     - strings are just lists of characters, and all lists of characters are
       printed as strings by default. It's literally just a type alias.
     - lists are just:
         data List x = [] | Cons x (List x)
       which defines cons cells and the list terminator in the obvious way
       (a list of x's is either nil or a pair of an x and a list of x's).
       The parser also provides syntactic sugar for this data-type in
       that [1, 2, 3] == Cons 1 (Cons 2 (Cons 3 [])). Of course, one
       could (and probably should) argue that a more general solution
       would be to define [...] as a macro, so you could redefine it or add
       similar constructs.
     - integers are, _in_principle_, just constructors for another algebraic
       data type. Of course, it's one that we could never write,
       because it has infinitely many constructors (or, in a practical
       computer, a finite but enormously large number of constructors),
       but conceptually, we can say:
         data Integer = 0 | 1 | -1 | 2 | -2 | 3 | -3 | ...

   This approach has many nice features. The reasoning goes something like
   this: since there's absolutely nothing special about
   lists/strings/etc., compiler designers needs to really spend time
   figuring out how to implement the general case of algebraic data in
   the best possible way. This has a huge payoff for the programmer,
   who knows that once the compiler designer has figured out how to
   get good performance for strings and lists, that performance will
   carry over into his own custom data types. This is in sharp
   contrast to, for example, Java, where built-in operations on primitive
   types are expected to be much more efficient than any operations I
   could write on semantically similar custom types.

   The basic insight is that whatever mechanism you give the
   programmer to define his data types should be efficient and
   powerful, and making sure that this is true is central to writing a
   decent compiler/interpreter. Both to commit to acheiving this goal,
   and as a litmus test for your success, you should use this same
   mechanism to define all built-in types.

   In Haskell, this has worked remarkably well.

   A brief aside on the subject of arrays/hash tables. What you
   describe is more or less the approach taken by Standard ML, where
   vectors (arrays) are really just syntactic sugar for records (hash
   tables) with integer keys... So, for instance:
     - (5, "baz");
     val it = (5,"baz") : int * string
     - {foo=5, bar="baz"};
     val it = {bar="baz",foo=5} : {bar:string, foo:int}
     - {1=5, 2="baz"};
     val it = (5,"baz") : int * string
   Note that "int * string" is really just a type alias for
   "{1:int, 2:string}"... Pretty neat! And of course, the actual core
   language is defined solely in terms of the general case (records),
   with the restricted case given as a derived form.

Matt

-- 
Matt Hellige                  matt@immute.net
http://matt.immute.net