[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Macros Make Me Mad

Michael Vanier writes:
> Given that, I'm curious as to why needle doesn't use s-expression
> syntax. For me, the big win with sexps is that macros don't look any
> different from standard function calls; only the evaluation is
> different (I'm excluding read-macros here).  My suspicion is that in
> a non-sexp-based syntax with operator precedence etc., macros would
> be much more likely to become incomprehensible.

Lisp s-expressions are basically arbitrary binary trees, and I wanted
to expose an actual AST instead as part of the macro model. Once
you've made this decision, there isn't a really big difference between
a surface s-exp and infix syntax.


As a complete digression, I'm going to talk about parsing infix
operators using recursive descent, without any fuss. Now, the normal
story of infix operators, associativity, and precedence in LL(1)
languages is very messy -- you need to have a production for each
precedence level and then do left factorization and left recursion
removal, which leads to some icky grammars. If you want the macro
system to reveal an AST based on the language's BNF grammar (as I do),
this is a very bad thing!

However, it's possible to evade all that cruft thanks to a very clever
trick I learned about on comp.compilers a few years ago. (The trick is
due to Keith Clarke, and I learned it from a webpage by Theodore
Norvell.) It's not a "big picture" language design thing, but is such
a pretty hack that I feel compelled to share.

First, let's look at an infix expression with operator precedence:

  3 + 4 * 6 ^ 2

If, for a moment, we forgot all about operator precedence and
associativity, we see a pattern in which every number is separated by
an operator. Or, we have a grammar that looks like this (using '*' to
mean Kleene closure):

 expr : number (binop number)*

This has no left-recursion, and is thus trivial to parse even with
naive recursive descent. But: the price is that we've lost all the
precedence information -- when we reach a binary operator, this
grammar doesn't give us any guidance on how to build a parse tree that
respects precedence. The solution is to put this information in a side
table that you can consult as you proceed during parsing. Our table
looks like this:

  Op  Assoc Prec
  --  ----- ----
   +   Left    1
   *   Left    2
   ^  Right    3

We can then rewrite the grammar as follows:

  expr   : exp(0)
  exp(p) : number (binop exp(q))*

The loop implied by (binop exp(q))* is terminated with the following

  1. If the next token is a binop with a precedence greater than p,
     then keep parsing.
  2. Otherwise, stop growing.

The p to parameterize the exp(p) rule is chosen as follows:

  1. Let prec = the precedence of the operator in question
  2. If the operator is left-associative, p = prec + 1
     If the operator is right-associative, p = prec

This isn't hugely complicated, but it takes a little while to convince
yourself that it is correct. (This description is much longer than the
actual implementation, which is only 28 lines of code!)

The payoff is this: you can add infix operators to your language more
or less at will, because all you need to do is specify the precedence
and associativity information with a new binary operator. It makes a
*great* combinator for a parser-combinator library, because it fits
naturally into a recursive-descent model, and makes infix binary
operators ridiculously easy to define.

As an aside, I've discovered that I'm a big fan of writing parsers
using recursive descent, in a combinatorial style. Recursive descent
forces the language to be LL(1), which prevents the more rococo
atrocities of syntax (eg, Ocaml, C++). Using combinators to abstract
out common patterns encourages one to make the language syntax more

Neel Krishnaswami