[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Macros Make Me Mad



At first glance your parsing scheme seems similar to

   Compact Recursive-Descent Parsing of Expressions, Software-Practice
and Experience 15 (12), 1205-1212, Dec. 1985.


Todd

-----Original Message-----
From: Neel Krishnaswami [mailto:neelk@alum.mit.edu] 
Sent: Sunday, November 17, 2002 5:34 PM
To: ll1-discuss@ai.mit.edu
Subject: Re: Macros Make Me Mad


Michael Vanier writes:
>
> Given that, I'm curious as to why needle doesn't use s-expression 
> syntax. For me, the big win with sexps is that macros don't look any 
> different from standard function calls; only the evaluation is 
> different (I'm excluding read-macros here).  My suspicion is that in a

> non-sexp-based syntax with operator precedence etc., macros would be 
> much more likely to become incomprehensible.

Lisp s-expressions are basically arbitrary binary trees, and I wanted to
expose an actual AST instead as part of the macro model. Once you've
made this decision, there isn't a really big difference between a
surface s-exp and infix syntax.

-*-*-

As a complete digression, I'm going to talk about parsing infix
operators using recursive descent, without any fuss. Now, the normal
story of infix operators, associativity, and precedence in LL(1)
languages is very messy -- you need to have a production for each
precedence level and then do left factorization and left recursion
removal, which leads to some icky grammars. If you want the macro system
to reveal an AST based on the language's BNF grammar (as I do), this is
a very bad thing!

However, it's possible to evade all that cruft thanks to a very clever
trick I learned about on comp.compilers a few years ago. (The trick is
due to Keith Clarke, and I learned it from a webpage by Theodore
Norvell.) It's not a "big picture" language design thing, but is such a
pretty hack that I feel compelled to share.

First, let's look at an infix expression with operator precedence:

  3 + 4 * 6 ^ 2

If, for a moment, we forgot all about operator precedence and
associativity, we see a pattern in which every number is separated by an
operator. Or, we have a grammar that looks like this (using '*' to mean
Kleene closure):

 expr : number (binop number)*

This has no left-recursion, and is thus trivial to parse even with naive
recursive descent. But: the price is that we've lost all the precedence
information -- when we reach a binary operator, this grammar doesn't
give us any guidance on how to build a parse tree that respects
precedence. The solution is to put this information in a side table that
you can consult as you proceed during parsing. Our table looks like
this:

  Op  Assoc Prec
  --  ----- ----
   +   Left    1
   *   Left    2
   ^  Right    3

We can then rewrite the grammar as follows:

  expr   : exp(0)
  exp(p) : number (binop exp(q))*

The loop implied by (binop exp(q))* is terminated with the following
rule:

  1. If the next token is a binop with a precedence greater than p,
     then keep parsing.
  2. Otherwise, stop growing.

The p to parameterize the exp(p) rule is chosen as follows:

  1. Let prec = the precedence of the operator in question
  2. If the operator is left-associative, p = prec + 1
     If the operator is right-associative, p = prec

This isn't hugely complicated, but it takes a little while to convince
yourself that it is correct. (This description is much longer than the
actual implementation, which is only 28 lines of code!)

The payoff is this: you can add infix operators to your language more or
less at will, because all you need to do is specify the precedence and
associativity information with a new binary operator. It makes a
*great* combinator for a parser-combinator library, because it fits
naturally into a recursive-descent model, and makes infix binary
operators ridiculously easy to define.

As an aside, I've discovered that I'm a big fan of writing parsers using
recursive descent, in a combinatorial style. Recursive descent forces
the language to be LL(1), which prevents the more rococo atrocities of
syntax (eg, Ocaml, C++). Using combinators to abstract out common
patterns encourages one to make the language syntax more regular.

-- 
Neel Krishnaswami
neelk@alum.mit.edu