[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Syntax extension of infix languages

I'm stuck in a terrible bind. :-/ 

I'm designing an infix language, and I want to support syntax
extension.  There are some interesting papers in the literature, but
most of them seem to have rough edges. 

I'm aware of two major approaches. 

Extensible Grammars 

I'm vaguely aware of some languages that allow programmers to add new
BNF productions to the parser.  These new productions are then
transformed by user-specified patterns (or code) into a parse tree in
the base language. 

Presumably, the transformations could either be procedural or based on
declarative rewrite rules. 

Is anybody familiar with these systems?  Can programmers use them
successfully?  What are the biggest drawbacks? 

I've never used any of these systems, so I don't have good intuitions
about where they work and where they fail.  If anybody has pointers to
major papers--or better yet, downloadable compilers--I'd be in your

The Dylan/JSE Approach 

The Dylan macro system takes a very different approach: 

A) The program is parsed using a "phrase" grammar.  This grammar doesn't
understand much beyond raw tokens and balanced delimiters.  The output
is (essentially) a tree with tokens for leaves, and balanced delimiters
for interior nodes. 

Note that the phrase grammar recognizes the beginning and end of macros,
and treats them as balanced delimiters.  For example: 

  1 + if (x) 2 else 3 end + 4 


  program('1', '+', macro:if(parens('x'), '2', 'else', '3'), '+', '4') 

Bachrach and Playford refer to this as a "skeleton syntax tree" (SST). 

B) The parser walks down the SST, parsing built in forms and looking for
macros to expand. 

C) When the parser finds a macro, it compares the body to a number of
rewrite rules: 

  { if (?e:expression) ?then:body else ?else:body end } 
    => { case 
           ?e  => ?then; 
           otherwise => ?else; 
         end } 
  { if (?e:expression) ?then:body end } 
    => { case 
           ?e => ?then; 
           otherwise => #f; 
         end } 

The left-hand side of these rules explain how to parse the body of
'if'.  The compiler tries to match each rule in turn, using some fairly
hairy rules that attempt to Do The Right Thing.  The ':expression' and
':body' constraints cause the matching tokens to be recursively parsed
according to the BNF grammar (including recursive macro expansion). 

The first matching rule is used. 

D) The right-hand side of each rule is (essentially) a backquote
containing raw tokens and variable substitutions.  The values of ?e and
friends are dumped into the new token stream as opaque, fully-parsed
program constituents. 

E) The output of the macro is reparsed.

Jonathan Bachrach and Keith Playford have designed a procedural
macro-expander for Java based on this approach (see
http://www.ai.mit.edu/~jrb/jse/index.htm ).  It's very sweet; definitely
a large advance beyond Dylan's system.

However, I'm not comfortable with the ad hoc matching in step (C).  I've
been burnt too many times in Dylan by bizarre match behavior, and wonder
whether the Dylan/JSE approach could be adapted to work with a more
formalized matching algorithm.