[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Syntax extension of infix languages

Eric Kidd <eric.kidd@pobox.com> writes:

> I'm stuck in a terrible bind. :-/ 
> I'm designing an infix language, and I want to support syntax
> extension.  There are some interesting papers in the literature, but
> most of them seem to have rough edges. 
> ...
> The Dylan/JSE Approach 
> ---------------------- 
> ...
> C) When the parser finds a macro, it compares the body to a number of
> rewrite rules: 
>   { if (?e:expression) ?then:body else ?else:body end } 
>     => { case 
>            ?e  => ?then; 
>            otherwise => ?else; 
>          end } 
>   { if (?e:expression) ?then:body end } 
>     => { case 
>            ?e => ?then; 
>            otherwise => #f; 
>          end } 
> The left-hand side of these rules explain how to parse the body of
> 'if'.  The compiler tries to match each rule in turn, using some fairly
> hairy rules that attempt to Do The Right Thing.  The ':expression' and
> ':body' constraints cause the matching tokens to be recursively parsed
> according to the BNF grammar (including recursive macro expansion). 

I wouldn't necessarily call the rules hairy:

  1. Pattern matching on a particular pattern proceeds from left to right. 
  2. Patterns match if and only if all of their subpatterns match.
  3. Non-pattern variable tokens appearing in a pattern match only the
     same token in the input.
  4. A largest first priority is used for matching non-wildcard
     variables.  A simple backup and retry algorithm is used to try to
     find a match by binding the variable to fewer and fewer tokens.
  5. A shortest first priority is used for wildcard variables.  
  6. Simple folding rules are applied to commas to make matching lists

but then you tell me.

my analysis is as follows. Rules 1-3 are intuitive.  rule 4 is
motivated by the fact that a parser would consume the biggest valid
input as well.  thus 'f(1, 2)' would have priority over 'f' (e.g.,
:expression).  this seems simple and understandable.  if you want the
'f' by itself you would have use the :name constraint.  rule 5 is
perhaps the most contentious but again it seems to match one's
intuitions in that wildcards should consume just enough to make the
whole rule match like tex's stretchy boxes. for example it makes sense
that in the following pattern, { start ?wild:* end }, ?wild matches
the minimum number of tokens between start and end.  now rule 6 might
be a bit ad hoc but the whole focus of JSE is on convenience.

> The first matching rule is used. 
> ...
> E) The output of the macro is reparsed.
> Jonathan Bachrach and Keith Playford have designed a procedural
> macro-expander for Java based on this approach (see
> http://www.ai.mit.edu/~jrb/jse/index.htm ).  It's very sweet; definitely
> a large advance beyond Dylan's system.
> However, I'm not comfortable with the ad hoc matching in step (C).  I've
> been burnt too many times in Dylan by bizarre match behavior, 

it would be interesting to hear of some actual cases of where the
match behavior was 'bizarre'.  perhaps the problem is that dylan's
macro system documentation is pretty daunting.  another thing that
would help would be some design patterns on writing rules.  finally, i
would like to know whether some of the extra complexity of dylan's
rewrite rule only system caused some of these problems for you and
whether you would have similar troubles with JSE.

> and wonder whether the Dylan/JSE approach could be adapted to work
> with a more formalized matching algorithm.

part of the point was to avoid having to dive into full blown AST and
grammar idiosyncrasies.  the 'simple' pattern matching factors the
matching problem shielding the programming from the complexity of
constraints details.  i would though be interested in what you might
have in mind when you talk about formalized matching algorithms.