[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: text processing as *the* problem



   [Dan Sugalski:]
   I've been thinking that regexes would be really useful if they could be 
   extended out past the character level to the token level, so I could write 
   something like:

       if (/(\{Adj})*\{Noun}/) {
	 ...
       }

   [Don Blaheta:]
   One deficiency in the usual regex model is that sometimes you want to
   run a nested search-and-replace.  That is, search for certain strings,
   then perform a s/// (or several in sequence!) on only those substrings.

The boring response that first came to my mind is: these are jobs for
a parser, not a regex engine.  However, in contemplating the boring
response, I came up with some follow-up questions:

(1) How can a language designer protect the user (i.e., the person who
    will write programs in that language) from caring about the
    difference between lexing and parsing?  In theory, you could have
    a language with a single syntax for both, and a "super-regex
    engine" that is really a parser generator.  But the simple-minded
    way to implement that -- translate the user's super-regex into a
    grammar for the parser generator, break up the input string into
    single-character tokens, and feed the input through the parser
    with the given grammar -- is probably not very efficient.

(2) How can a parser generator create parsers that are easier to
    debug?  The last time I had to debug a parser generated by
    Parse::RecDescent or byacc -P, it was an awful pain.  The path
    from an error message saying "Expected a FOO token but got a BAR
    instead" or "shift-reduce conflict in state 17" to the source of
    the mistake in my grammar specification went through a maze of
    trace statements or generated code.

-- 
"...conventional economic concepts of scarcity apply to intellectual
activity.  For example, you are wasting all of our attention with this
extremist blather."  --Stephen J. Turnbull
== Seth Gordon == sethg@ropine.com == http://ropine.com/ == std. disclaimer ==