[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: text processing as the problem

To: address@hidden
Subject: Re: text processing as *the* problem
From: "Seth Gordon" <address@hidden>
Date: 29 Nov 2001 17:28:35 -0000
Cc: address@hidden, address@hidden, address@hidden
In-reply-to: <5.1.0.14.2.20011128214935.01c27b10@pop.sidhe.org> (message fromDan Sugalski on Wed, 28 Nov 2001 22:11:58 -0500)
References: <5.1.0.14.2.20011128214935.01c27b10@pop.sidhe.org>
Sender: address@hidden

   [Dan Sugalski:]
   I've been thinking that regexes would be really useful if they could be 
   extended out past the character level to the token level, so I could write 
   something like:

       if (/(\{Adj})*\{Noun}/) {
	 ...
       }

   [Don Blaheta:]
   One deficiency in the usual regex model is that sometimes you want to
   run a nested search-and-replace.  That is, search for certain strings,
   then perform a s/// (or several in sequence!) on only those substrings.

The boring response that first came to my mind is: these are jobs for
a parser, not a regex engine.  However, in contemplating the boring
response, I came up with some follow-up questions:

(1) How can a language designer protect the user (i.e., the person who
    will write programs in that language) from caring about the
    difference between lexing and parsing?  In theory, you could have
    a language with a single syntax for both, and a "super-regex
    engine" that is really a parser generator.  But the simple-minded
    way to implement that -- translate the user's super-regex into a
    grammar for the parser generator, break up the input string into
    single-character tokens, and feed the input through the parser
    with the given grammar -- is probably not very efficient.

(2) How can a parser generator create parsers that are easier to
    debug?  The last time I had to debug a parser generated by
    Parse::RecDescent or byacc -P, it was an awful pain.  The path
    from an error message saying "Expected a FOO token but got a BAR
    instead" or "shift-reduce conflict in state 17" to the source of
    the mistake in my grammar specification went through a maze of
    trace statements or generated code.

-- 
"...conventional economic concepts of scarcity apply to intellectual
activity.  For example, you are wasting all of our attention with this
extremist blather."  --Stephen J. Turnbull
== Seth Gordon == sethg@ropine.com == http://ropine.com/ == std. disclaimer ==

References:
- Re: text processing as *the* problem
  - From: Dan Sugalski <dan@sidhe.org>

Prev by Date: Re: [NOISE] Curly braces [was Re: Curl]
Next by Date: Re: Ruby and OO (fwd)
Previous by thread: RE: text processing as *the* problem
Next by thread: Re: text processing as *the* problem
Index(es):
- Date
- Thread

Re: text processing as *the* problem

Re: text processing as the problem