[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: text processing as *the* problem
[Dan Sugalski:]
I've been thinking that regexes would be really useful if they could be
extended out past the character level to the token level, so I could write
something like:
if (/(\{Adj})*\{Noun}/) {
...
}
[Don Blaheta:]
One deficiency in the usual regex model is that sometimes you want to
run a nested search-and-replace. That is, search for certain strings,
then perform a s/// (or several in sequence!) on only those substrings.
The boring response that first came to my mind is: these are jobs for
a parser, not a regex engine. However, in contemplating the boring
response, I came up with some follow-up questions:
(1) How can a language designer protect the user (i.e., the person who
will write programs in that language) from caring about the
difference between lexing and parsing? In theory, you could have
a language with a single syntax for both, and a "super-regex
engine" that is really a parser generator. But the simple-minded
way to implement that -- translate the user's super-regex into a
grammar for the parser generator, break up the input string into
single-character tokens, and feed the input through the parser
with the given grammar -- is probably not very efficient.
(2) How can a parser generator create parsers that are easier to
debug? The last time I had to debug a parser generated by
Parse::RecDescent or byacc -P, it was an awful pain. The path
from an error message saying "Expected a FOO token but got a BAR
instead" or "shift-reduce conflict in state 17" to the source of
the mistake in my grammar specification went through a maze of
trace statements or generated code.
--
"...conventional economic concepts of scarcity apply to intellectual
activity. For example, you are wasting all of our attention with this
extremist blather." --Stephen J. Turnbull
== Seth Gordon == sethg@ropine.com == http://ropine.com/ == std. disclaimer ==