[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: text processing as *the* problem
KELLEHER,KEVIN (Non-HP-Roseville,ex1) wrote:
>
> As a language user, I am looking for a language that I can
> fall in love with, and have been following the appearance of
> new languages for several years. However, there is a problem
> space that seems neglected, and that is text processing.
>
> I am well acquainted with regular expressions and the sort of
> work that can be done with Perl, for example, but it does not
> have the sort of *feel* that I am looking for.
>
> Are there any languages, even big languages, that were *built*
> with text processing in mind? Are there approaches that are not
> limited to an implementation of regular-expression matching?
There are two or three that I'm aware of.
First, there's Icon, which is Griswold's successor to SNOBOL. To
make string processing easy, it has generators and coexpressions
built in, which are rather like Prolog-style backtracking or cooperative
threading. This makes it easy to express string matching in code in much
the same terms that you would describe to a human -- "find the third
'foo' in the file, back up two lines, and take the second word on
that line". (I'm describing it very badly, and simultaneously
overselling it and overstating its capabilities. The best thing to
do is to try out the language.)
Second, from the functional world are parser combinators. Parser
combinators are a set of higher-order functions that can be composed
in ways that strongly resemble the usual BNF rules for context-free
grammars. So you can programmatically "build up" a grammar for your
language (with associated semantic actions) and then call the parser
function you've constructed on a text stream.
Third, there's the sgrep utility, which was basically written to
be the next step up from regexes -- you can specify balanced
delimiters naturally. It's not a full language, but it is not
nearly as well known as it should be.
In all cases you suffer from the usual tradeoff -- the more powerful
your formalism for describing a language, the less you can prove
about it and the harder it gets to use. Consider the problem of fixing
shift-reduce conflicts in a YACC grammar versus figuring out why a
regex doesn't match, for example. The second is easier because regexes
are intrinsically less powerful. (IIRC Olin Shivers made this exact
point during his talk, actually.)
--
Neel Krishnaswami
neelk@cswcasa.com