[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: text processing as *the* problem

KELLEHER,KEVIN (Non-HP-Roseville,ex1) wrote:
> As a language user, I am looking for a language that I can 
> fall in love with, and have been following the appearance of 
> new languages for several years. However, there is a problem 
> space that seems neglected, and that is text processing.
> I am well acquainted with regular expressions and the sort of 
> work that can be done with Perl, for example, but it does not 
> have the sort of *feel* that I am looking for.
> Are there any languages, even big languages, that were *built* 
> with text processing in mind?  Are there approaches that are not 
> limited to an implementation of regular-expression matching?

There are two or three that I'm aware of. 

First, there's Icon, which is Griswold's successor to SNOBOL. To 
make string processing easy, it has generators and coexpressions 
built in, which are rather like Prolog-style backtracking or cooperative 
threading. This makes it easy to express string matching in code in much
the same terms that you would describe to a human -- "find the third
'foo' in the file, back up two lines, and take the second word on 
that line". (I'm describing it very badly, and simultaneously 
overselling it and overstating its capabilities. The best thing to 
do is to try out the language.)

Second, from the functional world are parser combinators. Parser
combinators are a set of higher-order functions that can be composed
in ways that strongly resemble the usual BNF rules for context-free
grammars. So you can programmatically "build up" a grammar for your
language (with associated semantic actions) and then call the parser
function you've constructed on a text stream. 

Third, there's the sgrep utility, which was basically written to
be the next step up from regexes -- you can specify balanced 
delimiters naturally. It's not a full language, but it is not 
nearly as well known as it should be. 
In all cases you suffer from the usual tradeoff -- the more powerful
your formalism for describing a language, the less you can prove 
about it and the harder it gets to use. Consider the problem of fixing
shift-reduce conflicts in a YACC grammar versus figuring out why a
regex doesn't match, for example. The second is easier because regexes
are intrinsically less powerful. (IIRC Olin Shivers made this exact 
point during his talk, actually.)

Neel Krishnaswami