[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: text processing as the problem

To: <address@hidden>, "Dan Sugalski" <address@hidden>
Subject: Re: text processing as *the* problem
From: "Oliver Steele" <address@hidden>
Date: Sun, 2 Dec 2001 21:33:01 -0500
References: <5.1.0.14.2.20011128214935.01c27b10@pop.sidhe.org>
Sender: address@hidden
Xref: oroboros.ai.mit.edu ll1-discuss:251

"Dan Sugalski" <dan@sidhe.org> writes:

> I've been thinking that regexes would be really useful if they could be
> extended out past the character level to the token level, so I could write
> something like:
>
>     if (/(\{Adj})*\{Noun}/) {
>       ...
>     }
>
> to match a string that's got zero or more adjectives preceding a noun.
(And
> yes, I'm painfully aware of how difficult the classification of words into
> parts of speech is--it's an example easily understandable by us but not
> necessarily easy to implement) When contemplating how to make Parrot's
> parser, I've really wanted to be able to do regular expressions against
> streams of tokens rather than streams of characters.

I've got a Python library that lets you do this: you can write
  if gre.match("adj* noun", tokens): ...
to match against a sequence of objects that have 'type' attributes with
values such as 'adj' or 'noun'.  (You can also compile an "adj* noun"
pattern strings into a DFA and use it later, or build a chart parser out of
a set of patterns, etc.)

Let me know if anyone's interested in beta testing this, and it will give me
an incentive to dust it off and finish documenting it.

Follow-Ups:
- Re: text processing as *the* problem
  - From: "Ronald D Stephens" <rdsteph@earthlink.net>
- Re: text processing as *the* problem
  - From: Terrence Brannon <metaperl@mac.com>

References:
- Re: text processing as *the* problem
  - From: Dan Sugalski <dan@sidhe.org>

Prev by Date: Re: Dylan (was: ARC)
Next by Date: Re: Book again, was Dylan (was: ARC)
Previous by thread: Re: text processing as *the* problem
Next by thread: Re: text processing as *the* problem
Index(es):
- Date
- Thread

Re: text processing as *the* problem

Re: text processing as the problem