The START natural language system (SynTactic Analysis using Reversible Transformations) consists of two modules which share the same grammar (Katz [1980]). The understanding module analyzes English text and produces a knowledge base which incorporates the information found in the text. Given an appropriate segment of the knowledge base, the generating module produces English sentences. A user can retrieve the information stored in the knowledge base by querying it in English. The system will then produce an English response.
START has been used by researchers at MIT and other universities and
research laboratories for constructing and querying knowledge bases
using English. (Katz and Winston [1983], Winston et al [1983],
Doyle [1984], Katz and Brooks [1987], Keshi and Katz [1991], Winston
[1992], Katz [1994]).
Given an English sentence containing various relative clauses, appositions, multiple levels of embedding, etc, the START system first breaks it up into smaller units, called kernel sentences (usually containing one verb). After separately analyzing each kernel sentence, START rearranges the elements of all parse trees it constructs into a set of embedded representational structures. These structures are made up of a number of fields corresponding to various syntactic parameters of a sentence, but the three most salient parameters, the subject of a sentence, the object, and the relation between them are singled out as playing a special role in indexing. These parameters are explicitly represented in a discrimination network for efficient retrieval. As a result, all sentences analyzed by START are indexed as embedded ternary expressions (T-expressions), <subject relation object>. Certain other parameters (adjectives, possessive nouns, prepositional phrases, etc.) are used to create additional T-expressions in which prepositions and several special words may serve as relations. For instance, the following simple sentence
(1) Bill surprised Hillary with his answer
will produce two T-expressions:
(2)
<<Bill surprise Hillary> with answer>
<answer related-to Bill>
The remaining parameters---adverbs and their position, tense,
auxiliaries, voice, negation, etc.---are recorded in a
representational structure called a history. The history has a
page pertaining to each sentence which yields the given
T-expression. When we index the T-expression in the knowledge base,
we cross-reference its three components and attach the history to it.
One can thus think of the resulting entry in the knowledge base as a
"digested summary" of the syntactic structure of an English
sentence.
In order to handle embedded sentences, START allows any T-expression
to take another T-expression as its subject or object. START can
analyze and generate sentences with arbitrarily complex embedded
structures.
Questions are requests for information from START's knowledge base.
In order to answer a question START must translate the question into a
T-expression template which can be used to search the knowledge base
for T-expressions which contain information relevant to providing an
answer to the question. Let us assume that as a result of analyzing
and indexing a text containing sentence (1), the
knowledge base currently includes T-expressions
(2). Now suppose that a user asks START the
following wh-question:
(3) Whom did Bill surprise with his answer?
In the context of (1), the answer is
Hillary. In order to determine this, the system must first turn the
question (3) into a T-expression template that can be
used to search the knowledge base. The first step in this process is
to undo the effects of the wh-movement transformation that is
used to create English wh-questions. To do this, START must
find the place in sentence (3) where the wh-word
whom came from and then insert the wh-word in this position:
(4) Bill surprised whom with his answer.
Next the language understanding
system leads sentence (4) through the same flow of
control as any declarative sentence and produces the following
T-expressions which serve as patterns used to query the knowledge
base:
(5)
<<Bill surprise whom> with answer>
<answer related-to Bill>
Treating whom as a matching variable, the system feeds
query (5) through a matcher in order to determine
whether there is anything in the knowledge base that matches
(5). The matcher finds T-expressions (6) created from sentence (1):
(6)
<<Bill surprise Hillary> with answer>
<answer related-to Bill>
and the language generation system then uses these
T-expressions to produce the English response to question (3):
(7) Bill surprised Hillary with his answer.
START handles yes-no questions in a similar fashion. Suppose
that START had been asked the yes-no question:
(8) Did Bill surprise Hillary with his answer?
As in the wh-case, START would turn this question into
a T-expression template that could be matched against the
T-expressions in the knowledge base. The difference between
yes-no and wh-questions is that the T-expression templates
generated by a yes-no question would contain no
wh-variables. Still, the match will be found allowing the system to
answer:
(9) Yes, Bill surprised Hillary with his answer.
Next: Introducing S-rules
Up: From Sentence Processing
Previous: From Sentence Processing
Boris Katz
Thu Feb 27 15:34:49 EST 1997