next up previous
Next: Introducing S-rules Up: From Sentence Processing Previous: From Sentence Processing

An Overview of the START system

The START natural language system (SynTactic Analysis using Reversible Transformations) consists of two modules which share the same grammar (Katz [1980]). The understanding module analyzes English text and produces a knowledge base which incorporates the information found in the text. Given an appropriate segment of the knowledge base, the generating module produces English sentences. A user can retrieve the information stored in the knowledge base by querying it in English. The system will then produce an English response.

START has been used by researchers at MIT and other universities and research laboratories for constructing and querying knowledge bases using English. (Katz and Winston [1983], Winston et al [1983], Doyle [1984], Katz and Brooks [1987], Keshi and Katz [1991], Winston [1992], Katz [1994]).gif

Given an English sentence containing various relative clauses, appositions, multiple levels of embedding, etc, the START system first breaks it up into smaller units, called kernel sentences (usually containing one verb). After separately analyzing each kernel sentence, START rearranges the elements of all parse trees it constructs into a set of embedded representational structures. These structures are made up of a number of fields corresponding to various syntactic parameters of a sentence, but the three most salient parameters, the subject of a sentence, the object, and the relation between them are singled out as playing a special role in indexing. These parameters are explicitly represented in a discrimination network for efficient retrieval. As a result, all sentences analyzed by START are indexed as embedded ternary expressions (T-expressions), <subject relation object>. Certain other parameters (adjectives, possessive nouns, prepositional phrases, etc.) are used to create additional T-expressions in which prepositions and several special words may serve as relations. For instance, the following simple sentence

(1) Bill surprised Hillary with his answer

will produce two T-expressions:

(2) <<Bill surprise Hillary> with answer>

    <answer related-to Bill>

The remaining parameters---adverbs and their position, tense, auxiliaries, voice, negation, etc.---are recorded in a representational structure called a history. The history has a page pertaining to each sentence which yields the given T-expression. When we index the T-expression in the knowledge base, we cross-reference its three components and attach the history to it. One can thus think of the resulting entry in the knowledge base as a "digested summary" of the syntactic structure of an English sentence.

In order to handle embedded sentences, START allows any T-expression to take another T-expression as its subject or object. START can analyze and generate sentences with arbitrarily complex embedded structures.

Questions are requests for information from START's knowledge base. In order to answer a question START must translate the question into a T-expression template which can be used to search the knowledge base for T-expressions which contain information relevant to providing an answer to the question. Let us assume that as a result of analyzing and indexing a text containing sentence (1), the knowledge base currently includes T-expressions (2). Now suppose that a user asks START the following wh-question:

(3) Whom did Bill surprise with his answer?

In the context of (1), the answer is Hillary. In order to determine this, the system must first turn the question (3) into a T-expression template that can be used to search the knowledge base. The first step in this process is to undo the effects of the wh-movement transformation that is used to create English wh-questions. To do this, START must find the place in sentence (3) where the wh-word whom came from and then insert the wh-word in this position:

(4) Bill surprised whom with his answer.

Next the language understanding system leads sentence (4) through the same flow of control as any declarative sentence and produces the following T-expressions which serve as patterns used to query the knowledge base:

(5) <<Bill surprise whom> with answer>

    <answer related-to Bill>

Treating whom as a matching variable, the system feeds query (5) through a matcher in order to determine whether there is anything in the knowledge base that matches (5). The matcher finds T-expressions (6) created from sentence (1):

(6) <<Bill surprise Hillary> with answer>

    <answer related-to Bill>

and the language generation system then uses these T-expressions to produce the English response to question (3):

(7) Bill surprised Hillary with his answer.

START handles yes-no questions in a similar fashion. Suppose that START had been asked the yes-no question:

(8) Did Bill surprise Hillary with his answer?

As in the wh-case, START would turn this question into a T-expression template that could be matched against the T-expressions in the knowledge base. The difference between yes-no and wh-questions is that the T-expression templates generated by a yes-no question would contain no wh-variables. Still, the match will be found allowing the system to answer:

(9) Yes, Bill surprised Hillary with his answer.

next up previous
Next: Introducing S-rules Up: From Sentence Processing Previous: From Sentence Processing

Boris Katz
Thu Feb 27 15:34:49 EST 1997