An Overview of the START system

Next: Introducing S-rules Up: How START Works Previous: How START Works

An Overview of the START system

The START natural language system (SynTactic Analysis using Reversible Transformations) consists of two modules which share the same grammar (Katz [1980]). The understanding module analyzes English text and produces a knowledge base which incorporates the information found in the text. Given an appropriate segment of the knowledge base, the generating module produces English sentences. A user can retrieve the information stored in the knowledge base by querying it in English. The system will then produce an English response.

Given an English sentence containing various relative clauses, appositions, multiple levels of embedding, etc, the START system first breaks it up into smaller units, called kernel sentences (usually containing one verb). After separately analyzing each kernel sentence, START rearranges the elements of all parse trees it constructs into a set of embedded representational structures. These structures are made up of a number of fields corresponding to various syntactic parameters of a sentence, but the three most salient parameters, the subject of a sentence, the object, and the relation between them are singled out as playing a special role in indexing. These parameters are explicitely represented in a discrimination network for efficient retrieval. As a result, all sentences analyzed by START are indexed as embedded ternary expressions (T-expressions), <subject relation object>. Certain other parameters (adjectives, possessive nouns, prepositional phrases, etc.) are used to create additional T-expressions in which prepositions and several special words may serve as relations. For instance, the following simple sentence

(1) Bill surprised Hillary with his answer

will produce two T-expressions:

(2) <<Bill surprise Hillary> with answer>

<answer related-to Bill>

The remaining parameters---adverbs and their position, tense, auxiliaries, voice, negation, etc.---are recorded in a representational structure called a history. The history has a page pertaining to each sentence which yields the given T-expression. When we index the T-expression in the knowledge base, we cross-reference its three components and attach the history to it. One can thus think of the resulting entry in the knowledge base as a ``digested summary'' of the syntactic structure of an English sentence.

In order to handle embedded sentences, START allows any T-expression to take another T-expression as its subject or object. START can analyze and generate sentences with arbitrarily complex embedded structures.

Next: Introducing S-rules Up: How START Works Previous: How START Works

Boris Katz
Thu Apr 17 17:51:51 EDT 1997