Natural Language Viewed as an Interlingua

Next: How START Works Up: Technical Rationale Previous: Technical Rationale

Natural Language Viewed as an Interlingua

Consider the following observation, by Lewis Carroll:

... take the following: `If a cat can kill a rat in a minute, how many would be needed to kill it in the thousandth part of a second?' The mathematical answer, of course, is `60,000,' and no doubt less than this would not suffice; but would 60,000 suffice? I doubt it very much. I fancy that at least 50,000 of the cats would never even see the rat, or have any idea of what was going on.
Or take this: `If a cat can kill a rat in a minute, how long would it be killing 60,000 rats?' Ah, how long, indeed! My private opinion is that the rats would kill the cat.

We might, in the spirit of Lewis Carroll ask: If one knowledge engineer can add one axiom in one day, then how long will it take him to build a knowledge base of 60,000 axioms? Ah, how long indeed! For the 60,000 axioms will eat that knowledge engineer just as surely as Carroll's 60,000 rats will eat the cat (no sane knowledge engineer would even undertake the task).

How long would it take 60,000 domain experts to enter just one axiom? As with Carroll's 60,000 cats, more organization is necessary: The Rabbis said that three Jews discussing an issue of the law would surely produce five opinions; 50,000 experts discussing one issue cannot be expected to do better.

Finally, we might ask what seems the equivalent of Lewis Carroll's simplest question: If one Domain Expert can enter one axiom in one day, How many days will it take for 60,000 domain experts to enter 60,000 axioms? Ah, how long indeed, for domain experts don't speak axioms.

Although large scale knowledge bases have long been identified as critical enablers for more effective Intelligent Systems and although much work has been put into attacking the problem of creating them, we still have little to show for our efforts. We have identified several crucial barriers to making progress, each illustrated by our discussion of cats, rats, and knowledge engineers:

The Knowledge Engineer Bottleneck: In all attempts to build large scale knowledge bases of which we are aware, the effort has been undertaken by a small (usually academic) team. The task becomes overwhelming, the focus shifts to a meta-level topic (understanding the truly right ontology, construction of tools, etc.) and the original goal is lost. A way around the bottleneck is to construct knowledge bases by way of distributed collaborations among large numbers of domain experts.
The Formal Language Bottleneck: Domain experts can be taught to use formal knowledge representation tools; most Knowledge Based application systems are in fact built by small teams of Knowledge Engineers who learn enough of the domain to become ``talented amateurs'' in the domain collaborating with Domain Experts who learn enough of the knowledge engineers tools to become ``talented amateurs.'' But this approach still leaves us with a small team building a finely honed, special purpose, small scale knowledge based system. We are looking to solve a different problem, with a different approach. If each of the 10,000 domain experts is to enter one axiom, then it hardly pays them to spend 6 weeks learning the syntax and semantics of PowerLoom. We must instead, make the interaction take place in the natural language of the contributing experts.
The Uniform Representation Bottleneck: Different tasks within a common domain are best solved using different Problem Solving Paradigms which in turn dictate different representational support. Tools crafted for a given Paradigm give strong direction on their appropriate use, but they also erect strong barriers to communicating with tools crafted to support other Paradigms. This is tolerable if our goal is to build a specific Knowledge Based Application system, but we are looking to solve a different problem -- that of providing knowledge based support for a broad variety of tasks in a common domain. The solution is to devise a protocol of inference which allows diverse reasoning and representational techniques to cooperate within a common framework.
The Consistency Bottleneck: A small team of knowledge engineers working closely together on a modest sized problem can manage to construct a knowledge based system which is clean and consistent. A large distributed team of domain experts, in contrast, will almost certainly produce many different viewpoints on the problem and even within a single viewpoint may disagree on the fine grained details of how to attack the problem. The solution is twofold: First we must be tolerant of multiple viewpoints and provide the dependency tracing necessary to understand which viewpoints have contributed what information to a solution. Secondly, within a single perspective we must use the collaborative capabilities of the system to help bring disagreeing experts into consensus.

To address these problems we first propose to build a complement to our highly functional Web Based, Natural Language query environment START. Specifically, we will develop HAWK, a system that Helps to Accumulate the World's Knowledge. HAWK will embody the techniques necessary to enable information to be accepted from a broad variety of both general informants and domain experts as part of their normal working activities.

Building HAWK involves several major tasks:

Types of Knowledge to be acquired:
1. We will develop a Web-based interface which allows users of HAWK to provide Natural Language annotations of text, multi-media and procedural assets. These annotations will act as a large knowledge base aiding a consumer to locate information.
2. We will develop a Web-based Natural Language interface which will interactively aid users of HAWK to build Ontologies and Domain Theories.
3. We will develop a Web-based Natural Language interace which will interactively aid users of HAWK to provide the Know-How of expert problem solvers in the domain of interest.
Other Sources of Knowledge to be tapped:
1. Literal and Explicit Texts: Although today it is still intractable to build knowledge bases simply by reading and representing the full content of unrestricted texts (e.g. an Encyclopedia or a newspaper), there are classes of texts such as Doctrine descriptions, or Catalogues of Indicators and Warnings which may be amenable to this approach. We will conduct experiments with such texts and if successful use them to populate our knowledge base.
2. Databases: Many relational databases contain valuable world knowledge but in an awkward form. Often the constraints of database design take knowledge whose impact is clear when rendered linguistically and turn it into data which is amenable to high speed data processing but which is relatively opaque for knowledge processing. We will develop techniques based on our existing IMPACT system which will help recover this knowledge from a database and make it accessible to our knowledge level processing technology.
Techniques in Support of Broad Scale Knowledge Acquisition
1. Techniques for managing diversity and building consensus: HAWK's goal is to help many informants contribute their knowledge. But it is inevitable that these informants will have differing conceptualizations of the problem domain. Furthermore, even informants who agree to the first order will have disagreements about choice of vocabulary and the exact partitioning of reasoning capabilities. We will build on previous work by Davis et al. to help the contributors reach consensus when their perspectives are close enough; we will also develop new techniques for maintaining multiple differing conceptualizations and using these cooperatively to achieve greater problem solving capability.
2. Techniques for Heterogeneity: We anticipate that other projects besides our own will develop useful knowledge bases, representations, and reasoning techniques; we assume that other contributors to HPKB share this attitude. An overall architecture for Intelligent Systems, in our view must therefore allow a variety of disparate systems to cooperate in solving problems even though they maintain different representations, data structures, control structures and problem solving methods. We will extend previous work by Rowley, Shrobe et al. on the Protocol of Inference which is a technique for allowing such heterogenity of representation.
3. Use of Multi-Modal HCI (the Intelligent Room): HAWK is intended to provide a natural medium for capturing knowledge. We believe that natural interactions almost always combine Language and Vision. Indicators and Warnings are drawn as trees, Plans and procedures as block diagrams, etc. The visual system is particularly adept at representing structure, language provides the content. The Intelligent Room is a facility at the MIT AI Laboratory combining active vision, speech recognition and language understanding (in particular START). This allows interactions in which a user can point at a location on a map projected on the wall and ask ``How many planes are near here'' in spoken English. We build upon these existing capabilities of the Intelligent Room to allow the mixed use of diagrams and language as a means for conveying know-how to the system.

Next: How START Works Up: Technical Rationale Previous: Technical Rationale

Boris Katz
Thu Apr 17 17:51:51 EDT 1997