Next: How START Works
Up: Technical Rationale
Previous: Technical Rationale
Consider the following observation, by Lewis Carroll:
... take the following: `If a cat can kill a rat
in a minute, how many would be needed to kill it
in the thousandth part of a second?' The
mathematical answer, of course, is `60,000,'
and no doubt less than this would not suffice;
but would 60,000 suffice? I doubt it very much.
I fancy that at least 50,000 of the cats would
never even see the rat, or have any idea of what
was going on.
Or take this: `If a cat can kill a rat in a
minute, how long would it be killing 60,000
rats?' Ah, how long, indeed! My private
opinion is that the rats would kill the cat.
We might, in the spirit of Lewis Carroll ask: If one knowledge engineer
can add one axiom in one day, then how long will it take him to build a
knowledge base of 60,000 axioms? Ah, how long indeed! For the 60,000
axioms will eat that knowledge engineer just as surely as Carroll's 60,000
rats will eat the cat (no sane knowledge engineer would even undertake
the task).
How long would it take 60,000 domain experts to enter just one axiom? As
with Carroll's 60,000 cats, more organization is necessary: The Rabbis said
that three Jews discussing an issue of the law would surely produce five
opinions; 50,000 experts discussing one issue cannot be expected to do
better.
Finally, we might ask what seems the equivalent of Lewis Carroll's simplest
question: If one Domain Expert can enter one axiom in one day, How many
days will it take for 60,000 domain experts to enter 60,000 axioms? Ah,
how long indeed, for domain experts don't speak axioms.
Although large scale knowledge bases have long been identified as critical
enablers for more effective Intelligent Systems and although much work has
been put into attacking the problem of creating them, we still have little
to show for our efforts. We have identified several crucial barriers to
making progress, each illustrated by our discussion of cats, rats, and
knowledge engineers:
- The Knowledge Engineer Bottleneck: In all attempts to build large
scale knowledge bases of which we are aware, the effort has been
undertaken by a small (usually academic) team. The task becomes
overwhelming, the focus shifts to a meta-level topic (understanding
the truly right ontology, construction of tools, etc.) and the
original goal is lost. A way around the bottleneck is to construct
knowledge bases by way of distributed collaborations among large
numbers of domain experts.
- The Formal Language Bottleneck: Domain experts can be taught to use
formal knowledge representation tools; most Knowledge Based
application systems are in fact built by small teams of Knowledge
Engineers who learn enough of the domain to become ``talented
amateurs'' in the domain collaborating with Domain Experts who learn
enough of the knowledge engineers tools to become ``talented
amateurs.'' But this approach still leaves us with a small team
building a finely honed, special purpose, small scale knowledge based
system. We are looking to solve a different problem, with a
different approach. If each of the 10,000 domain experts is to enter one
axiom, then it hardly pays them to spend 6 weeks learning the syntax
and semantics of PowerLoom. We must instead, make the
interaction take place in the natural language of the contributing
experts.
- The Uniform Representation Bottleneck: Different tasks within a
common domain are best solved using different Problem Solving
Paradigms which in turn dictate different representational support.
Tools crafted for a given Paradigm give strong direction on their
appropriate use, but they also erect strong barriers to communicating
with tools crafted to support other Paradigms. This is tolerable if
our goal is to build a specific Knowledge Based Application system,
but we are looking to solve a different problem -- that of providing
knowledge based support for a broad variety of tasks in a common
domain. The solution is to devise a protocol of inference which
allows diverse reasoning and representational techniques to cooperate
within a common framework.
- The Consistency Bottleneck: A small team of knowledge engineers
working closely together on a modest sized problem can manage to
construct a knowledge based system which is clean and consistent. A
large distributed team of domain experts, in contrast, will almost
certainly produce many different viewpoints on the problem and even
within a single viewpoint may disagree on the fine grained details of
how to attack the problem. The solution is twofold: First we must be
tolerant of multiple viewpoints and provide the
dependency tracing necessary to understand which viewpoints have
contributed what information to a solution. Secondly, within a
single perspective we must use the collaborative capabilities
of the system to help bring disagreeing experts into consensus.
To address these problems we first propose to build a complement to our
highly functional Web Based, Natural Language query environment START.
Specifically, we will develop HAWK, a system that Helps to Accumulate the
World's Knowledge. HAWK will embody the techniques necessary to enable
information to be accepted from a broad variety of both general informants
and domain experts as part of their normal working activities.
Building HAWK involves several major tasks:
- Types of Knowledge to be acquired:
- We will develop a Web-based interface which allows users of HAWK
to provide Natural Language annotations of text, multi-media and procedural
assets. These annotations will act as a large knowledge base aiding a
consumer to locate information.
- We will develop a Web-based Natural Language interface which will
interactively aid users of HAWK to build Ontologies and Domain Theories.
- We will develop a Web-based Natural Language interace
which will interactively aid users of HAWK to provide the Know-How of
expert problem solvers in the domain of interest.
- Other Sources of Knowledge to be tapped:
- Literal and Explicit Texts: Although today it is still
intractable to build knowledge bases simply by reading and representing
the full content of unrestricted texts (e.g. an Encyclopedia or a
newspaper), there are classes of texts such as Doctrine descriptions, or
Catalogues of Indicators and Warnings which may be amenable to this
approach. We will conduct experiments with such texts and if successful
use them to populate our knowledge base.
- Databases: Many relational databases contain valuable world
knowledge but in an awkward form. Often the constraints of database
design take knowledge whose impact is clear when rendered
linguistically and turn it into data which is amenable to high speed
data processing but which is relatively opaque for knowledge processing.
We will develop techniques based on our existing IMPACT system which
will help recover this knowledge from a database and make it accessible
to our knowledge level processing technology.
- Techniques in Support of Broad Scale Knowledge Acquisition
- Techniques for managing diversity and building consensus: HAWK's
goal is to help many informants contribute their knowledge. But it is
inevitable that these informants will have differing conceptualizations
of the problem domain. Furthermore, even informants who agree to the
first order will have disagreements about choice of vocabulary and the
exact partitioning of reasoning capabilities. We will build on previous
work by Davis et al. to help the contributors reach consensus when
their perspectives are close enough; we will also develop new techniques
for maintaining multiple differing conceptualizations and using these
cooperatively to achieve greater problem solving capability.
- Techniques for Heterogeneity: We anticipate that other projects
besides our own will develop useful knowledge bases, representations,
and reasoning techniques; we assume that other contributors to HPKB
share this attitude. An overall architecture for Intelligent Systems,
in our view must therefore allow a variety of disparate systems to
cooperate in solving problems even though they maintain different
representations, data structures, control structures and problem solving
methods. We will extend previous work by Rowley, Shrobe et al. on the
Protocol of Inference which is a technique for allowing such
heterogenity of representation.
- Use of Multi-Modal HCI (the Intelligent Room): HAWK is intended to
provide a natural medium for capturing knowledge. We believe that
natural interactions almost always combine Language and Vision.
Indicators and Warnings are drawn as trees, Plans and procedures as block
diagrams, etc. The visual system is particularly adept at representing
structure, language provides the content. The Intelligent Room is a
facility at the MIT AI Laboratory combining active vision, speech recognition
and language understanding (in particular START). This allows
interactions in which a user can point at a location on a map projected
on the wall and ask ``How many planes are near here'' in spoken English.
We build upon these existing capabilities of the Intelligent Room to
allow the mixed use of diagrams and language as a means for conveying
know-how to the system.
Next: How START Works
Up: Technical Rationale
Previous: Technical Rationale
Boris Katz
Thu Apr 17 17:51:51 EDT 1997