Next: Sample Questions Answered Up: From Sentence Processing Previous: START on the WWW

Annotating the World Wide Web

The World Wide Web is a vast collection of information in digitized form, including text, relational databases, pictures, audio, video, and multi-media information. The good news in this development is that this information has been growing exponentially; the bad news is that we can make little use of it. Several problems stand in our way:

The Web is an unstructured collection of information spanning the entire range of human experience and expression. No representation or reasoning technology we now have is capable of dealing with it.
We can't find what we need: size and the almost complete randomness of organization make it difficult.
The speed of growth would seem to render pointless almost any effort to keep up cataloging efforts.

So what can we do to make better use of all this knowledge? Any good researcher faced with an imposingly large and unstructured collection of information would solve the problem by simply finding someone who knows where to look. Asking a good reference librarian in the Library of Congress would be much more useful than going to Alta Vista. Notice however that the reference librarian doesn't need to understand all the details of the material she locates for us, only to know that it contains relevant information.

Hence we propose to create a smart reference librarian for the World Wide Web. Instead of attempting to capture and analyze each Web resource in detail, we will focus on more general knowledge about that knowledge, such as when it is relevant, to whom, and for what. We propose to attach such descriptive information to everything available on the Web. Size and speed of growth would seem to render this task impossible. But the key is to get everyone involved. To make the task of creating annotations less work than it's worth, we make it possible to create those annotations using a knowledge representation language that everyone knows: natural language.

By allowing thousands of people to build up knowledge about knowledge, we will create a knowledge base of an interesting form. The Web will continue to be built out of "opaque" information segments: text, maps, charts, audio, video, etc.; but attached to each of these will be natural language annotations that facilitate retrieval. By giving humans access to relevant information that humans can further interpret and understand, we will transform the Web into an intelligent, high performance knowledge base.

Next: Sample Questions Answered Up: From Sentence Processing Previous: START on the WWW

Boris Katz
Thu Feb 27 15:34:49 EST 1997