Next: Acquiring Ontologies and
Up: Acquiring Knowledge Through
Previous: Acquiring Knowledge Through
There are a variety of techniques which we can use to acquire
annotations; the techniques allow a tradeoff between effectiveness of
the annotation and the intrusiveness on the user. In all cases, the
asset being annotated may be any information segment reachable through
the Web: a text file, a multi-media Web page, a procedure, etc.
- User provided annotation: In this mode, the user supplies to HAWK
a URL (or other indication of the location of the asset) together with a
set of sentences. These sentences are meant to be the answers to a set
of questions which the asset answers. HAWK provides the annotations and
the information segment to START but monitors the processing to make sure that the
sentences can in fact be analyzed correctly. Optionally, the user may provide HAWK a
set of questions which she believes the annotations should answer. HAWK
will check that each question can be answered and that the asset
provided is among the possible answers. Failures to answer the provided
questions or failures to parse the annotations initiate interactions
aimed at acquiring missing knowledge; notice, however, that this missing
knowledge is of a different type, we will discuss acquiring ontologies,
domain theories and know-how later.
- Annotations inferred from the content: In this mode, the user
provides a useful asset to HAWK without annotation. This is
particularly relevant when the ``user'' is in fact a ``Web-crawler''. We
will explore several techniques for finding sentences which might act as
good annotations, including:
- Statistical Methods: Statistical techniques are the power behind
today's Web indexers. To simplify a complex story, they work by forming
a statistical profile of the vocabulary used in a document. Hill
climbing techniques can be used to find a set of sentences within the
document whose profile best approximates that of the whole document.
These sentences might be anticipated to form a good set of annotations.
- Linguistic Clues: A variety of linguistic markers indicate the
presence of topic sentences: a variety of words (e.g ``therefore'',
``consequently'') indicate that a conclusion has been drawn. Such
sentences can be anticipated to answer some important questions about the
document and therefore might be useful annotations.
- Annotations inferred by the context of use: HAWK will observe the
patterns of access of its users. When a link from an annotated page to
one without annotations is repeatedly followed, HAWK can ask the user
why she followed the link, in particular what question did she seek to
answer. The answer to this question could provide a good annotation for
the previously unannotated asset.
Next: Acquiring Ontologies and
Up: Acquiring Knowledge Through
Previous: Acquiring Knowledge Through
Boris Katz
Thu Apr 17 17:51:51 EDT 1997