Losing Freedom, One Lens-Cap at a Time

7/24/2009

Attendees at Ontology meeting
I am attending the International Conference on Biomedical Ontology in Buffalo, NY.  Here are some thoughts about the state of the art of ontologizing.

Too bad we didn’t get time to visit Niagara Falls—it would have made a much better picture!

Many of the speakers are presenting quite impressive detailed analyses of ontologies for specific topics ranging across molecular biology, cells, organs, and diseases.  Some are relatively straightforward attempts to create a systematic terminology for a previously virginal domain, some provide insightful critiques of existing ontologies, and some analyze requirements and uses of ontologies. 

I am impressed by the progress made since the 1990’s, when I recall that the KR-92 meeting presented a limited set of ontologies for liquids (Pat Hayes’ ground-breaking work), cooking tasks, lumped parameter models, piecewise linear systems, and qualitative versions of physics, probability, utility, etc.  At that time, few of these were very formal, and none attempted to address the large scope of many of the contemporary projects. Despite this progress, I find incredible open problems in the current approach, and wonder how they will be closed. 

My colleague at MIT, the late Bill Martin, in the 1970‘s used to assign to students the task of constructing a modest sized ontology for limited interesting domains, such as business processes, symptomatic diagnosis of diseases, personal preferences in the choice of stereo equipment, etc.  [Incidentally, the name of his representation system was OWL!]  The best students indeed produced valuable taxonomies that captured some essence of their domains and supported a variety of workable reasoning methods using these.  The development of a broadly comprehensive ontology, however, was way beyond their abilities.  Bill’s research program was to try to develop such a broad ontology, which he based on insights from linguistic expression and from his notions that analogical reasoning underlies much of human thought.  For example, he believed that prepositions such as “toward” and “along” were intimately tied to our conceptions of one, two and three-dimensional space.  Alas, his death interrupted this research program, which might have come to highly valuable results.  Today’s attempts at building ontologies, at least as represented by the papers at this conference, don’t seem to have similarly deep philosophical and linguistic underpinnings, and therefore seem likely to produce only locally useful definitional taxonomies, like many of the old student projects.

The notion that one should formalize knowledge is based on the idea that such formalized statements can be re-used for many purposes, and thus save vast effort expended toward building systems. To serve this desideratum, ontologies must be quite broad or must at least mesh perfectly at their margins in order to cover parts of the intellectual landscape for which they might be re-used. This is a very difficult challenge, however.  The best-known attempt to codify a substrate of general knowledge has been the CYC project, now in its third decade of development.  However, CYC has not solved the challenge of broad, seamless, knowledge representation. Instead, it has been forced to rely on “microtheories” that do well at describing the facts in small domains, but provide little help on how to span across multiple such theories.  This experience does not provide hopeful indicators for the ontology program.  In an earlier era, the great logician and philosopher Ludwig Wittgenstein began his career as a disciple of the Russell and Whitehead school of thought: all knowledge should and could be formalized by finding the right formalism. After decades of work on this program, he reversed course, decided that no formalism could be adequate, and stated his famous aphorism, “What we cannot speak about we must pass over in silence.”

Tim Berners-Lee, creator of the World Wide Web and passionate advocate for the “semantic web,” has also come to the conclusion that reaching interoperability among many independently-developed formalisms is fundamentally a social, not a technical problem.  He has expressed his belief that such integration will happen bottom-up, as separate groups figure out that the need to share ontologies and undertake the lengthy, difficult negotiation needed to reconcile divergent views.  He has no illusion that this is easy, and thinks that ordinarily it will require significant adjustments to the ontologies that are to be merged.  Again, not a completely positive prognosis for the ontology enterprise.

In parallel with (but lagging) database technologies, formalisms for knowledge representation arose from the chaos of inconsistent and often incoherent methods of the 1950’s to 70‘s. By the time of Brachman and Levesque’s 1985 Turing Award lecture, these had evolved to some basic consensus based on the following assumptions, which continue to dominate current research:
  1. Representational language should be limited in expressive power in order to make possible tractable inference over representations.
  2. One special kind of knowledge, namely taxonomic, should be treated specially and centrally.  In their proposal, the “T-Box” or terminological box, was delegated the task of maintaining is-a (subclass) relations among formally defined concepts, and supported efficient reasoning about subsumption—when one description is necessarily also satisfied by another.  This has become a frequent basis for classification, which itself has become a critical component of all kinds of reasoning. Today, the T-Box has become the domain of ontologists.
  3. The remainder of knowledge could be encoded in the “A-Box” or assertional box, which could use more expressive languages and thus require incomplete or exponentially complex (or undecidable) inference methods. Most importantly, the operation of the T-Box cannot rely on the operation of the A-Box. Thus, “ontological” reasoning can only be done by the more limited facilities of the T-Box.
Contemporary ontology work essentially adopts these ideas, except that it drops the A-Box and therefore encourages all uses of the reasoning system to rely just on the capabilities of the T-Box.  Fortunately, various versions of current OWL support a somewhat stronger description logic than most of the T-Boxes of the 1980‘s.  Nevertheless, my students and I have previously written about the fatal charm of trying to do all the reasoning of a clinical decision making system in such an impoverished framework.

Optimists about ontologies insist that if one simply views them in a more limited role, their advantages are legion.  One argued to me at this meeting that an ontology is meant to represent only the “eternal truths” about the world of discourse, not contingent definitions such as the criteria for deciding whether a patient has rheumatoid arthritis, which may change at the whim of the American Rheumatism Association.  However, even solid truths change.  The May/June 2009 issue of American Scientist contains a short article titled “A Tangled Tale of Plant Evolution” that argues that red algae (marine plants) contain lignin, a polymer that gives wood its strength, which had been known to occur only in land plants. As one of the quoted scientists says, “People thought they knew what was going on, ... But all that is changing.” From examples from other domains discussed at the conference, it’s easy to imagine lignin production being defined in the ontology as a capability only of land plants.  Then of course we also need to deal with the old chestnuts of non-monotonic logic, such as whether a three-legged dog is not a dog, by definition.

I don’t mean to argue that thinking clearly about definitions is unnecessary or impossible.  It is simply harmful to allow that type of reasoning to become the exclusive means of representing knowledge about the world, because its techniques are too weak to deal with much of that knowledge.  I was troubled, therefore, to hear of one speaker identify himself as an ontologist, not a knowledge representation person.  Our real goal in the world is to represent knowledge in the computer in such a way that programs can act intelligently based on such knowledge.  Ontologies form just one component of that striving.

Back to Blog

Accessibility