Abstract: Recent work in AI has made clear the advantages to be derived from combining probability theory with the expressive power of first-order logic. One promising approach is based on the concept of possible worlds, where a probability measure is defined over the interpretations defined by a logical knowledge base. This approach has been successfully used to add probabilistic elements to representations based on semantic networks and logic programming. However, all of the representations developed to date have made the unique names assumption; they have assumed that the constants of a language uniquely identify each such object. This is not always reasonable, since objects in the real world are not usually labeled with easily observable unique identifiers. Often, there exists a great deal of uncertainty over the identity mappings of observed objects. This is what we term identity uncertainty, and it is a pervasive problem of real-world data analysis, occurring in numerous settings such as database merging, feature correspondence, and object tracking. We propose an extension to the possible world approaches, one where the uncertainty over the mapping from terms to objects is represented explicitly, by extending the language used to define the probability distribution over possible worlds. We show that this extended language does define a unique and consistent distribution. We also suggest an approximate inference method for use in this scenario. This method is based on Markov chain Monte Carlo, and we have applied it to several domains, including vehicle matching and citation clustering, with promising results.
Download: PS version