Algorithms and Complexity Seminar

Thursday, November 29, 2007, 4-5:15pm in 32-G575.

Combinatorial Approach to Data Mining

Yury Lifshits (Caltech)

Host: Piotr Indyk

In many algorithmic problems like nearest neighbor search, clustering, near-duplicate detection or decentralized navigation input dataset is described either by some explicit representation (e.g. vectors) or by oracle access to pairwise distances.

In this talk we assume that our knowledge about dataset is much more limited. All informaton provided to an algorithm has the form "A is more similar to C than B is". Thus, we have only "comparative" information. It turns out that with some reasonable "consistancy assumption" all mentioned problems can be solved efficiently without any access to actual distance values. We also show a connection between combinatorial framework and concepts of doubling dimension and Karger-Ruhl dimension.

(The Algorithms and Complexity Seminar series talks are usually held Thursdays from 4-5:15pm in 32-G575.)