Algorithms and Complexity Seminar

Thursday, March 6, 2008, 4:00-5:15pm in 32-G575.

Overcoming the L1 non-embeddability barrier: choose your host space wisely

Alexandr Andoni (MIT)

We propose a new technique for solving computational problems over "hard" distance metrics, such as designing nearest neighbor data structure for the edit distance (aka Levenshtein distance). The usual approach to such problems is through embeddings, which map a "hard" space into an "easy" host space, most notably L1. Although embedding into L1 has proved successful for some spaces, it has inherent limitations: it would provably require high approximation for some important metrics, including the edit distance.

Our new technique overcomes this "L1 non-embeddability barrier" for a variant of the edit distance, Ulam metric, which is edit distance on non-repetitive strings (formally, permutations). The Ulam metric captures the hardness caused by the misalignments in the edit distance, and in fact all known lower bounds for the edit distance hold for Ulam metric to almost the same degree. For example, embedding Ulam into L1 requires ~(log d) distortion (approximation), where d is the strings' length.

We achieve a nearest neighbor data structure for Ulam with exponentially better approximation, namely O(log log d). We do so by constructing a constant approximation mapping of the Ulam metric into an alternative, richer host space, which is an iterated product of standard spaces like L1 and L_infinity. Then we show that, despite the added richness, the new host space admits efficient nearest neighbor data structure.

This is joint work with Piotr Indyk and Robert Krauthgamer.

Host: Piotr Indyk

(The Algorithms and Complexity Seminar series talks are usually held Thursdays 4:00-5:15pm in 32-G575.)