Algorithms and Complexity Seminar

Thursday, March 22, 2007, 4-5:15pm in 32-G575.

Lower Bounds for Edit Distance Estimation

Alexandr Andoni (MIT CSAIL)

Edit distance between two strings is the number of insertions/deletions/substitutions needed to transform one string into the other. This distance is of key importance in fields like computational biology, text processing, and others. Some of the most interesting problems involving edit distance are:
1) Compute distance between two strings;
2) Design a small-space sketching algorithm (sketch=a short representation of a string, permitting distance estimation between two strings given sketches only).

Edit distance is a notoriously "hard" distance. Although there is a classic quadratic time algorithm for computing the edit distance, all known near-linear algorithms achieve at best a polynomial approximation. Furthermore, the best constant space sketching algorithm achieves only roughly 2^{sqrt{log d}} approximation, where d is the length of the strings.

Natural question is thus: why is edit distance hard (or is it)? The only known (non-trivial) lower bounds for edit distance are the non-embeddability results. The best such result says that we cannot embed edit distance into l_1 with distortion better than Omega(log d). (We note that the upper bound for sketching is obtained through an embedding into l_1.) While non-embeddability into l_1 is a good evidence for hardness, it does not imply much about other approaches to the problem. For example, it does not rule out embedding into the square of l_2 with O(1) distortion -- which would be enough for, say, constant space sketching with O(1) approximation. More generally, non-embeddability results are statements about /geometric properties/ of edit distance, and do not necessarily say much about its /computational properties/.

In this work, we prove first lower bounds on computational properties of edit distance. For this, we study the communication complexity setting, where two players, Alice and Bob, each have a string, and they estimate the edit distance up to an approximation, by exchanging a number of bits. We lower bound the number of bits exchanged, obtaining as corollaries, \log^c d non-embeddability into the square of l_2, and \Omega(log log d) sketching space lower bound for constant approximation.

Joint work with Robert Krauthgamer (IBM Almaden).

Host: Ronitt Rubinfeld

(The Algorithms and Complexity Seminar series talks are usually held Thursdays from 4-5:15pm in 32-G575.)