Algorithms and Complexity Seminar
Thursday, March 22, 2007, 4-5:15pm in 32-G575.
Lower Bounds for Edit Distance Estimation
Alexandr Andoni (MIT CSAIL)
Edit distance between two strings is the number of
insertions/deletions/substitutions needed to transform one string into
the other. This distance is of key importance in fields like
computational biology, text processing, and others. Some of the most
interesting problems involving edit distance are:
1) Compute distance between two strings;
2) Design a small-space sketching algorithm (sketch=a short
representation of a string, permitting distance estimation between two
strings given sketches only).
Edit distance is a notoriously "hard" distance. Although there is a
classic quadratic time algorithm for computing the edit distance, all
known near-linear algorithms achieve at best a polynomial
approximation. Furthermore, the best constant space sketching algorithm
achieves only roughly 2^{sqrt{log d}} approximation, where d is the
length of the strings.
Natural question is thus: why is edit distance hard (or is it)? The
only known (non-trivial) lower bounds for edit distance are the
non-embeddability results. The best such result says that we cannot
embed edit distance into l_1 with distortion better than Omega(log d).
(We note that the upper bound for sketching is obtained through an
embedding into l_1.) While non-embeddability into l_1 is a good
evidence for hardness, it does not imply much about other approaches to
the problem. For example, it does not rule out embedding into the
square of l_2 with O(1) distortion -- which would be enough for, say,
constant space sketching with O(1) approximation. More generally,
non-embeddability results are statements about /geometric properties/ of edit distance, and do not
necessarily say much about its /computational properties/.
In this work, we prove first lower bounds on computational properties
of edit distance. For this, we study the communication complexity
setting, where two players, Alice and Bob, each have a string, and they
estimate the edit distance up to an approximation, by exchanging a
number of bits. We lower bound the number of bits exchanged, obtaining
as corollaries, \log^c d non-embeddability into the square of l_2, and
\Omega(log log d) sketching space lower bound for constant
approximation.
Joint work with Robert Krauthgamer (IBM Almaden).
Host: Ronitt Rubinfeld
(The Algorithms
and Complexity Seminar series talks are usually held Thursdays from
4-5:15pm in 32-G575.)