Algorithms and Complexity Seminar

Thursday, March 16, 2006, 4-5:15pm in 32-G575.

Sublinear Algorithms for String Compressibility and the Distribution Support Size

Sofya Raskhodnikova (Weizmann Institute)

Imagine having to choose between a few compression schemes to compress a very long file. Before deciding on the scheme, you might want to obtain a rough estimate on how well each scheme performs on your file. We consider the question of approximating compressibility of a string with respect to a fixed compression scheme, in sublinear time.

In the talk, we will concentrate on the run-length encoding and a version of Lempel-Ziv as our example compression schemes. We present algorithms and lower bounds for approximating compressibility with respect to both schemes. We show that compressibility with respect to Lempel-Ziv is related to approximating the support size of a distribution. This problem has been considered under different guises in the literature. We prove a lower bound for it, at the heart of which is a construction of two positive integer random variables, X and Y, with very different expectations and the following condition on the moments up to k:

    E[X]/E[Y] = E[X^2]/E[Y^2] = ... = E[X^k]/E[Y^k].

Joint work with Dana Ron, Ronitt Rubinfeld, Amir Shpilka and Adam Smith.

Host: Ronitt Rubinfeld