Algorithms and Complexity Seminar

Monday, December 11, 2006, 4-5:15pm in 32-G575.

Testing k-wise and Almost k-wise Independence

Ning Xie (MIT)

In this work, we consider the problems of testing whether a distribution over $\{0,1\}^n$ is $k$-wise or $(\eps,k)$-wise independent using samples drawn from that distribution.

To distinguish $k$-wise independent distributions from those that are $\delta$-far in statistical distance, we upper bound the number of required samples by $\tilde{O}(n^k/\delta^{2})$ and lower bound it by $\Omega(n^{\frac{k-1}{2}}/\delta)$ (these bounds hold for constant $k$, and essentially the same bounds hold for general $k$).  To achieve these bounds, we use Fourier analysis to relate a distribution's distance from $k$-wise independence to its  \emph{biases} (a measure of the parity imbalance it induces on a set of variables). The relationships we derive are tighter than previously known, and are of independent interest.

To distinguish $(\eps,k)$-wise independent distributions from those that are $\delta$-far in statistical distance, we upper bound the number of required samples by $O\left({k \log n \over \delta2\epsilon2}\right)$ and lower bound it by $\Omega\left({\sqrt{k\log n} \over (\eps+\delta)\sqrt{\log 1/(\eps+\delta)}}\right)$.  Although these bounds are an exponential improvement (in terms of $n$ and $k$) over the corresponding bounds for testing $k$-wise independence, we show that the {\em time} complexity of testing $(\eps,k)$-wise independence is unlikely to be $\poly(n,1/\eps,1/\delta)$ for $k=\Theta(\log n)$, since this would disprove a plausible conjecture about the hardness of finding hidden cliques in random graphs. Under the conjecture, our result implies that for, say, $k =\log n$ and $\eps = 1 / n^{0.99}$, there is a set of $(\eps,k)$-wise independent distributions, and a set of distributions at distance $\delta=1/n^{0.51}$ from $(\eps,k)$-wise independence, which are indistinguishable by polynomial time algorithms.

Joint work with Noga Alon, Alexandr Andoni, Tali Kaufman, Kevin Matulef, and Ronitt Rubinfeld.

Host: Madhu Sudan

(The Algorithms and Complexity Seminar series talks are usually held Mondays from 4-5:15pm in 32-G575.)