Algorithms and Complexity Seminar
Revisiting Norm Estimation in Data Streams
Jelani Nelson (MIT)
The problem of estimating the pth moment Fp (p nonnegative
and real) in a data stream is as follows. There is a vector x which
starts as 0, and many updates of the form xi ←
xi + v come sequentially in a stream (v can be negative).
The algorithm also receives an error parameter 0 < ε < 1. The
goal is to output an approximation with relative error ε to
Fp = ||x||pp.
Previously, it was known that polylogarithmic space (in the vector
length n) was achievable if and only if p ≤ 2. We make several new
contributions in this regime, including:
- An optimal space algorithm for 0 < p < 2, which, unlike previous
algorithms which had optimal dependence on 1/ε but sub-optimal
dependence on n, does not rely on Nisan's PRG.
-
A near-optimal space algorithm for p = 0 with optimal update and query time.
- A near-optimal space algorithm for "distinct elements" (p = 0 and
all updates have v = 1) with optimal update and query time.
- Improved L2 → L2 dimensionality
reduction in a stream.
- New 1-pass lower bounds to show optimality and near-optimality of
our algorithms, as well as of some previous algorithms (the "AMS
sketch" for p = 2, and the L1-difference algorithm of
Feigenbaum et al.).
As corollaries of our work, we discover a few separations in
complexity: F0 in 1 pass vs. 2 passes, p = 0 vs. p > 0, and
F0 with strictly positive updates vs. arbitrary updates.
Joint work with Daniel Kane (Harvard) and David Woodruff (IBM Almaden).