Samson Zhou
Nearly-Optimal Distinct Elements and Heavy Hitters in the
Sliding Window Model (co-authors Vladimir Braverman, Elena Grigorescu, Harry Lang, David Woodruff)
Abstract: We study the distinct elements and L_p-heavy hitters
problems in the sliding window model, where only the most recent n
elements in the data stream form the underlying set. We first
introduce the composable histogram, a simple twist on the exponential
(Datar et al., SODA 2002) and smooth histograms (Braverman and
Ostrovsky, FOCS 2007) that may be of independent interest. We then
show that the composable histogram along with a careful combination of
existing techniques to track either the identity or frequency of a few
specific items suffices to obtain algorithms for both distinct
elements and L_p-heavy hitters that are nearly optimal in both n and
epsilon.
Applying our new composable histogram framework, we provide an
algorithm that outputs a (1+epsilon)-approximation to the number of
distinct elements in the sliding window model and uses O(1/epsilon^2 *
log n log 1/epsilon log log n + 1/epsilon * log^2 n) bits of space.
For L_p-heavy hitters, we provide an algorithm using space
O(1/epsilon^p * log^2 n * (log log n+ log 1/epsilon)) for 0