### Algorithms and Complexity Seminar

##
Data Structure Lower Bounds using Statistical Reasoning

### Elad Verbin (Tsinghua University)

In this talk I will describe a new technique for proving data
structure lower bounds based on statistical reasoning. I will present
it in the context of new (as in submitted-3-days-ago new) lower bounds
for dynamic membership in the external memory model; the general
technique, however, might (and should!) have further applications.

The external memory model is just like the cell probe model, except
that it has a free-to-access cache of size m, and the cell size is
typically w=polylog(n). The cell-probe model for data structures
counts only cell-accesses, so computation is free. One of the most
interesting features of the external memory model is that it allows to
achieve sub-constant update time by writing multiple items to the
cache, and then writing them to memory at the same time (just like
what is done in paging). This is called *buffering*. There is a data
structure called the buffer tree, which achieves update time roughly
O(log^2(n)/b) and query time O(logn); it works for multiple problems,
among them membership, predecessor search, rank select, 1-d range
counting, etc. . For b=log^9(n), for example, the update time here is
subconstant.

We prove that if one wants to keep the update time less than 1, it is
impossible to reduce the query time to less than logarithmic. Thus one
has a choice between two sides of a dichotomy: either buffer very well
but take at least logarithmic query time, or use the old data
structures from the RAM model who do not buffer at all, and have a
shot at sublogarithmic query time. More specifically, We prove that
for membership data structures, in order to get update time 0.999 (or
any constant < 1), the query time has to be at least logarithmic. This
is a *threshold phenomenon for data structures*, since when the update
time is allowed to be 1+o(1), then a bit vector or hash table give
query time O(1).

The proof of our lower bound is based on statistical reasoning, and in
particular on a new combinatorial lemma called the Lemma Of Surprising
Intersections (LOSI) The LOSI allows us to use a proof methodology
where we first analyze the intersection structure of the positive
queries by using encoding arguments, and then use statistical
arguments to deduce properties of the intersection structure of *all*
queries, even the negative ones. All other data structure arguments
that we know cannot argue anything about the negative queries, since
the structure of such queries cannot be understood by conventional
arguments.

This is joint work with Qin Zhang from HKUST