Algorithms and Complexity Seminar

Data Structure Lower Bounds using Statistical Reasoning

Elad Verbin (Tsinghua University)

In this talk I will describe a new technique for proving data structure lower bounds based on statistical reasoning. I will present it in the context of new (as in submitted-3-days-ago new) lower bounds for dynamic membership in the external memory model; the general technique, however, might (and should!) have further applications.

The external memory model is just like the cell probe model, except that it has a free-to-access cache of size m, and the cell size is typically w=polylog(n). The cell-probe model for data structures counts only cell-accesses, so computation is free. One of the most interesting features of the external memory model is that it allows to achieve sub-constant update time by writing multiple items to the cache, and then writing them to memory at the same time (just like what is done in paging). This is called *buffering*. There is a data structure called the buffer tree, which achieves update time roughly O(log^2(n)/b) and query time O(logn); it works for multiple problems, among them membership, predecessor search, rank select, 1-d range counting, etc. . For b=log^9(n), for example, the update time here is subconstant.

We prove that if one wants to keep the update time less than 1, it is impossible to reduce the query time to less than logarithmic. Thus one has a choice between two sides of a dichotomy: either buffer very well but take at least logarithmic query time, or use the old data structures from the RAM model who do not buffer at all, and have a shot at sublogarithmic query time. More specifically, We prove that for membership data structures, in order to get update time 0.999 (or any constant < 1), the query time has to be at least logarithmic. This is a *threshold phenomenon for data structures*, since when the update time is allowed to be 1+o(1), then a bit vector or hash table give query time O(1).

The proof of our lower bound is based on statistical reasoning, and in particular on a new combinatorial lemma called the Lemma Of Surprising Intersections (LOSI) The LOSI allows us to use a proof methodology where we first analyze the intersection structure of the positive queries by using encoding arguments, and then use statistical arguments to deduce properties of the intersection structure of *all* queries, even the negative ones. All other data structure arguments that we know cannot argue anything about the negative queries, since the structure of such queries cannot be understood by conventional arguments.

This is joint work with Qin Zhang from HKUST