IEEE Symposium on Foundations of Computer Science (FOCS '04)
Rome, October 2004
The need to deal with massive data sets in many practical applications has led to a growing interest in computational models appropriate for large inputs. The most important quality of a realistic model is that it can be efficiently implemented across a wide range of platforms and operating systems.
In this paper, we study the computational model that results if the streaming model is augmented with a sorting primitive. We argue that this model is highly practical, and that a wide range of important problems can be efficiently solved in this (relatively weak) model. Examples are undirected connectivity, minimum spanning trees, and red-blue line segment intersection, among others. This suggests that using more powerful, harder to implement models may not always be justified.
Our main technical contribution is to show a hardness result for the "streaming and sorting" model, which demonstrates that the main limitation of this model is that it can only access one data stream at a time. Since our model is strong enough to solve "pointer chasing" problems, the communication complexity based techniques commonly used in showing lower bounds for the streaming model cannot be adapted to our model. We therefore have to employ new techniques to obtain these results.
Finally, we compare our model to a popular restriction of external memory algorithms that access their data mostly sequentially.