We study the time and query complexity of approximation algorithms that access only a minuscule fraction of the input, focusing on two classical sources of problems: combinatorial graph optimization and manipulation of strings. The tools we develop find applications outside of the area of sublinear algorithms. For instance, we obtain a more efficient approximation algorithm for edit distance and distributed algorithms for combinatorial problems on graphs that run in a constant number of communication rounds.
Combinatorial Graph Optimization Problems: The graph optimization problems considered by us include vertex cover, maximum matching, and dominating set. A graph algorithm is traditionally called a constant-time algorithm if it runs in time that is a function of only the maximum vertex degree, and in particular, does not depend on the number of vertices in the graph.
We show a general local computation framework that allows for transforming many classical greedy approximation algorithms into constant-time approximation algorithms for the optimal solution size. By applying the framework, we obtain the first constant-time algorithm that approximates the maximum matching size up to an additive ε*n, where ε is an arbitrary positive constant, and n is the number of vertices in the graph.
It is known that a purely additive ε*n approximation is not computable in constant time for vertex cover and dominating set. We show that nevertheless, such an approximation is possible for a wide class of graphs, which includes planar graphs (and other minor-free families of graphs) and graphs of subexponential growth (a common property of networks). This result is obtained via locally computing a good partition of the input graph in our local computation framework.
The tools and algorithms developed for these problems find several other applications:
Edit Distance: We study a new asymmetric query model for edit distance. In this model, the input consists of two strings x and y, and an algorithm can access y in an unrestricted manner (without charge), while being charged for querying every symbol of x.
We design an algorithm in the asymmetric query model that makes a small number of queries to distinguish the case when the edit distance between x and y is small from the case when it is large. Our result in the asymmetric query model gives rise to a near-linear time algorithm that approximates the edit distance between two strings to within a polylogarithmic factor. For strings of length n and every fixed ε > 0, the algorithm computes a (log n)O(1/ε) approximation in n1+ε time. This is an exponential improvement over the previously known near-linear time approximation factor 2Õ(sqrt(log n)) (Andoni and Onak, STOC 2009; building on Ostrovsky and Rabani, J. ACM 2007). The algorithm of Andoni and Onak was the first to run in O(n2-δ) time, for any fixed constant δ > 0, and obtain a subpolynomial, no(1), approximation factor, despite a sequence of papers.
We provide a nearly-matching lower bound on the number of queries. Our lower bound is the first to expose hardness of edit distance stemming from the input strings being ``repetitive'', which means that many of their substrings are approximately identical. Consequently, our lower bound provides the first rigorous separation on the complexity of approximation between edit distance and Ulam distance.