CS294-152 Lower Bounds 9/10/18 Time-Space lower bounds for functions in P ============================= Announcements: - Lec 1 notes on webpage (bit.ly/LB-COURSE). Scribe notes still coming, I need to edit them - Simons workshop this week! Students enrolled: please send your "review" of a talk by next Monday. Scribe? Last time we saw time-space lower bounds for SAT, and we saw a bit about relativization and oracles, in particular an oracle relative to which the SAT lower bound is false. Of course SAT is an NP-complete problem... what about decision problems in P? Lower bounds for these have mainly been studied in a *non-uniform* model of space-bounded computation. [Borodin-Cook 1982] Multi-Way Branching Programs. (Generalizes branching program model seen in boot camp) Let Sigma be a finite alphabet, n > 0 be an integer. Def. |Sigma|-way BP of length L(n) and size S(n) on strings from Sigma^n - S(n)-node D.A.G. with one source node (the "start node"). - Every path in the D.A.G. is of length <= L(n). - Every node u labeled with an index i_u in [n] - Every node u has |Sigma| outgoing edges; each edge is labeled with a (unique) letter from Sigma. - At each node v, we also have an output letter o_v in Sigma cup {\varepsilon} (\varepsilon means "no output") Such a BP computes a function from Sigma^n to Sigma^*, as follows: Let x = x_1 ... x_n in Sigma^n be an input. Starting at the start node s, determine the index i_s on s, follow the edge labeled x_{i_s} to a new node u. Repeat until a sink is reached. This process creates a path v_1,...,v_t of t <= L(n) nodes in the BP, from the source to a sink. The output of the BP on x is the string o_{v_1} \cdots o_{v_t}. "Space" of BP: log(S(n)) -- imagine storing the current node of the BP in bits. "Time" of BP: length L(n) -- each edge followed is a step in the computation Extremely generic model: Captures random access machines of all kinds, and word RAMs with O(1) word-size. Theorem: Let f : Sigma^* -> Sigma^* be computed by a random-access TM in time t(n) >= n and space s(n) >= log(n). Then f_n : Sigma^n -> Sigma^* has a |Sigma|-way branching program of length t(n) and size 2^{O(s(n))}. Proof Sketch: Let M be a random-access TM. The nodes of our branching program will correspond to every possible configuration of M on an n-length input, for which there are 2^{O(s(n))}. The start node will be the initial configuration of M. The sinks will be configurations of M in halting states. We label each node with the input index that is read in the corresponding configuration of M, and we use the transition function of M to determine the edges between configurations and the output letters on each node. QED Since Sigma can depend on n, the multi-way BP model even allows us to capture the so-called "word-RAM" model, where the input is stored in words of, say, log_2(n) bits, and we can output words of log_2(n) bits. This would correspond to |Sigma| = n. ==== DECISION PROBLEMS IN P ===== State-of-the-art for branching program lower bounds for decision problems in P: [Ajtai 99, Beame-Saks-Sun-Vee'03] There are problems on Boolean input solvable in O(n log n) time requiring branching program (families) of Omega(n (log n)^{delta}) length to solve with n^{.999} space (2^{n^{.999} size) for various delta <= 1 But when the space is lowered to e.g. O(log n), the length lower bound is still around n (log n)^{1/2}. === [Note: In the *uniform* setting, for strong enough computational models (e.g. certain random-access models) it's not hard to show a problem in O(n log n) time that's not in o(n (log n)) time, by a form of the time hierarchy theorem...] What do these problems look like? Roughly speaking, they're decision versions of vector convolution: Given vectors x, y over F_2^n, For all i=0,...,n-1, compute z = sum_{j=0}^{n-1} x_j * y_{i+j (mod n)} [can be done in O(n log n) time and O(n) space] Output Seems open: prove there's an O(n)-time problem that requires n^{1.1} time when space is restricted to O(log n). In notation, is TIME[O(n)] contained in TS[n^{1.1},O(log n)]? Or even: TIME[O(n)] contained in TS[O(n), O(log n)]? It's not clear to me whether these lower bounds of Ajtai, Beame, et al. actually establish something like this, because the problems that they lower bound typically take Omega(n log n) time to solve, but the time lower bounds they prove are o(n log n)... ==== FUNCTION PROBLEMS. In the case of function problems which output many bits (instead of decision problems, which just output one bit), it is easier to prove time-space lower bounds. Consider families of functions f_n : Sigma^n -> Sigma^n where Sigma is some finite alphabet (could depend on n). [Borodin-Cook 1982] Sorting n integers from the range [n,n^2] requires time-space product Omega(n^2/log n). Proof shows: for all multi-way BP for sorting, (length) * (log(size)) >= Omega(n^2/log n). [Beame 1991] Optimal Omega(n^2) lower bound. We will give an Omega(n^2) lower bound for a problem even simpler than sorting (which also implies LB for sorting). The following exposition comes from joint work with a PhD student, Dylan McKay. (We used this slightly simplified proof to prove other general results.) The simple problem is: Non-Occurring Elements (NOE): Given a list L of n elements from [n] (possibly repeating), print the elements of [n]-L in any order. That is, we want to print the elements that do not occur in L. Note that since L may have repeated elements, this is a non-trivial problem! Proposition: For any n <= T(n) <= n^2/(log n), S(n) >= log(n), NOE can be solved on the (log n)-word RAM in time T(n) and space S(n) where T(n)*S(n) <= O(n^2). Idea: Partition the set [n] into n/S(n) blocks, each of S(n) items. Perform n/S(n) passes over the input list of n items; in the i-th pass, determine which items in the i-th block are not occurring in the list. This just takes an S(n)-bit vector. The space is O(S(n)), the total time is n^2/S(n). Theorem [adapting Beame 1991] For every n-way BP for NOE with T(n) length and size 2^{S(n)}, T(n)*S(n) >= Omega(n^2). Recalling that length = time and size = 2^{space}, this implies the above algorithm is time-space optimal even for the (log n)-word RAM! Let's prove the lower bound. Like the SAT time-space tradeoffs, the proof has two ingredients that are combined together cleverly. However, this is where the similarity ends: the ingredients are very different from those in the SAT time-space tradeoff lower bounds! We will exploit two basic properties of *random* inputs to NOE. -- PART 1: UNIFORM RANDOM INPUT TO NOE REQUIRES A LONG OUTPUT. Proposition: There is a delta > 0 such that for sufficiently large n, Pr_{L in [n]^n}[L contains at least delta*n non-occurring elements] >= delta. Proof Sketch: Balls-and-bins argument. When we throw n balls into n bins, what's the probability that at least delta*n of the bins are empty? Let Z be a random variable indicating the number of empty bins. We want to know a delta > 0 such that Pr[Z >= delta*n] >= delta. Note Prob_L[i in [n] does not occur in L] = (1-1/n)^n Thus E[Z] = n*(1-1/n)^n ~ n/e for large n. It's not hard to show that Pr[Z > n/(2e)] is high, for large n... For example, the "Method of Bounded Differences" shows that for all t, Pr[|Z - E[Z]| > t] <= 2 e^{-2t^2/n}. Let t = alpha*n. For large n, we have Pr[Z < n/e - alpha*n] <= 2 e^{-2*alpha^2 n} thus Pr[Z >= n/e - alpha*n] >= 1 - 2*e^{-2 alpha^2 n} So we can set delta in (1/e,1), and our probability will be greater than delta for large enough n. QED -- PART 2: ALL SHORT BPs HAVE LOW PROBABILITY OF PRINTING MANY OUTPUTS OF A RANDOM INPUT Lemma: For all BPs P of length <= n/2 which make m outputs, where m <= n/2, Pr_{L in [n]^n} [P outputs m NOE of L] <= e^{-m/2}. Proof: Let pi be a path in the BP P from the start node to a sink. The desired probability equals (*) sum_{computation paths pi in P} Pr_L [P follows pi]* Pr_L [P outputs m NOE of L | P follows pi]. We'll show that for all computation paths pi in P, (**) Pr_L [P outputs m NOE of L | P follows pi] <= e^{-m/2}. Then (*) <= e^{-m/2} * sum_{computation paths pi of P} Pr_L [P follows pi] = e^{-m/2}, since those probabilities sum to 1. Now we turn to proving (**). Fix a path pi in P. We observe that Pr_{L in [n]^n} [P outputs m NOE of L | P follows pi] = (# lists in [n]^n consistent with path pi and the m outputs of P) ------------------------------------------------------------ (# lists in [n]^n consistent with path pi) So we just need to lower bound the denominator and upper bound the numerator. Let q <= n/2 be the number of distinct input variables queried along the path pi. There are n^{n-q} possible lists that are consistent with pi. (There are n-q unread variables, and there are n possibilities for each of those unread variables.) So the denominator equals n^{n-q}. How many lists are such that path pi outputs m NOE correctly? All n-q vars not queried in pi must *not* take any values among the m outputs, because the m outputs are supposed to be non-occurring elements! Thus there are at most (n-m)^{n-q} possible lists which are consistent with pi, and for which all the m outputs of P are correct. So the probability is at most (n-m)^{n-q}/n^{n-q} <= (1-m/n)^{n-q} <= (1-m/n)^{n/2} <= e^{-n/2*m/n} <= e^{-m/2}. [where we used 1-x <= e^{-x} at the end] QED Proof of Theorem: Let P be a BP for NOE with length T(n) and 2^{S(n)} size. We will show that if T(n)*S(n) < alpha*n^2 for a certain alpha > 0, then there must be an input on which P outputs the wrong answer. First we make P "layered": P has T(n) layers, each layer has O(2^{S(n)}) nodes, the start node is in layer 1, and all edges from layer i go to nodes in layer i+1. WLOG, we can do this by simply making copies of nodes; the new size is O(T(n)*2^{S(n)}) but the length is the same. Consider a random input list L in [n]^n to P. By Part 1, we know that there is a delta > 0 such that Pr_L[the number of NOE in L is >= delta*n] >= delta. We now claim that: (*) If T(n)*S(n) < (delta/10)*n^2, then Pr_L[P outputs >= delta*n NOE of L] << delta. Therefore there must be some L such that P outputs the wrong answer on L (it does not output enough NOEs on L). In order to prove (*), we "reduce" the problem to one about "short" branching programs: those for which we can prove limitations. Consider the event that P outputs >= delta*n NOE of L. Partition P into p = 2T(n)/n PARTS, so that each PART has n/2 contiguous layers. Think of each part as being a collection of n/2-length "subprograms". When P outputs >= delta*n NOE of L, by the pigeonhole principle, some part outputs >= (delta*n)/p = delta*n^2/(2T(n)) NOE of L. Call this the "important part of P on L" (which part it is, depends on L). We have Pr_L[P outputs >= delta*n NOE of L] <= Pr_L[the important part of P on L outputs >= delta*n^2/(2T(n)) NOE] By Part 2, for every length-n/2 BP P' with number of outputs m := delta*n^2/(2T(n)), Pr_L[P' outputs m NOE of L] <= e^{-m/2}. However, we don't know on a random input L which path will be taken in the important part; it depends on which node we start with in that important part. There are at most 2^{S(n)} nodes to start from in the important part. So by a union bound, Pr_L[important part on L outputs >= delta*n^2/(2T(n)) NOE] <= Pr_L[*SOME* subprogram in the important part on L outputs >= delta*n^2/(2T(n)) NOE] <= 2^{S(n)}*e^{-delta*n^2/(4 T(n))}. If T(n)*S(n) < (delta/10)*n^2, then the above bound is < 2^{(delta/10)*n^2/T(n)}*e^{-(delta/4)*n^2/T(n)}. This probability goes to zero as n increases. Thus (*) holds, and therefore the lower bound holds. QED Let's recap. We showed (1) On a random input, the correct answers are long with decent probability. (2) On a random input, short BPs have exp-low probability of outputting a long-ish correct answer. (3) If we partition a longer small BP into a bunch of short BPs, can use (2) to show there is low probability on a random input that the longer BP won't output long correct answers. Contradiction with (1). Because the NOE problem is so simple, it implies lower bounds for many other problems: UE: Given a list L of 2n elements from [n], output the sublist of elements that appear exactly once in L. We'll show that computing UE efficiently implies that NOE can be solved efficiently. Lemma: Suppose UE has a BP on [n]^{2n} of time t(n) and space s(n). Then NOE has a BP on [n]^n of time t(n) and space s(n). Proof: Given a BP for UE on lists of length 2n over [n], suppose it's leveled, with 2^{s(n)} nodes at each level. Note this BP cannot print any outputs before the 2n-th layer, otherwise it would not be a valid BP: it would output a unique element before it read all the inputs, so you could simply include that output in the remaining unread input. Our BP for NOE will be a "subprogram" of the BP for UE. We set the start node of our BP for NOE to simply be the node in the (n+1)th layer that would have been reached in the original BP assuming it read "1" in the first layer, "2" in the second layer,...., "n" in the n-th layer. This is now a BP for NOE: each of the n elements occurring after this point are not unique, so the original BP will print those that do *not* occur after this point; this is precisely the list of NOE. QED Corollary [Beame 1991] Every time-t(n) and space-s(n) (log n)-word RAM computing UE requires t(n)*s(n) >= Omega(n^2). Corollary [Beame 1991] Every time-t(n) and space-s(n) (log n)-word RAM for Sorting requires t(n)*s(n) >= Omega(n^2). Thm [McKay-Williams 201?] #SAT, printing SAT assignments requires t(n)*s(n) >= Omega(n^2) on random-access TMs (even with random oracle: random access to 2^{s(n)} bits of randomness) ===== QUESTION: Is the above proof method subject to any barriers? Yes, there is at least one... For one, there are *no* functions f : Sigma^n -> Sigma^n such that T(n)*S(n) >> n^2 for all T(n), S(n). Observation: Every f : Sigma^n -> Sigma^m has a Sigma-way BP of length O(n+m) and size O(|Sigma|^n*m). Proof: Make a complete |Sigma|-ary tree of depth n, with |Sigma|^n leaves, which simply reads in all n inputs. Each leaf now corresponds to a distinct x in Sigma^n. From the leaf for x, add a path of m nodes where each node on the path outputs a character from f(x). QED Corollary: For every f : Sigma^n -> Sigma^n, there *exists* a space bound S(n) such that there's a BP with time-space product T(n)*S(n) <= O(n^2 log(|Sigma|)). (Namely, S(n) = n log(|Sigma|)+Theta(1)) Therefore it's impossible to prove a super-linear lower bound in this model when you have linear space. But what if the BP is small, e.g., has poly(n) size? Then BPs characterize LOGSPACE: Claim (follows from earlier observations) f : {0,1}^*->{0,1} is in LOGSPACE <=> f has a family of poly(n)-size 2-way BPs.