CS294-152 Lower Bounds 10/15/18 and 10/22/18 (two "short" lectures, due to other overlapping events) === Lower bounds for AC0 with PARITY gates Last time, we showed PARITY is not in AC0. Natural question: What happens when we add PARITY as part of the gate basis? Then those lower bounds clearly fail... AC0[2] = AC0 with PARITY = problems solvable with O(1)-depth, poly-size, over PARITY,OR,AND,NOT AC0[m] = AC0 with MODm where MODm(x1,...,xn) = 1 <=> sum_i x_i is div by m In this case, one can show lower bounds for different functions, such as MODp (for p != 2^k) and MAJORITY(x_1,...,x_n) = 1 <=> sum_i x_i >= n/2. Thm [Razborov 87] MAJORITY is not in AC0[2]. Thm [Smolensky 88] For all primes p != q, MODq is not in AC0[p], MAJORITY not in AC0[p] Note: Random restrictions cannot help here! You can set all but k+1 inputs in a PARITY gate, and it will still not simplify to a k-CNF! We are forced to take a different strategy. Polynomial Method: - Represent "easy" f : {0,1}^n -> {0,1} as "easy" multilinear polynomials (i.e. polys with no squares). Notation: x = (x_1,...,x_n), [n] = {1,...,n} Every n-variable multilinear polynomial has the form: sum_{S : S subset [n]} c_S * prod_{i in S} x_i for some 2^n coefficients c_S. Fact: For every f : {0,1}^n -> {0,1} and every field F, there are *unique* c_S in F such that for all a in {0,1}^n, f(a) = sum_{S : S subset [n]} c_S * prod_{i in S} a_i. Examples: AND(x) = prod_{i=1}^n x_i. OR(x) = 1-prod_{i=1}^n (1-x_i). Proof: For all a in {0,1}^n, define delta_a(x) = 1 iff x = a. Expressing delta_a as a polynomial, we can write delta_a(x) = prod_{i=1}^n (1-x_i + a_i*(2x_i-1)). Note (1-x_i + a_i*(2x_i-1)) = (1-x_i) if a_i = 0, and x_i if a_i = 1. Every function f can be written as the polynomial sum_{a in {0,1}^n} f(a)*delta_a(x). To show this representation is unique, we can use the following fact (which we won't prove). Fact: Let p be a multilinear polynomial (no squares). Suppose for all a in {0,1}^n, p(a) = 0. Then p is *identically* zero. (Note this follows from a version of the Schwartz-Zippel Lemma: If degree-d multilinear p is NOT identically zero, then Pr_{a in {0,1}^n}[p(a) != 0] >= 1/2^d.) So if there were two different multilinear polys q and p for the same function f, we would have (p-q)(a) = 0 on all a in {0,1}^n ==> p - q is identically zero => p = q. QED In this lecture, we'll show MAJORITY not in AC0[2]. Using MOD2 (PARITY) instead of MODp avoids a lot of extra worries about finite fields, but I will highlight how you can prove MAJORITY not in AC0[p] and even MODq not in AC0[p]. Plan: (1) All AC0[2] circuits are well-approximated by low-degree polys over F2. (2) MAJORITY aren't well-approximable. The key concept of part (1) is: Probabilistic Polynomial over field F for f : {0,1}^n -> {0,1} of degree d and error epsilon: Distribution D of F-polynomials p(x_1,...,x_n) of degree d s.t. for all x in {0,1}^n, Pr_{p from D}[p(x) = f(x)] >= 1-epsilon. This is like a "randomized algorithm" for f where the computational model is a polynomial: You want to evaluate f at some point x. You randomly draw a poly p, and evaluating p on x gives you the right answer whp. The key theorem behind part (1) is that small AC0[2] circuits have low-degree polys over F2 (field of two elements): *** Prob Poly Thm: For all C with n inputs, size s, depth d, over {OR,AND,PARITY}, for all k, there's a prob poly over F2 for C with degree <= k^d and error <= s/2^k. A similar theorem holds for circuits over {OR,AND,MODp}: the degree changes to ((p-1)k)^d, and the error changes to s/p^k. Example: k = 2*log(s). Then the degree is 2^d*(log s)^d and the error is <= 1/s. Pf: We replace each gate of C with a prob poly of degree <= k, and error <= 1/2^k. Since there are s gates, total error is <= s/2^k by union bound. Since there are d layers of depth, total degree is <= k^d. Let's go through the possible gate types. The first two are very easy: (a) NOT(x_i) -> 1-x_i (b) PARITY(x_1,...,x_t) -> sum_{i=1}^t x_i, because we are working over F2 (the field Z/2, i.e. modulo 2). (c) polynomial for OR? *** OR Lemma: For all k and n, there's a prob poly for OR_n over F2 of degree k and error <= 1/2^k. Note, these parameters are independent of n(!). Pf (Lemma): Pick random S subset of [n], set p_1(x) := sum_{i in S} x_i. OR(x) = 0 => p_1(x) = 0. OR(x) = 1 => Pr_S [p_1(x) = 0] = 1/2. Why? If OR(x) = 1, then x != 0^n. We have p(x) = 1 <=> S picks an odd number of 1s from x. Precisely of the subsets S choose an odd number of 1s from x, and half choose an even number of 1s. (Recall PARITY is true on exactly 1/2 the inputs.) [Note the probability of success can be slightly higher if we do not allow the empty set. E.g. for n=2, we only have choose among S = {1}, S = {2}, S = {1,2}, and have probability 2/3. Then P_S[p_1(x) = 1] = 2^{n-1}/(2^n-1) = 1/2*(1 + 1/(2^n-1)), slightly better for small n.] The error 1/2 can be reduced by increasing the degree: "multiplying repeated trials" Pick independent random subsets S_1,...,S_k of [n]. Define p_j(x) := sum_{i in S_j} x_i, and set P(x) := OR(p_1(x),...,p_k(x)) = 1 - prod_j(1-p_j(x)). OR(x) = 0 => All p_j(x) = 0 => P(x) = 0. OR(x) = 1 => Pr[P(x) = 0] = Pr[(for all j)p_j(x) = 0] = 1/2^k. QED (Lemma) (d) AND(x) = 1-P(1-x_1,...,1-x_n). This poly also has degree k and error <= 1/2^k. Draw one prob poly for OR, and use it to get a poly for AND. Replace each AND/OR gate of C with these two prob polys. Each gate gets replaced by a sum of products of fan-in <= k. Now we have an "arithmetic circuit" representing the original circuit C: every gate acts like a sum over F2, or a product over F2. Since C has depth d, by the distributive law (converting products of sums into sums of products), can write the entire expression as a sum of <= s^{k^d} products of fan-in <= k^d. Let this final polynomial be Q. We want to upper bound the probability Q outputs the wrong bit. Every assignment x to the inputs induces outputs to all the gates. Fix an assignment x to the inputs and fix a gate i in C. Let y_{i,x} be the bits that are input to gate i, given input x. For the prob poly p_i representing i, Pr[p_i outputs the wrong bit on input y_{i,x}] <= 1/2^k, because for every input, p_i behaves like the gate it is simulating, with probability >= 1-1/2^k. By the union bound, Pr_Q[Q(x) != C(x)] <= Pr[there's an i s.t. p_i outputs wrong bit] <= s/2^k. QED We can use the Prob Poly Thm to show that every AC0[2] circuit has a low-degree polynomial that "agrees" with the circuit on a large fraction of inputs. Cor: Let C be as above (n inputs, size s, depth d, AC0+parity). There is a polynomial p(x) of degree <= k^d over F2 such that Pr_{x in {0,1}^n}[C(x) = p(x)] >= 1-s/2^k. Pf: Let D be prob poly distribution for C. We will show E_{p \sim D}[Pr_x[C(x)=p(x)]] >= 1-s/2^k, thus there must be at least one p where Pr_x[C(x) = p(x)] >= 1-s/2^k. Define random variable Z_{p,x} = 1 <=> C(x) = p(x). Then E_p[Pr_x[C(x)=p(x)]] = E_p[sum_x Z_{p,x}/2^n] = sum_x E_p[Z_{p,x}]/2^n (linearity of expectation) = sum_x Pr_p[C(x) = p(x)]/2^n. By the prob poly thm, Pr_p[C(x) = p(x)] >= 1-s/2^k for *every* x. Thus the above sum is >= 1-s/2^k. QED The above observation is standard, and useful to know (if you don't already): given a randomized worst-case algorithm that works whp, there is a "fixed" algorithm that works on random inputs whp. In general, deg-d probabilistic polynomials for f with error eps ==> there's a fixed deg-d polynomial agreeing with f on >= 1-eps fraction of points in {0,1}^n. *** MAJORITY Lower Bound So far, we've just been proving results about AC0[2], showing that its functions can be represented well by low-degree probabilistic polynomials. Now we turn to analyzing MAJORITY. We'll show that MAJORITY cannot be well-approximated by any low-degree polynomial: Inapproximability Thm: There's a delta > 0 s.t. for all F2-polys of degree <= sqrt{n}, Pr_{x in {0,1}^n} [p(x) != MAJORITY(x)] >= delta. That is, every polynomial of degree <= sqrt{n} *disagrees* with MAJORITY on >= a delta-fraction of points. It is not hard to show this is optimal: Thm: There is a degree-O(sqrt{n}) polynomial that *agrees with* MAJORITY on at least 99% of {0,1}^n. How would you construct this optimal polynomial? Proof Sketch (for working over the integers). Consider the degree-O(sqrt{n)}) univariate polynomial p(x), which is 0 on all integers x in [n/2 - 100*sqrt{n},n/2] and is 1 on all integers x in [n/2+1,n/2+100*sqrt{n}]. (Can be obtained by polynomial interpolation.) By Chernoff bounds, or tail inequalities for binomials, or Chebyshev's inequality, for MOST x in {0,1}^n, we have sum_i x_i in [n/2-100sqrt{n},n/2+100sqrt{n}]. So take p(sum_i x_i). This polynomial agrees with MAJORITY on MOST points. QED of Proof Sketch. More complicated methods will achieve this over finite fields as well. [[[Note it was open whether a stronger lower bound held for computing MAJORITY with probabilistic polynomials. Josh Alman and I proved in FOCS15 that this lower bound cannot be improved for probabilistic polynomials: there is a probabilistic polynomial of degree O(sqrt{n}) and error eps that computes MAJORITY, for any constant eps > 0.]]] Assuming the Inapproximability Thm, we can prove: Thm: MAJORITY does not have depth-d AC0[2] circuits of size o(2^{n^{1/(2d)}}). Pf: In fact we can show an exponential-size lower bound. By the Prob Poly Thm (Corollary), (1) For all AC0[2] C of size s and depth d, there's a degree k^d poly p s.t. Pr_x[C(x) != p(x)] <= s/2^k. By the Inapproximability Thm, (2) There's a delta > 0 s.t. for all C, if C computes MAJ then for all p of degree <= sqrt{n}, Pr_x[C(x) != p(x)] >= delta. Set k such that k^d <= sqrt{n}. E.g. set k = n^{1/(2d)}. Assuming C computes MAJ, (1) and (2) together => s/2^k >= delta. => s >= delta*2^{n^{1/(2d)}}. QED. ======= Announcements (for 10/22): - Please send me your "review" of a talk from last week's workshop, today! - Office hours tomorrow at 4pm, Simons Institute, 2nd floor common area. ** OPEN PROBLEM: The above shows that MAJORITY does not have depth-d AC0[2] circuits of size o(2^{n^{1/(2d)}). But the best upper bound is 2^{O(n^{1/(d-1))}. Which can be improved? Upper or lower bound? Note for d=2, the upper bound is known to be optimal... *** Let's turn to proving the inapproximability thm for MAJORITY. The key claim is that if MAJORITY has a low-degree polynomial on some set of points, then EVERY Boolean function has a "somewhat low" degree poly on that set of points! In the literature, the terminology is that "MAJORITY is versatile" Versatility Lemma: Let T be a subset of {0,1}^n, and let p be a degree-d F2-poly. Suppose for all a in T, MAJORITY(a) = p(a). Then for *every* f : {0,1}^n -> {0,1}, there is a degree-(n/2+d) polynomial p_f such that for all a in T, f(a) = p_f(a). Pf: Recall every f can be written as a multilinear polynomial over F2: f(x) = sum_{S subset of [n]} c_S prod_{i in S} x_i, where the c_S in F2. We can *also* find another set of coefficients d_S in F2 such that f(x) = sum_{S subset of [n]} d_S prod_{i in S} (1-x_i). (**) That is, the functions g_S(x) = prod_{i in S} (1-x_i) also form a basis for the vector space of functions f : {0,1}^n -> {0,1}. Another way of saying this is that we can represent any f by a linear combination of AND functions, *as well as* a linear combination of OR functions. (Note OR_{in S} x_i = 1-prod_{i in S}(1-x_i), as a polynomial.) Now consider the expression: h(x) = MAJ(x)*(sum_{S : |S| <= n/2} d_S prod_{i in S} (1-x_i) + (1-MAJ(x))*(sum_{S : |S| <= n/2} c_S prod_{i in S} x_i). Claim: h(a) = f(a), for all a in {0,1}^n. If MAJ(a) = 1, then at least n/2 of the a_i's are 1. Thus any subset S of > n/2 bits of a contains at least one 1. But (there is an i in S such that a_i = 1) => prod_{i in S}(1-a_i) = 0. Therefore f(a) = (**) = (sum_{S : |S| <= n/2} d_S prod_{i in S} (1-a_i)), because the other terms are zero! So for those a where MAJ(a) = 1, f(a) = h(a): the second part is zeroed out by (1-MAJ(a)), and the first part equals f(a). Similarly, if MAJ(a) = 0, then at least n/2 of the a_i's are 0, and any set of > n/2 bits of a contains at least one 0. Therefore f(a) = (sum_{S : |S| <= n/2} c_S prod_{i in S} a_i), again because all other terms of (*) are zero, and h(a) = f(a). Over all a in T, the MAJ(a) can be replaced with a degree-d poly, by assumption. Thus the degree of f is n/2+d over T. QED Now we can prove the inapproximability theorem. Pf of Inapx Thm: Suppose p has degree <= sqrt{n} and agrees with MAJORITY on some set T. We want to show there's a delta > 0 such that |T| <= (1-delta)*2^n. Claim: There's a delta > 0 s.t. (# multilinear monomials over {x_1,...,x_n} of degree <= n/2+sqrt{n}) <= (1-delta)*2^n. (Note: Chebyshev's inequality and Chernoff bounds would only prove *upper bounds* on the tail of the distribution, i.e., upper bounds on the probability of having more than n/2+sqrt{n} heads. Here we want a *lower bound* on the probability that the number of heads is > n/2+sqrt{n}) In fact we can set delta = 1/50. Note the LHS (# monomials) equals (# of subsets [n] of cardinality <= n/2+sqrt{n}) = sum_{i=0}^{n/2+sqrt{n}} {n choose i}. So the Claim follows from bounds on binomial coefficients. It is equivalent to analyzing the following experiment: Toss n fair coins, what's the probability that > n/2+sqrt{n} coins are heads? It's actually > 1/50 (but not much bigger...) Proof is omitted (it's technical, and not too informative). [[ Here's a simple proof of the Claim with "sqrt{n}" replaced by "sqrt{n}/2", which would already imply a 2^{Omega(n^{1/(2d)}) lower bound. We use the facts: (1) for all eps > 0, and all suff large n, {n choose n/2} <= (1+eps)*2^n/sqrt{pi*n} < 2^n/(3 sqrt{n}). (2) for positive k, {n choose n/2 + k} = {n choose n/2 - k} < {n choose n/2} (binomial coefficients are maximized at k=n/2) (3) sum_{i=0}^{n/2} {n choose i} <= 2^{n-1}. Then sum_{i=0}^{n/2+sqrt{n}/2} {n choose i} <= 2^{n-1}/2 + sum_{i=n/2+1}^{n/2+sqrt{n}/2} {n choose i} < 2^{n-1}/2 + sqrt{n}/2 * 2^n/(3 sqrt{n}) = 2^n*(1/2 + 1/6) = 2/3 * 2^n. So in this case, we can set delta = 1/3. To obtain the original claimed bound, we have to use more exact expressions for {n choose n/2 + k}. ]] The Claim implies (# multilinear polys of deg <= n/2+sqrt{n} over F2) <= 2^{(1-delta)*2^n}. Why? (The number of deg <= d polynomials is the number of ways to choose a distinct coefficient in {0,1} for each of the deg <= d monomials.) In comparison, the total number of functions f from T to {0,1} is (#f : T -> {0,1}) = 2^{|T|}. Versatility Lemma => Every f : T -> {0,1} has a (n/2+sqrt{n})-degree polynomial. Therefore (#f: T -> {0,1}) <= (# of multilinear polys of degree <= n/2+sqrt{n}). The LHS is 2^{|T|}, and the RHS is 2^{(1-delta)2^n}. So we must have |T| <= (1-delta)*2^n, and our original p agrees with MAJORITY on at most (1-delta)*2^n inputs. This completes the proof of the inapproximability theorem. QED For prime p != 2, one can prove MOD2 is "versatile" over Fp as well. The proof is somewhat similar: instead of {0,1}, rewrite functions to be from {1,1}^n -> {-1,1}. Then the products in our multilinear representation function like MOD2 gates. We can take any f:{1,1}^n -> {-1,1} and write it as a degree <= n/2 polynomial with MOD2 "for free", just as we can take any f:{0,1}^n -> {0,1} and write it as a degree <= n/2 poly with MAJORITY "for free". So we get lower bounds for MOD2 over Fp. In general, MODq is versatile for functions from {0,1}^n to Fp, where p != q. OPEN: MAJORITY in AC0 + PARITY + MOD3? MAJORITY in AC0 + MODp + MODq where p != q are prime? (It is conjectured that MAJORITY is not in such a class) Define ACC0 := union_m AC0[m] = problems solvable with O(1)-depth, poly-size, over MODm,OR,AND,NOT, for some fixed m >= 2. OPEN: Is TIME[2^n] contained in ACC0? ======================================= ACC0: A Nice Polynomial Representation The best general lower bound we know for ACC0 is much weaker than what is known for AC0[p]: Thm [MW18] There are functions in NTIME(n^{polylog n}) which are not in poly-size ACC0. This lower bound also exploits a polynomial representation for ACC0. However, this representation is not quite as "nice" as the probabilistic/approximate polynomial representation we saw for AC0[p]. Today we'll discuss this polynomial representation. Later, we will see how to use it for lower bounds. SYM = Symmetric Boolean function, a function that depends only on the number of 1s in the input. Formally, f : {0,1}^n -> {0,1} is symmetric if there is a g : {0,1,...,n} -> {0,1} such that for all x, f(x) = 1 <=> g(sum_i x_i) = 1. Thm [Beigel-Tarui 91 (building on Yao)] For every symmetric function f, there is a symmetric function f' s.t. f of s AC0[m] circuits of size s and depth d can be represented as a f' of s^{polylog(s)} ANDs, where the polylog factor depends on d and m. Moreover the transformation can be computed in (deterministic) s^{polylog(s)} time, given a starting "f of ACC" circuit as input. Sketch of Simpler Beigel-Tarui (still technical!): AC[6] size-s depth-d can be represented as a SYM of s^{polylog(s)} ANDs. With a probabilistic construction. (Can be derandomized) Some optimization due to Chen and Papakonstantinou (2016). 0. Note: can write OR as MOD2 of AND of MOD2 (DeMorgan's laws) 1. First, think of the AC0[6] circuit C as being ``layered'' (similar to what we did for AC0): Layers of (MOD3 of MOD2 of AND) repeated <= O(d) times, by sticking in extra gates that don't really do anything. Size is still <= s^{c} for some c <= O(d). 2. Construct 10n circuits C_i as follows: 2.1 Replace each AND (over {0,1}) with a probabilistic F_3-polynomial of degree 2c*log(S). Has error <= 1/s^{2c}, so overall the probabilistic circuit has error <= s^c/s^{2c} <= 1/s^c. Can write each such AND (with fan-in S) as a probabilistic polynomial as a MOD3 of S^{2c log(S)} ANDs, each AND has fan-in <= 2c*log(S). 2.2 Next, write each MOD3 of AND of fan-in <= 2c*log(S) as a MOD3 of S^{2c} MOD2s of fan-in <= 2c*log(S). In particular, AND on v variables can be written as a sum (mod 3) of <= 2^v MOD2's. (Why can we do this? Every function on v variables can be written in the Fourier basis, i.e. as a real-valued sum of <= 2^v MOD2s on v variables. Moreover, each "Fourier coefficient" for these MOD2s has the form j/2^v, where j in [-2^v,...,2^v]. In F_3, we can divide by 2, so the answer is preserved over F3.) 2.3 Now each C_i is O(d) layers of (MOD3 of MOD2). 3. Let D = MAJORITY(C_1,...,C_{10n}). Chernoff bound ==> Over each input x of length n, Pr[D(x) != C(x)] << 1/2^n. Union bound ==> Pr[exists x D(x) != C(x)] << 1. So D is correct on all 2^n inputs with good probability. 4. Now we want to transform the MAJORITY of (MOD3 of MOD2)^* into a SYMMETRIC function of ANDs. In fact, we will give a generic way to take: - any SYM of MOD3 of MOD2 and convert it into a SYM of AND, and any SYM of MOD2 of MOD3 and convert it into a SYM of AND. Concretely, suppose we have a SYM of t MOD3 of subcircuits C_{i,1},...,C_{i,u} which output 0/1, where each C_{i,j} has a MOD2 at its output gate. Note there are t*u subcircuits in total. Let the symmetric function be f, has t inputs. The total circuit is then equivalent to f((sum_{j=1}^u C_{1,j} mod 3),...,(sum_{j=1}^u C_{t,j} mod 3)) Modulus-Amplifying Polynomial: Let L in {1,2,...}. Polynomial P_L(x) is L-mod-amplifying if for all m, x \in Z, where m >= 2, x = 0 mod m ==> P_L(x) = 0 mod m^L x = 1 mod m ==> P_L(x) = 1 mod m^L. P_L "amplifies the modulus from m to m^L." Thm[B-T] For every L, there's a mod-amplifying P_L of degree <= 2L. Can be efficiently constructed. (Omitted) Set L:= log_2(t). Consider: (*) = sum_{i=1}^t P_L(sum_{j=1}^u C_{i,j}) By the mod-amplifying property, (*) = sum_{i=1}^t P(sum_{j=1}^u C_{i,j})) = sum_{i=1}^t P(sum_{j=1}^u C_{i,j}) mod 3^L (since t < 3^L) = sum_{i=1}^t [sum_{j=1}^u C_{i,j} mod 3] mod 3^L ([property] = 1 if property is true, [property] = 0 if property is false) Key Observation: Since t < 3^L, knowing the sum (*) determines the number of original inputs to f which are set true. Thus there is another symmetric function g such that f((sum_{j=1}^u C_{1,j} mod 3),...,(sum_{j=1}^u C_{t,j} mod 3)) = g(sum_{i=1}^t P(sum_{j=1}^u C_{i,j})). Writing the polynomial P as a sum of ANDs of fan-in L = log_2(t), we get a SYM of ANDs of fan-in log_2(t) of the subcircuits C_{i,1},...,C_{i,u}. 5. Write the ANDs of fan-in log_2(t) as a sum of <= O(t) MOD2s of fan-in log_2(t). Then we have a SYM of MOD2s of C_{i,1},..,C_{i,u} where each C_{i,j} has a MOD2 at the output gate But MOD2 of MOD2 = MOD2, so this is just a SYM of MOD2s. We have completely eliminated the MOD3 layer just below the SYM function! 6. We can do a similar thing for a SYM of MOD2 of MOD3s: -> Use the mod-amplifying polynomial with a power 2^L instead of 3^L. -> Get a SYM of ANDs of fan-in L of MOD3s. Next, we can write the ANDs as a sum of O(3^L) MOD3s (writing AND in the "MOD3" basis). However, we can't just merge the layers of MOD3; we don't have MOD3 of MOD3 = MOD3 easily. What we can say is: MOD3 of s MOD3 of t can be written as MOD3 of s*t^2 ANDs of fan-in 2. Idea: "square" the sum in each bottom MOD3, to force it to output a 0/1 value. (for all a in {0,1,2}, a^2 = 0 or 1 mod 3) Then sum up those squares modulo 3. Using AND of 2 |--> sum (mod 3) of O(1) MOD2s of 2, we can get: MOD3 of s*t^2 ANDs of fan-in 2 |--> MOD3 of O(s*t^2) MOD2s of 2. So now we are back to SYM of MOD3 of MOD2 again. Either (1) you put the ANDs between a MOD3 and a MOD2 layer, or (2) your ANDs reach the bottom of the circuit (can stop). Example: start with the circuit AND of s MOD2 of s MOD3 of s MOD2 of s -- Use probabilistic polynomial for AND: MAJORITY of O(n) (MOD3 of s^{O(log s)} MOD2s of log(s)) MOD2 of s MOD3 of s MOD2 of s = MAJORITY of O(n) MOD3 of s^{O(log s)} MOD2 of s*log(s) MOD3 of s MOD2 of s Use mod-amplifying polynomial on MAJORITY of MOD3: = SYM of s^{O(log s)^2} AND of O(log s)^2 MOD2 of s*log(s) MOD3 of s MOD2 of s -- Convert AND into MOD2s: SYM of s^{O(log s)^2} MOD2 of O(log s)^2 MOD2 of s*log(s) MOD3 of s MOD2 of s -- Merge MOD2s: SYM of s^{O(log s)^2} MOD2 of s*log(s)^3 MOD3 of s MOD2 of s -- Use mod-amplifying polynomial on SYM of MOD2: SYM of s^{O(log s)^3} AND of O(log s)^3 MOD3 of s MOD2 of s -- Write ANDs as sum of MOD3s: SYM of s^{O(log s)^3} MOD3s of O(log s)^3 MOD3 of s MOD2 of s -- "Merge MOD3s": Write MOD3 of s MOD3 of t as MOD3 O(s*t^2) of MOD2s of 2: SYM of s^{O(log s)^3} MOD3 of O(s^2(log s)^3) MOD2 of s -- Use mod-amplifying polynomial on SYM of MOD3: SYM of s^{O(log s)^4} MOD2s of O(log s)^4 -- Finally, we could convert each MOD2 to a sum of 2^{O(log s)^4} ANDs. Note we get a SYM of s^{O(log s)^{O(d)} ANDs!