CS294-152 Lower Bounds
9/24/18
===
Functions With and Without Small Circuits

Recall:
SIZE(s(n)) := class of f:{0,1}^* -> {0,1} such that for all but finitely many n, f_n has a circuit of size <= s(n).

ioSIZE(s(n)) is the same definition but "all but finitely many" is replaced with "infinitely many".

Last time we saw that E^{Sigma2 P} is not in SIZE(2^n/(cn)) for some c \geq 1.

Today we'll consider lower bounds against SIZE(n^c) for a fixed c \geq 1, 
and lower bounds against PSIZE = union_c SIZE(n^c).

Our goal is to find the "weakest" functions which do not have small circuits: 
to put these hard functions in the smallest complexity classes possible.

Why the smallest class possible? 
Although the question is arguably very mathematically interesting in itself, here are two concrete complexity-theoretic reasons: 

1. We mentioned last time:
 [IW97] If E=TIME[2^{O(n)}] is not in io-SIZE(2^{alpha*n}) for some alpha > 0
 then P = BPP.

2. If a class C is not in PSIZE, then P != C. (Think NP, PSPACE, PP) 

But it is open whether E^{NP} is in SIZE(n^2)! 
(Even though we have E != P, by the time hierarchy theorem!)

Why is it so hard to prove *circuit* lower bounds for algorithmic problems? 
Here is one very good reason: 
there are functions of *arbitrary* time complexity that have trivial circuits.

Proposition: There is an undecidable f in SIZE(O(1)). 

Pf: Let PI:{0,1}^*->{0,1} be your favorite undecidable problem. 
Let bin : N -> {0,1}^* be an efficient 1-1 encoding of all natural numbers in binary.
Then 
PI'(x) := {1 if bin(|x|) in PI
		= {0 otherwise

is also undecidable. 
(There is an exptime mapping reduction from PI to PI': 
Given y, let n = int(y) and output 1^n.
Then PI(y) = PI'(1^n).)

However, for every n, either PI'_n = all-ones, or PI'_n = all-zeroes.
Both can be implemented with one gate: e.g. x1 OR not(x1), x1 AND not(x1).
QED

The point is that the "infinite" computational model of circuit families 
(where we get one circuit C_n for each input length n) 
has very different behavior in general from the algorithmic model, where programs are assumed to be finite strings. 
This means we will have to devise much more interesting arguments to prove even that E is not in SIZE(n^2) (open).

=====
P-Time Algorithms have Small Circuits

Efficient algorithms do have small circuits.
We can relate algorithmic time to circuit size in the following way:

Thm: There's a c >= 1 such that f in TIME(t(n)) => f in SIZE(t^c(n)).

The exact constant c depends on the computational model. Generally speaking, we can put c <= 3.

Pf: First, for all "reasonable" deterministic models, 
a time-t(n) algorithm in that model can be converted into a one-tape TM M running in t^b(n) time, for some b.
We will show how to simulate M on length-n inputs with a circuit C_n of O(t^{2b}(n)) size.

The basic idea is to work with the computation tableau T of M on x, an O(t^b(n)) by O(t^b(n)) size matrix, and exploit the locality of one-tape TMs to use O(1) gates for each cell of the tableau. 

Each cell holds k = O(1) bits of information, 
corresponding to one tape cell of M on x, and possibly the state of M on x at that step.
Each row encodes a configuration of M; 
each column corresponds to a particular cell of M and how it changes over time. 

We replace each cell T(i,j) of the tableau with a circuit that takes 3k bits from 3 cells in the previous row, 
and computes the value T(i,j) as a k-bit string.

On the top row of T we hard-code all the constants of the initial configuration, 
and have our input gates point into where the input goes.

For all cells T(i,j) below the top row, we note that T(i,j) is a function of <= 3 cells on the row above it: T(i-1,j-1),T(i-1,j),T(i-1,j+1). Indeed at each cell we just need a "transition computer" function
 f : {0,1}^{3k} -> {0,1}^k 
 which takes in T(i-1,j-1),T(i-1,j),T(i-1,j+1) and outputs the value for T(i,j). 
By our results on circuit complexity from last time, 
f has a circuit of size O(2^{3k}): a circuit of size O(2^{3k}/k) for each of the k output bits.

On the bottom row of T we want to know if M is in an accept state. 
We can take an OR over all cells in the last row to check if any are in an accept state.

For n-bit inputs, the total circuit size is O(2^{3k} t^{2b}(n)).
QED

Cor: P = union_c TIME(n^c) is a subset of union_c SIZE(n^c) = SIZE(n^{O(1)}).

Cor: If a class C is not in PSIZE, then P != C. (Think NP, PSPACE, PP) 

Open: P in SIZE(n^2)? P in SIZE(n^k) for some *fixed* k?
Note: This is *NOT* the same as P in PSIZE!

P in PSIZE: for all ptime algs A there is a k_A and an n^{k_A}-size circuit family for A.
P in SIZE(n^k): for all ptime algs A, there is an n^k-size circuit family for A.

====
Randomized P-Time Algorithms have Small Circuits

Def. f in BPP <=> there is poly p(n) and deterministic p(n)-time alg A(*,*) such that
					for all x, Pr_{r in {0,1}^{p(|x|}}[A(x,r) = f(x)] > 2/3.

Can make 2/3 very close to 1, e.g. 1-1/2^{poly(n)}.					
					
Thm: BPP in P/poly.	

Two ways to look at this: 

(1) "Adleman's argument"
Define an "amplified" randomized algorithm:
Pick r_1,...,r_{100n} at random, output Majority(A(x,r_1),...,A(x,r_{100n})).
Use Chernoff bounds to show the probability of success has gone up, from > 2/3 to > 1-1/2^n.
By union bound, there is now a single choice r_1,...,r_{100n} 
which outputs the correct answer on all 2^n inputs x.
Can hard-code r_1,...,r_{100n} in a psize circuit implementing Majority(A(x,r_1),...,A(x,r_{100n})).

(2) "Pseudorandom sets"
For each n, find S subset of {0,1}^{p(n)} such that |S| = O(p(n)^2) and
|Pr_{r in S}[A(x,r)=f(x)] - Pr_{r in {0,1}^{p(|x|)}}[A(x,r)=f(x)]| < 1/10.
Then just hard-code those O(p(n)^2) strings in a psize circuit for A(*,*).

(2) requires more randomness, but gives a stronger guarantee (it works for all *circuits* of size p(n), not just algorithms running in p(n) time).

Amplifying success probability (how much randomness do you need to get better success probability? what resources do you need? etc.) is a topic with many many references and has been studied extensively over decades.

Open: NEXP = BPP??

Open: BPTIME(O(n)) = BPP = BPTIME(n^{polylog n})?
 
Known: BPP is properly contained in BPTIME(2^{n^{eps}}), for all eps > 0 (simple padding argument).

Main problem: if I give you the code of a deterministic or a nondeterministic alg, you can simply run it on input and not worry. But if I give arbitrary code for a "randomized" algorithm, it might not satisfy the BPP condition (that on all inputs, the probability it accepts is > 2/3, or the probability it rejects is > 2/3). So a "universal simulator" for BPP would seemingly have to check this condition before running a given randomized algorithm, otherwise it would not be a BPP machine itself! 

[Fortnow-Santhanam'04] for all k, BPTIME(n^k)/1 != BPTIME(n^{k+1})/1 
one bit of advice! 
 

====
PSIZE = P-Time Algorithms With Advice

Circuit families constitute one non-uniform model of computation. 
There is a more general way to characterize how "non-uniform" programs can help solve problems. 
Consider the situation: suppose I give you polynomial time to solve a problem, 
but for inputs of length n, I let you write a program of f(n) lines of code. 
(The program must still run in poly-time, but the code size is allowed to grow with the input.)
What problems can you then solve?  

It's a natural question! And it's inherently related to Boolean circuit complexity.

Def. Alg A decides f with s(n) advice if
for all n, there is an a_n in {0,1}^{s(n)} such that 
for all x in {0,1}^n, f(x) = A(x,a_n).

Note: this definition looks similar to NP, but there is a very important difference! 
Namely, the advice a_n is given *prior* to seeing the input. 

Think of A = interpreter or universal simulator, and a_n = program to run on n-bit inputs. 
Then a "program" in this model is an infinite sequence of strings {a_n}. 
Again we have "infinite" program length, a non-uniform model.

Def. P/s(n) = decision problems solvable by ptime algs with s(n) bits of advice.

P/poly := union_c P/n^c.

Thm: P/poly = PSIZE

Proof Idea: 
-- P/poly in PSIZE: 
Take P-time algorithm with poly(n) advice, convert algorithm to circuit, hard-code poly(n) advice into the input of this circuit.
-- PSIZE in P/poly: 
Let advice a_n be encoding of circuit, and let P-time alg be a circuit evaluation alg. 

Obs: There are undecidable problems in P/1.

====
Sigma2 P and Circuit Lower Bounds

First, we give a weak circuit lower bound:

Thm [Fixed-Poly Lower Bound]: For all k, there's f_k in P^{Sigma_2 P} s.t. f not in SIZE(n^k).

Note: P^{Sigma2 P} is contained in Sigma3 P = NP^{Sigma2 P}.
This says that there is no universal k such that everything in P^{Sigma2 P} has n^k-size circuits.
If this had been *true* it would mean that even if a problem in P^{Sigma2 P} took n^{k^k^k^k^k} time to solve,
it would still have a circuit family where each circuit is at most n^k size. 
This would be a kind of "compression" of computation.

Pf: We simply "scale down" our argument which showed that E^{Sigma2 P} has functions of max circuit complexity.

Let m be a natural number. Last class, we showed how to find a function g : {0,1}^m -> {0,1} with maximum circuit complexity (e.g., > 2^m/(2m)) in time 2^{O(m)} with a Sigma2 P oracle (the function COMPLEX).

Fix k. Here's a P^{Sigma2 P} function f:

On input x of length n,
Set m := (k+1)*log(n). 
If x is not of the form 0^{n-m}y, reject. 
In n^{O(k)} time, use an oracle for COMPLEX to construct a hard function g on m inputs.
Output g(y).

Let C_n be any circuit for f on n-bit inputs. 
From C_n, we can easily get a circuit C'_m for g on m-bit inputs:
  C'_m(y) := C_n(0^{n-m}y). (hard-code n-m inputs to be 0)
So size(C_n) = size(C'_m) >= n^{k+1}/(2 (k+1) log n) > n^k for large n.

Therefore f is not in SIZE(n^k).
QED

We can reduce the complexity of the hard function above by being more clever.
However our argument will be somewhat indirect.
Suppose NP is not in P/poly. Then we get SAT does not have n^k size circuits for all k.
What if NP is in P/poly? What interesting consequences do we get then?

Thm [K-L'80] NP in P/poly => Sigma_3 P = Sigma_2 P.

(In fact the hypothesis implies poly-time hierarchy collapses: PH = union_k Sigma_k P is contained in Sigma_2 P.
I am writing it this way to hopefully reduce the notation that grad students need to know.)

Pf. Every f in Sigma_3 P has the form:

for all x, f(x) = 1 <=> (exists^p y_0)(forall^p y_1)(exists^p y_2)[A(x,y_0,y_1,y_2) accepts].

Define B(x,y_0,y_1) := (exists^p y_2)[A(x,y_0,y_1,y_2) accepts]. 
Note B is in NP.

Assume NP in P/poly. 
Then SAT is in P/poly, and by a standard reduction from search-to-decision, 
there are polynomial-size circuits {C_n} which given any SAT formula F will *output* a satisfying assignment for F.

Let m = poly(n) for a large enough polynomial.
The key is to guess a SAT circuit C_m *before* the "forall y_1" quantifier. 
We won't need the "exists y_2" quantifier any more,
because we can use our SAT circuit C_m to print a string y_2 for A(x,y_0,y_1,y_2).

Let R be a ptime reduction from B to SAT. Then

f(x) = 1 <=> (exists^p y_0)(there's a psize C_m)(forall y_1)[A(x,y_0,y_1,C_m(R(x,y_0,y_1))) accepts].

Therefore f is in Sigma_2 P. 
QED

Can use K-L to improve the weak circuit lower bound! 

Thm [Kannan'81]: For all k, there's f in Sigma_2 P s.t. f not in SIZE(n^k).

Pf. (1) NP is not in P/poly => set f = SAT.
	(2) NP is in P/poly => P^{Sigma2 P} is in Sigma3 P = Sigma_2 P by K-L thm.
			=> there's f in Sigma_2 P not in SIZE(n^k), by Fixed-Poly Lower Bound thm. 
QED

Note: Kannan's theorem is non-constructive! It does not give an explicit function f_k.
(But there are ways to get one...Cai-Watanabe)

Cor: [Kannan] Sigma2 TIME[2^n] is not in P/poly.
Pf. If Sigma2 TIME[2^n] is in P/poly, then NP is in P/poly.
By K-L, Sigma3 P = Sigma 2 P. 
By padding, Sigma3 EXP = Sigma2 EXP. 
But E^{Sigma2 P} is in Sigma3 EXP, and has functions of *maximum* circuit complexity (last lecture). 
This is a contradiction.
QED

Can improve the "P/poly" to "SIZE(s(n))" for any function s(n) such that s(s(n)) << 2^n. 

OPEN: Sigma2 TIME(2^{O(n)}) subset of SIZE(2^{o(n)})?

In general, improved Karp-Lipton theorems imply improved circuit lower bounds:

Corollary:
If (NP in P/poly => PH = C) then for all k, C isn't in SIZE(n^k)

[[KNOWN: Can put C = ZPP^{NP}, "zero-error randomized P with NP oracle".
This uses algorithms (that have an NP oracle) for learning a circuit for a function from black-box queries.
So if SAT were truly "easy in practice" then we'd be able to learn arbitrary functions that have small circuits,
with only black-box queries to a SAT circuit.
Bshouty, Cleve, Galvada, Kannan, Tamon 1996
]]

OPEN: Is P^{NP} in SIZE(O(n))? (Recall NP in P^{NP} in Sigma_2 P)

Do circuit lower bounds imply Karp-Lipton theorems? 
It seems the answer would be obviously no: one could imagine applying hard-core combinatorics to prove e.g. P^{NP} not in SIZE(O(n)), and not care at all about whether the poly-time hierarchy collapses.

In ongoing work with Cody, we can in fact prove that an equivalence between certain lower bounds and Karp-Lipton theorems. (If you are interested, we can talk later.)
Examples: 
- P^{NP} not in SIZE(n^k) <=> (NP in P/poly => PH is in io-P^{NP}/n)
- NP not in SIZE(n^k) => (PP in P/poly => PP is in io-NP/n)

Conjecture: P is not in SIZE(O(n)). 
If conjecture is false, that would mean the poly-size circuits for every poly-time computation can be 
"compressed" to equivalent circuits of only *linear* size!

Best known circuit size lower bound for a function in P:
There is f in P s.t. f not in SIZE(5n). <-- AND/OR/NOT gates.
There is f in P s.t. f not in SIZE(3.01n). <-- AND/OR/NOT/XOR gates.

====
Merlin-Arthur and Circuit Lower Bounds.
(This part requires knowing some more complexity theory; 
if you don't get it right now, that's OK, can read more in Arora and Barak.)

Now we'll prove a fixed poly-size lower bound for an apparently even easier complexity class.
Lower bound is due to Santhanam (2007).

Def. Merlin-Arthur (MA) 
f in MA if there is a polytime V such that:
If f(x) = 1, then there is psize y such that Pr_z[V(x,y,z) accepts] = 1.
If f(x) = 0, then for all psize y, Pr_z[V(x,y,z) accepts] < 1/10.
(where the z are also psize)

Probabilistic version of NP: have a randomized verification algorithm. 
If f(x)=1 and it is faced with a "good proof" the verifier V always accepts. 
If f(x)=0 and it is faced with a bad proof, it has some small chance of error.

(Note: if f(x)=1 but we give V a bad proof, there is no guarantee on the success probability...)

Obs: MA is in Sigma2 P.

Lemma [BFNW'93] If PSPACE is in P/poly, then PSPACE = MA.

Stronger hypothesis than Karp-Lipton, and stronger collapse than Karp-Lipton.

This can be proved using interactive proofs, and the fact that IP=PSPACE.

We'll use an "alternative" to IP=PSPACE [Shamir'92] 
due to [Trevisan-Vadhan02] and [Santhanam07], because it will be necessary later on:

Thm: There is a PSPACE-complete function g such that, on input of length n, 
there is a ptime algorithm V^g with the properties:
- V^g(x,r) only makes oracle calls of length |x| to g. (important later!)
- If g(x) = 1 then Pr_r[V^g(x,r) accepts] = 1
- If g(x) = 0 then for all g', Pr_r[V^{g'}(x,r) accepts] < 1/10.

[[Notes: Without first property, the theorem is true for all PSPACE-complete and EXP-complete functions.
It's equivalent to having a MIP (Multi-Party Interactive Proof System) for the function, 
where the prover's strategy is implementable in the same complexity class.
There is an E-complete function that makes oracle calls of length O(|x|). See Trevisan-Vadhan02.
But to get *exactly* length |x|, Santhanam uses downward-self-reducibility of PSPACE-complete problems,
which cannot be true of EXP-complete problems unless EXP = PSPACE.]]

Sometimes this condition is called being "randomly self-checkable" or "randomly self-reducible".
Unfortunately, we have to leave this theorem as a black-box... sorry!
It builds on the proof of IP=PSPACE, the function g is essentially QBF encoded in an interesting way.
It makes the above Lemma very easy to prove...

Proof of Lemma: 
Let g be the above PSPACE-complete function. It suffices to show g in P/poly => g is in MA. 
Let V^g be the poly-time verifier above. 

Assuming g in P/poly, let {C_n} be a poly-size family for g.
We compute g by the following procedure:

On input x,
1. Guess circuits C_{|x|}, intended to be a circuit for g.
2. Run V^{C_{|x|}(x,r) with a random string r, and output its answer.

Claim: This is a Merlin-Arthur protocol for g! 
Merlin's message: the circuit C_{|x|}. The rest is a randomized computation. By properties of TV'02:
If g(x) = 1, then there is a C such that Pr[V^C(x,r) accepts] = 1
If g(x) = 0, then for all C, Pr[V^C(x,r) accepts] < 1/10.
QED

We can use this collapse to prove two more circuit lower bounds.
One proof is easy and one proof is not so easy.

Theorem [BFT'98] MAEXP (exponential-time MA) is not in P/poly.
Pf. Suppose MAEXP is in P/poly. 
Then PSPACE is in EXP is in P/poly, so by BFNW, PSPACE = MA.
By padding, EXPSPACE = MAEXP. 
But there are functions in EXPSPACE of *maximum* circuit complexity, so this is a contradiction.
QED


Theorem [Santhanam'07] MA/1 is not in SIZE(n^k) for all k.
Proof.
We try to mimic Kannan's lower bound. It had two cases, NP in P/poly and NP not in P/poly. 
But we don't know if NP in P/poly implies Sigma3 P = MA.
So we use BFNW instead.

-- PSPACE in P/poly: 
BFNW => PSPACE = MA, and Sigma2 P is in PSPACE = MA, so MA is not in SIZE(n^k).

-- PSPACE not in P/poly:
In this case, we design an f_k that ``pads'' the input just enough, so that 
(a) f_k in MA/1:
there's so much padding that we can guess a circuit encoding the ideal prover for the IP of a PSPACE-complete language, and verify it in randomized polytime. The advice tells us if we have enough padding.
(b) f_k not in SIZE(n^k).

Let k > 0 be fixed. Let g be the PSPACE-complete function from TV'02.
Let C(n) be the circuit complexity of g on inputs of length n. 
By our assumption (PSPACE not in P/poly), it must be that for all k, 
C(n) > n^k for infinitely many n.

Key Observations: 
(1) g has a poly(C(n))-time MA protocol, ***if we knew C(n)***.
(On input of length n, guess a circuit for g_n of size C(n), use randomness and ptime V to verify.
Here it is important that in order to verify inputs of length n, 
V only makes oracle calls to inputs of length n, and not poly longer!
Otherwise, we would only have a poly(C(n^c)) time protocol for some c > 1, and the below will not work.)

(2) For all n, g_n has no circuit of size less than C(n).

If C(n) were time-constructible (we could efficiently compute it) 
we would then have MATIME(poly(C(n))) not in SIZE(C(n)). 
But it may not be! The circuit complexity of g may behave very badly! 

Our idea is to make an f_k with *padded* input, 
so that it runs in polynomial MA time, but does not have n^k-size circuits.
Since we don't know how to compute C(n) efficiently, we don't know how much to pad, exactly! 
Our extra bit of advice will take care of this, telling us when we have enough padding.

Define 
f_k(y) := {1 if y = x1^{2^L}, g(x) = 1, and 
				[for L s.t. 2^{L+1} > |y| >= 2^L, 2^L >= (C(n)^{1/k}-n)/2 > n where n=|y|-2^L]
        = {0 otherwise.
		
(Note: 2^{L+1} > |y| >= 2^L just means that 2^L is the largest power of two that's at most |y|.)
		
Let (*) be the condition [for L s.t. 2^L >= (C(n)^{1/k}-n)/2 > n, 2^L >= |y| > 2^{L-1}, n = |y|-2^L].
		
Key points:
-- Condition (*) can be determined solely from the input length |y|.
In particular, we parse |y| into 2^L and n as follows: 
put |y| in binary, 2^L is the high-order bit of |y|, rest of the bits of |y| give n. 
From those numbers we can check condition (*).

-- f_k computes g on infinitely many input lengths, when given enough 2^L padding.
In particular, there are infinitely many n such that condition (*) holds for some |y|=2^L+n.
We can always set L to be large enough; note
 there are infinitely many n such that (C(n)^{1/k}-n)/2 > n
<=> there are infinitely many n such that C(n) > (3n)^k (which is true)

Now we show that f_k has the desired properties.

--- f_k in MA/1: 
Advice bit on n: says if n=|y| satisfies condition (*) or not.
Note this one bit may be very hard to compute, but we don't care... it is advice!

Given y=x1^{2^L}, if the advice bit is 0 then reject. (not enough padding)
Otherwise, guess a circuit C of size O(|y|^{2k}) for g on |x| inputs, 
then use V^C to compute g(x) in poly(|y|^{2k}) time.  
 
--- f_k not in SIZE(n^k):
Assume the opposite. We'll use a circuit for f_k to get a contradiction: 
for inf many n, we'll get a circuit for g on inputs of length n with size less than C(n).

Let n be an input length where (C(n)^{1/k}-n)/2 > n (recall there are infinitely many).
Suppose on all inputs x of length n, we pad by "just enough" 2^L ones, such that 

 2^L >= (C(n)^{1/k}-n)/2 > 2^{L-1}.
 
Then f_k(x01^{2^L}) computes g(x) correctly on all x of length n (2^L is long enough), 
but the input length to f_k(x01^{2^L}) is only

  2^L + n < (C(n)^{1/k} - n) + n = C(n)^{1/k}.
  
Assuming f_k has N^k size circuits on inputs of length N, 
we therefore have a circuit for g on inputs of length n, of size at most (2^L + n)^k  < C(n). 

This is a contradiction, as C(n) is the min-size of a circuit for g.
QED

OPEN: Is MA in SIZE(O(n)) (with NO advice)??