CS294-152 Lower Bounds
11/9/18 (special extra lecture)
===
Announcements:

*** Reminder: Project progress reports due next week (Monday)

Today: Natural Proofs 

Way back in the first lecture of this course, we spoke about the relativization barrier to proving lower bounds (proposed 40 years ago). 

Today we'll discuss another (more modern) complexity barrier. 
It was posed over 20 years ago, but it is still extremely relevant to current research!

Natural Proofs: Why many circuit lower bounds are difficult to prove
[Razborov-Rudich'94]
What are common properties of circuit LB proofs?

Think of a function f : {0,1}* -> {0,1}
as an infinite family of functions {f_n : {0,1}^n -> {0,1}}

(0) Identify P : Boolean functions -> {True,False}

Examples:
"f is non-degenerate"
 (depends on all its variables)

"f is an affine disperser for dimension 10O*log(n)"
 (non-constant on all affine subspaces of dimension >= 100*log(n))

"f doesn't become trivial under a restriction to many of its variables"
Used to show PARITY not in AC0

"f is versatile: *many* functions g can be written as g = p_1*f + p_2 where the p_i's are degree <= n/2 polynomials"
Used to show MAJORITY not in AC0[2]

(1) Show P is not too hard to check:
There is an algorithm implementing P (outputs 1 when P is True, and 0 when P is False)
 which takes poly(2^n) time on inputs of length 2^n.

(2) Show for all f={f_n : {0,1}^n->{0,1}},
if P(f_n) = True for almost all (or infinitely many) n, then there is a circuit lower bound for f

(3) Show Pr_{f_n : {0,1}^n -> {0,1}}[P(f) = True] >= 1/2^n
The probability a random f_n is true is non-negligible.

Why (3)? Well, (3) should be true of any property P that's correlated with high circuit complexity
(it's True when circuit complexity is high),
because the vast majority of Boolean functions have > 2^n/n^2 circuit complexity.

Def: Algorithm A : {0,1}^* -> {0,1} is a *natural property* if
(Constructive) For all n, for all f_n in {0,1}^{2^n},
            A(f_n) runs in poly(2^n) time.
(Large) For all n, 
           Pr_{f_n in {0,1}^{2^n}}[A(f_n) = 1] >= 1/2^n.

So A is natural if it's a poly-time algorithm which outputs 1 a non-negligible fraction of the time.

Def: A is *useful* against a circuit class C (AC0,AC0[m],P/poly) if
for all f in C, there are infinitely many n such that:
- A(g_n) = 1 for some g_n : {0,1}^{2^n}
- A(f_n) = 0 (thinking of f_n as a 2^n-bit string)

(Sometimes also say: A is useful against g)

So A is useful if it "distinguishes" between all f in C and some function g, on infinitely many n.

Recall MCSP: Given (f,s), does f has circuit complexity <= s?

Intuition: A is P-natural and useful against size s
<=> A is an "average-case" ptime algorithm for MCSP with size parameter s. 

***
Example: Non-degeneracy is natural. 
(Recall we used this to prove C_{B_2}(XOR) >= n-1.)

Given f:{0,1}^n -> {0,1}, want to check for all i in [n], there's an a in {0,1}^n such that
 f(a_1,...,a_{i-1},0,a_{i+1},...,a_n) != f(a_1,...,a_{i-1},1,a_{i+1},...,a_n).
 
Constructive: Given f, can check this in O(n*2^n) time! 
For each i, check all 2^n possible a's, probing the truth table in two places.  

Largeness: The probability f(x) = f(y) for any x != y is only 1/2.
So
Pr_f[f degenerate] <= sum_{i=1}^n Pr_f[for all a, f(a) = f(a^{-i})] <= O(n/2^{2^n}).

Example: The Restriction Method is natural.

P(f_n) = "for all possible restrictions rho to subsets of <= n - 2*log(n) variables, f|_{rho} != constant"
This is true of PARITY, for example, but not true of small AC0 circuits!

Theorem: This P is natural.

Constructive: Here's an algorithm.

Given f as a 2^n truth table,
 Try all ways to pick a restriction rho on i variables:
 Choose a subset of i <= n-2*log(n) variables, and an assignment to those i variables. 
 Output 0 <=> f is constant on some rho.
 
Only takes time O(3^n*2^n): 
-- 3^n is an upper-bound on the number of ways to choose a partial assignment to variables 
(number of ways to choose a string of length n from the alphabet {0,1,?}.)
-- takes 2^n time to check whether the 2^{n-i}-length substring defined by rho is all-ones or all-zeroes

Large: Suppose P(f_n) = False. Then f|_{rho} = constant for some rho.

Then I claim we can encode f in <= 2n + 1 + 2^n - n^2 bits:
Write down log_2(3^n) <= 2n bits to encode a restriction rho on <= n-2*log(n) variables, 
such that f|_{rho} = constant.
Write down more one bit for the constant that f equals on rho.
That already covers n^2 bits of f. Then write down the remaining 2^n-n^2 bits of f.

This uniquely encodes any function f such that f|_{rho} is constant for some rho.

Therefore there are <= 2^{2n+1+2^n-n^2} functions on which P is false.
But there are 2^{2^n} total functions, so P is false on a random function with probability only 
<= 2^{2n+1+2^n-n^2}/2^{2^n} = 2^{2n+1}/2^{n^2} = o(1). P is true with very high probability!
QED

Similarly, testing for an affine disperser is natural.

Testing versatility of a function (which we did in R-S) is also natural.

Thm (Informally) [RR94] 
There is a poly(2^n)-time algorithm that, given a 2^n truth table, 
outputs 1 on random inputs whp,
outputs 0 on all f such that at most 2^{3/4 * 2^n} functions g : {0,1}^n -> {0,1}
can be expressed as p_1 + p_2*g where p_i are degree <= n/2.

(==> outputs 0 on all AC0[2] circuits of poly(n) size) 

The big theorem of Razborov-Rudich is:

Thm[RR94] Suppose there is an N^{polylog N}-time natural property useful against P/poly (or even TC0)
Then there are no PRFs implementable in P/poly (or poly-size TC0, constant-depth MAJORITY+NOT circuits)
(Translation: No exponentially-secure one-way functions, by other works.)

Contrapositive: 
If we "believe" in crypto, then no natural properties can prove lower bounds against TC0! 

Thm[NR'04] Assuming Integer Factoring is not in 2^{n^{eps}} time for some eps > 0,
there are no natural properties useful against TC0.

Natural properties *cannot* prove strong lower bounds without breaking the possibility of strong cryptography!

Intuition: (Worst-case) lower bounds based on natural properties are "self-defeating": 
    they rule out stronger average-case lower bounds (which we also believe should hold) 

So we can't apply conditions like versatility, restrictions, degeneracy, etc. even for TC0 lower bounds.

Proof of RR94: 

First let's define what we mean by a PRF.

Def. eps-PRF is a polytime alg F(r,x) which takes two inputs of n bits, such that 
for all probabilistic 2^{O(n^{eps})}-time algorithms D with oracle access to an f : {0,1}^n -> {0,1},
"F looks like a random function to D":
(*) |Pr_r[D^{F(r,*)}(1^n) = 1] - Pr_g[D^g(1^n) = 1]| <= 1/2^{n^{eps}}. 
"D can't distinguish between completely random g (with 2^n bits of randomness) and F with only n bits of randomness"

[Note: It is widely believed that such PRFs exist for fixed eps > 0. 
E.g. assuming Factoring takes 2^{n^{alpha}} size circuits for some alpha > 0, such PRFs exist.]

Fix an eps > 0. Suppose there's an N^{log N}-time natural property A useful against SIZE(n^c) with c >= 1.
We use A to define a distinguisher D_{eps} that can tell all such F apart from uniform random g:

D_{eps}^f(1^n): Query f on 0^{n-n^{eps}}y for all y in {0,1}^{n^{eps}}. 
			Form a truth table of a function f' : {0,1}^{n^{eps}} -> {0,1}.
			Output A(f').
			
(1) D_{eps} runs in poly(2^{n^{2*eps}}) time 
 because we run A on 2^{n^{eps}}-length input, and A runs in N^{log N} time.
(2) Suppose F(r,x) runs in n^{c/2} time. Then F has O(n^c)-size circuits.
 Consider D_{eps}^(F(r,*))(1^n). 
 So the f' constructed would have O(n^c)-size circuits (since f'(y) = F(r,0^{n-n^{eps}}y).
 A is useful against SIZE(n^c) => A(f') = 0.
 Therefore Pr_r[D_{eps}^(F(r,*))(1^n) = 1] = 0.
(3) For random g, Pr_g[D_{eps}^g(1^n) = 1] = Pr_{g'}[A(g') = 1] >= 1/2^{n^{eps}}.

Thus for every eps > 0, for every c >= 1, and for every n^{c/2}-time F, 
there is a D running in 2^{O(n^{eps})} time such that (*) does *NOT* hold.  
That is, for every eps > 0, there is no eps-PRF as described above.
QED

***

How to get around natural properties?

Usefulness is necessary. We must find proofs avoiding at least one of Constructivity or Largeness. But which? 

Heuristic arguments.

Why We Should Avoid Largeness: “Specific functions are hard for special reasons. We must find properties that are true of specific functions, but fail to work for random (hard) functions.”

Why We Should Avoid Constructivity: “P != NP is hard to prove because human reasoning is polynomial-time. Somehow we must ‘go beyond’ such primitive reasoning.”

One of these heuristics is not quite right... 

MORAL: AVOID BEING LARGE!
(Good health advice as well...)

Thm [W'13/'16] NEXP is not in P/poly <=> there is a constructive and useful property against P/poly

In fact, for many circuit classes C
Thm [W'13/'16] NEXP is not in C <=> there is a constructive and useful property against C

That is, there's a poly-time algorithm which can distinguish "some" function from all P/poly functions, but does not necessarily work on a random function. So the way to prove NEXP not in P/poly is to find a property that's actually false of random functions but happens to work on some particular cooked-up functions.

Future progress in circuit complexity is equivalent to finding a property that satisfies 2 of the 3 axioms in Natural Proofs. WLOG, we may as well try to avoid largeness! (Because our proof of NEXP not in P/poly will imply a non-large constructive useful property.)

Proof Idea: Expand on the easy witness lemma for NEXP.

(*)
NEXP in P/poly
 <=>
 For every NTIME[2^n] verifier V, there's a c such that for almost all YES instances x for V, 
 there's an n^c circuit family encoding an accepting witness for V on x.

(Earlier, we claimed the => direction without proof, but also the <= direction is true!
That is, "easy witnesses" for all NEXP verifiers imply that NEXP problems have small circuits!) 
 
Negating this,

NEXP not in P/poly
 <=>
 For some NTIME[2^n] verifier V, for all c, and infinitely many "bad" YES instances x_n,
 every n^c-size circuit family *fails* to encoding an accepting witness for V on x_n.

Imagine we "knew" these bad x_n's: we could compute them efficiently. 
Then we could get a constructive and useful property that works for infinitely many input lengths.
Our property P will take O(2^n)-length strings y:

P(y) = True <=> V(x_n,y) accepts.

Now observe:
-- P is constructive: It runs in time O(|y|), linear in the input y. 
-- Useful: For all c, and all size-n^c circuits C,
  there are infinitely many n such that V(x_n,tt(C)) rejects, because C fails to encode a witness.

To "know" these x_n's, we could do various things (which I won't go into). 
- We could hold them as "advice" for inputs of length |y|.
- We could use the fact that |y| is exponentially larger than x_n, and encode x_n in the input length itself (by expanding the input length by some artificial amount).

This shows NEXP not in P/poly => there's a constructive useful property against P/poly.

To go the other direction, we show how any constructive useful property could be naturally turned into an NEXP verfier which doesn't have polynomial-size witness circuits. 
This bad verifier can then be used to prove NEXP not in P/poly by referring to the equivalence (*).
 
End of Proof Idea.
 
Q: Does the natural proofs barrier apply to results like NEXP not in AC0[6]?
Not clear!

OPEN: Is there a natural property useful against AC0[6]?