\documentclass[11pt]{article}
\usepackage{amsfonts}
\newtheorem{define}{Definition}
%\usepackage{psfig}
\oddsidemargin=0.15in
\evensidemargin=0.15in
\topmargin=-.5in
\textheight=9in
\textwidth=6.25in
\newcommand{\AC}{\mbox{AC}}
\begin{document}
\input{preamble.tex}
\lecture{5}{February 22, 2005}{Madhu Sudan}{Alissa Reyzin}
During the last lecture we used the Switching Lemma to show that PARITY is not
in $\AC_0$. This time we show a different proof of this fact that is due to
Razborov and Smolensky ('87). As a reminder, we defined $\AC_0$ as the class of
constant depth, polynomial size circuits that have unbounded fan-in for the OR and
AND gates.
\\\\
\begin{proof-idea}
Assume that some circuit is too complex to analyze directly. Then we can try to
approximate the function computed by that circuit by replacing the AND and OR
gates by some approximations, which creates a simpler circuit. We want to show
that parity is complex even to approximate. We now need to choose appropriate
definitions of simplicity, complexity and approximation to make this proof
idea work.
We want to define a simple function as one that can be computed by a low degree
polynomial and a complex function as one that requires a high degree
polynomial. That is, we think of arithmetic operations as being the gates in
our circuit, rather than the standard AND and OR gates.
We first try to work with polynomials over GF(2) = $\{0,1\}$. This gives us
\begin{quote}
AND(x,y) = $xy$ $|$ AND($x_1, x_2 ... x_n) = x_1x_2...x_n$
PARITY(x,y) = $x + y$ $|$ PARITY$(x_1, x_2... x_n) = x_1+x_2 + ... + x_n$
\end{quote}
However, that means that PARITY is a degree one polynomial! Since we are
looking to show that PARITY is complex, using degree of polynomials in
$\mathbb{Z}_2 $ will not work.
What if we tried using a field of three elements so that everything wasn't
automatically taken modulo two? Let's use GF(3) =
$\{-1, 0, +1\}$. (We could just as well use $\{0, 1, 2\}$ but this is less
convenient for our purposes). Now we have
\[\mbox{PARITY}(x_1, x_2... x_n) = \prod_{i=1}^n x_i\] where $x_i \in \{1, -1\}$.
which gives us PARITY as a high degree function. Unfortunately, if simply use
the degree of the corresponding polynomial, then AND and OR are also
``complex''. But since AND and OR can both be computed by a single gate, this
does not provide us with a useful notion of complexity. Therefore we will need
to find some way of approximating AND and OR with lower degree polynomials.
\end{proof-idea}
\begin{lemma}
If $f: \{0, 1\}^n \rightarrow \{0, 1\}$ is computed by a circuit with depth
$d$ and size $s$, then there exists a set $S \subseteq \{0, 1\}^n$
such that $|S| \geq \frac{3}{4} 2^n$ and a polynomial $p$ over
GF(3) of degree $(\log s)^{\bigO(d)}$ such that for every $(x_1...x_n) \in
S$,
$f(x_1... x_n) = p(x_1...x_n)$
\end{lemma}
In essence, this lemma states that all $\AC^0 $ circuits are approximated by
by simple functions. That is, there is a low degree polynomial which will get
the same answer as the original circuit for $\frac{3}{4}$ of the inputs. The
number $\frac{3}{4}$ is important here. For example,
if we stated that we could find the correct output for $\frac{1}{2}$ of the
inputs, the statement would be meaningless, since the constant 0 or 1
functions would do this.
\begin{lemma}
If there exists a degree $D$ polynomial GF$(3)^n \rightarrow $GF(3) and a set
$S \subseteq \{0, 1\}^n$ such that $p(x) = \mbox{PARITY}(x)$ for all $x \in S$
then every boolean function $g : S \rightarrow \{0, 1\}$ is computed by a
degree $\frac{n}{2} + D$ multilinear polynomial over GF(3).
\end{lemma}
\begin{lemma}
Any set of functions generating all functions from $S$ to $\{0, 1\}$ must
have cardinality $\geq |S|$.
\end{lemma}
\begin{proofof}{Theorem}
We assume, for the sake of contradiction, that PARITY has a depth $D$ size
$s$ circuit.
\begin{itemize}
\item By Lemma 1, we can compute PARITY on a set $S$ such that $|S|
\geq \frac{3}{4} 2^n$ using a polynomial of degree $(\log s)^{\bigO(D)}$.
\item By Lemma 4, we know that for every boolean function on the
set $S$ there is polynomial of degree $\frac{n}{2}+(\log s)^{\bigO(D)}$.
\item We know that all polynomials are just linear combinations of monomials.
But we also no that there is a limited number of monomials of degree $\leq
\frac{n}{2} +D$. More precisely can bound the number of monomials $N$ as:
\begin{eqnarray*}
N \leq \sum_{i=0}^{\frac{n}{2}+D} \left(\begin{array}{c} n \\ i \end{array}\right)\\
N \leq \sum_{i=0}^{\frac{n}{2}} \left(\begin{array}{c} n \\
i \end{array}\right)
+\sum_{i=\frac{n}{2}+1}^{\frac{n}{2}+D}\left(\begin{array}{c} n \\ i
\end{array}\right) \\
N \leq 2^{n-1} + D\left(\frac{2^n}{\sqrt n}\right)
\end{eqnarray*}
We know therefore, by Lemma 3, that $2^{n-1} + D\left(\frac{2^n}{\sqrt
n}\right) \geq |S| \geq \frac{3}{4}2^n$. If $|S|$ were less than
exponential($n^{\bigO(d)})$, then we would have $N<\frac{3}{4}2^n$, which is
impossible according to the above. So, there is no circuit of polynomial
size and constant depth that calculates PARITY.
\end{itemize}
\end{proofof}
\begin{proof-of-lemma}{3}
We view the set of functions $f: S \rightarrow{0, 1}$ as vectors of size $|S|$
over $\{0, 1\}$. Suppose that vectors $f_1$ through $f_k$ can generate every
boolean function. Then, in particular, they can generate all the unit
functions $\{\delta_x\}_{x \in S}$ where $\delta_x(y) = 1$ if and only if $x
= y$. Clearly, the vectors corresponding to these functions are all linearly
independent of each other. Therefore, we have $k \geq |S|$, and we are done.
\end{proof-of-lemma}
\begin{proof-of-lemma}{2}
The idea of this proof is to switch back and forth between functions of
$\{0, 1\}$ and functions on $\{1, -1\}$. So, given a function $f : \{0,
1\}^n \rightarrow \{0, 1\}$ we define $\hat{f} : \{1, -1\}^n \rightarrow
\{1, -1\}$ as the translation of $f$.
\[\hat{f}(y_1... y_n) = -1 \mbox{ iff } f(x_1...x_n) = 1 \mbox{ where } y_i =
-1 \Leftrightarrow x_i =1\]
We claim that the degrees of $f$ and $\hat{f}$ will be the same since the
mapping between the two functions is linear. We have:
\begin{eqnarray*}
\hat{f} = 1 - 2f \\
y_i = 1 - 2x_i \\
\hat{f}(y_1, y_2 ... y_n) = 1-2f(\frac{1-y_1}{2}, \frac{1-y_2}{2} ...
\frac{1-y_n}{2}) \\
\mbox{deg}(f) = \mbox{deg}(\hat{f})
\end{eqnarray*}
We define PARITY$(x_1...x_n) = \sum x_i \bmod 2$ and
$\widehat{\mbox{PARITY}}(x_1...x_n) = \prod x_i$
Assume that PARITY can be approximated by a low degree polynomial. More
formally, we let $T \subseteq \{-1, 1\}^n$ such that $\widehat{\mbox{PARITY}}$
is equal to $q(y_1...y_n)$ on $T$ and the degree of $q \leq D$. We take
$\hat{f}(y_1...y_n)$ to be some mapping $T \rightarrow \{ 1, -1\}$. We know
that we can express $\hat{f}$ as a polynomial $p(y)$, and therefore as a summation
of monomials. We separate the monomials into the sets $A$ and $B$ where
the terms in $A$ have total degree less than $\frac{n}{2}$ and those in $B$
have degree greater than or equal to $\frac{n}{2}$. Then let
\begin{eqnarray*}
p_1(y_1...y_n) = \sum_i \alpha_iA_i \\ p_2(y_1...y_n) = \sum_i \beta_jB_j
\end{eqnarray*}
We let $C_j = \prod_{i=1}^n y_i/B_j$ We then know that each monomial $B_j
={\mbox{PARITY}}(y_1... y_n) C_j$. Since this is the case, we get:
\begin{eqnarray*}
p_3(y_1...y_n) = \sum_{i=1}^n \beta_j C_j \\
\hat{f}(y_1...y_n) = p_1(y_1...y_n) + \widehat{\mbox{PARITY}}(y_1... y_n)*p_3(y_1...y_n)
\end{eqnarray*}
Since both $p_1$ and $p_3$ are polynomials of degree no more than $n/2$
and $\widehat{\mbox{PARITY}}$ has degree $D$, the total degree of
$\hat{f}(x)$ can be no more than $\frac{n}{2}+D$ and therefore, the
degree of $f(x)$ can not be more than $\frac{n}{2}+D$ either.
\end{proof-of-lemma}
\begin{proof-of-lemma}{1}
The idea of this proof is to probabilistically replace every gate by a low
degree polynomial that gives the correct output almost all of the time.
This will give us a circuit of low degree overall which will still output
the correct result with high probability. Here is the proof summary:
\begin{itemize}
\item For the sake of simplicity we can assume that our circuit contains
no AND gates. This can be done with no loss of generality, since using
DeMorgan's law we can convert any circuit containing AND gates to one
containing only OR and NOT gates with only a constant factor increase in
size and depth.
\item We need to create low degree polynomials that approximate the NOT
and OR gates. For the NOT gate, we simply use $p = 1-x$ which clearly
gives the correct value when $x =1 $ or $ x = 0$. We need not worry
about $x = -1$.
\item For each OR gate, we need to randomly generate some polynomial of
degree $\log s$ that will calculate the correct value most of the time.
While simply using the constant 1 would be a reasonable approximation at
the base level of our circuit, this becomes unreasonable as we look
higher up in the circuit and the distribution of inputs becomes
non-uniform.
\item We then show that for each gate that we replace by a polynomial,
the probability that the polynomial computes the correct value is at
least $1-\frac{1}{4s}$.
\item From this, we can use the Union bound to demonstrate that since the
probability of an error in each gate is at most $\frac{1}{4s}$, the
probability of an error in the entire circuit is no more that
$\frac{1}{4}$ and therefore our approximation works for at least
$\frac{3}{4}$ of the inputs.
\item We can then show that the total degree of our approximation
polynomial is no more than $\log(s)^{\bigO(d)}$.
\end{itemize}
We now describe the resulting polynomial that represents the circuit. (This
section is taken verbatim from a previous year's lecture notes).
In order to see what the function computed from the ``poly-replaced'' circuit
looks like, we notice that throughout the circuit we are replacing OR gates
(with fan-in of size $k$) with polynomials of degree $\log s$ (in $x_1,
x_2...x_k$). At the lowest level of the circuits, where all the inputs to the
gates (polynomials) are constants in $\{0, 1\}$, we have polynomials of degree
$\log s$. At the second level, where the inputs to the polynomials are
themselves polynomials of degree $\log s$, we will have polynomials of degree
$\log^2 s$. In general, if we have polynomials of degree $d_1$ and all its
inputs are polynomials of degree $d_2$, the output will be a polynomial of
degree $d_1d_2$ (can be proved using induction). So continuing all the way to
depth d, we see that the resulting polynomial (which computes the output of
the circuit) has degree $\leq(\log s)^d$.
This polynomial will not be computing the correct value for all inputs.
However, for a fixed input and an \emph{independent}, random choice of
polynomials to replace the OR gates, we will show that the probability that a
(replaced) gate computes the wrong result is at most $\frac{1}{4s}$. By the
union bound, we get the right values throughout the circuit with probability
$\geq 3/4$. Notice that this requires that the choice of polynomials be made
\emph{independently} of the choice of inputs (i.e. the polynomials were not
chosen to suit the input, or the other way around).
Since for a particular input we have a $3/4$ probability of getting the right
result when the replacing polynomials (not necessarily the same for all
gates) which produces the correct result for at least $3/4$ of all possible
inputs to the circuit.
We now need only to find this $\log s -$degree polynomials to construct our
OR gates. One way to approximate an OR gate is by considering the sum of a
random subset of its inputs. That is: \[p(z_1... z_m) = \left(\sum_{i=1}^m
\alpha_i z_i\right)^2\] The reason we square the resulting sum is to remove
the possibility of getting an output of -1. In this way, if all of the $z_i$
are 0, then $p$ is guaranteed to be 0. If at least one of the $z_i = 1$, then
$p =1$ with probability $\geq 2/3$. So how do we make sure that the
probability of an error is only $\frac{1}{4s}$? We do this by observing that
if we compute $p$ on several random subsets of variables and then OR together
the results of these computations, our probability of making a mistake will
decrease. Therefore, our OR-replacement polynomial will be
\[\mbox{OR-P}(p_1(z_1...z_m), p_2 (z_1...z_m)... p_k(z_1...z_m))\]
where OR-P is the exact OR polynomial. If we pick $k = \log s$, we then get a
polynomial that is of degree $2\log s$ and will only make a mistake with
probability $(1/3)^{\log s}$ which is less than our desired result of
$\frac{1}{4s}$.
We can repeat this process independently for every OR gate in the circuit in
order to generate the polynomial that represents the circuit.
\end{proof-of-lemma}
\end{document}