\documentclass[10pt]{article}
\usepackage{amssymb}
\newtheorem{define}{Definition}
%\newcommand{\Z}{{\mathbb{Z}}}
%\usepackage{psfig}

\oddsidemargin=0.15in
\evensidemargin=0.15in
\topmargin=-.5in
\textheight=9in
\textwidth=6.25in

\begin{document}
\input{preamble.tex}

%{lecture number}{lecture date}{Ronitt Rubinfeld}{Your name}
\lecture{17}{April 19, 2006}{Ronitt Rubinfeld}{Pouya Kheradpour}

%%%% body goes in here %%%%

\section{Outline}
In this lecture we discuss the use of pairwise independent random numbers in algorithms. We will first show an algorithm that only requires pairwise independence of the random bits used. Next, we will show how to generate these pairwise random bits using very few truly random bits allowing us to apply enumeration technique for derandomization. We will also show how to efficiently generate pairwise random \emph{numbers} (as opposed to bits). Finally, we will see how to reduce the error rate of randomized algorithms with one-sided error using pairwise independent numbers.
\section{Definitions}
We begin with formal definitions of independence. Consider the random variables $x_1 \ldots x_n \in T^n$ where $\forall {b \in T}, \forall i, \Pr[x_i = b] = \frac{1}{|T|}$. We say that $x_1 \ldots x_n$ are 
\begin{itemize}
\item \emph{independent} when $\forall {b_1 \ldots b_n \in T^n}, \Pr[x_1 \ldots x_n = b_1 \ldots b_n] = \frac{1}{|T|^n}$.
\item \emph{pairwise independent} when $\forall {b_1b_2 \in T^2}, \forall {i \neq j}, \Pr[x_ix_j = b_1 b_2] = \frac{1}{|T|^2}$.
\item \emph{k-wise independent} when $\forall {b_1 \ldots b_k \in T^n}, \forall {\textrm{ distinct } i_1\ldots i_k}, \Pr[x_{i_1} \ldots x_{i_k} = b_1 \ldots b_k] = \frac{1}{|T|^k}$.
\end{itemize}
Notice that pairwise independence is equivalent to 2-wise independence.
\section{Large Cut using pairwise independence}
Here we present and analyze a randomized algorithm for finding a large cut of a graph requiring only pairwise independent bits. 
\begin{center}
\begin{minipage}{5cm}
\texttt{
\begin{tabbing}
AAA\=AAA\=AAA\= \kill
\underline{Algorithm LargeCut($G$)}: \\
\>  initialize $V_1, V_2 \gets \{ \}, \{ \}$ \\
\> for every node $i = 1, \ldots , n$\\ 
\> \> flip coin $r_i$ \\
\> \> if $r_i = \textrm{heads}$ \\
\> \> \> $V_1 \gets V_1 \cup \{ i\}$  \\
\> \> else \\
\> \> \> $V_2 \gets V_2 \cup \{ i\}$ 
\end{tabbing} } 
\end{minipage}
\end{center}

Now, to show that $V_1$ and $V_2$ do define a large cut, consider the following:
\begin{eqnarray*}
E[ \textrm{\# of edges in cut}]  &=&  \sum_{(i,j)\in E} \Pr[ v_i \neq v_j ]  \qquad \qquad\qquad \qquad \qquad \qquad  \quad \quad \textrm{(linearity of expectation)} \\
&=& \sum_{(i,j)\in E} \Pr[ v_i = 1 \cap v_j = 0] + \Pr[ v_i = 0 \cap v_j = 1] \\
&=& \sum_{(i,j)\in E} \Pr[ v_i = 1 ] \Pr[v_j=0] + \Pr[ v_i = 0 ] \Pr[v_j=1] \,\,\,\, \textrm{(\emph{pairwise} independence)} \\
&=& \sum_{(i,j)\in E} \frac{1}{2} = \frac{|E|}{2} 
\end{eqnarray*}
Thus, all we need for this algorithm is $n$ coins that are pairwise independent (not truly independent). 
\section{Efficiently generating pairwise random bits}
Now we provide an algorithm for producing $N$ pairwise independent random bits from only $O(\log N)$ truly random bits. Take $k$ truly random bits $b_1 \ldots b_k$. $\forall {S \subseteq [k]}$ such that $S \neq \emptyset$ set $C_S = \oplus_{i \in S} b_i$. Each of the $2^k-1$ $C_S$ $S$ are pairwise independent (proof left as exercise). Thus, to generate $N$ pairwise random bits we only need $\log N$ truly random bits.
\section{Derandomized Large Cut}
Because we can generate $N$ pairwise random bits using $\log N$ truly random bits, we can
\begin{enumerate}
\item Enumerate all $N$ possibilities for truly random bits of length $\log N$.
\item Use each string of independent bits to generate a string of $N$ pairwise independent bits.
\item Run it through the randomized Large Cut algorithm presented above.
\item Take the cut that is the largest. By the argument above it will have expected size of at least $\frac{|E|}{2}$.
\end{enumerate}
\section{Generating pairwise independent random numbers}
Pairwise independent random numbers $r_1, \ldots , r_q$ between $0$ and some prime $q>N$ can be generated using the following algorithm:
\begin{center}
\begin{minipage}{5cm}
\texttt{
\begin{tabbing}
AAA\=AAA\=AAA\= \kill
\underline{Algorithm GeneratePI($N$)}: \\
\>  $q \gets $ some prime $ > N$ \\
\> choose $a, b \in_R \Z_q$ \\
\> for $i \gets 1 \ldots q$ \\
\> \> $r_i \gets a \cdot i + b \pmod{q}$
\end{tabbing} }
\end{minipage}
\end{center}
Notice that this algorithm requires only $2\log q$ truly random bits to generate $a$ and $b$. 

To see that these number are pairwise independent, consider any $x, w, y, z \in [q]$ where $x \neq w$. It follows that there is only one $a, b$ such that $a \cdot w + b = y $ and $a \cdot x + b = z$ because in
\[
\left( \begin{array}{cc}
w & 1 \\
x & 1 
\end{array} \right)
\left( \begin{array}{c}
a \\
b 
\end{array} \right) =
\left( \begin{array}{c}
y \\
z 
\end{array} \right)
\]
the matrix $\left( \begin{array}{cc}
w & 1 \\
x & 1 
\end{array} \right)$
has non-zero determinant $w - x$. Thus, $Pr_{a,b} [  a \cdot w + b = y \cap a \cdot x + b = z ] = \frac{1}{q^2}$ indicating pairwise independence.
\section{Generalization of pairwise independence generation}
Define $\mathcal{H} = \{ h_i : [N] \rightarrow [M] \}$ to be a pairwise independent family of functions if
\[
\forall {x \neq y \in [N]}, \forall {a,b \in [M]}, Pr_{h \in \mathcal{H}} [h(x) = a \cap h(y) = b] = \frac{1}{M^2}
\]
Thus, we notice that our construction for pairwise independent bits was the special case of $M=2$ and for pairwise independent numbers was $M=q$. 
\section{``Two-point sampling'' with pairwise independent numbers}
Consider an algorithm $\mathcal{A}$ with ``one-sided error'', i.e. 
\[
x \in L \Rightarrow  Pr_R [ \mathcal{A} (x,R) = 1 ] \geq \frac{1}{2}
\]
and 
\[
x \notin L  \Rightarrow Pr_R [ \mathcal{A} (x,R) = 0 ] = 1
\]
It is that that if $\mathcal{A}(x,R) = 1 $ then $R$ is a ``witness to $X \in L$''. 

To improve the error probability, we can do one of the following:
\begin{enumerate}
\item Repeat $k$ times, use Chernoff bounds to obtain error $ \leq 2^{-k}$ with $O(kr)$ random bits.
\item Use a random walk on rapid mixing graph with $r + O(k)$ bits.
\item Use pairwise random bits requiring the generation of only $O(r+k)$ bits.
\end{enumerate}
In previous lectures we saw the first two methods. We will discuss the last method now. We compute $\mathcal{A}(x,r_i)$ for all $1 \leq i \leq t$. If any $\mathcal{A}(x,r_i) = 1$, then we output $1$. Otherwise, we output $0$. 

Consider the probability of misclassification. If $x \notin L$ then $\mathcal{A}(x,r_i)$ is always $0$ and we do not misclassify. 

If $x \in L$ then the probability of misclassification is equal to the probability that we never see a $1$. Let $Y \equiv \sum_{i=1}^t \mathcal{A}(x,r_i)$. By linearity of expectation, we see that $E[Y] = \sum_{i=1}^t E[\mathcal{A}(x,r_i)] \geq \frac{t}{2}$. We cannot use a Chernoff bound to evaluate $\Pr[Y = 0]$ because the $r_i$s are not independent and the Markov inequality is too weak. However, we can apply the Chebyshev inequality. 

First we must compute the variance of $Y$. Because the $r_i$s are pairwise independent, 
\[
\sigma_Y^2 = \sum \sigma^2 [ \mathcal{A}(x,r_i) ] = \sum \left( E[\mathcal{A}(x,r_i)^2] - E^2 [ \mathcal{A}(x,r_i) ] \right) = \sum (p - p^2) \leq \sum \frac{1}{4} = \frac{t}{4}
\]
Thus, $\sigma_Y \leq \frac{\sqrt{t}}{2}$. The Chebyshev inequality states that
\[
Pr [ | x - E[x] | \geq s \sigma ] \leq \frac{1}{s^2}
\]
Thus, the probability that we encounter no $\mathcal{A}(x,r_i) = 1$ when $x \in L$ is
\[
Pr [ Y = 0] \leq Pr [ | Y - \frac{t}{2} | > \frac{t}{2} ] \leq \frac{1}{\sqrt{t}^2} = \frac{1}{t}
\]
Thus, we obtain our desired error rate of $2^{-k}$ for $t = 2^k$ which requires $O(\log t) = O(k)$ truly random bits.

The name ``Two-point sampling'' comes from the fact that we only need random numbers $a$ and $b$. Their size is bounded by $q$ where $q \geq 2^r$ and $q \geq t$ (the numbers have to be big enough and we must be able to generate enough of them). Thus $q = \Omega(r+\log t) = \Omega(r + k)$.
\end{document}