\documentclass[10pt]{article}
\usepackage{amsfonts}
\usepackage{amsmath}
\newtheorem{define}{Definition}

%\newcommand{\Z}{{\bf Z}}
\usepackage{psfig}
\newcommand{\e}{\epsilon}
\newcommand{\getsr}{\gets_{\mbox{\tiny R}}}
\newcommand{\bits}{\{0,1\}}
\newcommand{\st}{\mbox{ s.t. }}

\oddsidemargin=0.15in
\evensidemargin=0.15in
\topmargin=-.5in
\textheight=9in
\textwidth=6.25in

\begin{document}
\input{preamble.tex}

\lecture{20}{November 17, 2004}{Madhu Sudan}{Anup Rao}

\section{Overview}

\begin{itemize}
\item Extractors
\item Extractors vs Codes
\end{itemize}

\section{Extractors}

Suppose we have access to a source of randomness that isn't completely
random, but does contain a lot of randomness. Is there a way to
convert this source into a source of truly random bits? Ideally we'd
like an algorithm (called an Extractor) that does something like this:

\psfig{figure=extractor.eps}

To formalize this, we need to decide what we mean by a \emph{weak}
source and what it means to say that a distribution is \emph{close} to
being random. 

\begin{definition} Let $D_1$ and $D_2$ be two distributions on a set
$S$. Their \emph{statistical distance} is
\begin{align*}
\parallel D_1-D_2 \parallel \eqdef \max_{T \subseteq S}(|D_1(T) -
D_2(T)|) = \frac{1}{2} \sum_{s \in S}|D_1(s)-D_2(s)|
\end{align*}
If $\|D_1-D_2\| \leq \e$ we shall say that $D_1$ is $\e$-close to $D_2$.
\end{definition}

Note that this definition of \emph{closeness} is as strong as we might
hope for. Specifically, if $D_1$ and $D_2$ are $\e$-close, then for
any algorithm $A$, $\| \Pr_{x \getsr D_1}[A(x)=1] - \Pr_{x \getsr
D_2}[A(x)=1] \| \leq \e$.

Different models have been considered to formalize the notion of
\emph{weak} sources. von-Neumann considered sources where each bit
from the source is independent, but biased, i.e. the probability of
the bit being 1 is $p$, where $p$ may not be $\frac{1}{2}$. Santha and
Vazirani \cite{SanthaV84} considered sources where each bit is biased
and the bias of the bit depends on the bits that have come before
it. The most general notion of a \emph{weak} source to date was first
introduced by Nisan and Zuckerman \cite{NisanZ93}. Here, we bound the
\emph{min-entropy} of the source:

\begin{definition} \label{def:minentropy} The \emph{min-entropy} of a
source $D$ is defined to be:
\[ H_{\infty}(D) \eqdef -\log(\max_{x \in D}(D(x)) \]

We will call a distribution over $\bits^n$ that has min-entropy $k$ an
$(n,k)$ source. 
\end{definition}

So if a source has min-entropy $k$, the probability that the source
gives a particular string is at most $2^{-k}$. This definition is
fairly general. If a source has min-entropy $k$, then we expect that
we will have to take more than $2^{k/2}$ samples from the source
before we see a repetition. Such a source must also have a support
that contains more than $2^k$ elements.

\begin{fact} \label{fact:flat} Every $(n,k)$ source is a convex combination
of sources which are uniform distributions on sets of size $2^k$.
\end{fact}

Our extractor problem is now well defined. We'd like to construct an
efficiently computable deterministic function $EXT$ that takes $n$
bits of input from any source of min-entropy $k$ and produces $m$ bits
that are statistically close to being uniformly random. Unfortunately
such a function does not exist! To see this, suppose $EXT$ was an
extractor that could extract just one bit from a source of min-entropy
$n-1$. At least half the points ($2^{n-1}$ points) in $\bits^n$ must
get mapped to either $0$ or $1$ by the extractor, so if we choose our
input weak source to be the flat distribution on these points, we see
that our source has min-entropy $n-1$, but the output of the extractor
is statistically far from being uniformly random.

Notice that in the argument above there was a strong relation between
the source that we picked as a counterexample to the extractor and the
extractor itself. This suggests a relaxation to the problem: pick a
function randomly from a family and use that as an extractor.

\begin{definition} \label{def:extractor} We say a function
$EXT:\{0,1\}^n \times \{0,1\}^d \rightarrow \{0,1\}^m$ is a $(k,\e)$
extractor if for any $(n,k)$ source X and for an Y chosen
independently uniformly at random from $\{0,1\}^d$, we have

\[ \parallel EXT(X,Y) - U_{m} \parallel \leq \e \]

where $U_m$ is the uniform distribution on $m$ bits. 
\end{definition}

\begin{definition} \label{def:strongextractor} We say a function
$EXT:\{0,1\}^n \times \{0,1\}^t \rightarrow \{0,1\}^m$ is a
\emph{strong} $(k,\e)$ extractor if for any $(n,k)$ source X and for Y
chosen uniformly at random from $\{0,1\}^t$, we have

\[ \parallel Y \circ EXT(X,Y) - Y \circ U_{m} \parallel \leq \e \]
\end{definition}

A strong extractor ensures that even when given the seed used, the
output of the extractor is statistically close to being uniformly
random.

Fact \ref{fact:flat} implies that if our extractor can extract from
any flat distribution with min-entropy $k$, then it can also extract
from a general source that has min-entropy $k$.

\subsection{Graph theoretic view of Extractors}

Let $N = 2^n$, $M = 2^m$, $D = 2^d$, $K = 2^k$. Then we can view an
extractor as a bipartite graph with $N$ $D$-regular vertices on the
left and $M$ vertices on the right.

\par
\centerline{\psfig{figure=extdiag.eps}}
\par

In terms of the extractor graph the extractor property translates to
the following. If $S$ is a subset of the vertices on the left with
$|S| \geq K$ and $T$ is any subset of the vertices on the right, the
number of edges going from $S$ to $T$ is close to what we'd expect if
the graph was chosen randomly, i.e., 

$\bigg \lvert \text{\# of edges going from } S \text{ to } T -
D|S||T|/M \bigg \rvert \leq
\e D|S|$

This property looks very similar to something we've seen
before. Recall the definition of $(D,\e)$-uniform graphs that we saw
in lecture 18:

\begin{definition} Let $G$ be a $D$-regular, bipartite graph with a
set $L$ of left vertices and a set $R$ of right vertices satisfying
$|L|=|R|=N$. G is a $(D,\epsilon)$-$uniform$ graph if $\forall X
\subseteq L, Y \subseteq R, \beta = \frac{|X|}{N}, \gamma =
\frac{|Y|}{N}$, the number of edges between $X$ and $Y$ is $\geq
(\beta \cdot \gamma - \epsilon) \cdot D \cdot N$.
\end{definition}

The key differences to note between the two properties are that:

\begin{itemize}
\item An extractor graph may be imbalanced, while a uniform graph
must be balanced.
\item An extractor graph is required to satisfy the \emph{uniformness}
property only with regards to large (cardinality $\geq K$) sets on the
left, whereas a uniform graph allows a more general choice of sets.

\item The uniformness condition for an extractor graph is somewhat
stronger in the sense that the number of edges going from a particular
set $S$ on the left to a set $T$ on the right is allowed to deviate by
only $\e D |S|$ from the expected number for a random graph. On the
other hand in a uniform graph the number of edges is allowed to
deviate by $\e D N$ from the expected number.
\end{itemize}

\section{Extractors vs Codes}

We have already seen some ways to use bipartite graphs to construct
codes. Now we will see a new way to interpret a code as a bipartite
graph that will allow us to get a connection between extractors and
codes. Given any $[n,k,?]_q$ code with an encoding function $E$, we
construct the bipartite graph $(L,R,G)$, where $L$ is the set of left
vertices, $R$ is the set of right vertices and $G$ is the set of
edges, so that:

\begin{itemize}
\item $L \eqdef [\F_q]^k$
\item $R \eqdef [\F_q]$
\item $G \eqdef \{(x,y) | \exists i \st E(x)_i = y\}$, i.e., for any
element $x \in L$, $x$ is connected to the $n$ elements $E(x)_1,
E(x)_2,...,E(x)_n$ in $R$.
\end{itemize}

\par
\centerline{\psfig{figure=codegraph.eps}}
\par


Ta-Shma and Zuckerman \cite{TaShmaZ2001} observed that this graph is a
good extractor exactly when the corresponding code is a good list
decodable code! In fact, when the graph is a good extractor, the
corresponding code has a property that is slightly stronger than just
being list decodable, it is \emph{soft decision decodable}. To
motivate this concept, let us review what it means to decode words in
the classical sense. 

In the classical decoding problem, we start with a received vector $r
\in \F_q^n$ and we try to find the closest (in terms of Hamming
distance) codeword to this received word. However, in the real world
we usually have a little more information than just a received
word. For instance we can imagine using some electrical device to
transmit information over a noisy channel, where a $+1$ volt signal
means a $1$ and a $-1$ volt signal means a $0$. If we receive a signal
of $+.99$ volts, it was probably transmitted as a $1$. On the other
hand, if we receive $0.10$ volts, then it is less likely that the
transmitted signal was a $1$. Soft decision decoding is one way to
take advantage of this extra information.

In soft decision decoding, instead of a received word, we are given a
weight function $w:[n] \times \F_q \rightarrow [0,1]$ and the goal is
to find a codeword $C$ that maximizes $\sum_{i=1}^n w(i,C_i)$. So for
instance if we are very sure that the first symbol of the codeword is
a $0$, we can factor that in by making $w(1,0)$ very high. It is easy
to see that classical decoding is just a special case of soft decision
decoding. For any weight function $w$, we define the relative weight
of the function as: \[\rho(w) \eqdef \frac{\sum_{i \in [n], y \in
\F_q}w(i,y)}{nq}\] Note that the expected weight of a random word is
exactly $\rho(w)n$.

As usual, we can now look at the corresponding list decoding problem:

\begin{definition}
A code $C$ is said to be \emph{soft list decodable} from $(1-\e)$
fraction errors if: $\forall$ weight functions $w$, $\exists$ at most
$\poly(n)$ codewords in $C$ that have weight $\geq n(\rho(w) + \e)$.
\end{definition}

Ta-Shma and Zuckerman showed that good extractors correspond to codes
which are soft list decodable. Now we will sketch the arguments for
the connection. 

\begin{theorem} $C = (L,R,G)$ is the graph of a code that is soft list
decodable from $(1-\e)$ fraction errors with list size less than $K$
$\Rightarrow$ $C$ is the graph of a strong $(k-\log(\e),2\e)$
extractor.
\end{theorem}
\begin{proof-sketch}

For simplicity, we will just prove that the graph obtained is the
graph of a $(k-\log(\e),2\e)$ extractor. To show that it's a strong
extractor we need to make some minor modifications to the same proof. 

Suppose the graph is not that of a good extractor. Then $\exists S
\subset L, T \subset R$, with $|S| \geq K/\e$, so that the number of
edges between $S$ and $T$ is $2\e n |S|$ more (or less) than the
expected number of edges. wlog we will assume that the number of edges
is more than the expected number (if it's less than the expected
number we can take the complement of $T$ instead). Now consider the
following weight function

\[
w(i,y) = \begin{cases}
0 & y \notin T \\
1/|T| & y \in T\\
\end{cases}
\]

Note that $\rho(w) = 1/q$. The average weight of a codeword in $S$ is
at least $2 \e n$ more than the expected weight of a random word. On
the other hand each word in $S$ has a weight of at most $n/|T|$. If
there are $x$ words in $S$ which have a weight greater than $(1/q +
\e)n$, we have that

\begin{align*}
 xn/|T| + (|S|-x)(1/q + \e )n &\geq (1/q + 2\e) n |S| \\
\Rightarrow x(1/|T| - 1/q - \e) &\geq \e |S| \\
\Rightarrow x &\geq \e |S| \\
\Rightarrow x &\geq K\\
\end{align*}

This contradicts the soft list decodability of the code. 

\end{proof-sketch}

Using very similar arguments, we can argue the other direction, i.e.,
that an extractor graph gives a soft list decodable code.

\begin{theorem} $C$ is the graph of a strong $(k,\e)$
extractor $\Rightarrow$ $C = (L,R,G)$ is the graph of a code that is
soft list decodable from $(1-\e)$ fraction errors with list size less
than $K$.
\end{theorem}

\begin{proof-sketch}
Suppose not. Then there is a weight function $w$ for which the list
size is larger than $K$. Consider the behavior of the extractor when
it is given the uniform distribution over this list as the input
distribution. The weight function $w$ then gives a way to distinguish
the output of the extractor from the uniform distribution with
probability $\e$, contradicting the extractor property.
\end{proof-sketch}

\bibliographystyle{alphabetic}

\bibliography{lect20}

\end{document}