\documentclass[oneside,normalheadings]{scrartcl}
\usepackage{amsfonts,graphicx}

\newcommand{\coRP}{\textrm{co-RP}}

\begin{document}
\input preamble

\lecture{10: Randomized Computation}{March 12, 2007}{Madhu
  Sudan}{Punyashloka Biswal}

\begin{abstract}
  We will motivate and define the complexity classes used to describe
  randomized computation, and discuss a few of their basic properties.
\end{abstract}

\section{Motivation}

There are several natural problems that we know how to solve
probabilistically (i.e., using randomness in some way), but not
deterministically. Here are a few examples:

\paragraph{Prime in interval} Given an $n$-bit integer $N$, we wish to
find a prime $p \in [N + 1, 2N]$. A variant of this problem asks for a
prime in the interval $[N + 1, 2^{(\log N)^\epsilon}]$.

To solve this probabilistically, pick $p$ uniformly from $[N + 1,
2N]$. Test if it is prime (this can be done deterministically), and if
so, output it. If not, repeat. By the Prime Number Theorem, such a $p$
is prime with probability $\geq \Omega(1/n)$, so that we have a
polynomial number of trials in expectation.

\paragraph{Square roots in $\Z_p$} Given an $n$-bit prime $p$ and an
integer $a \in \Z_p$, we wish to find $\alpha \in \Z_p$ such that
$\alpha^2 = a$. This is a special case of the problem of factoring
polynomials over finite fields.

This problem is trivial for $p = 2$. When $p \cong 3 \pmod 4$, it can
be solved deterministically: $\alpha = \pm a^{(p + 1)/4}$. The key
step in the case when $p \cong 1 \pmod 4$ is finding a quadratic
non-residue $q$. The quadratic residues form a cyclic subgroup of
index $2$ in $\Z_p$, so there are $(p - 1)/2$ such $q$'s. This means
that they are very easy to find by randomized trials. Unfortunately,
we know of no efficient way to find them deterministically.

\paragraph{Nonsingular matrix sum} Given $k$ matrices $M_1, \ldots,
M_k \in \Z^{n \times n}$, we wish to find $\lambda_1, \ldots,
\lambda_k \in \Z$ such that 
\[
\det \Bigl( \sum_{i = 1}^k \lambda_i M_i \Bigr) \neq 0.
\]
To solve this deterministically, pick $\lambda_i \in_U S$, where $S =
[2n]$, and test if the determinant is zero in this case. If not,
output these $\lambda_i$s, and if not, then repeat.

To understand why this is an efficient algorithm, consider the
degree-$n$ polynomial $p(x_1, \ldots, x_n) := \det ( \sum_{i = 1}^k
M_i x_i )$. By the Schwartz lemma, $\Pr[p(\lambda_1, \ldots,
\lambda_k) = 0] \leq n/\abs S = 1/2$.

\paragraph{Algebraic Circuit Identity Testing} We are given an
arithmetic circuit $C$ over the integers with gates for addition,
subtraction, and multiplication and inputs which are either variables
or constants. We wish to test whether $C$ computes the zero function.

\textbf{Exercise:} Give a randomized $\coRP$ algorithm for
ACIT. \footnote{This is not entirely trivial, as demonstrated by the
  following example: let $S$ be the circuit that maps $x \mapsto x^2$,
  and define $C = S \circ \cdots \circ S = S^{\circ n}$. Observe that
  $C$ is a small circuit with small constant inputs, but it outputs a
  polynomial of degree exponential in $n$. This makes it impossible to
  determine whether it computes zero simply by sampling from a small
  set of values of $x$.}

\section{Computational assumptions}

The notion of probabilistic computation seems `realistic' in the sense
that nature exhibits unpredictable behavior, and we should be able to
extract some of this unpredictability in the form of random inputs to
algorithms. However, the previous section shows that we can pose
natural problems that we do not know how to solve without the aid of
randomness. This conflicts with the strong form of the Turing-Church
Hypothesis:

\begin{description}
\item [Strong Turing-Church Hypothesis] Every physically
  realizable form of computation is simulatable on a Turing machine
  with only a polynomial time slowdown.
\end{description}

The two key features of this hypothesis are
\begin{enumerate}
\item Locality: A Turing machine has a finite-state controller. As a
  result, even though it has an infinite tape, its actions at any one
  moment are controlled by a finite, localized set of inputs.
\item Determinism: A Turing machine's actions are fully determined by
  its past.
\end{enumerate}
Randomized computation preserves the former property, but not the
latter. Guided by our sense that the physical world contains
randomness, that modern-day computers can simulate randomized
algorithms, and that we are unable to do without randomization for
several natural problems\footnote{If $\textrm{BPP} = \textrm{P}$, as
  some people believe, this addition is unnecessary and the original
  hypothesis stands.} we are led to redefine the Turing-Church
hypothesis as
\begin{description}
\item[Strong Turing-Church Hypothesis, probabilistic version] Every
  physically realizable form of computation is simulatable on a
  \emph{probabilistic} Turing machine with only a polynomial time
  slowdown.
\end{description}
It is instructive to contrast this with the situation in quantum
computing. Quantum computation is another resource that nature
provides us, which lets us perform computatations that are not known
to be in P (such as factoring integers). However, present-day
computers are unable to simulate quantum computation, so we are wary
of making a corresponding change to the Turing-Church hypothesis.

\section{Complexity classes}

We could define complexity classes for randomized computation by
extending classical Turing machines with randomized primitives (such
as ``toss a fair coin''). It turns out that it is better to adapt the
two-tape definition of NP, which we repeat below for reference:

\begin{definition}[Non-deterministic polynomial time, NP]
  A language $L \in {\rm NP}$ if there exists a $2$-input Turing
  machine $M(\cdot, \cdot)$ with worst-case running time polynomial in
  the length of the first input $x$, such that

  \begin{tabular}{rl}
    (Completeness) & $x \in L \Rightarrow \exists y, M(x, y)$
    accepts, and \\
    (Soundness) & $x \not\in L \Rightarrow \forall y, M(x, y)$ rejects.
  \end{tabular}
\end{definition}

Similarly, we have

\begin{definition}[Randomized polynomial time, RP]
  A language $L \in {\rm RP}$ if there exists a $2$-input Turing
  machine $M(\cdot, \cdot)$ and an efficiently sampleable distribution
  on $y$ such that $M$ has expected running time polynomial in the
  length of the first input $x$ and

  \begin{tabular}{rl}
    (Completeness) & $x \in L \Rightarrow \Pr_y[M(x, y)\textrm{
      accepts}] \geq 2/3$, and \\
    (Soundness) & $x \not\in L \Rightarrow \Pr_y[M(x, y)\textrm{
      accepts}] = 0$.
  \end{tabular}
\end{definition}

\begin{definition}[Complements of RP, co-RP]
  A language $L \in {\mbox{\rm co-RP}}$ if there exists a $2$-input
  Turing machine $M(\cdot, \cdot)$ and an efficiently sampleable
  distribution on $y$ such that $M$ has expected running time
  polynomial in the length of the first input $x$ and

  \begin{tabular}{rl}
    (Completeness) & $x \in L \Rightarrow \Pr_y[M(x, y)\textrm{
      accepts}] = 1$, and \\
    (Soundness) & $x \not\in L \Rightarrow \Pr_y[M(x, y)\textrm{
      accepts}] \leq 1/3$.
  \end{tabular}
\end{definition}

\begin{definition}[Bounded-error probabilistic polynomial time, BPP]
  A language $L \in {\rm BPP}$ if there exists a $2$-input Turing
  machine $M(\cdot, \cdot)$ and an efficiently sampleable distribution
  on $y$ such that $M$ has expected running time polynomial in the
  length of the first input $x$ and

  \begin{tabular}{rl}
    (Completeness) & $x \in L \Rightarrow \Pr_y[M(x, y)\textrm{
      accepts}] \geq 2/3$, and \\
    (Soundness) & $x \not\in L \Rightarrow \Pr_y[M(x, y)\textrm{
      accepts}] \leq 1/3$.
  \end{tabular}
\end{definition}

\begin{definition}[Zero-error probabilistic polynomial time, ZPP]
  A language $L \in {\rm ZPP}$ if there exists a $2$-input Turing
  machine $M(\cdot, \cdot)$ and an efficiently sampleable distribution
  on $y$ such that $M$ has \textbf{expected} running time polynomial
  in the length of the first input $x$ and 

  \begin{tabular}{rl}
    (Completeness) & $x \in L \Rightarrow \Pr_y[M(x, y)\textrm{
      accepts}] = 1$, and \\
    (Soundness) & $x \not\in L \Rightarrow \Pr_y[M(x, y)\textrm{
      accepts}] = 0$.
  \end{tabular}
\end{definition}

\paragraph{A note on expected running time} In the first three of these
definitions, it is possible to replace ``expected running time'' with
``worst-case running time'' if we are willing to relax the error
parameters slightly; this also lets us compare BPP/RP/co-RP algorithms
with deterministic ones on a more equal footing. As we shall see in
the next section, all these definitions are robust across a wide range
of values of the error parameters, so we do not lose any generality.

\begin{definition}[RP, alternate]
  A language $L \in {\rm RP}$ if there exists a $2$-input Turing
  machine $M(\cdot, \cdot)$ and an efficiently sampleable distribution
  on $y$ such that $M$ has worst-case running time polynomial in the
  length of the first input $x$ and

  \begin{tabular}{rl}
    (Completeness) & $x \in L \Rightarrow \Pr_y[M(x, y)\textrm{
      accepts}] \geq 5/8$, and \\
    (Soundness) & $x \not\in L \Rightarrow \Pr_y[M(x, y)\textrm{
      accepts}] = 0$.
  \end{tabular}  
\end{definition}

\begin{proof}
  To see why this definition implies the old one, suppose $L \in {\rm
    BPP}$ and let $M$ be a BPP Turing machine that takes a random tape
  in addition to $x$ and decides $L$ in expected time $p(\abs x)$. We
  shall construct another Turing machine $M'$ for $L$ as follows: on
  input $x$, we run $M$ on $x$ and a random tape for $10p(\abs x)$
  time steps. If $M$ does not terminate by this time, then reject.

  By the Markov inequality, $M$ fails to terminate in $10 p(\abs x)$
  steps with probability $\leq 1/10$. Therefore, the probability that
  an $x \in L$ is not accepted is at most $2/3 - 1/10 \geq 5/8$.

  The converse is trivially true after performing an amplification.
\end{proof}

Unlike for BPP, RP, and co-RP, expected running time is crucial to the
definition of ZPP and cannot be eliminated. For example, consider the
the language 
\[
L = \{ (p, a, b, c) \mid \textrm{$p$ is a prime, $a, b, c \in \Z_p,
  \exists \alpha \in [b, c]: \alpha^2 = a$} \}.
\]
This is a decision version of the square root problem, and is easily
shown to be of equivalent hardness. It has a ZPP algorithm but no
known deterministic one.

However, we have a different characterization of ZPP in terms of other
classes defined using worst-case time:
\begin{proposition}
  ${\rm ZPP} = {\rm RP} \cap {\mbox{\rm co-RP}}$
\end{proposition}
\begin{proof}
  Let $M$ and $M'$ be RP and co-RP algorithms, respectively, for a
  language $L$ that lies in the intersection. Run them in parallel. If
  $M$ accepts first, then accept. If $M'$ rejects first, then
  reject. One of these two must eventually happen, and in expected
  polynomial time.

  The converse is trivial.
\end{proof}

\paragraph{Inclusions among randomized classes} These definitions make
it clear that ${\rm RP} \subseteq {\rm NP}$ and ${\mbox {\rm co-RP}}
\subseteq {\mbox {\rm co-NP}}$. We shall see in a later lecture that
RP and co-RP lie within $\Sigma_2^{\rm P}$. This gives us the picture
below (Figure~\ref{fig:inclusions}).
\begin{figure}[h]
  \centering
  \caption{Known inclusions}
  \label{fig:inclusions}
  \includegraphics[scale=0.7]{rand-classes}
\end{figure}
Some complexity theorists believe that ${\rm BPP} = {\rm P} \neq {\rm
  NP} \neq {\mbox {\rm co-NP}}$.

\section{Amplification}

The error parameters used in the definitions from the previous section
can actually be replaced with a wide range of different values without
altering the complexity classes involved.

For example, we can change ``$\geq 2/3$'' in the definition of RP to
``$\geq 1/p(n)$'' or ``$\geq 1 - 2^{-p(n)}$'' for any polynomial
function $p$ of the input length: take an RP machine $M$ with error
probability $1 - 1/p(n)$, and run it $t$ times on independently chosen
random $y$'s. Accept if any of these runs accept. The error
probability of this new Turing machine is $(1 - 1/p)^t$, which can be
made $\leq e^{-q(n)}$ by setting $t = \Omega(p(n) q(n))$.

For BPP, we can replace the correctness statement ``$\geq 1/2 +
1/p(n)$'' with ``$\geq 1 - 2^{-q(n)}$'' and the soundness statement
``$\leq 1/2$'' with ``$\leq 2^{-q(n)}$.'' The amplification argument
is conceptually similar, but uses Chernoff bounds.
\end{document}