\documentclass[runningheads, 11pt]{article}
\usepackage{graphicx,verbatim,xspace,color}
\usepackage{amsmath,amssymb,amsthm}
\usepackage{fullpage}
\theoremstyle{plain}
\newtheorem{theorem}{Theorem}[section]
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{claim}[theorem]{Claim}
\theoremstyle{definition}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{remark}[theorem]{Remark}
\setlength{\oddsidemargin}{.25in}
\setlength{\evensidemargin}{.25in}
\setlength{\textwidth}{6in}
\setlength{\topmargin}{-0.4in}
\setlength{\textheight}{8.5in}
\newcommand{\heading}[6]{
\renewcommand{\thepage}{\arabic{page}}
\noindent
\begin{center}
\framebox{
\vbox{
\hbox to 5.78in { \textbf{#2} \hfill #3 }
\vspace{4mm}
\hbox to 5.78in { {\Large \hfill #6 \hfill} }
\vspace{2mm}
\hbox to 5.78in { \textit{Instructor: #4 \hfill #5} }
}
}
\end{center}
\vspace*{4mm}
}
%
%\newenvironment{proof}{\noindent{\bf Proof:} \hspace*{1mm}}{
% \hspace*{\fill} $\Box$ }
%\newenvironment{proof_of}[1]{\noindent {\bf Proof of #1:}
% \hspace*{1mm}}{\hspace*{\fill} $\Box$ }
%\newenvironment{proof_claim}{\begin{quotation} \noindent}{
% \hspace*{\fill} $\diamond$ \end{quotation}}
\newcommand{\lecturenotes}[4]{\heading{#1}{6.S897 Algebra and Computation}{#2}{Madhu Sudan}{Scribe: #3}{#4}}
\newcommand{\lecturenum}{21} % lecture number
\newcommand{\lecturetitle}{Applications of Algebra in Computing: Codes and Decoding} % topic of lecture
\newcommand{\lecturedate}{April 30, 2012} % date of lecture
\newcommand{\studentname}{Michael Forbes} % full name of student (i.e., you)
\newcommand{\F}{\mathbb{F}}
\newcommand{\K}{\mathbb{K}}
\newcommand{\la}{\langle}
\newcommand{\ra}{\rangle}
\DeclareMathOperator{\mdeg}{mdeg}
\newcommand{\from}{\leftarrow}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%555
\begin{document}
\lecturenotes{\lecturenum}{\lecturedate}{\studentname}{\lecturetitle}
\section{Overview}
Today we will cover an application of algebra to computing, specifically talking about the constructions of error correcting codes. Specifically, we aim to cover
\begin{itemize}
\item Reed-Solomon Codes
\item List decoding of Reed-Solomon codes
\item Ideal error correction
\end{itemize}
We will mention the frontier of this work, encapsulated in folded Reed-Solomon codes, and how this frontier improves on what we show today.
\section{Coding Theory}
We first motivate coding theory as a topic. Consider the problem of storing data on a CD-ROM (or a modern variant). Such physical devices can be corrupted over time due to mistreatment or decay of physical structure. That is, \textit{errors} will occur in our stored message. Thus, if we want to recover the original message in the future we need to do something to cope with such errors. The main idea is that we will assume that the error rate is somehow bounded, so errors cannot be arbitrarily complicated. In such a situation, we hope to add a small amount of redundancy in our dataset such that this will still allow recovery in the face of errors.
Formally, we have some universe of messages, denoted $U$, and wish to store it. As is typical, we encode the messages as strings over some finite alphabet $\Sigma$. So our encoding map will be $Enc:U\to\Sigma^n$. Typically, we will want $\Sigma$ to be $\{0,1\}$. However, other such $\Sigma$ can be useful if we have a physical medium (such as a CD-ROM) that naturally has a larger $\Sigma$. CD-ROM's are this way because their errors come in \textit{bursts} (think of scratches on a CD), so even though the CD is actually encoded in bits, the errors are better modeled as errors on chunks of bits. There are other reasons to take $\Sigma$ larger. One is that it allows interesting families of codes to be defined, such as the Reed-Solomon codes we shall soon see. The other is that one can often then convert codes over large $\Sigma$ to codes over $\{0,1\}$, while retaining interesting properties.
The notion of error we will use is the \textit{Hamming metric}, given by $\Delta(x,y)=|\{i:x_i\ne y_i\}|$, for $x,y\in\Sigma^n$. We shall denote $\mathcal{C}\subseteq\Sigma^n$ as the image of the encoding function $Enc$, and call $\mathcal{C}$ the code. The \textit{minimum distance} of the code is defined as $\Delta(\mathcal{C})=\min_{x\ne y\in \mathcal{C}}$. Typically a code will have $\Sigma$ as some finite field, and oftentimes the universe $U$ can be identified with some $\Sigma^k$. In such a situation, we have an $[n,k,d]_q$ code, for a code $\Sigma^k=\mathcal{C}\subseteq\Sigma^n$, with minimum distance $d$, over the alphabet $\Sigma=\mathbb{F}_q$. A large minimum distance means that the codewords are well-separated according to the hamming metric, and thus it takes many errors to confuse one codeword with another. Putting such reasoning together, we get
\begin{lemma}[Hamming]
If the minimum distance of a code $\mathcal{C}$ is $d$ then one can (information theoretically) correct up to $\lfloor (d-1)/2\rfloor$ errors.
\end{lemma}
Note that this decoding is only information theoretic. That is, if there are less than $\lfloor (d-1)/2\rfloor$ errors then there is a unique codeword that we can decode to. However, the above lemma, which is geometric, does not say how to find that codeword. One of the challenges of coding theory is to develop codes with efficient decoding algorithms. One of the other challenges is to balance the various parameters of interest. That is, we want to maximize the rate $k/n$ of a code for any given minimum distance $d$. Intuitively, there must be some limit to this rate for any such $d$, as we must add some redundancy to ensure the distance property. There are many results showing this to be the case. We prove one here, called the Singleton bound (named after an individual named Singleton).
\begin{lemma}[Singleton Bound]
In an $[n,k,d]_q$ code, we have that $n\ge k+d-1$.
\end{lemma}
\begin{proof}
Let $\mathcal{C}\subseteq \Sigma^n$ be an $[n,k,d]_q$ code, so that $|\mathcal{C}|=q^k$. Consider the projection map $\pi:\Sigma^n\to\Sigma^{k-1}$ that takes a codeword and drops the last $n-(k-1)$ symbols. By the pigeonhole principle, we have that there must be $x\ne y\in\mathcal{C}$ such that $\pi(x)=\pi(y)$. Thus $x$ and $y$ agree on the first $k-1$ coordinates, so that $\Delta(x,y)\le n-(k-1)$. However, $d\le \Delta(x,y)$, giving the bound.
\end{proof}
Perhaps somewhat surprisingly, this bound is tight, in that Reed-Solomon codes meet it, as we shall now see.
\section{Reed-Solomon Codes}
Reed-Solomon codes, defined in the 1960's, are one of the most simple applications of algebra to coding theory. Here, we take $\Sigma=\mathbb{F}_q$, for $n\le q$. Take $\alpha_1,\ldots,\alpha_n\in\mathbb{F}_q$ to be distinct. Our messages $(m_0,\ldots,m_{k-1})$ we be encoded as the evaluations $(p(\alpha_1),\ldots,p(\alpha_n))$, where $p$ is the polynomial $p(x)=\sum_{i=0}^{k-1}m_ix^i$.
To analyze the distance of this code, note that two polynomials of degree $\sqrt{2kn}$. In this regime, an inclusion-exclusion type argument (known as the Johnson bound) shows that there are at most $2\sqrt{n/k}$ such polynomials with this level of agreement. The argument relies on the fact that we are asking to fit many polynomials into a smallish hamming ball, but that each polynomial must still be well-separated from each other.
In the regime where $a<\sqrt{kn}$, current techniques can show that there are at most $n^2$ such polynomials.
We note that the above regimes essentially constitute all of our knowledge about decoding Reed-Solomon codes.
We now comment on why one would want to actually perform list-decoding instead of just unique-decoding. Unfortunately there is no quick black-box motivation, indicating why list-decoding is the correct idea. However, there are black-box applications known. Typically, they use the following idea: list-decoding can still function in the presence of more errors than unique decoding can. Thus, when there are many errors one wants to still be able to do something, and list decoding is that something. Once one gets a list of candidate messages, it is sometimes, but not generically, possible to prune down the candidates to get the correct message. Often, the applications of list-decoding have some auxiliary problem structure that allow this pruning to be done (eg.\ the messages are all English sentences).
The paradigm of the decoding algorithms is to find a nice algebraic ``explanation'' for the data. One explanation would be a polynomial $\hat{p}$ such that $\hat{p}(\alpha_i)=\beta_i$ for each $i$. However, this is not robust to error. For if we start out with a degree $\sqrt{2n}$ then there is such a non-zero $Q$. We now need to argue correctness, that is, all such polynomials $p$ with sufficient agreement will appear as a factor $y-p(x)$ of any such $Q$ (note that there could be many such $Q$ and it is not clear which to choose. The following lemma says that any such $Q$ will work). This is given by the following lemma.
\begin{lemma}
If $|\{i:p(\alpha_i)=\beta_i\}|\ge a> Dk$ then $y-p(x)|Q$, for any $Q$ such that $Q(\alpha_i,\beta_i)=0$ for all $i$.
\end{lemma}
\begin{proof}
There are two ways to prove this. The first is to observe that $Q(x,p(x))$ is of degree at most $Dk$ and has more than $Dk$ roots and thus must be identically zero, and as such $y-p(x)$ divides $Q(x,y)$.
Another way is to use Bezout's theorem in the plane. For we have two curves, $y-p(x)$ and $Q(x,y)$ that have more than $Dk$ common zeroes, and $Dk$ is the product of their degrees. This means, by Bezout's theorem, that they share a common factor. However, as $y-p(x)$ is irreducible, the claim follows.
\end{proof}
Note that this allows us to correct from agreement $a\ge k\sqrt{2n}$, and get at most $\sqrt{2n}$ polynomials, as this is the degree of $Q$ and each factor is of degree 1. One can do better by using a notion of ``weighted degree''. For when we take the polynomial $Q(x,y)$ and substitute $p(x)$ for $y$ we are not using the degrees of $x$ and $y$ in a balanced way. Thus, in constructing $Q$, one can define the weighted degree of $x^iy^j$ to be $i+kj$ and then ask for $Q$ to be of weighted degree $\le D$. When doing this, we can then correct from $O(\sqrt{kn})$ agreement, which is stronger than what we derived above.
\section{Ideal Error-Correcting Codes}
We now abstract the decoding process we just performed, to understand the essence of what was done. We begin by noting that for a polynomial $p(x)$, the residue of $p$ after taking it modulo $x-\alpha$ is just the value $p(\alpha)$. So the Reed-Solomon code can be viewed as taking a polynomial and encoding it by taking it modulo several ideals. We now abstract this.
Let $R$ be a ``nice'' ring. Let $M\subseteq R$ be a finite subset of $R$, which will be our message space. Note that $R$ itself might be infinite. Let $I_1,\ldots, I_n$ be ideals of $R$, such that the quotient rings $R/I_j$ are all finite. That is, the number of distinct equivalence classes $m+I_j$ is finite for each $j$. Further, we shall assume each such quotient ring is ``small'', as they will form our alphabet symbols. We shall encode $m\in M$ by the residues $(m+I_1,\ldots,m+I_n)$.
We now apply this to get a new family of codes, called the Chinese Remainder Code. Recall that the rings $\mathbb{F}_q[x]$ and $\mathbb{Z}$ share many properties, and things true in one are usually true in the other. As Reed-Solomon codes operate in the former ring, we seek to find their analogue over $\mathbb{Z}$. The message space over $\mathbb{F}_q[x]$ is the space of all degree $