Publications 
Badih Ghazi, Madhu Sudan
The Power of Shared Randomness in Uncertain Communication
International Colloquium on Automata, Languages and Programming (ICALP) 2017.
[Abstract]
[PDF]
In a recent work (Ghazi et al., SODA 2016), the authors with Komargodski and Kothari initiated the study of \emph{communication with contextual uncertainty}, a setup aiming to understand how efficient communication is possible when the communicating parties imperfectly share a huge context. In this setting, Alice is given a function $f$ and an input string $x$, and Bob is given a function $g$ and an input string $y$.
The pair $(x,y)$ comes from a known distribution $\mu$ and $f$ and $g$ are
guaranteed to be close under this distribution.
Alice and Bob wish to compute $g(x,y)$ with high probability. The lack of agreement between Alice and Bob on the function that is being computed captures the uncertainty in the context. The previous work showed that any problem with oneway communication complexity $k$ in the standard model (i.e., without uncertainty\footnote{in other words, under the promise that $f=g$.}) has \emph{publiccoin} communication at most $O(k(1+I))$ bits in the uncertain case, where $I$ is the mutual information between $x$ and $y$. Moreover, a lower bound of $\Omega(\sqrt{I})$ bits on the publiccoin uncertain communication was also shown.
However, an important question that was left open is related to the power that public randomness brings to uncertain communication. Can Alice and Bob achieve efficient communication amid uncertainty without using public randomness? And how powerful are publiccoin protocols in overcoming uncertainty? Motivated by these two questions:
 We prove the first separation between privatecoin uncertain communication and publiccoin uncertain communication. Namely, we exhibit a function class for which the communication in the standard model and the publiccoin uncertain communication are $O(1)$ while the privatecoin uncertain communication is a growing function of $n$ (the length of the inputs). This lower bound (proved with respect to the uniform distribution) is in sharp contrast with the case of publiccoin uncertain communication which was shown by the previous work to be within a constant factor from the certain communication. This lower bound also implies the first separation between publiccoin uncertain communication and deterministic uncertain communication. Interestingly, we also show that if Alice and Bob \emph{imperfectly share} a sequence of random bits (a setup weaker than public randomness), then achieving a constant blowup in communication is still possible.
 We improve the lowerbound of the previous work on publiccoin uncertain communication. Namely, we exhibit a function class and a distribution (with mutual information $I \approx n$) for which the oneway certain communication is $k$ bits but the oneway publiccoin uncertain communication is at least $\Omega(\sqrt{k} \cdot \sqrt{I})$ bits.
Our proofs introduce new problems in the standard communication complexity model and
prove lower bounds for these problems. Both the problems and the lower bound
techniques may be of general interest.
Badih Ghazi, Elad Haramaty, Pritish Kamath, Madhu Sudan
Compression in a Distributed Setting
Innovations in Theoretical Computer Science (ITCS) 2017.
[Abstract]
[PDF]
Motivated by an attempt to understand the formation and development of (human)
language, we introduce a ``distributed compression'' problem. In our problem a sequence of pairs of players from a set of
$K$ players are chosen and tasked
to communicate messages drawn from an unknown distribution
$Q$.
Arguably languages are created and evolve to compress frequently occurring messages, and we focus on this aspect.
The only knowledge that players have about the distribution $Q$
is from previously drawn samples, but these samples differ from player to player.
The only {\em common} knowledge between the players is restricted to a common prior distribution $P$ and some constant number
of bits of information (such as a learning algorithm).
Letting $T_\epsilon$ denote the number of iterations it would take for a typical player
to obtain an $\epsilon$approximation to $Q$ in total variation distance, we ask
whether $T_\epsilon$ iterations suffice to compress the messages down roughly to their
entropy and give a partial positive answer.
We show that a natural
uniform algorithm can compress the communication down to an average cost per
message of $O(H(Q) + \log (D(P  Q))$
in $\tilde{O}(T_\epsilon)$ iterations
while allowing for $O(\epsilon)$error,
where $D(\cdot  \cdot)$ denotes the KLdivergence between distributions.
For large divergences
this compares favorably with the static algorithm that ignores all samples and
compresses down to $H(Q) + D(P  Q)$ bits, while not requiring $T_\epsilon\cdot
K$ iterations that it would take players to develop optimal but separate compressions for
each pair of players.
Along the way we introduce a ``datastructural'' view of the task of
communicating with a natural language and show that our natural algorithm can also be
implemented by an efficient data structure, whose storage is comparable to the storage requirements of $Q$ and whose query complexity is comparable to the lengths of the message to be
compressed.
Our results give a plausible mathematical analogy to the mechanisms by which
human languages get created and evolve, and in particular highlights the
possibility of coordination towards a joint task (agreeing on a language)
while engaging in distributed learning.
Vitaly Feldman, Badih Ghazi
On the Power of Learning from kWise Queries
Innovations in Theoretical Computer Science (ITCS) 2017.
[Abstract]
[PDF]
Several wellstudied models of access to data samples, including statistical queries, local differential privacy and lowcommunication algorithms rely on queries that provide information about a function of a single sample. (For example, a statistical query (SQ) gives an estimate of $\E_{x\sim D}[q(x)]$ for any choice of the query function $q:X\rightarrow \R$, where $D$ is an unknown data distribution.) Yet some data analysis algorithms rely on properties of functions that depend on multiple samples. Such algorithms would be naturally implemented using $k$wise queries each of which is specified by a function $q:X^k\rightarrow \R$. Hence it is natural to ask whether algorithms using $k$wise queries can solve learning problems more efficiently and by how much.
Blum, Kalai, Wasserman~\cite{blum2003noise} showed that for any weak PAC learning problem over a fixed distribution, the complexity of learning with $k$wise SQs is smaller than the (unary) SQ complexity by a factor of at most $2^k$. We show that for more general problems over distributions the picture is substantially richer. For every $k$, the complexity of distributionindependent PAC learning with $k$wise queries can be exponentially larger than learning with $(k+1)$wise queries. We then give two approaches for simulating a $k$wise query using unary queries. The first approach exploits the structure of the problem that needs to be solved. It generalizes and strengthens (exponentially) the results of Blum \etal \cite{blum2003noise}. It allows us to derive strong lower bounds for learning DNF formulas and stochastic constraint satisfaction problems that hold against algorithms using $k$wise queries. The second approach exploits the $k$party communication complexity of the $k$wise query function.
Mohammad Bavarian, Badih Ghazi, Elad Haramaty, Pritish Kamath, Ronald L. Rivest, Madhu Sudan
Optimality of Correlated Sampling
Manuscript 2016.
[Abstract]
[PDF]
In the \emph{correlated sampling} problem, two players, say Alice and Bob,
are given two distributions, say $P$ and $Q$ respectively,
over the same universe and access to shared randomness.
The two players are required to output two elements, without any interaction,
sampled according to their respective distributions, while trying to minimize the
probability that their outputs disagree.
A wellknown protocol due to Holenstein, with close variants (for similar problems) due to Broder, and to Kleinberg and Tardos, solves this task with disagreement probability at most $2 \delta/(1+\delta)$, where $\delta$ is the total variation distance between $P$ and $Q$. This protocol has been used in several different contexts including sketching algorithms, approximation algorithms based on rounding linear programming relaxations, the study of parallel repetition and cryptography.
In this note, we give a surprisingly simple proof that this protocol is in fact tight. Specifically, for every $\delta \in (0,1)$, we show that any correlated sampling scheme should have disagreement probability at least $2\delta/(1+\delta)$. This partially answers a recent question of Rivest.
Our proof is based on studying a new problem we call \emph{constrained agreement}. Here, Alice is given a subset $A \subseteq [n]$ and is required to output an element $i \in A$, Bob is given a subset $B \subseteq [n]$ and is required to output an element $j \in B$, and the goal is to minimize the probability that $i \neq j$. We prove tight bounds on this question, which turn out to imply tight bounds for correlated sampling.
Though we settle basic questions about the two problems, our formulation also leads
to several questions that remain open.}
Badih Ghazi, Pritish Kamath, Madhu Sudan
Decidability of NonInteractive Simulation of Joint Distributions
IEEE Symposium on Foundations of Computer Science (FOCS) 2016.
[Abstract]
[PDF]
We present decidability results for a subclass of ``noninteractive'' simulation
problems, a wellstudied class of problems in information theory.
A {\em noninteractive simulation} problem is specified by two distributions $P(x,y)$ and $Q(u,v)$: The goal is to determine if two players,
Alice and Bob, that observe sequences $X^n$ and $Y^n$ respectively where $\set{(X_i, Y_i)}_{i=1}^n$ are drawn i.i.d. from $P(x,y)$ can generate pairs $U$ and $V$ respectively (without communicating with each other) with a joint distribution that is arbitrarily close in total variation to $Q(u,v)$.
Even when $P$ and $Q$ are extremely simple: e.g., $P$ is uniform on the triples
$\{(0,0), (0,1), (1,0)\}$ and $Q$ is a ``doubly symmetric binary source'', i.e., $U$ and $V$ are uniform $\pm 1$ variables with correlation say $0.49$, it is open if $P$ can simulate $Q$.
In this work, we show that whenever $P$ is a distribution on a finite domain and
$Q$ is a $2 \times 2$ distribution, then the noninteractive simulation
problem is {\em decidable}: specifically, given $\delta > 0$ the algorithm runs in time bounded by some function of $P$ and $\delta$ and either gives a noninteractive simulation
protocol that is $\delta$close to $Q$ or asserts that no protocol gets
$O(\delta)$close to $Q$. The main challenge to such a result is determining explicit (computable) convergence bounds on the number $n$ of samples that need to be drawn from $P(x,y)$ to get $\delta$close to $Q$. We invoke contemporary results
from the analysis of Boolean functions such as the invariance principle and a regularity lemma to obtain such explicit bounds.
Venkata Gandikota, Badih Ghazi, Elena Grigorescu
NPHardness of ReedSolomon Decoding
and the ProuhetTarryEscott Problem
IEEE Symposium on Foundations of Computer Science (FOCS) 2016.
[Abstract]
[PDF]
Establishing the complexity of {\em Bounded Distance Decoding} for ReedSolomon codes is a fundamental open problem in coding theory, explicitly asked by Guruswami and Vardy (IEEE Trans. Inf. Theory, 2005). The problem is motivated by the large current gap between the regime when it is NPhard, and the regime when it is efficiently solvable (i.e., the Johnson radius).
We show the first NPhardness results for asymptotically smaller decoding radii than the maximum likelihood decoding radius of Guruswami and Vardy. Specifically, for ReedSolomon codes of length $N$ and dimension $K=O(N)$, we show that it is NPhard to decode more than $ NK c\frac{\log N}{\log\log N}$ errors (with $c>0$ an absolute constant). Moreover, we show that the problem is NPhard under quasipolynomialtime reductions for an error amount $> NK c\log{N}$ (with $c>0$ an absolute constant).
An alternative natural reformulation of the Bounded Distance Decoding problem for ReedSolomon codes is as a {\em Polynomial Reconstruction} problem. In this view, our results show that it is NPhard to decide whether there exists a degree $K$ polynomial passing through $K+ c\frac{\log N}{\log\log N}$ points from a given set of points $(a_1, b_1), (a_2, b_2)\ldots, (a_N, b_N)$. Furthermore, it is NPhard under quasipolynomialtime reductions to decide whether there is a degree $K$ polynomial passing through $K+c\log{N}$ many points.
These results follow from the NPhardness of a generalization of the classical Subset Sum problem to higher moments, called {\em Moments Subset Sum}, which has been a known open problem, and which may be of independent interest.
We further reveal a strong connection with the wellstudied ProuhetTarryEscott problem in Number Theory, which turns out to capture a main barrier in extending our techniques. We believe the ProuhetTarryEscott problem deserves further study in the theoretical computer science community.
Badih Ghazi, Ilan Komargodski, Pravesh Kothari, Madhu Sudan
Communication with Contextual Uncertainty
ACMSIAM Symposium on Discrete Algorithms (SODA) 2016.
[Abstract]
[PDF]
We introduce a simple model illustrating the role of context in communication
and the challenge posed by uncertainty of knowledge of context. We consider a
variant of distributional communication complexity where Alice gets some
information $x$ and Bob gets $y$, where $(x,y)$ is drawn from a known
distribution, and Bob wishes to compute some function $g(x,y)$ (with
high probability over $(x,y)$). In our variant, Alice does not know $g$, but only knows some
function $f$ which is an approximation of $g$. Thus, the function being computed
forms the context for the communication, and knowing it imperfectly models
(mild) uncertainty in this context.
A naive solution would be for Alice and Bob to first agree on some common
function $h$ that is close to both $f$ and $g$ and then use a protocol for $h$
to compute $h(x,y)$. We show that any such agreement leads to a large overhead
in communication ruling out such a universal solution.
In contrast, we show
that if $g$ has a oneway communication protocol with complexity $k$ in the
standard setting, then it has a communication protocol with complexity $O(k \cdot (1+I))$ in the uncertain setting, where $I$ denotes the mutual information between $x$ and $y$. In the particular case where the input distribution is a product distribution, the protocol in the uncertain setting only incurs a constant factor blowup in communication and error.
Furthermore, we show that the dependence on the mutual information $I$ is required. Namely, we construct a class of functions along with a nonproduct distribution over $(x,y)$ for which the communication complexity is a single bit in the standard setting but at least $\Omega(\sqrt{n})$ bits in the uncertain setting.
Badih Ghazi, Pritish Kamath, Madhu Sudan
Communication Complexity of PermutationInvariant Functions
ACMSIAM Symposium on Discrete Algorithms (SODA) 2016.
[Abstract]
[PDF]
Motivated by the quest for a broader understanding of upper bounds in communication complexity, at least for simple functions, we introduce the class of ``permutationinvariant'' functions. A partial function $f:\bit^n \times \bit^n\to \set{0,1,?}$ is permutationinvariant if for every bijection $\pi:\set{1,\ldots,n} \to \set{1,\ldots,n}$ and every $\bx, \by \in \bit^n$, it is the case that $f(\bx, \by) = f(\bx^{\pi}, \by^{\pi})$. Most of the commonly studied functions in communication complexity are permutationinvariant. For such functions, we present a simple complexity measure (computable in time polynomial in $n$ given an implicit description of $f$) that describes their communication complexity up to polynomial factors and up to an additive error that is logarithmic in the input size. This gives a coarse taxonomy of the communication complexity of simple functions. Our work highlights the role of the wellknown lower bounds of functions such as {\sc SetDisjointness} and {\sc Indexing}, while complementing them with the relatively lesserknown upper bounds for {\sc GapInnerProduct} (from the sketching literature) and {\sc SparseGapInnerProduct} (from the recent work of Canonne et al. [ITCS 2015]). We also present consequences to the study of communication complexity with imperfectly shared randomness where we show that for total permutationinvariant functions, imperfectly shared randomness results in only a polynomial blowup in communication complexity after an additive $O(\log \log n)$ overhead.
Venkata Gandikota, Badih Ghazi, Elena Grigorescu
On the NPhardness of Bounded Distance Decoding of ReedSolomon Codes
IEEE International Symposium on Information Theory (ISIT) 2015.
[Abstract]
[PDF]
Guruswami and Vardy (IEEE Trans. Inf. Theory, 2005) show that given a ReedSolomon code over a finite field $\F$, of length $n$ and dimension $t$, and given a target vector $v\in\F^n$, it is NPhard to decide if there is a codeword that disagrees with $v$ on at most $nt1$ coordinates. Understanding the complexity of this Bounded Distance Decoding problem as the amount of error in the target decreases is an important open problem in the study of ReedSolomon codes.
In this work we generalize this result by proving that it is NPhard to decide the existence of a codeword that disagrees with $v$ on $nt2$ and on $nt3$ coordinates.
No other NPhardness results were known before for an amount of error less than $nt1$. The core of our proof is showing the NPhardness of a parameterized generalization of the SubsetSum problem to higher degrees (called Moments SubsetSum) that may be of independent interest.
Badih Ghazi, Euiwoong Lee
LP/SDP Hierarchy Lower Bounds for Decoding Random LDPC Codes
ACMSIAM Symposium on Discrete Algorithms (SODA) 2015.
IEEE Transactions on Information Theory 2017 (to appear).
[Abstract]
[PDF]
Random $(d_v,d_c)$\emph{regular} LDPC codes (where each variable is involved in $d_v$ parity checks and each parity check involves $d_c$ variables) are wellknown to achieve the Shannon capacity of the binary symmetric channel (for sufficiently large $d_v$ and $d_c$) under exponential time decoding. However, polynomial time algorithms are only known to correct a much smaller fraction of errors. One of the most powerful polynomialtime algorithms with a formal analysis is the LP decoding algorithm of Feldman et al. which is known to correct an $\Omega(1/d_c)$ fraction of errors. In this work, we show that fairly powerful extensions of LP decoding, based on the SheraliAdams and Lasserre hierarchies, fail to correct much more errors than the basic LPdecoder. In particular, we show that:
\begin{itemize}
\item For any values of $d_v$ and $d_c$, a linear number of rounds of the SheraliAdams LP hierarchy cannot correct more than an $O(1/d_c)$ fraction of errors on a random $(d_v,d_c)$regular LDPC code.
\item For any value of $d_v$ and infinitely many values of $d_c$, a linear number of rounds of the Lasserre SDP hierarchy cannot correct more than an $O(1/d_c)$ fraction of errors on a random $(d_v,d_c)$regular LDPC code.
\end{itemize}
Our proofs use a new \emph{streching} and \emph{collapsing} technique that allows us to leverage recent progress in the study of the limitations of LP/SDP hierarchies for Maximum Constraint Satisfaction Problems (MaxCSPs). The problem then reduces to the construction of special \emph{balanced pairwise independent distributions} for SheraliAdams and special \emph{cosets of balanced pairwise independent subgroups} for Lasserre. Our (algebraic) construction for the Lasserre hierarchy is based on designing sets of points in $\F_q^d$ (for $q$ any power of $2$ and $d = 2,3$) with special hyperplaneincidence properties  constructions that may be of independent interest. An intriguing consequence of our work is that \emph{expansion} seems to be both the \emph{strength} and the \emph{weakness} of random regular LDPC codes.
Our techniques are more generally applicable to a large class of Boolean CSPs called MinOnes. In particular, for $k$Hypergraph Vertex Cover, we obtain an improved integrality gap of $k1\epsilon$ that holds after a \emph{linear} number of rounds of the Lasserre hierarchy, for any $k = q+1$ with $q$ an arbitrary prime power. The best previous gap for a linear number of rounds was equal to $2\epsilon$ and due to Schoenebeck.
Eric Blais, Joshua Brody, Badih Ghazi
The Information Complexity of Hamming Distance
International Workshop on Randomization and Computation (RANDOM) 2014.
[Abstract]
[PDF]
The Hamming distance function $\HamDist{n,d}$ returns $1$ on all pairs of inputs
$x$ and $y$ that differ in at most $d$ coordinates and returns 0
otherwise.
We initiate the study of the information complexity of the Hamming distance function.
We give a new optimal lower bound for the information complexity of the
$\HamDist{n,d}$ function in the smallerror regime where the protocol is required
to err with probability at most $\epsilon < d/n$.
We also give a new conditional lower bound for the information complexity of
$\HamDist{n,d}$ that is optimal in all regimes.
These results imply the first new lower bounds on the communication complexity
of the Hamming distance function for the shared randomness twoway communication model
since Pang and ElGamal (1986).
These results also imply new lower
bounds in the areas of property testing and parity decision tree complexity.
Badih Ghazi, Haitham Hassanieh, Piotr Indyk, Dina Katabi, Eric Price, Lixin Shi
SampleOptimal AverageCase Sparse Fourier Transform in Two Dimensions
Allerton Conference on Communication, Control, and Computing
(Allerton) 2013.
[Abstract]
[PDF]
We present the first sampleoptimal sublinear time algorithms for the sparse Discrete Fourier Transform over a twodimensional sqrt{n} x sqrt{n} grid. Our algorithms are analyzed for /average case/ signals. For signals whose spectrum is exactly sparse, our algorithms use O(k) samples and run in O(k log k) time, where k is the expected sparsity of the signal. For signals whose spectrum is approximately sparse, our algorithm uses O(k log n) samples and runs in O(k log^2 n) time; the latter algorithm works for k=Theta(sqrt{n}). The number of samples used by our algorithms matches the known lower bounds for the respective signal models.
By a known reduction, our algorithms give similar results for the onedimensional sparse Discrete Fourier Transform when n is a power of a small composite number (e.g., n = 6^t).
Louay Bazzi, Badih Ghazi, Rudiger Urbanke
Linear Programming Decoding of Spatially Coupled Codes
IEEE International Symposium on Information Theory (ISIT) 2013.
IEEE Transactions on Information Theory 2014.
[Abstract]
[PDF]
For a given family of spatially coupled codes, we prove that the Linear Programming (LP) threshold on the BSC of the tailbiting graph cover ensemble is the same as the LP threshold on the BSC of the derived spatially coupled ensemble. This result is in contrast with the fact that spatial coupling significantly increases the Belief Propagation (BP) threshold as shown by [Kudekar, Richardson and Urbanke  IEEE trans. on Information Theory 2011] and [Kudekar, Richardson and Urbanke  ISIT 2012]. To prove this, we establish some properties related to the dual witness for LP decoding which was introduced by [Feldman et al.  IEEE trans. on Information Theory 2007] and simplified by [Daskalakis et al.  IEEE trans. on Information Theory 2008]. More precisely, we prove that the existence of a dual witness which was previously known to be sufficient for LP decoding success is also necessary and is equivalent to the existence of certain acyclic hyperflows. We also derive a sublinear (in the block length) upper bound on the weight of any edge in such hyperflows, both for regular LPDC codes and for spatially coupled codes and we prove that the bound is asymptotically tight for regular LDPC codes. Moreover, we show how to trade crossover probability for ``LP excess'' on all the variable nodes, for any binary linear code.
