Harvard/MIT/MSR Theory Reading Group (v2.0)
About: This is a revival of the MSR/MIT Theory Reading Group.
Goal: The goal of this reading group is to learn in significant depth about (new/exciting) developments in theoretical computer science and related areas.
Formula: Each meeting consists of a threehourlong whiteboard talk given by the speaker. Talks tend to be informal in nature and quite interactive. In particular, questions from the audience are strongly encouraged. There is a break in the middle – it is not impolite to leave during this break, or indeed any time earlier or later. Ah...and we have pizza!
Time and place: The meetings will alternate between Harvard and MIT during terms, and will be held at MSR in the summer. Unless otherwise noted, the meetings will be held
on
Fridays 1:15 pm–4:15 pm (with pizza at 1pm).
Organizers:
Boaz Barak,
Yael Kalai,
Aleksander Mądry,
Ankur Moitra,
Madhu Sudan.
Mailing list: There is a mailing list where the announcements of upcoming talks, as well as, other information related to the reading group appear. To subscribe to this list, visit this
page. Please note that the mailing list is intended only for local participants. So please subscribe using an email that lets us know you are local, or send email to madhu letting him know who, and where, you are.
Upcoming Talks:
 No upcoming talks
Past Talks:

Friday, March 9, 2018: Speaker:
Shafi Goldwasser and Adam Sealfon
(MIT).
Topic:
Population stability: regulating size in the presence of an adversary.
We introduce a new coordination problem in distributed computing that we call the population stability problem. A system of agents each with limited memory and communication, as well as the ability to replicate and selfdestruct, is subjected to attacks by a worstcase adversary that can at a bounded rate (1) delete agents chosen arbitrarily and (2) insert additional agents with arbitrary initial state into the system. The goal is perpetually to maintain a population whose size is within a constant factor of the target size N. The problem is inspired by the ability of complex biological systems composed of a multitude of memorylimited individual cells to maintain a stable population size in an adverse environment. Such biological mechanisms allow organisms to heal after trauma or to recover from excessive cell proliferation caused by inflammation, disease, or normal development.
We present a population stability protocol in a communication model that is a synchronous variant of the population model of Angluin, Aspnes, Diamadi, Fischer, and Peralta. In each round, pairs of agents selected at random meet and exchange messages, where at least a constant fraction of agents is matched in each round. Our protocol uses threebit messages and ?(log^2 N) states per agent. We emphasize that our protocol can handle an adversary that can both insert and delete agents, a setting in which existing approximate counting techniques do not seem to apply. The protocol relies on a novel coloring strategy in which the population size is encoded in the variance of the distribution of colors. Individual agents can locally obtain a weak estimate of the population size by sampling from the distribution, and make individual decisions that robustly maintain a stable global population size.
Joint work with Alessandra Scafuro and Rafail Ostrovsky. 
Friday, March 2, 2018: Speaker:
Boaz Barak
(Harvard).
Topic:
An exposition of Dinur et al's proof of the 2 to 2 conjecture  Part II.
(Continuation of last week.)
In a recent manuscript, Khot, Minzer and Safra proved a combinatorial conjecture which, combined with a work of Dinur, Khot, Kindler, Minzer and Safra (STOC 2018) and me, Kothari and Steurer (unpublished), completes the proof of Khot's 2 to 2 conjecture (up to imperfect completeness).
In this talk I will give an overview of this result, focusing on the reduction. My exposition will differ quite a lot from the works above. We will not be talking about the Grassman graphs but rather our main object of study will be the graph on matrices over GF(2) where A is a neighbor of B if AB has rank one.
While this is a matter of taste, I believe this exposition can serve to highlight both the similarities and differences between the works of Dinur et al and prior hardness of approximation results, as well as what are the challenges that need to be overcome to extend these techniques to prove the unique games conjecture.
The talk will not assume knowledge of unique games or PCPs and I will define what we need. However, it might be easier to follow if you are at least slightly familiar with Hastad's Theorem on the NP hardness of approximating 3XOR, the label cover problem, and error correcting codes.
I will try to go slowly (especially if I'm asked questions!) and so may need a second lecture to fully cover the result, but should be able to convey a lot of the interesting new ideas already in the first lecture. 
Friday, February 23, 2018: Speaker:
Boaz Barak
(Harvard).
Topic:
An exposition of Dinur et al's proof of the 2 to 2 conjecture.
In a recent manuscript, Khot, Minzer and Safra proved a combinatorial conjecture which, combined with a work of Dinur, Khot, Kindler, Minzer and Safra (STOC 2018) and me, Kothari and Steurer (unpublished), completes the proof of Khot's 2 to 2 conjecture (up to imperfect completeness).
In this talk I will give an overview of this result, focusing on the reduction. My exposition will differ quite a lot from the works above. We will not be talking about the Grassman graphs but rather our main object of study will be the graph on matrices over GF(2) where A is a neighbor of B if AB has rank one.
While this is a matter of taste, I believe this exposition can serve to highlight both the similarities and differences between the works of Dinur et al and prior hardness of approximation results, as well as what are the challenges that need to be overcome to extend these techniques to prove the unique games conjecture.
The talk will not assume knowledge of unique games or PCPs and I will define what we need. However, it might be easier to follow if you are at least slightly familiar with Hastad's Theorem on the NP hardness of approximating 3XOR, the label cover problem, and error correcting codes.
I will try to go slowly (especially if I'm asked questions!) and so may need a second lecture to fully cover the result, but should be able to convey a lot of the interesting new ideas already in the first lecture. 
Friday, February 16, 2018: Speaker:
Benjamin Sudakov
(ETH, Zurich).
Topic:
Properties of Ramsey graphs.
An nvertex graph is called CRamsey if it has no clique or independent set of size C log n. All known constructions of Ramsey graphs involve randomness in an essential way, and there is an ongoing line of research towards showing that in fact all Ramsey graphs must obey certain richness conditions characteristic of random graphs. In this talk we survey several results which show that Ramsey graphs indeed have randomlike properties, focusing on the recent progress on two old problems of P. Erdos and his coauthors.
Joint work with Matthew Kwan.

Monday, November 20, 2017: Speaker:
Bernhard Haeupler
(CMU).
Topic:
Synchronization Strings and Coding for Insertions and Deletions.
The talk will give an introduction to synchronization strings which are a novel way of efficiently reducing synchronization errors, such as, insertions and deletions, to much more benign and better understood Hamming errors. Synchronization strings have many applications. The talk will focus on using synchronization strings as a new way to generate efficient error correcting block codes for insertions and deletions. In particular, codes that approach the Singleton bound, i.e., for any 0 < delta < 1 and any eps > 0 these codes achieve a rate of 1  delta  eps while being able to efficiently decode from a delta fraction of insertions and deletions.
Joint work with Amirbehshad Shahrasbi

Friday, October 27, 2017: Speaker:
Justin Thaler
(Georgetown).
Topic:
Recent Advances in Approximate Degree and Its Applications.
The epsapproximate degree of a Boolean function is the minimum degree of a real polynomial that pointwise approximates f to error eps. Approximate degree has wideranging applications in theoretical computer science. As one example, the approximate degree of a function is a lower bound on its quantum query complexity. Despite its importance, the approximate degree of many basic functions has resisted characterization.
In this talk, I will describe recent (and upcoming) advances in proving both upper and lower bounds on the approximate degree of specific functions.
These advances yield tight (or nearly tight) bounds on the approximate degree of AC^0, as well as a variety of specific functions. Our approximate degree bounds settle (or nearly settle) the quantum query complexity of many of these functions, including junta testing, approximating the statistical distance of two distributions, entropy approximation, and kdistinctness.
Based on joint work with Mark Bun and Robin Kothari.

Friday, October 20, 2017: Speaker:
Ravi Boppana
(MIT).
Topic:
Tomaszewski's Problem on Randomly Signed Sums.
Let v_1, v_2, ..., v_n be real numbers whose squares add up to 1. Consider the 2^n signed sums of the form S = sum +/ v_i. In 1986, Tomaszewski asked whether it is always true that at least 1/2 of these sums satisfy S le 1. His question has been conjectured to have an affirmative answer, but is still open more than 30 years later. 1n 1992, Holzman and Kleitman proved that at least 3/8 of the sums satisfy S le 1. This 3/8 bound is provably the best their method can achieve. In this talk, using a different method, we will improve the bound to 13/32 = 0.40625, thus breaking the 3/8 barrier. Actually, we will improve the bound further to 0.406259. Some connections to algorithmic problems will be discussed.
Joint work with Ron Holzman (Technion, Israel). 
Friday, October 13, 2017: Speaker:
Timothy Gowers
(Cambridge).
Topic:
A quantitative inverse theorem for the U^4 norm in the finitefields case.
The inverse theorem for the U^k norm, due to Green, Tao and Ziegler, is a remarkable and central result in additive combinatorics with important consequences in number theory. A few years before it was proved, an analogue of the statement was proved by Bergelson, Tao and Ziegler for functions defined on F_p^n (for p fixed and n tending to infinity). This says that any bounded function with a large U^k norm has to correlate with a polynomial phase function, with the polynomial of degree at most k1. (Familiarity with all this terminology will be not be assumed in the talk.)
The proofs of these results did not give reasonable bounds when k was greater than 3. In this talk I will discuss recent work with Luka Milicevic, in which we have obtained a quantitative and potentially generalizable argument for k=4 in the F_p^n case. It will not be possible to explain the whole proof, so instead I shall give an extensive sketch, and then try to convince you that the details can be filled in by demonstrating the techniques we use in simpler contexts. In recent years there has been an interesting interplay between additive combinatorics and theoretical computer science. I hope that some of these techniques may contribute to that. (In fact, some of them are already closely related to results that have been proved by theoretical computer scientists.)
I shall be giving a shorter and less detailed account of this result in Harvard the day before, where I shall discuss other results of a similar flavour. It will not be necessary to have attended that talk, but it will provide a bit more context. 
Friday, September 29, 2017: Speaker:
Noga Alon
(Tel Aviv University).
Topic:
Ramsey graphs, nearly orthogonal vectors and list decodable codes.
What is the maximum possible Shannon capacity of a graph on n vertices with independence number 2 ?
What is the maximum possible value of the Lovasz Theta function of an n vertex graph with independence number 2 ?
What is the maximum possible (Euclidean) norm of a sum of n unit vectors so that any 3 of them contain an orthogonal pair ?
What is the maximum number of words in a binary code of length n so that there is no Hamming ball of radius (1/4+epsilon)n containing more than two words ?
I will describe a construction that settles three of these four problems up to a constant factor, and will discuss several related open problems. The part dealing with list decodable codes is based on a recent work in progress with Bukh and Polyanskiy. 
Friday, September 22, 2017: Speaker:
Umesh Vazirani
(Berkeley).
Topic:
Rigorous RG algorithms for simulating 1D quantum systems.
Computing low energy states of quantum many body systems is a central challenge in condensed matter physics. From a computational complexity viewpoint these are near optimal solutions to (quantum) constraint satisfaction problems. Mathematically, the problem is specified by a succinctly described Hamiltonian  an exponentially large matrix, and the challenge is to find the eigenstates with small eigenvalues. A priori it is not even clear whether such eigenstates can be succinctly described, let alone computed efficiently. In this talk I will describe novel combinatorial arguments showing that low energy states for systems of particles with nearest neighbor interactions in 1D can be described succinctly, leading to an efficient classical algorithm for computing them. The algorithm provides a new perspective on the well known Renormalization Group (RG) formalism within condensed matter physics.
Over the last quarter century the famous Density Matrix Renormalization Group (DMRG) algorithm has been widely used as a numerical heuristic for identifying ground and low energy states of 1D quantum systems. A recent implementation of our algorithm shows promise for outperforming DMRG in hard cases with high ground space degeneracy or near criticality.
The talk will be aimed at a broad audience of computer scientists and physicists, and I will not assume a background in quantum computing.
Based on joint work with Itai Arad, Zeph Landau and Thomas Vidick 
Friday, September 15, 2017: Speaker:
Jaroslaw Blasiok, Preetum Nakkiran, Madhu Sudan
(Harvard).
Topic:
General Strong Polarization.
A martingale is a sequence of random variables that maintain their future expected value conditioned on the past. A $[0,1]$bounded martingale (where the random variables are always in $[0,1]$) is said to polarize if it converges in the limit to either $0$ or $1$ with probability $1$. A martingale is said to polarize strongly, if in $ steps it is subexponentially close to its limit with all but exponentially small probability.
Martingale polarization came to the fore in 2008, when Arikan built a powerful class of errorcorrecting codes called Polar codes based on this idea. The essence of his theory associates a martingale with every invertible square matrix over a field (and a channel) and showed that polarization of the martingale leads to a construction of codes that converge to Shannon capacity. In 2013, Guruswami and Xia, and independently Hassani et al. showed that the martingale associated with the simplest $2x2$ matrix considered in Arikan's work polarizes strongly, and this led to the first construction of codes that converge to Shannon capacity at finite block lengths, specifically at lengths that are inverse polynomial in the gap to capacity, thereby resolving a major mathematical challenge associated with the attainment of Shannon capacity.
Prior to our work strong polarization was poorly understood and the only matrix that was known to polarize strongly was the 2x2 matrix alluded to above. We show that a simple necessary condition for an invertible matrix to polarize over any nontrivial channel is also sufficient for strong polarization over all symmetric channels over all prime fields. In addition to the generality of our result, it also leads to arguably simpler proofs. The essence of our proof is a local definition of polarization which only restricts the evolution of the martingale in a single step, and a general theorem showing the local polarization suffices for strong polarization.
In this talk we will introduce polarization and polar codes and present a full proof of our main theorem. No prior background on polar codes will be necessary.
Based on joint work with Venkatesan Guruswami (CMU), and Atri Rudra (Buffalo).

Friday, September 8, 2017: Speaker:
Dor Minzer
(Tel Aviv University).
Topic:
Playing 2to1 Games with Grassmann.
In this talk we discuss a recent line of attacks towards proving the 2to1 Games conjecture, a variant of Khot's wellknown UniqueGames Conjecture. A key ingredient in this line of attack is the Grassmann Graph, and the approach relies on a certain unproven combinatorial hypothesis on it. We will describe the main ideas in the reductions. If time allows, we will discuss recent developments towards resolving the unproven combinatorial hypothesis.

Friday, November 18, 2016: Speaker:
Gil Cohen
(Princeton).
Topic:
Explicit Ramsey graphs.
In his 1947 paper that inaugurated the probabilistic method, Erdos proved the existence of 2log?(n)Ramsey graphs on n vertices. Matching Erdos' result with a constructive proof is a central problem in combinatorics that has gained significant attention in the literature. In this talk, we will present recent progress towards this goal.

Friday, November 4, 2016: Speaker:
Mika Göös
(Harvard).
Topic:
QuerytoCommunication Lifting Theorems.
I will discuss new lower bound methods in communication complexity that "lift" lower bounds from decision tree complexity. These methods have recently enabled progress on core questions in communication complexity (logrank conjecture, classicalquantum separations) and related areas (circuit complexity, proof complexity, LP/SDP formulations). In the 1st half, I will prove a simple lifting theorem for NP (nondeterministic query/communication complexity). In the 2nd half, I will discuss some ongoing work on finding lifting theorems for P^NP.

Friday, October 28, 2016: Speaker:
Larry Guth
(MIT).
Topic:
Decoupling theorems in Fourier analysis.
In the last few years, Jean Bourgain and Ciprian Demeter have proven a variety of striking ``decoupling'' theorems in Fourier analysis. I think this is an important development in the subject. I don't know any applications to the theory of computation, but there are applications in partial differential equations and in number theory. I will explain what decoupling theorems say and why I think they're important, I'll sketch an application in number theory, and I'll give a detailed sketch of the proof of the simplest decoupling theorem.

Friday, October 21, 2016: Speaker:
Sébastien Bubeck
(MSR Redmond).
Topic:
Kernelbased methods for bandit convex optimization.
In bandit optimization one wants to optimize some unknown function using an approximate function value oracle. In the stochastic approximation literature one usually assumes that the oracle reports the correct value up to some zeromean noise term, and that the noise is independent for each query. The bandit framework goes much beyond this independence assumption by saying that for each query there is a different function (potentially chosen adversarially given all the past choices) and what one cares about is to optimize the average function. In this three hours lecture I will present the first polynomialtime method with poly(dimension) sqrt(#queries)regret for bandit convex optimization. This new algorithm is based on three ideas which will be described in details: (i) kernel methods, (ii) a generalization of Bernoulli convolutions (this a selfsimilar process that has been studied since the 1930's, most notably by Erdos), and (iii) a new annealing schedule for exponential weights (with increasing learning rate).
Joint work with Ronen Eldan and Yin Tat Lee. 
Friday, September 16, 2016: Speaker:
Oded Regev
(NYU).
Topic:
A Reverse Minkowski Theorem.
Informally, Minkowski's first theorem states that lattices that are globally dense (have small determinant) are also locally dense (have lots of points in a small ball around the origin). This fundamental result dates back to 1891 and has a very wide range of applications.
I will present a proof of a reverse form of Minkowski's theorem, conjectured by Daniel Dadush in 2012, showing that locally dense lattice are also globally dense (in the appropriate sense).
The talk will be selfcontained, and I will not assume familiarity with lattices. The first hour will contain a high level description of the problem and the proof, with the rest of the talk going into more detail, as well as describing some of the applications.
Based on joint works with Daniel Dadush and Noah StephensDavidowitz.

Friday, July 29, 2016: Speaker:
Ankit Garg
(MSR).
Topic:
Parallel repetition of 2prover games.
Parallel repetition theorems state that playing multiple copies of a 2prover game in parallel reduces its success probability exponentially fast. These have a rich history and many connections to areas of TCS and mathematics such as PCPs, UGC, foams etc. While very intuitive, these theorems are notoriously hard to prove and many a times, the strongest bounds are not even true. I will review the known parallel repetition theorems (and counterexamples) and then describe a proof of parallel repetition for small value games (that also recovers most of the previously known parallel repetition theorems). The proof is information theoretic and elementary in the sense that it only uses basic facts from information theory and probability. This will be based on joint work with Mark Braverman.
If time permits, I will also review the state of the art in related direct product theorems and xor lemmas in communication complexity and end up with some open questions. 
Friday, July 22, 2016: Speaker:
Nati Linial
(Hebrew U.).
Topic:
HighDimensional Permutations.
This is part of our ongoing effort to develop what we call "highdimensional combinatorics". In this lecture I concentrate on permutations. It's a long presentation and I expect to have time to cover a lot of ground. Let us start with the definition: A permutation can be equated with a permutation matrix, i.e., an nxn array of zeros and ones where every line (row or column) contains a unique 1. The ddimensional counterpart of this concept suggests itself very naturally. For example, it is easily verified that a 2dimensional permutation is synonymous with a Latin square. I intend to cover several aspects of the theory:
1. Enumeration of highdimensional permutations. I will be brief here since I had already spoken about this in previous visits to this area.
2. Discrepancy problems: I will explain this fundamental notion and will present our recent conjecture and some partial results about it that we have. In this context I will mention groundbreaking work of P. Keevash and its relevance to this subject.
3. By the Birkhoff vonNeumann theorem, the convex hull of all permutation matrices is the polytope of bistochastic matrices. The highdimensional picture turns out to be quite different.
4. Erdos and Szekeres have famously proved that every permutation in S_n has a monotone subsequence of length sqrt n and the bound is tight. The analogous statement holds in higher dimensions as well. Also, a deep and fascinating theory deals with the length of long monotone subsequences in random permutations. We have something to say on this in higher dimensions, but the picture is still not clear.
5. How to generate random highdimensional permutations. We have some preliminary results here, too.
If time and energy of the speaker and audience allow it, I may comment as well on the highdimensional counterpart of the StanleyWilf conjecture.
My work on items 1,2,3 is joint with Zur Luria (now postdoc in ETH). On 4,5  with my PhD student Michael Simkin. On StanleyWilf with MSc student Ruth Malka. 
Tuesday, June 28, 2016: Speaker:
Raghu Meka
(UCLA).
Topic:
Approximating CSPs requires subexponential size linear programs.
We show that linear programming relaxations need subexponential size to beat trivial random guessing for approximately satisfying constraint satisfaction problems. In fact, we show that for such problems subexponential size relaxations are as powerful as n^(Omega(1))rounds of SheraliAdams hierarchy. Previously, only superpolynomial (~ n^(Omega(log n)) bounds were known any CSP [CLRS13].
Our bounds are obtained by exploiting and extending the recent progress in communication complexity for "lifting" query lower bounds to communication problems. The core of our results is a new decomposition theorem for "highentropy rectangles" into structurally nice distributions and may be of independent interest in communication complexity.
Joint work with Pravesh Kothari and Prasad Raghavendra. 
Friday, June 3, 2016: Speaker:
Bobby Kleinberg
(MSR & Cornell).
Topic:
Progressionfree sets, the polynomial method, and matrix multiplication.
Letting C_m denote the cyclic group Z/mZ, what is the largest subset of (C_m)^n having no three distinct elements in arithmetic progression? Up until a couple of weeks ago, the best known upper bounds were barely better than the trivial upper bound of m^n, even for the case of m=3, when the question is known as the "cap set" problem. In a breakthrough just this month, Croot, Lev, and Pach found a way to prove a nontrivial upper bound for m=4 using the polynomial method. This was soon extended by Ellenberg and (independently) Gijswijt to derive an upper bound of (0.95*m)^n for all m>2. The proof is simple, beautiful, and unbelievably short.
The CrootLevPach lemma has subsequently found other applications: for example, Blasiak, Church, Cohn, Grochow, and Umans have shown that it refutes many (but not all) of the conjectured approaches for applying the grouptheoretic fast matrix multiplication paradigm of CohnUmans to prove that the exponent of matrix multiplication equals 2. In this talk, I will give some historical background on these problems, present the proofs of the new upper bounds for progressionfree sets, and discuss the TCS applications that are known at present. 
Friday, April 15, 2016: Speaker:
David Zuckerman
(UT Austin).
Topic:
Explicit TwoSource Extractors and Resilient Functions.
We explicitly construct an extractor for two independent sources on n bits, each with minentropy at least log^C n for a large enough constant C. Our extractor outputs one bit and has error n^{Omega(1)}. The best previous extractor, by Bourgain, required each source to have minentropy .499n.
A key ingredient in our construction is an explicit construction of a monotone, almostbalanced boolean function on n bits that is resilient to coalitions of size n^{1delta}, for any delta>0.
In fact, our construction is stronger in that it gives an explicit extractor for a generalization of nonoblivious bitfixing sources on n bits, where some unknown nq bits are chosen almost polylog(n)wise independently, and the remaining q=n^{1delta} bits are chosen by an adversary as an arbitrary function of the nq bits. The best previous construction, by Viola, achieved q=n^{1/2  delta}.
Our explicit twosource extractor directly implies an explicit construction
of a 2^{(log log N)^{O(1)}}Ramsey graph over N vertices, improving bounds
obtained by Barak et al. and matching independent work by Cohen.
This talk will be a more indepth exposition of the results presented in the ToC Colloquium on April 12th. (Attending the colloquium talk is not required.)
Joint work with Eshan Chattopadhyay. 
Friday, April 8, 2016: Speaker:
Ron Rothblum
(MIT).
Topic:
ConstantRound Interactive Proofs for Delegating Computation.
Interactive proofs, introduced by Goldwasser, Micali and Rackoff, have had a dramatic impact on Complexity Theory and Cryptography. In particular, the celebrated IP=PSPACE Theorem [LFKN92,Shamir92] allows an allpowerful but untrusted prover to convince a polynomialtime verifier of the validity of extremely complicated statements (as long as they can be evaluated using polynomial space). The interactive proof system designed for this purpose requires a large number of communication rounds and heavy computation for generating the proof.
We introduce new interactive proofs that are very efficient in the number of rounds and computation time, that are particularly well suited for delegating boundedspace computations (e.g., in the context of cloud computing). Our main result is that for every statement that can be evaluated in polynomial time and boundedpolynomial space there exists an interactive proof that satisfies the following strict efficiency requirements:
(1) the honest prover runs in polynomial time, (2) the verifier is almost linear time (and under some conditions even sub linear), and (3) the interaction consists of only a constant number of communication rounds. Prior to this work, very little was known about the power of efficient, constantround interactive proofs.
Joint work with Omer Reingold and Guy Rothblum 
Friday, April 1, 2016: Speaker:
Sasha Rakhlin
(U. Penn.).
Topic:
A few peculiar connections between deterministic worstcase prediction algorithms and probabilistic inequalities.
In this talk, we will outline several equivalences between deterministic and probabilistic statements. First, we will argue that *existence* of deterministic online algorithms can be certified via certain probabilistic inequalities. Conversely, simple worstcase prediction algorithms yield probabilistic inequalities that are otherwise difficult to prove. As an example, we will see that the number of steps required by any firstorder deterministic optimization method (e.g. mirror descent) can be understood by studying lengths of random walks. The equivalence, however, extends beyond situations where we know how to construct algorithms. Time permitting, we will outline results that point to the equivalence of sample complexity in online regression with worstcase sequences, statistical learning with i.i.d. data, and nonparametric estimation in statistics. The nature of this latter phenomenon is not very clear.

Friday, March 11, 2016: Speaker:
JM Landsberg
(Texas A&M).
Topic:
Geometry and the complexity of matrix multiplication.
In 1969 Strassen discovered the standard algorithm for multiplying nxn matrices, that takes O(n^3) arithmetic operations, is not the best. Since then, after considerable research, it is generally conjectured in the computer science community that as n goes to infinity, it becomes nearly as easy to multiply matrices as it is to add them (i.e., that O(n^{2+epsilon}) arithmetic operations is possible). I will survey the contributions of geometry to this problem, namely lower complexity bounds obtained by finding polynomials that distinguish between the set S of bilinear maps that can be computed with a small number of arithmetic operations from the bilinear map corresponding to matrix multiplication. The polynomials are found by exploiting the symmetries of the set S and using general results from algebraic geometry. I will also discuss very recent work indicating that geometry may also have a role to play in determining upper bounds and finding algorithms.

Friday, March 4, 2016: Speaker:
Bobby Kleinberg
(MSR & Cornell).
Topic:
InferenceBased Privacy Guarantees for Differentially Private Mechanisms, or The Physics of Differential Privacy.
How much information can an attacker learn about an individual by observing the outcome of a computation? If the computation is differentially private, the answer is: "not much more than if the individual's data had been excluded from the input to the computation." A stronger notion of privacy, originally propounded by Dalenius in the 1970's, requires instead that the attacker should not learn much more about an individual than if the outcome of the computation had never been revealed. Simple examples, as well as a general impossibility theorem of Dwork and Naor, preclude the possibility of supplying this stronger "inferencebased" type of guarantee against attackers with arbitrary side information, assuming the computation is at least minimally useful.
In this talk we revisit the notion of inferencebased privacy (henceforth, IBP) and ask: under what limitations on the adversary's side information can we deduce an IBP guarantee from a differential one? We model the adversary's side information as a prior distribution over datasets (or, more generally, a set of possible priors) and we prove two main results. The first result pertains to distributions that satisfy a natural positiveaffiliation condition, and gives an upper bound on the IBP parameter for any differentially private mechanism. This upper bound is matched by a simple mechanism that adds Laplace noise to the sum of the data. The second result pertains to distributions that have weak correlations, defined in terms of a suitable "influence matrix". The result provides an upper bound for IBP in terms of the differential privacy parameter and the spectral norm of this matrix.
Statistical mechanics constitutes a leitmotif that runs through the talk, influencing both the proofs and the interpretation of the results. In brief, application of a differentially private mechanism to correlated data is analogous to application of an external field to a spin system. Techniques developed by physicists for bounding the change in magnetization due to an external field provide a blueprint for proving our main results. Physical phenomena such as phase transitions have analogues in the setting of the privacy questions we address. The talk will be selfcontained and, in particular, will not assume any prior knowledge of physics.
This is joint work with Arpita Ghosh. 
Friday, February 26, 2016: Speaker:
Julia Chuzhoy
(TTI Chicago).
Topic:
Excluded Grid Theorem: Improved and Simplified.
One of the key results in Robertson and Seymour's seminal work on graph minors is the Excluded Grid Theorem. The theorem states that there is a function f, such that for every positive integer g, every graph whose treewidth is at least f(g) contains the (gxg)grid as a minor. This theorem has found many applications in graph theory and algorithms. An important open question is establishing tight bounds on f(g) for which the theorem holds. Robertson and Seymour showed that f(g)geq Omega(g^2 log g), and this remains the best current lower bound on f(g). Until recently, the best upper bound was superexponential in g. In this talk, we will give an overview of a recent sequence of results, that has lead to the best current upper bound of f(g)=O(g^{19}polylog(g)). The main emphasis will be on presenting a simple proof of the theorem that achieves a bound that is polynomial in g. We will also outline some techniques that allow us to achieve the best current bounds.

Friday, February 12, 2016: Speaker:
Avi Wigderson
(IAS).
Topic:
The singularity of symbolic matrices.
The main object of study of this talk are matrices whose entries are linear forms in a set of formal variables (over some field). The main problem is determining if a given such matrix is invertible or singular (over the appropriate field of rational functions).
As it happens, this problem has a dual life; when the underlying variables commute, and when they do not. Most of the talk will be devoted to explaining the many origins, motivations and interrelations of these two problems, in computational complexity, noncommutative algebra, (commutative) invariant theory, quantum information theory, optimization, approximating the permanent and more. I will provide an introduction to the relevant theory in each of these areas which will assume no prior knowledge.
I will then describe a recent joint work with Garg, Gurvits and Olivera, giving a deterministic polynomial time algorithm for the noncommutative version (over the rationals), which uses many of the connections above. Strangely, while the problem is completely algebraic, the algorithm is analytic!
This algorithm actually computes the noncommutative rank of the symbolic matrix, which turns out to give a factor2 approximation to the commutative rank. This creates a different natural challenge for deterministic polynomial identity testing (PIT)  obtain a better approximation ratio for the rank. 
Friday, February 5, 2016: Speaker:
Nikhil Srivastava
(Berkeley).
Topic:
Ramanujan Graphs from Finite Free Convolutions.
We show that a random dregular bipartite graph is an optimal (i.e., Ramanujan) expander with nonzero probability. Notably, we use tools inspired by asymptotic (i.e., large n limit) random matrix theory to prove statements about finite dimensional matrices. The mediating role is be played by the expected characteristic polynomials of the random matrices in question, exploiting in particular their realrootedness, interlacing, and invariance properties. Our analysis of the roots of these polynomials is based on finite analogues of tools from Free Probability Theory.
Joint work with Adam Marcus and Daniel Spielman. 
Wednesday, February 3, 2016: Speaker:
László Babai
(U. Chicago).
Topic:
Graph Isomorphism in Quasipolynomial Time.
The algorithm indicated in the title builds on Luks's classical framework and introduces new group theoretic and combinatorial tools.
The first part of the talk will give an overview of the algorithm, discuss the core group theoretic routine ("Local Certificates") in detail, and sketch the aggregation of the local certificates.
After a break, we shall discuss the combinatorial partitioning techniques ("Design Lemma" and "SplitorJohnson") required for the grouptheoretic recurrence.
Elements of undergraduatelevel group theory such as facility with the concepts involved in the JordanHolder Theorem and basic concepts of permutation groups (such as orbits) will be assumed.
Helpful reading:
Eugene M. Luks:
Isomorphism of graphs of bounded valence can be tested in polynomial time. JCSS 25(1) (1982) 4265.
Please note that the preceding day, Tuesday, February 2, 4:055:00pm, I will present an outline of the algorithm at the MIT Theory Colloquium. Attendance of that talk is not a prerequisite for this seminar, but it may be helpful.

Friday, December 18, 2015: Speaker:
Srikanth Srinivasan
(IIT Bombay).
Topic:
Noncommutative arithmetic circuits.
I will talk about some results on computing multivariate polynomials in the *noncommutative* setting. There are many points of contrast from the commutative setting: specifically, we have exponential lower bounds for computing formulas (Nisan 1991), efficient deterministic Polynomial Identity Testing algorithms for formulas (Raz, Shpilka 2005), and reductions from the permanent *to* the determinant (Arvind, S. 2010). I will discuss some of these results.

Friday, December 11, 2015: Speaker:
Noga Alon
(Tel Aviv & IAS).
Topic:
Nonconstructive combinatorics.
I will describe several old and new applications of topological and algebraic methods in the derivation of combinatorial results. In all of them the proofs provide no efficient solutions for the corresponding algorithmic problems. Finding such solutions is an intriguing challenge.

Friday, December 4, 2015: Speaker:
Gillat Kol
(IAS).
Topic:
Interactive Compression.
In profoundly influential works, Shannon and Huffman showed that if Alice wants to send a message X to Bob, it's sufficient for her to send roughly H(X) bits, where H denotes Shannon's entropy function. This means that every message can be compressed to its information content. Can one prove similar results in the interactive setting, where Alice and Bob engage in an interactive communication protocol? The standard way to formalize this question is by asking whether the communication complexity of a communication task is always (roughly) bounded by its (internal or external) information complexity.
In this talk, we will survey known compression protocols and see that for some interesting special cases, interactive compression is possible (e.g. [Kol15]). We then sketch the proof of [GKR15], showing that interactive compression is impossible in general, by presenting a communication task whose communication complexity is exponentially large in its (external) information cost. 
Friday, November 13, 2015: Speaker:
Philippe Rigollet
(MIT).
Topic:
Statistical and computational tradeoffs in highdimensional learning.
Statistical learning developments are driven by two forces: extracting the most information about the data on the one hand and designing fast algorithms on the other. In some cases, these forces push in opposite directions calling for a certain tradeoff between statistical accuracy and computational efficiency. While this tradeoff can be exhibited for specific families of methods, it natural to ask whether it is inherent to the problem itself in the sense that no method can exhibit simultaneously good sample and computational complexity.
We will study this gap primarily in the context of sparse Principal Component Analysis (PCA) and conclude by describing a problem where a similar gap may exist. The following points will be covered:
1) Optimal sample complexity of sparse PCA: standard information theoretic tools employed in high dimensional learning will be reviewed.
2) Computationally efficient methods: SDP, MDP and the diagonal method.
3) Planted clique and computational lower bounds: average case reduction.
4) Average case complexity, open problems and random ideas.

Friday, November 6, 2015: Speaker:
Zeev Dvir
(Princeton).
Topic:
A wonderful theorem of Barthe and its applications in discrete geometry and TCS.
In his work on the BrescampLeib inequality, Barthe proved a remarkably simple and useful theorem on finite sets of points in R^d. Roughly speaking, as long as the points are not `too linearly dependent' (say, no large subset belongs to a lowdimensional space) then there exists a change of basis that puts them in radial (or projective) isotropic position: meaning that their normalized unit vectors (after the basis change) are in isotropic position. A special case of this result (for points in general position) was used by Forster to prove strong communication complexity lower bounds by bounding the signrank of an explicit matrix. More recently, the full result was used to derive new bounds for 3query locally correctable codes over the reals (D. Saraf and Wigderson) and high dimensional generalizations of the SylvesterGallai theorem for arrangements of subspaces (D. and Hu). In this talk I will describe the theorem and try to explain how it is used in each of these results. I will also show the proof of the theorem, which uses ideas from convex optimization.

Friday, October 30, 2015: Speaker:
Rüdiger Urbanke
(EPFL).
Topic:
EXIT Functions and the Area Theorem in Coding Theory.
An EXIT function measures the "response" of a code to a particular channel. The area theorem refers to the fact that if you integrate this function over a family of channels (from perfect channel to useless channel) then this integral is equal to the rate of the code, irrespective of what code you pick. Therefore, no code can universally be ``better'' than any other code (of the same rate). Two codes of the same rate simply will have different characteristics.
EXIT functions started out as convenient engineering tool to approximate the behaviour of iterative coding systems. But they are much more powerful.
I will discuss two applications. The first concerns spatiallycoupled codes. These are codes that use the ideas of nucleation and crystallisation, familiar from heat packs, hail, and supercooled water, in order to build lowcomplexity universal capacityachieving codes.
[In case you are curious, see this for a super cool supercooled water experiment.]
The second one concerns the questions whether classical families of codes, such as ReedMuller codes and extended BCH codes are capacityachieving.

Friday, October 23, 2015: Speaker:
Cris Moore
(Santa Fe Institute).
Topic:
Statistical inference, statistical physics, and the community detection problem.
There is a deep analogy between statistical inference — where we try to fit a model to data, or (even better) understand the posterior distribution of models given the data — and statistical physics, where we define a probability distribution in terms of some energy function. Many concepts like energy landscapes, partition functions, free energy, the cavity method, and phase transitions can be usefully carried over from physics to machine learning and computer science. At the very least, these techniques are a source of conjectures that have stimulated new work in machine learning, computer science, and probability theory; at their best, they offer strong intuitions about the structure of the problem and its possible solutions.
One recent success of this flow of ideas is the discovery of a sharp phase transition in community detection in sparse graphs. This appeared in Decelle et al. in the physics literature, and was then made rigorous in beautiful work by Mossel, Neeman, and Sly, and Massoulie. Analogous transitions exist in many other inference problems, where our ability to find patterns in data jumps suddenly as a function of how noisy or how sparse they are.
I will discuss why and how the detectability transition occurs, review what is now known rigorously, and present a number of open questions that cry out for proofs. Perhaps more importantly, I will take full advantage of the excellent format of this seminar to dwell on the analogy between physics and inference, and discuss how a thermodynamic point of view helps us find structure in data, test whether this structure is statistically significant, and choose between competing models.
This is based on joint work with many people, including Aurelien Decelle, Florent Krzakala, Lenka Zdeborova, and Pan Zhang. 
Friday, October 9, 2015: Speaker:
Ryan O'Donnell
(CMU).
Topic:
Lots of fun things ... (And Quantum
PCA).
Uncompressed Title: Lots of fun things, like longest increasing subsequences in random strings, property testing of probability distributions, the RobinsonSchenstedKnuth algorithm, some interesting symmetric polynomials, and basic representation theory of S_n. (And quantum PCA.)
Suppose we want to learn something about an unknown probability distribution p = (p_1, p_2, ..., p_d) on [d] = {1, 2, ..., d}. Assume we get to see n independent draws from p, forming a string w in [d]^n. If we want to learn p to accuracy eps, having n ~ d/eps^2 is necessary and sufficient. (This is one of the most ancient results in statistics: the algorithm is to just output the 'empirical histogram' of w.) If we only want to distinguish whether p is the uniform distribution or epsfar from it, then having n ~ sqrt(d)/eps^2 is necessary and sufficient. (This was only determined in 2008!)
We investigate the problem of learning an unknown ddimensional *quantum state*. This is a strict generalization in which, roughly speaking, p is a probability distribution over d unknown orthonormal *vectors*. By virtue of some basic results in representation theory (which we will explain), a large portion of the job reduces to the following: Suppose we want to learn about an unknown probability distribution p, but instead of getting to see the random string w we only get to see the data (LIS_1(w), LIS_2(w), ..., LIS_d(w)), where LIS_k(w) denotes the total length of the longest k disjoint increasing (=nondecreasing) subsequences in w. Now how large does n need to be to understand various properties of p? (For example, is it true that p_max ~ E[LIS_1(w)]?)
This is joint work with John Wright of Carnegie Mellon. A onehour overview of this material will be presented at the Charles River Lectures on Probability on Oct. 2 but attending that event is not a prerequisite for this talk.
Previous Version: MSR/MIT Reading Group .