Harvard/MIT/MSR Theory Reading Group (v2.0)
About: This is a revival of the MSR/MIT Theory Reading Group.
Goal: The goal of this reading group is to learn in significant depth about (new/exciting) developments in theoretical computer science and related areas.
Formula: Each meeting consists of a threehourlong whiteboard talk given by the speaker. Talks tend to be informal in nature and quite interactive. In particular, questions from the audience are strongly encouraged. There is a break in the middle – it is not impolite to leave during this break, or indeed any time earlier or later. Ah...and we have pizza!
Time and place: The meetings will alternate between Harvard and MIT during terms, and will be held at MSR in the summer. Unless otherwise noted, the meetings will be held
on
Fridays 1:30 pm–4:30 pm (with pizza at 1:15).
Organizers:
Boaz Barak,
Bobby Kleinberg,
Aleksander Mądry,
Ankur Moitra,
Madhu Sudan.
Mailing list: There is a mailing list where the announcements of upcoming talks, as well as, other information related to the reading group appear. To subscribe to this list, visit this
page. Please note that the mailing list is intended only for local participants. So please subscribe using an email that lets us know you are local, or send email to madhu letting him know who, and where, you are.
Upcoming Talks:

Friday, December 4, 2015 at MIT, room TBD: Speaker:
Gillat Kol
(IAS)
Topic:
Interactive Compression.
In profoundly influential works, Shannon and Huffman showed that if Alice wants to send a message X to Bob, it's sufficient for her to send roughly H(X) bits, where H denotes Shannon's entropy function. This means that every message can be compressed to its information content. Can one prove similar results in the interactive setting, where Alice and Bob engage in an interactive communication protocol? The standard way to formalize this question is by asking whether the communication complexity of a communication task is always (roughly) bounded by its (internal or external) information complexity.
In this talk, we will survey known compression protocols and see that for some interesting special cases, interactive compression is possible (e.g. [Kol15]). We then sketch the proof of [GKR15], showing that interactive compression is impossible in general, by presenting a communication task whose communication complexity is exponentially large in its (external) information cost. 
Friday, December 11, 2015 at Harvard, MD G115: Speaker:
Noga Alon
(Tel Aviv & IAS)
Topic:
Nonconstructive Combinatorics.
TBD

Friday, December 18, 2015 at Harvard, room TBD: Speaker:
Srikanth Srinivasan
(IIT Bombay)
Topic:
TBD.
TBD

Wednesday, February 3, 2016 at Harvard, room TBD from 4:307:30pm (Note the unusual date and time!): Speaker:
László Babai
(U. Chicago)
Topic:
TBD .
TBD

Friday, February 5, 2016 at MIT, room TBD: Speaker:
Nikhil Srivastava
(Berkeley)
Topic:
TBD.
TBD

Friday, February 12, 2016 at Harvard, room TBD: Speaker:
Avi Wigderson
(IAS)
Topic:
TBD.
TBD

Friday, February 26, 2016 at Harvard, room TBD: Speaker:
Julia Chuzhoy
(TTI Chicago)
Topic:
TBD.
TBD

Friday, March 4, 2016 at MIT, room TBD: Speaker:
Bobby Kleinberg
(MSR & Cornell)
Topic:
InferenceBased Privacy Guarantees for Differentially Private Mechanisms, or The Physics of Differential Privacy.
How much information can an attacker learn about an individual by observing the outcome of a computation? If the computation is differentially private, the answer is: "not much more than if the individual's data had been excluded from the input to the computation." A stronger notion of privacy, originally propounded by Dalenius in the 1970's, requires instead that the attacker should not learn much more about an individual than if the outcome of the computation had never been revealed. Simple examples, as well as a general impossibility theorem of Dwork and Naor, preclude the possibility of supplying this stronger "inferencebased" type of guarantee against attackers with arbitrary side information, assuming the computation is at least minimally useful.
In this talk we revisit the notion of inferencebased privacy (henceforth, IBP) and ask: under what limitations on the adversary's side information can we deduce an IBP guarantee from a differential one? We model the adversary's side information as a prior distribution over datasets (or, more generally, a set of possible priors) and we prove two main results. The first result pertains to distributions that satisfy a natural positiveaffiliation condition, and gives an upper bound on the IBP parameter for any differentially private mechanism. This upper bound is matched by a simple mechanism that adds Laplace noise to the sum of the data. The second result pertains to distributions that have weak correlations, defined in terms of a suitable "influence matrix". The result provides an upper bound for IBP in terms of the differential privacy parameter and the spectral norm of this matrix.
Statistical mechanics constitutes a leitmotif that runs through the talk, influencing both the proofs and the interpretation of the results. In brief, application of a differentially private mechanism to correlated data is analogous to application of an external field to a spin system. Techniques developed by physicists for bounding the change in magnetization due to an external field provide a blueprint for proving our main results. Physical phenomena such as phase transitions have analogues in the setting of the privacy questions we address. The talk will be selfcontained and, in particular, will not assume any prior knowledge of physics.
This is joint work with Arpita Ghosh. 
Friday, March 11, 2016 at Harvard, room TBD: Speaker:
JM Landsberg
(Texas A&M)
Topic:
TBD.
TBD
Past Talks:

Friday, November 13, 2015: Speaker:
Philippe Rigollet
(MIT).
Topic:
Statistical and computational tradeoffs in highdimensional learning.
Statistical learning developments are driven by two forces: extracting the most information about the data on the one hand and designing fast algorithms on the other. In some cases, these forces push in opposite directions calling for a certain tradeoff between statistical accuracy and computational efficiency. While this tradeoff can be exhibited for specific families of methods, it natural to ask whether it is inherent to the problem itself in the sense that no method can exhibit simultaneously good sample and computational complexity.
We will study this gap primarily in the context of sparse Principal Component Analysis (PCA) and conclude by describing a problem where a similar gap may exist. The following points will be covered:
1) Optimal sample complexity of sparse PCA: standard information theoretic tools employed in high dimensional learning will be reviewed.
2) Computationally efficient methods: SDP, MDP and the diagonal method.
3) Planted clique and computational lower bounds: average case reduction.
4) Average case complexity, open problems and random ideas.

Friday, November 6, 2015: Speaker:
Zeev Dvir
(Princeton).
Topic:
A wonderful theorem of Barthe and its applications in discrete geometry and TCS.
In his work on the BrescampLeib inequality, Barthe proved a remarkably simple and useful theorem on finite sets of points in R^d. Roughly speaking, as long as the points are not `too linearly dependent' (say, no large subset belongs to a lowdimensional space) then there exists a change of basis that puts them in radial (or projective) isotropic position: meaning that their normalized unit vectors (after the basis change) are in isotropic position. A special case of this result (for points in general position) was used by Forster to prove strong communication complexity lower bounds by bounding the signrank of an explicit matrix. More recently, the full result was used to derive new bounds for 3query locally correctable codes over the reals (D. Saraf and Wigderson) and high dimensional generalizations of the SylvesterGallai theorem for arrangements of subspaces (D. and Hu). In this talk I will describe the theorem and try to explain how it is used in each of these results. I will also show the proof of the theorem, which uses ideas from convex optimization.

Friday, October 30, 2015: Speaker:
Rüdiger Urbanke
(EPFL).
Topic:
EXIT Functions and the Area Theorem in Coding Theory.
An EXIT function measures the "response" of a code to a particular channel. The area theorem refers to the fact that if you integrate this function over a family of channels (from perfect channel to useless channel) then this integral is equal to the rate of the code, irrespective of what code you pick. Therefore, no code can universally be ``better'' than any other code (of the same rate). Two codes of the same rate simply will have different characteristics.
EXIT functions started out as convenient engineering tool to approximate the behaviour of iterative coding systems. But they are much more powerful.
I will discuss two applications. The first concerns spatiallycoupled codes. These are codes that use the ideas of nucleation and crystallisation, familiar from heat packs, hail, and supercooled water, in order to build lowcomplexity universal capacityachieving codes.
[In case you are curious, see this for a super cool supercooled water experiment.]
The second one concerns the questions whether classical families of codes, such as ReedMuller codes and extended BCH codes are capacityachieving.

Friday, October 23, 2015: Speaker:
Cris Moore
(Santa Fe Institute).
Topic:
Statistical inference, statistical physics, and the community detection problem.
There is a deep analogy between statistical inference — where we try to fit a model to data, or (even better) understand the posterior distribution of models given the data — and statistical physics, where we define a probability distribution in terms of some energy function. Many concepts like energy landscapes, partition functions, free energy, the cavity method, and phase transitions can be usefully carried over from physics to machine learning and computer science. At the very least, these techniques are a source of conjectures that have stimulated new work in machine learning, computer science, and probability theory; at their best, they offer strong intuitions about the structure of the problem and its possible solutions.
One recent success of this flow of ideas is the discovery of a sharp phase transition in community detection in sparse graphs. This appeared in Decelle et al. in the physics literature, and was then made rigorous in beautiful work by Mossel, Neeman, and Sly, and Massoulie. Analogous transitions exist in many other inference problems, where our ability to find patterns in data jumps suddenly as a function of how noisy or how sparse they are.
I will discuss why and how the detectability transition occurs, review what is now known rigorously, and present a number of open questions that cry out for proofs. Perhaps more importantly, I will take full advantage of the excellent format of this seminar to dwell on the analogy between physics and inference, and discuss how a thermodynamic point of view helps us find structure in data, test whether this structure is statistically significant, and choose between competing models.
This is based on joint work with many people, including Aurelien Decelle, Florent Krzakala, Lenka Zdeborova, and Pan Zhang. 
Friday, October 9, 2015: Speaker:
Ryan O'Donnell
(CMU).
Topic:
Lots of fun things ... (And Quantum
PCA).
Uncompressed Title: Lots of fun things, like longest increasing subsequences in random strings, property testing of probability distributions, the RobinsonSchenstedKnuth algorithm, some interesting symmetric polynomials, and basic representation theory of S_n. (And quantum PCA.)
Suppose we want to learn something about an unknown probability distribution p = (p_1, p_2, ..., p_d) on [d] = {1, 2, ..., d}. Assume we get to see n independent draws from p, forming a string w in [d]^n. If we want to learn p to accuracy eps, having n ~ d/eps^2 is necessary and sufficient. (This is one of the most ancient results in statistics: the algorithm is to just output the 'empirical histogram' of w.) If we only want to distinguish whether p is the uniform distribution or epsfar from it, then having n ~ sqrt(d)/eps^2 is necessary and sufficient. (This was only determined in 2008!)
We investigate the problem of learning an unknown ddimensional *quantum state*. This is a strict generalization in which, roughly speaking, p is a probability distribution over d unknown orthonormal *vectors*. By virtue of some basic results in representation theory (which we will explain), a large portion of the job reduces to the following: Suppose we want to learn about an unknown probability distribution p, but instead of getting to see the random string w we only get to see the data (LIS_1(w), LIS_2(w), ..., LIS_d(w)), where LIS_k(w) denotes the total length of the longest k disjoint increasing (=nondecreasing) subsequences in w. Now how large does n need to be to understand various properties of p? (For example, is it true that p_max ~ E[LIS_1(w)]?)
This is joint work with John Wright of Carnegie Mellon. A onehour overview of this material will be presented at the Charles River Lectures on Probability on Oct. 2 but attending that event is not a prerequisite for this talk.
Previous Version: MSR/MIT Reading Group .
Website designed by Aleksander Mądry, and currently maintained by Madhu Sudan.