18.S996: Algorithmic Aspects of Machine Learning

Fall 2013

Modern machine learning systems are often built on top of algorithms that do not have provable guarantees, and it is the subject of debate when and why they work. In this class, we will focus on designing algorithms whose performance we can rigorously analyze for fundamental machine learning problems. We will cover topics such as: spectral clustering, learning mixture models, matrix completion, tensor decomposition, nonnegative matrix factorization, sparse dictionary learning, inference in graphical models and deep learning. Almost all of these problems are computationally hard in the worst-case and so developing an algorithmic theory is about (1) choosing the right models in which to study these problems and (2) developing the appropriate mathematical tools (often from probability, geometry or algebra) in order to analyze the performance of an algorithm.

Announcement 1: Info about Final Projects is available here. Please send me your final report by December 14th. Here is a sample final project by Yufei Zhao

Announcement 2: Here are the lecture notes for the course!

Course Information

Instructor: Ankur Moitra
Office Hours: send me an email.

Lecture: Mondays and Wednesdays 9:30-11:00, E17-122

Class room update: We will have to stay in the current room, despite my best efforts. Unfortunately there are no big classrooms in the new math building, and the old math building is being renovated.

Lecture Notes: Lecture notes and/or presentations will be provided.

Prerequisites: An advanced course in algorithms (6.854/18.415 or equivalent) and probability (6.041/18.440 or equivalent)
Assessment: Students will be expected to scribe one or two lectures, solve one or two problem sets, and complete a research-oriented final project. This could be either a survey, or original research; students will be encouraged to find connections between the course material and their own research interests.

Course Outline

Here is a tentative outline for the course:

Nonnegative Matrix Factorization
- Qualitative Comparisons to SVD
- New Algorithms from Algebra
- New Algorithms from Geometry
- Applications to Topic Models
Tensor Decompositions
- Basic Definitions and Uniqueness
- Perturbation Bounds for Eigendecompositions
- Phylogenetic Reconstruction/HMMs
- Topic Models and Community Discovery
- Independent Component Analysis
Sparse Representations
- Uncertainty Principles and Uniqueness
- Pursuit Algorithms and their Guarantees
- Compressed Sensing I: Prony's Method
- Compressed Sensing I: Stable Recovery
- Dictionary Learning I: Full-Rank
- Dictionary Learning II: Incoherent and Overcomplete
Mixture Models and Clustering
- Gaussian Mixture Models I: Well-Separated
- Gaussian Mixture Models II: Beyond Clustering
- Gaussian Mixture Models III: Tools from Algebraic Geometry
Matrix Completion
- Exact Recovery I: Nuclear Norm
- Exact Recovery II: Quantum Golfing
- Alternating Minimization
- Sparse PCA
Miscellaneous
- Open Questions about Graphical Models and Deep Learning

References

A. Anandkumar, D. Foster, D. Hsu, S. Kakade, Y. Liu A Spectral Algorithm for Latent Dirichlet Allocation
S. Arora, R. Ge, R. Kannan, A. Moitra Computing a Nonnegative Matrix Factorization -- Provably
S. Arora, R. Ge, A. Moitra Learning Topic Models -- Going Beyond SVD
S. Arora, R. Ge, A. Moitra New Algorithms for Learning Incoherent and Overcomplete Dictionaries
M. Belkin, K. Sinha Polynomial Learning of Distribution Families
A. Bhaskara, M. Charikar, A. Vijayaraghavan Uniqueness of Tensor Decompositions with Applications to Polynomial Identifiability
E. Candes, X. Li, Y. Ma, J. Wright Robust Principal Component Analysis?
E. Candes, B. Recht Exact Matrix Completion via Convex Optimization
V. Chandrasekaran, P. Parrilo, A. Willsky Latent Variable Graphical Model Selection via Convex Optimization
V. Chandrasekaran, S. Sanghavi, P. Parrilo, A. Willsky Rank-Sparsity Incoherence for Matrix Decomposition
U. Feige, J. Kilian Heuristics for Semirandom Graph Problems
A. Frieze, M. Jerrum, R. Kannan Learning Linear Transformations
D. Hsu, S. Kakade Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions
P. Jain, P. Netrapalli, S. Sanghavi Low-rank Matrix Completion using Alternating Minimization
A. Kalai, A. Moitra, G. Valiant Efficiently Learning Mixtures of Gaussians
A. Kumar, R. Kannan Clustering with Spectral Norm and the k-Means Algorithm
F. McSherry Spectral Partitioning of Random Graphs
A. Moitra, G. Valiant Settling the Polynomial Learnability of Mixtures of Gaussians
A. Moitra, M. Saks A Polynomial Time Algorithm for Lossy Population Recovery
E. Mossel, S. Roch Learning Nonsingular Phylogenies and Hidden Markov Models
R. Ostrovsky, Y. Rabani, L. Schulman, C. Swamy The Effectiveness of Lloyd-Type Methods for the k-Means Problem
B. Recht A Simpler Approach to Matrix Completion
D. Spielman, H. Wang, J. Wright Exact Recovery of Sparsely-Used Dictionaries
A. Wigderson, A. Yehudayoff Population Recovery and Partial Identification