18.408: Algorithmic Aspects of Machine Learning
Fall 2017
Modern machine learning systems are often built on top of algorithms that do not have provable guarantees, and it is the subject of debate when and why they work. In this class, we will focus on designing algorithms whose performance we can rigorously analyze for fundamental machine learning problems. We will cover topics such as: nonnegative matrix factorization, tensor decomposition, sparse coding, learning mixture models, matrix completion and inference in graphical models. Almost all of these problems are computationally hard in the worst-case and so developing an algorithmic theory is about (1) choosing the right models in which to study these problems and (2) developing the appropriate mathematical tools (often from probability, geometry or algebra) in order to rigorously analyze existing heuristics, or to design fundamentally new algorithms.
Announcement: Course evaluations are
here, please take a few minutes to fill it out before it closes on Monday, December 18th at 9am!
Announcement: The final project description is posted
here and due on December 13th, by email
Course Information
- Instructor: Ankur Moitra
- Lectures: Monday and Wednesday 1:00-2:30, room change (!) 4-237
- Teaching Assistant: Alex Wein
- Prerequisites: A course in algorithms (6.046/18.410 or equivalent) and probability (6.041/18.440 or equivalent). You will need a strong background in algorithms, probability and linear algebra.
- Textbook: We will use this monograph. Lecture notes and/or presentations covering new topics will be provided.
- Office Hours: Monday and Wednesday 2:30-3:30 (meet after class)
- Assessment: Students will be expected to solve a handful of problem sets, and complete a research-oriented final project. This could be either a survey, or original research; students will be encouraged to find connections between the course material and their own research interests.
Problem Sets
Additional Notes
- Alex Wein Guest Lecture: Message Passing and State Evolution [pdf]
- Lecture 21: Matrix Completion and Rademacher Complexity [pdf]
- Lecture 22: More Matrix Completion [pdf]
- Lecture 23: Nonconvex Optimization and the Strict Saddle Property [pdf]
- Lecture 24: No Spurious Local Minima [pdf]
Course Outline
Here is a tentative outline for the course:
Nonnegative Matrix Factorization [slides]
- Qualitative Comparisons to SVD
- New Algorithms via Separability
- Applications to Topic Models
Discussion: When does well-posedness lead to better algorithms?
Tensor Decompositions and Applications [slides]
- Tensor Rank, Border Rank and the Rotation Problem
- Jennrich's Algorithm and the Generalized Eigenvalue Problem
- Learning HMMs
- Mixed Membership Models and Community Detection
Discussion: When do algorithms rely (too much) on a distributional model?
Sparse Recovery and Sparse Coding
- Incoherence and Uncertainty Principles
- Orthogonal Matching Pursuit
- Compressed Sensing and RIP
- Alternating Minimization via Approximate Gradient Descent [slides]
Discussion: When does belief propagation (provably) work?
Graphical Models, Robustness and Nonconvexity
- Learning Graphical Models
- Agnostically Learning a Gaussian
- The Landscape of Nonconvex Optimization
G. Bresler. Efficiently Learning Ising Models on Arbitrary Graphs, STOC 2015.
A. Klivans and R. Meka. Learning Graphical Models Using Multiplicative Weights, FOCS 2017.
I. Diakonikolas, G. Kamath, D. Kane, J. Li, A. Moitra and A. Stewart. Robust Estimators in High-Dimensions with the Computational Intractability, FOCS 2016.
K. Lai, A. Rao and S. Vempala. Agnostic Estimation of Mean and Covariance, FOCS 2016.
M. Charikar, J. Steinhardt and G. Valiant. Learning from Untrusted Data, STOC 2017.
R. Ge, C. Jin and Y. Zheng. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis, ICML 2017.
Discussion: Do we have enough average-case assumptions?
Learning Mixture Models
- Expectation Maximization
- Clustering in High-Dimensions
- Method of Moments and Systems of Polynomial Equations [slides]
Discussion: Is nature an adversary? And if not, how can we model and exploit that?