18.408: Algorithmic Aspects of Machine Learning
Spring 2023
Modern machine learning systems are often built on top of algorithms that do not have provable guarantees, and it is the subject of debate when and why they work. In this class, we will focus on designing algorithms whose performance we can rigorously analyze for fundamental machine learning problems. We will cover topics such as: nonnegative matrix factorization, tensor decomposition, learning mixture models, graphical models, community detection, matrix completion and robust statistics. Almost all of these problems are computationally hard in the worst-case and so developing an algorithmic theory is about (1) choosing the right models in which to study these problems and (2) developing the appropriate mathematical tools (often from probability, geometry or algebra) in order to rigorously analyze existing heuristics, or to design fundamentally new algorithms.
Announcement: Project guidelines are posted
here and due May 16th, after class.
Announcement: PSET 2 is posted
here and due April 6th, after class.
Announcement: PSET 1 is posted
here and due March 7th, after class.
Course Information
- Instructor: Ankur Moitra
- Lectures: Tuesdays and Thursdays 11:00-12:30, 2-190
- Prerequisites: A course in algorithms (6.046/18.410 or equivalent), probability (6.041/18.440 or equivalent) and linear algebra (18.06/18.C06 or equivalent). You will need a strong background in algorithms, probability and linear algebra.
- Textbook: We will use this monograph. Lecture notes and/or presentations covering new topics will be provided.
- Office Hours: Wednesdays 4-5 in 2-472
- Assessment: Students will be expected to solve a handful of problem sets, and complete a research-oriented final project. This could be either a survey, or original research; students will be encouraged to find connections between the course material and their own research interests.
Instructor Notes
- Latent Semantic Indexing and Nonnegative Matrix Factorization [pdf]
- Intermediate Polytope Problem and the Anchorwords Algorithm [pdf]
- Applications to Topic Models [pdf]
- Interlude: Clustering and Approximation Stability [pdf]
- Tensors: The Rotation Problem and Computational Hardness [pdf]
- Tensors II: Jennrich's Algorithm [pdf]
- Tensors III: Applications to Phylogenetic Reconstruction, HMMs, GMMs, Community Detection [pdf]
- Tensors IV: Perturbation Bounds [pdf]
- Interlude: ICA and Cumulants [pdf]
- GMM: Clustering and some High-Dimensional Geometry [pdf]
- GMM II: Reconstruction from Projections [pdf]
- GMM III: Six Moments Suffice, and Hilbert's Basis Theorem [pdf]
- SBM: Spectral Algorithms and Random Matrix Theory [pdf]
- SBM II: Semi-Random Models [pdf]
- SBM III: Partial Recovery, Sharp Thresholds and Non-backtracking Walks [pdf]
- Planted Clique and Computational vs. Statistical Tradeoffs [pdf]
- The Low Degree Method and Friends (SOS, OGP, SQL) [pdf]
- Graphical Models: Phase Transitions and Hardness [pdf]
- Graphical Models II: Learning via Logistic Regression [pdf]
- Superresolution and Extremal Functions [pdf]
- Learning Linear Dynamical Systems [pdf]
- Orbit Recovery and Invariant Theory [pdf]
Course Outline
Here is a tentative outline for the course:
Nonnegative Matrix Factorization [slides]
- The SVD and Latent Semantic Indexing
- Probabilistic Interpretations and Anchorwords
- Applications to Topic Models
Discussion: When does well-posedness lead to better algorithms?
Tensor Decompositions and their Applications [slides]
- Tensor Rank, Border Rank and the Rotation Problem
- Jennrich's Algorithm and the Generalized Eigenvalue Problem
- Learning HMMs
- Mixed Membership Models and Community Detection
Learning Mixture Models
- Expectation Maximization
- Clustering in High-Dimensions
- Method of Moments and Systems of Polynomial Equations [slides]
Discussion: Is nature an adversary? And if not, how can we model and exploit that?
The Stochastic Block Model (NEW)
- Spectral Partitioning
- Semirandom Models
- The Kesten-Stigum Bound and Nonbacktracking Walks
Discussion: When are there computational vs. statistical tradeoffs?
Graphical Models and Sparse Regression (NEW)
- Markov Properties
- Pseudolikelihood Methods
- Latent Variable Models
Super Resolution and Linear Dynamical Systems (NEW)
- The Matrix Pencil Method
- Extremal Functions
- The Ho-Kalman Algorithm
- Learning the Markov Parameters
Discussion: When is reinforcement learning algorithmically tractable?
Robust Statistics
- Robustly Estimating the Mean and Covariance
- Subgaussian Mean Estimation
I. Diakonikolas, G. Kamath, D. Kane, J. Li, A. Moitra and A. Stewart. Robust Estimators in High-Dimensions with the Computational Intractability, FOCS 2016.
K. Lai, A. Rao and S. Vempala. Agnostic Estimation of Mean and Covariance, FOCS 2016.
M. Charikar, J. Steinhardt and G. Valiant. Learning from Untrusted Data, STOC 2017.
S. Hopkins. Mean Estimation with Sub-Gaussian Rates in Polynomial Time, Annals of Statistics 2020.