18.408: Algorithmic Aspects of Machine Learning

Fall 2017

Modern machine learning systems are often built on top of algorithms that do not have provable guarantees, and it is the subject of debate when and why they work. In this class, we will focus on designing algorithms whose performance we can rigorously analyze for fundamental machine learning problems. We will cover topics such as: nonnegative matrix factorization, tensor decomposition, sparse coding, learning mixture models, matrix completion and inference in graphical models. Almost all of these problems are computationally hard in the worst-case and so developing an algorithmic theory is about (1) choosing the right models in which to study these problems and (2) developing the appropriate mathematical tools (often from probability, geometry or algebra) in order to rigorously analyze existing heuristics, or to design fundamentally new algorithms.

Announcement: Course evaluations are here, please take a few minutes to fill it out before it closes on Monday, December 18th at 9am!

Announcement: The final project description is posted here and due on December 13th, by email

Course Information

Instructor: Ankur Moitra

Lectures: Monday and Wednesday 1:00-2:30, room change (!) 4-237

Teaching Assistant: Alex Wein

Prerequisites: A course in algorithms (6.046/18.410 or equivalent) and probability (6.041/18.440 or equivalent). You will need a strong background in algorithms, probability and linear algebra.

Textbook: We will use this monograph. Lecture notes and/or presentations covering new topics will be provided.

Office Hours: Monday and Wednesday 2:30-3:30 (meet after class)

Assessment: Students will be expected to solve a handful of problem sets, and complete a research-oriented final project. This could be either a survey, or original research; students will be encouraged to find connections between the course material and their own research interests.

Problem Sets

Problem Set 1: [pdf]
Problem Set 2: [pdf]
Problem Set 3: [pdf]
Final Project: [pdf]

Additional Notes

Alex Wein Guest Lecture: Message Passing and State Evolution [pdf]
Lecture 21: Matrix Completion and Rademacher Complexity [pdf]
Lecture 22: More Matrix Completion [pdf]
Lecture 23: Nonconvex Optimization and the Strict Saddle Property [pdf]
Lecture 24: No Spurious Local Minima [pdf]

Course Outline

Here is a tentative outline for the course:

Nonnegative Matrix Factorization

Qualitative Comparisons to SVD
New Algorithms via Separability
Applications to Topic Models

Learning the Parts of Objects by Nonnegative Matrix Factorization

On the Complexity of Nonnegative Matrix Factorization

Computing a Nonnegative Matrix Factorization -- Provably

Learning Topic Models -- Going Beyond SVD

A Practical Algorithm for Topic Modeling with Provable Guarantees

Discussion:

Clustering under Approximation Stability

Tensor Decompositions and Applications

Tensor Rank, Border Rank and the Rotation Problem
Jennrich's Algorithm and the Generalized Eigenvalue Problem
Learning HMMs
Mixed Membership Models and Community Detection

Most Tensor Problems are NP-hard

Learning Nonsingular Phylogenies and Hidden Markov Models

A Spectral Algorithm for Latent Dirichlet Allocation

A Tensor Spectral Approach to Learning Mixed Membership Community Models

Discussion:

Heuristics for Semirandom Graph Problems

Sparse Recovery and Sparse Coding

Incoherence and Uncertainty Principles
Orthogonal Matching Pursuit
Compressed Sensing and RIP
Alternating Minimization via Approximate Gradient Descent [slides]

Emergence of Simple-cell Receptive Field Properties by Learning a Sparse Code for Natural Images

Exact Recovery of Sparsely-Used Dictionaries

Simple, Efficient and Neural Algorithms for Sparse Coding

Discussion:

Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

Graphical Models, Robustness and Nonconvexity

Learning Graphical Models
Agnostically Learning a Gaussian
The Landscape of Nonconvex Optimization

Efficiently Learning Ising Models on Arbitrary Graphs

Learning Graphical Models Using Multiplicative Weights

Robust Estimators in High-Dimensions with the Computational Intractability

Agnostic Estimation of Mean and Covariance

Learning from Untrusted Data

No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis

Discussion:

Computational Lower Bounds for Sparse PCA

A Nearly Tight Sum-of-Squares Lower Bound for the Planted Clique Problem

Learning Mixture Models

Expectation Maximization
Clustering in High-Dimensions
Method of Moments and Systems of Polynomial Equations [slides]

Maximum Likelihood from Incomplete Data via the EM Algorithm

Learning Mixtures of Gaussians

Learning Mixtures of Separated Nonspherical Gaussians

Settling the Polynomial Learnability of Mixtures of Gaussians

Polynomial Learning of Distribution Families

Learning Mixtures of Gaussians in High Dimensions

Discussion:

Smoothed Analysis of Tensor Decompositions