6.883 Advanced Machine Learning — Learning with Combinatorial Structure

Real-world machine learning tasks frequently involve combinatorial structure. How model, infer or predict with graphs, matchings, hierarchies, informative subsets or other discrete structure underlying the data? This graduate course explores mathematical models, overarching concepts and algorithmic techniques for solving such problems efficiently. A focus will lie on understanding connections between machine learning, suitable representations, and convex and combinatorial optimization.

Generics

Time: MW 11-12:30
Location: 2-105
Instructor: Prof. Stefanie Jegelka
Room 32-G472
stefje@mit.edu
Office hours: Wed 1.30-2.30pm
TA: Zi Wang

Tentative List of Topics

Applications

structured prediction
combinatorial norms, (structured) sparse and low-rank representations
modeling diversity, informativeness and influence
selected graph and network problems

Overarching concepts and representations

polyhedral representations: convex hulls and combinatorial polyhedra
submodular functions and matroids
determinantal representations: diversity, repulsion and random trees

Algorithms

submodular optimization: relaxations, convexity, greedy methods
non-smooth convex optimization
conditional gradients everywhere
scaling to larger problems, splitting methods, online and stochastic methods
statistical and computational tradeoffs

Lectures

Disclaimer: these lecture notes have been checked only loosely, and may contain typos or errors. If you find any, please let me know. If you find those notes useful, I'd be happy to hear about that too. Not all lectures were scribed, so the below only covers parts of the course.

Lecture 1: overview
Lecture 2: convex analysis recap
Lecture 3: submodularity definition, structured prediction
Lecture 4: structured prediction, subgradients and subgradient method
Lecture 5: structured prediction, subgradient method, cutting planes
Lecture 6: graphical model inference and perfect graphs
Lecture 7: extensions of submodular functions, log-submodularity in graphical models
Lecture 8: base polytopes and matroids, submodular minimization problem and its dual
Lecture 9: parametric submodular minimization, minimumm norm point problem
Lecture 10: applications of the minimum norm point problem; dual decomposition and splitting
Lecture 11: sparse regression
Lecture 12: structured sparsity and convex relaxation
Lecture 13: proximal gradient, atomic norms
Lecture 14: Frank-Wolfe algorithm
Lecture 15: submodular maximization: applications, greedy algorithm
Lecture 16: submodular maximization: continuous greedy
Lecture 17: maximizing non-monotone submodular functions
Lecture 18: scaling the greedy algorithm
Lecture 19: (combinatorial) online learning
Lecture 20: online mirror descent, multi-armed bandit problem
Lecture 21: determinantal point processes
Lecture 22: determinantal point processes