18.408 Theoretical Foundations for Deep Learning
Spring 2021
Deep learning has sparked a revolution across machine learning. It has led to major advancements in vision, speech, playing strategic games, and the sciences. And yet it remains largely a mystery. We do not understanding why the algorithms that we use work so well in practice.
In this class we will explore theoretical foundations for deep learning, emphasizing the following themes: (1) Approximation: What sorts of functions can be represented by deep networks, and does depth provably increase the expressive power? (2) Optimization: Essentially all optimization problems we want to solve in practice are non-convex. What frameworks can be used to analyze such problems? (3) Beyond-Worst Case Analysis: Deep networks can memorize worst-case data, so why do they generalize well on real-world data? For this and related questions, our starting point will often be natural data-generative models. The theory of deep learning is still very much a work-in-progress. Our goal in this course is merely to explain some of the key questions that drive the this area, and take a critical look at where the existing theory falls short.
We will cover topics such as: Barron's theorem, depth separations, landscape analysis, implicit regularization, neural tangent kernels, generalization bounds, data poisoning attacks and frameworks for proving lower bounds against deep learning.
Announcement: You should join the course slack where we will post all lectures, notes, homeworks, etc.
Course Information
- Instructor: Ankur Moitra
- Harvard Sister Seminar: CS 229br: Advanced Topics in the Theory of Machine Learning, by Boaz Barak
- Lectures: Wednesday 12-3, on Zoom
- Prerequisites: A course in algorithms (6.046/18.410 or equivalent) and probability (6.041/18.440 or equivalent). You will need a strong background in algorithms, probability and linear algebra.
- Assessment: Students will be expected to solve two problem sets, and complete a research-oriented final project. This could be either a survey, or original research; students will be encouraged to find connections between the course material and their own research interests.
Course Outline
Here is a tentative outline for the course:
Approximation Theory
- Universal Approximation and the Fourier Transform
- Barron's Theorem
- Depth Separations and Connections to Circuit Complexity
Optimization Theory
- Convex Optimization and Acceleration
- Escaping Local Minima
- Implicit Regularization
- Neural Tangent Kernels
Statistical Theory
- Data-independent and Data-dependent Generalization Bounds
- Kernel Ridge Regression
- The Interpolation Regime and Double Descent
Representation Learning
- Dictionary Learning
- Student-Teacher Networks
- Lower Bounds from Cryptography
Advanced Optimization Theory
- The Mean Field View
- Landscape Analysis and Matrix Completion
- Langevin Dynamics and Metastability
Instructor Notes
- Lecture 1: Universal Approximation and Barron's Theorem [pdf]
- Lecture 2: Barron's Theorem (continued) and Depth Separations [pdf]
- Lecture 3: Neural Tangent Kernels and Least Squares [pdf]
Materials and Links
Here are links to some other resources you might find helpful: