Inference and Representation
DS-GA-1005, Fall 2015
Overview
This course covers how to think about and model data. We
introduce the tools of probabilistic graphical models as a
means of representing and manipulating data, modeling
uncertainty, and discovering new insights from data. We
will particularly emphasize latent variable models,
examples of which include latent Dirichlet allocation (for
topic modeling), factor analysis, and Gaussian processes.
The class will also discuss modeling temporal data (e.g.,
hidden Markov models), hierarchical models, deep
generative models, and structured prediction. On
completing this course, students should be able to take a
new problem or data set, formulate an appropriate model,
learn the model from data, and ultimately answer their
original question using inference in the model. |
|||||||||||||||
General information Lecture: Tuesdays, 5:10-7:00pm, in
Warren Weaver Hall 102
Office hours:
Thursdays, 3:30-4:30pm. Location:
715 Broadway (intersection with Washington Place),
12th Floor, room 1204 Grading: [not finalized]
problem sets (55%) + midterm exam (20%) + final exam (20%)
+ participation (5%). Problem Set
policy Book (required): Kevin Murphy, Machine Learning: a Probabilistic Perspective, MIT Press, 2012. I recommend the latest (4th) printing, as the earlier editions had many typos. You can tell which printing you have as follows: check the inside cover, below the "Library of Congress" information. If it says "10 9 8 ... 4" you've got the (correct) fourth print.
Piazza: We will use Piazza
to answer questions and post announcements about the
course. Please sign up here.
Students' use of Piazza, particularly for adequately
answering other students' questions, will contribute
toward their participation grade. Online recordings: Most of the lectures and labs'
videos will be posted to techtalks.tv.
Note, however, that class attendance is required. |
Schedule (Readings refer to the Murphy book)
Week | Date | Topic | Readings | Assignments |
1 | Sept 8 |
Introduction, Bayesian networks [slides] Lab: Probability review, Bayesian network basics, PyMC3 tutorial |
Chapters 1, 2, and 3.5-3.5.3 (should be
review for most people) Chapter 10 (Bayesian networks), except for 10.2.5 and 10.6 Probabilistic Programming and Bayesian Methods -- Chapter 1 Algorithm for d-separation, from Koller & Friedman (optional) |
|
2 | Sept 15 |
Case studies #1-2 Probabilistic modeling in neuroscience (Neil Rabinowitz) [slides] and political science (Pablo Barberá) [slides] Lab: Review of case studies and BN structure learning |
Sec. 17.3, 17.4, 17.5.1 (HMMs) Introduction to Probabilistic Topic Models (optional) Rabinowitz paper (optional) |
|
3 |
Sept 22 |
Undirected graphical models [slides] Conditional random fields, Gaussian MRFs Case study #3: Astronomy (Dan Foreman-Mackey) [slides] Lab: Some subtleties on BNs, MRF review, CRF intro |
Sec. 9-9.2.5 (exponential
family) Sec. 19-19.4.4, 19.6 (MRFs, CRFs) An Introduction to Conditional Random Fields (section 2) Original paper introducing CRFs (optional) Application to camera lens blur (optional) |
ps2 [data] due Oct. 2 at
5pm [Solutions] |
4 |
Sept 29 |
Exact inference [slides] Variable elimination, treewidth, belief propagation Lab: Graph separation in MRFs, revisiting CRFs, BP, pruning barren nodes |
Chapter 20 Sec. 22.2 (loopy belief propagation) Using loopy BP for stereo vision (optional) |
|
5 |
Oct 6 (no lab this week) |
Unsupervised learning [slides] Expectation Maximization Case study #4: Education (Kevin Wilson, Knewton) |
Sec. 11-11.4.4, 11.4.7 (EM) Sec. 17.5.2 (EM for HMMs) The Expectation Maximization Algorithm: A short tutorial Measuring student knowledge (see Section 3; optional) |
ps3 due Oct. 14 at 4pm |
6 |
Oct 14 (Weds, 6:10-8pm in 719 Broadway 12th floor, room 1221) |
Monte-Carlo methods [slides 1] [slides 2] Gibbs sampling |
Sec. 23.1-23.4 Sec. 24.1-24.4.1 Sec. 17.2.3 |
ps4 [data] due Oct. 23 at
5pm [Solutions] |
7 |
Oct 20 |
Causal inference & Bayesian
additive regression trees [slides] Lab: Midterm review |
Hill
& Gelman Ch. 9 (
p.167-175, 181-188) Chipman et al's BART paper and bartMachine (optional) Hill's Bayesian Nonparametric Modeling for Causal Inference (optional) Particle filtering for BART (optional) Sec. 26.6 (learning causal DAGS; optional) |
|
Oct 27 |
Midterm exam (in class) |
|||
8 |
Nov 3 |
Topic modeling [slides] Case study #5: Musical influence via dynamic topic models (Uri Shalit) [slides] Lab: Group discussion on real-world modeling problems |
Chapter 27 (except 27.3.6, 27.3.6.7, 27.6.1, and 27.7) Introduction to Probabilistic Topic Models Modeling musical influence with topic models (optional) |
|
9 |
Nov 10 |
Gaussian processes [slides] |
Chapter 15 Application to predicting wind flow (optional) |
|
10 |
Nov 17 |
Learning Markov random fields [slides] Moment matching, Chow-Liu algorithm, pseudo-likelihood Case study #6: Cognitive science (Brenden Lake) [slides] Lab: Exponential families, learning MRFs, and GPs [iPython notebook] |
Sec. 19.5, 26.7-26.8 Notes on pseudo-likelihood An Introduction to Conditional Random Fields (section 4; optional) Approximate maximum entropy learning in MRFs (optional) |
ps5 [data] due Dec. 1 at 3pm |
11 |
Nov 24 (no lab this week) |
Variational inference [slides] Mean-field approximation |
Sec. 21.1-21.6 Sec. 22.3-22.4 Graphical models, exponential families, and variational inference (optional) |
|
12 | Dec 1 |
Learning deep generative models [slides] Stochastic variational inference, Variational auto-encoders |
Stochastic Variational
Inference (optional) Talk for an ICML 2014 paper on deep generative models (optional) |
ps6 [data] due Dec. 8 at 3pm |
13 | Dec 8 |
Structured prediction [slides] Lab: Overview of structured prediction, parametrizing CRFs |
Sec. 19.7 (also, review 19.6
again) Tutorial on Structured Prediction (optional) Original paper introducing structured perceptron (optional) Cutting-Plane Training of Structural SVMs (optional) Block-Coordinate Frank-Wolfe Optimization for Structural SVMs (optional) |
ps7 [data] due Dec. 18 at 5pm |
14 | Dec 15 |
Integer linear programming [slides] MAP inference, linear programming relaxations, dual decomposition Lab: Review for final |
Sec. 22.6 (MAP inference) Derivation relating dual decomposition & LP relaxations Introduction to Dual Decomposition for Inference (optional) Integer Programming for Bayesian Network Structure Learning (optional) |
|
Dec 22 (regular class time) |
Final exam |
Reference materials
|
Problem
Set policy I expect you to try solving each problem set on your own. However, when being stuck on a problem, I encourage you to collaborate with other students in the class, subject to the following rules:
|