Inference and Representation

DS-GA-1005, Fall 2015


This course covers how to think about and model data. We introduce the tools of probabilistic graphical models as a means of representing and manipulating data, modeling uncertainty, and discovering new insights from data. We will particularly emphasize latent variable models, examples of which include latent Dirichlet allocation (for topic modeling), factor analysis, and Gaussian processes. The class will also discuss modeling temporal data (e.g., hidden Markov models), hierarchical models, deep generative models, and structured prediction. On completing this course, students should be able to take a new problem or data set, formulate an appropriate model, learn the model from data, and ultimately answer their original question using inference in the model.

Prerequisite: DS-GA-1003 (Machine Learning and Computational Statistics), or permission of instructor.

General information

Lecture: Tuesdays, 5:10-7:00pm, in Warren Weaver Hall 102
Recitation/Laboratory (required for all students): Wednesdays, 7:10-8:00pm in Warren Weaver Hall 102

David Sontag   
dsontag {@ | at}
Lab instructor:
Rachel Hodos

hodos {@ | at}

Prasoon Goyal
Justin Chiu

pg1338 {@ | at}
jchiu {@ | at}

Office hours: Thursdays, 3:30-4:30pm. Location: 715 Broadway (intersection with Washington Place), 12th Floor, room 1204

Grading: [not finalized] problem sets (55%) + midterm exam (20%) + final exam (20%) + participation (5%). Problem Set policy

Book (required): Kevin Murphy, Machine Learning: a Probabilistic Perspective, MIT Press, 2012. I recommend the latest (4th) printing, as the earlier editions had many typos. You can tell which printing you have as follows: check the inside cover, below the "Library of Congress" information. If it says "10 9 8 ... 4" you've got the (correct) fourth print.

Piazza: We will use Piazza to answer questions and post announcements about the course. Please sign up here. Students' use of Piazza, particularly for adequately answering other students' questions, will contribute toward their participation grade.

Online recordings: Most of the lectures and labs' videos will be posted to Note, however, that class attendance is required.

  Schedule (Readings refer to the Murphy book)

Week Date Topic Readings Assignments
1 Sept 8
Introduction, Bayesian networks [slides]

Lab: Probability review, Bayesian network basics, PyMC3 tutorial
Chapters 1, 2, and 3.5-3.5.3 (should be review for most people)

Chapter 10 (Bayesian networks), except for 10.2.5 and 10.6

Probabilistic Programming and Bayesian Methods -- Chapter 1

Algorithm for d-separation, from Koller & Friedman (optional)

ps1 due Sept. 22 at 3pm

2 Sept 15
Case studies #1-2

Probabilistic modeling in neuroscience (Neil Rabinowitz) [slides] and political science (Pablo Barberá) [slides]

Lab: Review of case studies and BN structure learning

Sec. 17.3, 17.4, 17.5.1 (HMMs)

Introduction to Probabilistic Topic Models (optional)

Rabinowitz paper (optional)

Sept 22
Undirected graphical models [slides]

Conditional random fields, Gaussian MRFs

Case study #3: Astronomy (Dan Foreman-Mackey) [slides]

Lab: Some subtleties on BNs, MRF review, CRF intro
Sec. 9-9.2.5 (exponential family)
Sec. 19-19.4.4, 19.6 (MRFs, CRFs)

An Introduction to Conditional Random Fields (section 2)

Original paper introducing CRFs (optional) 

Application to camera lens blur (optional)
ps2 [data] due Oct. 2 at 5pm
Sept 29
Exact inference [slides]

Variable elimination, treewidth, belief propagation

Lab: Graph separation in MRFs, revisiting CRFs, BP, pruning barren nodes
Chapter 20
Sec. 22.2 (loopy belief propagation)

Using loopy BP for stereo vision (optional)


Oct 6

(no lab this week)

Unsupervised learning

Expectation Maximization

Case study #4: Education (Kevin Wilson, Knewton)
Sec. 11-11.4.4, 11.4.7 (EM)
Sec. 17.5.2 (EM for HMMs)

The Expectation Maximization Algorithm: A short tutorial

Measuring student knowledge (see Section 3; optional)

ps3 due Oct. 14 at 4pm

Oct 14

(Weds, 6:10-8pm in 719 Broadway 12th floor, room 1221)
Monte-Carlo methods
[slides 1] [slides 2]

Gibbs sampling
Sec. 23.1-23.4
Sec. 24.1-24.4.1
Sec. 17.2.3
ps4 [data] due Oct. 23 at 5pm
Oct 20
Causal inference & Bayesian additive regression trees [slides]

Lab: Midterm review
Hill & Gelman Ch. 9 ( p.167-175, 181-188)

Chipman et al's BART paper and bartMachine (optional)

Hill's Bayesian Nonparametric Modeling for Causal Inference (optional)

Particle filtering for BART (optional)

Sec. 26.6 (learning causal DAGS; optional)

Oct 27

Midterm exam
(in class)

Nov 3

Topic modeling [slides]

Case study #5: Musical influence via dynamic topic models (Uri Shalit) [slides]

Lab: Group discussion on real-world modeling problems
Chapter 27 (except 27.3.6,, 27.6.1, and 27.7)

Introduction to Probabilistic Topic Models

Modeling musical influence with topic models (optional)

Nov 10

Gaussian processes [slides]

Chapter 15

Application to predicting wind flow (optional)

Nov 17

Learning Markov random fields


Moment matching, Chow-Liu algorithm, pseudo-likelihood

Case study #6: Cognitive science (Brenden Lake) [slides]

Lab: Exponential families, learning MRFs, and GPs [iPython notebook]

Sec. 19.5, 26.7-26.8

Notes on pseudo-likelihood

An Introduction to Conditional Random Fields (section 4; optional)

Approximate maximum entropy learning in MRFs (optional)
ps5 [data] due Dec. 1 at 3pm
Nov 24

(no lab this week)

Variational inference

Mean-field approximation
Sec. 21.1-21.6
Sec. 22.3-22.4

Graphical models, exponential families, and variational inference (optional)

12 Dec 1

Learning deep generative models


Stochastic variational inference, Variational auto-encoders
Stochastic Variational Inference (optional)

Talk for an ICML 2014 paper on deep generative models (optional)
ps6 [data] due Dec. 8 at 3pm
13 Dec 8
Structured prediction [slides]

Overview of structured prediction, parametrizing CRFs
Sec. 19.7 (also, review 19.6 again)

Tutorial on Structured Prediction (optional)

Original paper introducing structured perceptron (optional)

Cutting-Plane Training of Structural SVMs (optional)

Block-Coordinate Frank-Wolfe Optimization for Structural SVMs (optional)
ps7 [data] due Dec. 18 at 5pm
14 Dec 15
Integer linear programming [slides]

MAP inference, linear programming relaxations, dual decomposition

Lab: Review for final
Sec. 22.6 (MAP inference)

Derivation relating dual decomposition & LP relaxations

Introduction to Dual Decomposition for Inference (optional)

Integer Programming for Bayesian Network Structure Learning (optional)

Dec 22

(regular class time)

Final exam

Acknowledgements: Many thanks to TTI-C, CMU, Hebrew University, UC Berkeley, and Stanford University for sharing material used in slides and homeworks

Reference materials

Problem Set policy

I expect you to try solving each problem set on your own. However, when being stuck on a problem, I encourage you to collaborate with other students in the class, subject to the following rules:

  1. You may discuss a problem with any student in this class, and work together on solving it. This can involve brainstorming and verbally discussing the problem, going together through possible solutions, but should not involve one student telling another a complete solution.

  2. Once you solve the homework, you must write up your solutions on your own, without looking at other people's write-ups or giving your write-up to others.

  3. In your solution for each problem, you must write down the names of any person with whom you discussed it. This will not affect your grade.

  4. Do not consult solution manuals or other people's solutions from similar courses.
Late submission policy During the semester you are allowed at most two extensions on the homework assignment. Each extension is for at most 48 hours and carries a penalty of 25% off your assignment.