Machine Learning and Computational Statistics, Spring 2014

Machine Learning and Computational Statistics

DS-GA-1003 and CSCI-GA.2567, Spring 2014

Overview

Machine learning is an exciting and fast-moving field at the intersection of computer science, statistics, and optimization with many recent consumer applications (e.g., Microsoft Kinect, Google Translate, Iphone's Siri, digital camera face detection, Netflix recommendations, Google news). Machine learning and computational statistics also play a central role in data science. In this graduate-level class, students will learn about the theoretical foundations of machine learning and computational statistics and how to apply these to solve new problems. This is a required course for the MS in Data Science and should be taken in the first year of study; it is also suitable for MS and Ph.D. students in Computer Science and related fields (see pre-requisites below).

This course is part of a two course series, although it can be taken individually. The second course, DS-GA-1005: Inference and Representation, will be offered in Fall 2014 (see an approximate list of topics).

For registration information, please contact Varsha Tiger <varsha.tiger@nyu.edu> or Katie Laugel <laugel@cs.nyu.edu>.

General information

Lecture: Tuesdays, 5:10-7pm, in Warren Weaver Hall 109.
Recitation/Laboratory (required for all students): Thursdays, 8:10-9pm in Warren Weaver Hall 109.

Instructor:
Prof. David Sontag
dsontag {@ | at} cs.nyu.edu

Office hours: Tuesdays, 10-11am in 715 Broadway, 12th floor, Room 1204. [still subject to change]

Lab instructor:

Yoni Halpern

halpern {@ | at} cs.nyu.edu

Office hours: Thursdays, 9:10-9:40pm in Warren Weaver Hall 109 (right after lab)

Project advisers:

Dr. David Rosenberg
Dr. Kurt Miller
Dr. Alex Simma

Graders:

Akshay Kumar, Mick Jermsurawong

akshaykumar, jj1192 {@ | at} nyu.edu

Pre-requisites: There are two different sets of pre-requisites to accommodate both Computer Science and Data Science MS students. Students are required to have taken either:

Fundamental Algorithms (CSCI-GA.1170) and Mathematical Techniques for Computer Science Applications (CSCI-GA.1180), or
Intro to Data Science (DS-GA-1001) and Statistical and Mathematical Methods (DS-GA-1002).

Students should be familiar with linear algebra, probability and statistics, and multi-variable calculus, in addition to having good programming skills.

Grading: problem sets (45%) + midterm exam (25%) + project (25%) + participation (5%). Problem Set policy

Books: No textbook is required (readings will come from freely available online material). If an additional reference is desired, a good option is the following book by Kevin Murphy: Machine Learning: a Probabilistic Perspective (2012). A good reference on linear algebra and probability is Ernest Davis's Linear Algebra and Probability for Computer Science Applications.

Mailing list: To subscribe to the class list, follow instructions here.

Project information

Schedule

Lecture Date Topic Required reading Assignments

1 Jan 28
Introduction to learning [Slides]
Chapter 1 of Murphy's book

Notes on perceptron mistake bound (just section 1)
ps1 (data) due Feb 6 at 8pm.

2 Feb 4
Support vector machines (SVMs) [Slides]

Notes on support vector machines

Optional: Second reference on SVM dual and kernel methods (sec. 3-8)

Optional: For more on SVMs, see Hastie, Sections 12.1-12.3 (pg. 435). For more on cross-validation see Hastie, Section 7.10 (pg. 250). ps2 due Feb 14 at 5pm. [Solutions]

3 Feb 11
Kernel methods [Slides]

Optimization, Mercer's theorem Notes on linear algebra, convexity, kernels, and Mercer's theorem

Optional: For more advanced kernel methods, see chapter 3 of this book (free online from NYU libraries) ps3 (data) due Feb 25 at 3pm.

4 Feb 18
Learning theory [Slides]

Notes on learning theory
Notes on gap-tolerant classifiers (section 7.1, pg. 29-31)

Pedro Domingos's A Few Useful Things to Know About Machine Learning

5 Feb 25
Decision trees [Slides]

Ensemble methods, Random forests Mitchell Ch. 3
Hastie et al., Section 8.7 (bagging)

Optional: Rudin's lecture notes (on decision trees)
Optional: Hastie et al. Chapter 15 (on random forests)
ps4 (data) due Mar 7 at 5pm.

6 March 4
Midterm review

Lab: deep learning (guest lecture by Yann LeCun)

7 March 11

(no class, office hours, or lab March 18/20, Spring break)
Midterm exam

Lab: project advisers
Project proposal, due March 27 at 3pm.

8 March 25
Clustering [Slides]

K-means, hierarchical, spectral Hastie et al., Sections 14.3.6, 14.3.8, 14.3.9, 14.3.12, 14.5.3

Optional: Tutorial on spectral clustering

9 April 1
Dimensionality reduction [Slides]

Notes on PCA
More notes on PCA

Optional: Barber, Chapter 15
Optional: Roweis and Saul, Science 2000, Tenenbaum et al., Science 2000, van der Maaten and Hinton, JMLR '08 ps5 (data) due Apr 15 at 3pm.

10 April 8
Bayesian methods [Slides]

Maximum likelihood estimation, naive Bayes Notes on naive Bayes and logistic regression

Optional: Notes on probability and statistics

11 April 15
Graphical models [Slides]

Tutorial on HMMs
Introduction to Bayesian networks ps6 due Apr 28 at 5pm

12 April 22
Unsupervised learning [Slides]

Notes on mixture models

13 April 29
EM algorithm [Slides 1, Slides 2]

Mixture models, topic models, latent Dirichlet allocation Notes on Expectation Maximization
The Expectation Maximization Algorithm: A short tutorial

Review article on topic modeling
Explore topic models of: state-of-the-union addresses, literary studies (see also this blog), evolution of science, Wikipedia

14 May 6
(no class Tuesday May 13)
Advanced topics
Optional:
Introduction to learning to rank
Joachims' Training Linear SVMs in Linear Time
Slides on collaborative filtering
Slides on victim identification using Bayesian networks (Video)

15 Thu. May 15, 7:10-9:40pm
Project presentations (WWH 13th floor)

Acknowledgements: Many thanks to the University of Washington, Carnegie Mellon University, UT Dallas, Stanford, UC Irvine, Princeton, and MIT for sharing material used in slides and homeworks.

Reference materials

Machine learning books

Trevor Hastie, Rob Tibshirani, and Jerry Friedman, Elements of Statistical Learning, Second Edition, Springer, 2009. (Can be downloaded as PDF file.)

David Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press, 2012. (Can be downloaded as PDF file.)

Probability

Chapter 2 of either Murphy or Bishop (see also Bishop Appendix B)

Review notes from Stanford's machine learning class

Sam Roweis's probability review

Linear algebra

Bishop Appendix C

Online class from MIT

Review notes from Stanford's machine learning class

Sam Roweis's linear algebra review

Calculus

Online calculus textbook (MIT Open Courseware)

Bishop Appendix D and E (Lagrange multipliers)

Notes from MIT on Lagrange multipliers

Dan Klein's Lagrange Multipliers without Permanent Scarring

Optimization

Convex Optimization by Stephen Boyd and Lieven Vandenberghe. (Can be downloaded as PDF file.)

Problem Set policy

I expect you to try solving each problem set on your own. However, when being stuck on a problem, I encourage you to collaborate with other students in the class, subject to the following rules:

You may discuss a problem with any student in this class, and work together on solving it. This can involve brainstorming and verbally discussing the problem, going together through possible solutions, but should not involve one student telling another a complete solution.

Once you solve the homework, you must write up your solutions on your own, without looking at other people's write-ups or giving your write-up to others.

In your solution for each problem, you must write down the names of any person with whom you discussed it. This will not affect your grade.

Do not consult solution manuals or other people's solutions from similar courses.

Late submission policy: During the semester you are allowed at most two extensions on the homework assignment. Each extension is for at most 48 hours and carries a penalty of 25% off your assignment.

Lecture	Date	Topic	Required reading	Assignments
1	Jan 28	Introduction to learning [Slides]	Chapter 1 of Murphy's book Notes on perceptron mistake bound (just section 1)	ps1 (data) due Feb 6 at 8pm.
2	Feb 4	Support vector machines (SVMs) [Slides]	Notes on support vector machines Optional: Second reference on SVM dual and kernel methods (sec. 3-8) Optional: For more on SVMs, see Hastie, Sections 12.1-12.3 (pg. 435). For more on cross-validation see Hastie, Section 7.10 (pg. 250).	ps2 due Feb 14 at 5pm. [Solutions]
3	Feb 11	Kernel methods [Slides] Optimization, Mercer's theorem	Notes on linear algebra, convexity, kernels, and Mercer's theorem Optional: For more advanced kernel methods, see chapter 3 of this book (free online from NYU libraries)	ps3 (data) due Feb 25 at 3pm.
4	Feb 18	Learning theory [Slides]	Notes on learning theory Notes on gap-tolerant classifiers (section 7.1, pg. 29-31) Pedro Domingos's A Few Useful Things to Know About Machine Learning
5	Feb 25	Decision trees [Slides] Ensemble methods, Random forests	Mitchell Ch. 3 Hastie et al., Section 8.7 (bagging) Optional: Rudin's lecture notes (on decision trees) Optional: Hastie et al. Chapter 15 (on random forests)	ps4 (data) due Mar 7 at 5pm.
6	March 4	Midterm review Lab: deep learning (guest lecture by Yann LeCun)
7	March 11 (no class, office hours, or lab March 18/20, Spring break)	Midterm exam Lab: project advisers		Project proposal, due March 27 at 3pm.
8	March 25	Clustering [Slides] K-means, hierarchical, spectral	Hastie et al., Sections 14.3.6, 14.3.8, 14.3.9, 14.3.12, 14.5.3 Optional: Tutorial on spectral clustering
9	April 1	Dimensionality reduction [Slides]	Notes on PCA More notes on PCA Optional: Barber, Chapter 15 Optional: Roweis and Saul, Science 2000, Tenenbaum et al., Science 2000, van der Maaten and Hinton, JMLR '08	ps5 (data) due Apr 15 at 3pm.
10	April 8	Bayesian methods [Slides] Maximum likelihood estimation, naive Bayes	Notes on naive Bayes and logistic regression Optional: Notes on probability and statistics
11	April 15	Graphical models [Slides]	Tutorial on HMMs Introduction to Bayesian networks	ps6 due Apr 28 at 5pm
12	April 22	Unsupervised learning [Slides]	Notes on mixture models
13	April 29	EM algorithm [Slides 1, Slides 2] Mixture models, topic models, latent Dirichlet allocation	Notes on Expectation Maximization The Expectation Maximization Algorithm: A short tutorial Review article on topic modeling Explore topic models of: state-of-the-union addresses, literary studies (see also this blog), evolution of science, Wikipedia
14	May 6 (no class Tuesday May 13)	Advanced topics	Optional: Introduction to learning to rank Joachims' Training Linear SVMs in Linear Time Slides on collaborative filtering Slides on victim identification using Bayesian networks (Video)
15	Thu. May 15, 7:10-9:40pm	Project presentations (WWH 13th floor)