Machine Learning and Computational Statistics
DS-GA-1003 and CSCI-GA.2567, Spring 2014
Machine learning is an exciting and fast-moving field at
the intersection of computer science, statistics, and
optimization with many recent consumer
applications (e.g., Microsoft Kinect, Google Translate,
Iphone's Siri, digital camera face detection, Netflix
recommendations, Google news). Machine learning and
computational statistics also play a central role in data
science. In this graduate-level class, students will learn
about the theoretical foundations of machine learning and
computational statistics and how to apply these to solve
new problems. This is a required course for the MS in Data
Science and should be taken in the first year of study; it
is also suitable for MS and Ph.D. students in Computer
Science and related fields (see pre-requisites below).
For registration information, please contact Varsha
Tiger <email@example.com> or Katie Laugel
Lecture: Tuesdays, 5:10-7pm, in Warren
Weaver Hall 109.
Pre-requisites: There are two different sets of pre-requisites to accommodate both Computer Science and Data Science MS students. Students are required to have taken either:
Students should be familiar with linear algebra,
probability and statistics, and multi-variable calculus,
in addition to having good programming skills.
Grading: problem sets (45%) + midterm
exam (25%) + project (25%) + participation (5%). Problem Set policy
Books: No textbook is required (readings will come from freely available online material). If an additional reference is desired, a good option is the following book by Kevin Murphy: Machine Learning: a Probabilistic Perspective (2012). A good reference on linear algebra and probability is Ernest Davis's Linear Algebra and Probability for Computer Science Applications.
Introduction to learning [Slides]
Chapter 1 of Murphy's book
Notes on perceptron mistake bound (just section 1)
ps1 (data) due Feb 6 at 8pm.
Support vector machines (SVMs) [Slides]
Notes on support vector machines
Optional: Second reference on SVM dual and kernel methods (sec. 3-8)
Optional: For more on SVMs, see Hastie, Sections 12.1-12.3 (pg. 435). For more on cross-validation see Hastie, Section 7.10 (pg. 250).
ps2 due Feb 14 at 5pm. [Solutions]
Kernel methods [Slides]
Optimization, Mercer's theorem
Notes on linear algebra, convexity, kernels, and Mercer's theorem
Optional: For more advanced kernel methods, see chapter 3 of this book (free online from NYU libraries)
ps3 (data) due Feb 25 at 3pm.
Learning theory [Slides]
on learning theory
Notes on gap-tolerant classifiers (section 7.1, pg. 29-31)
Pedro Domingos's A Few Useful Things to Know About Machine Learning
Decision trees [Slides]
Ensemble methods, Random forests
Mitchell Ch. 3
Hastie et al., Section 8.7 (bagging)
Optional: Rudin's lecture notes (on decision trees)
Optional: Hastie et al. Chapter 15 (on random forests)
ps4 (data) due Mar 7 at 5pm.
Lab: deep learning (guest lecture by Yann LeCun)
(no class, office hours, or lab March 18/20, Spring break)
Lab: project advisers
Project proposal, due March 27 at 3pm.
K-means, hierarchical, spectral
Hastie et al., Sections 14.3.6,
14.3.8, 14.3.9, 14.3.12, 14.5.3
Optional: Tutorial on spectral clustering
Dimensionality reduction [Slides]
More notes on PCA
Optional: Barber, Chapter 15
Optional: Roweis and Saul, Science 2000, Tenenbaum et al., Science 2000, van der Maaten and Hinton, JMLR '08
ps5 (data) due Apr 15 at 3pm.
Bayesian methods [Slides]
Maximum likelihood estimation, naive Bayes
on naive Bayes and logistic regression
Optional: Notes on probability and statistics
Graphical models [Slides]
Introduction to Bayesian networks
ps6 due Apr 28 at 5pm
Unsupervised learning [Slides]
on mixture models
EM algorithm [Slides 1, Slides 2]
Mixture models, topic models, latent Dirichlet allocation
on Expectation Maximization
The Expectation Maximization Algorithm: A short tutorial
Review article on topic modeling
Explore topic models of: state-of-the-union addresses, literary studies (see also this blog), evolution of science, Wikipedia
(no class Tuesday May 13)
Introduction to learning to rank
Joachims' Training Linear SVMs in Linear Time
Slides on collaborative filtering
Slides on victim identification using Bayesian networks (Video)
Thu. May 15, 7:10-9:40pm
Project presentations (WWH 13th floor)
Acknowledgements: Many thanks to the University of Washington, Carnegie Mellon University, UT Dallas, Stanford, UC Irvine, Princeton, and MIT for sharing material used in slides and homeworks.
I expect you to try solving each problem set on your own. However, when being stuck on a problem, I encourage you to collaborate with other students in the class, subject to the following rules: