Introduction To Machine Learning, Spring 2016

Introduction To Machine Learning

Spring 2016

Overview

Machine learning is an exciting and fast-moving field of computer science with many recent consumer applications (e.g., Microsoft Kinect, Google Translate, Iphone's Siri, digital camera face detection, Netflix recommendations, Google news) and applications within the sciences and medicine (e.g., predicting protein-protein interactions, species modeling, detecting tumors, personalized medicine). In this undergraduate-level class, students will learn about the theoretical foundations of machine learning and how to apply machine learning to solve new problems.

General information

Lectures: Tuesday and Thursday, 2pm-3:15pm
Room: Warren Weaver Hall 312

Instructor:
Prof. David Sontag
dsontag {@ | at} cs.nyu.edu

TA:
Kevin Jiao

jj1745 {@ | at} nyu.edu

Graders:

Alexandre Sablayrolles

sla382 {@ | at} nyu.edu

Yijun Xiao

ryjxiao {@ | at} nyu.edu

Office hours (David): Tuesdays 4:30-5:30pm. Location: 715 Broadway, 12th floor, Room 1204
Office hours (Kevin): Wednesdays 6:30-8:00pm. Location: KMC 8-150

Grading: problem sets (50%) + midterm exam (25%) + project (20%) + participation (5%). Problem Set policy

Pre-requisites: Students must either have taken Basic Algorithms (CSCI-UA.0310) or be taking it concurrently. Linear algebra (MATH-UA 140) is strongly recommended as a pre-requisite, and knowledge of multivariable calculus will be helpful. Students should also have good programming skills.

Books: No textbook is required (readings will come from freely available online material). If an additional reference is desired, the following books are good options. Bishop's book is easier to read, whereas Murphy's book has more depth and coverage (and is up to date).

Machine Learning: a Probabilistic Perspective, by Kevin Murphy (2012).
Pattern Recognition and Machine Learning, by Chris Bishop (2006).

Piazza: We will use Piazza to answer questions and post announcements about the course. Please sign up here. Students' use of Piazza, particularly for adequately answering other students' questions, will contribute toward their participation grade.

Project information

Schedule

Lecture	Date	Topic	Required reading	Assignments
1	Jan 26 (Tues)	Overview [Slides ]	Chapter 1 of Murphy's book
2	Jan 28 (Th)	Introduction to learning [Slides] Loss functions, Perceptron algorithm, proof of perceptron mistake bound	Barber 17.1 on least-squares regression, A.1.1-4 (review of vector algebra) Notes on perceptron mistake bound (just section 1)	ps1 (data), due Feb 5 at 6pm
3	Feb 2 (Tues)	Linear classifiers [Slides] Introduction to Support vector machines
4	Feb 5 (Thurs)	Support vector machines [Slides] [iPython notebook, html] Introduction to convex optimization, gradient descent	Notes on support vector machines (sections 1-4) Additional notes on SVMs (sec. 1 & 2) Notes on optimization	ps2, due Feb 15 at 10pm. [Solutions]
5	Feb 9 (Tues)	Stochastic gradient descent [Slides] Pegasos algorithm (stochastic subgradient descent for SVMs)
6	Feb 11 (Thurs)	Kernel methods [Slides] Kernel methods for SVMs, multi-class classification	Notes on kernels (section 7) Lecture notes Optional: Shalev-Shwartz & Ben-David Chapter 16 on kernel methods Optional: Shalev-Shwartz & Ben-David Sections 17.1 & 17.2 on multi-class
7	Feb 16 (Tues)	Kernel methods (continued)	Python demo shown in class	ps3 (data), due Feb 24 at 10pm
8	Feb 18 (Thurs)	L1-regularization + Diabetes case study [Slides] Intro to learning theory	Optional: Diabetes paper
9	Feb 23 (Tues)	Learning theory [Slides] Finite hypothesis classes	Notes on learning theory (sections 1-3)
10	Feb 25 (Thurs)	Learning theory [Slides] VC-dimension	Notes on learning theory (section 4) Optional: Notes on gap-tolerant classifiers (section 7.1, pg. 29-31)	ps4 (data), due Mar 6 at 10pm
11	Mar 1 (Tues)	Decision trees [Slides] Ensemble methods	Mitchell Ch. 3 Optional: Hastie et al., Section 8.7 (bagging) Optional: Rudin's lecture notes (on decision trees) Optional: Hastie et al. Chapter 15 (on random forests)
12	Mar 3 (Thurs)	K-means clustering [Slides]	Shalev-Shwartz & Ben-David Chapter 22 intro and Section 22.2 Optional: Hastie et al., Sections 14.3.6, 14.3.8, 14.3.9	ps5 (data) due Mar 21 at 10pm
13	Mar 8 (Tues)	Hierarchical & spectral clustering [Slides]	Shalev-Shwartz & Ben-David Sections 22.1, 22.3 Hastie et al., Sections 14.3.12, 14.5.3 Optional: Tutorial on spectral clustering
14	Mar 10 (Thurs) No office hours during spring break	Introduction to Bayesian inference [Slides] Bayes rule, decision theory	The Go Files: AI computer wins first match against master Go player Optional: Silver et al. Nature article (NYU access)
15	Mar 22 (Tues)	Midterm review
	Mar 24 (Thurs)	Midterm exam (in class)		Project proposal due Mar 28 at 10pm
16	Mar 29 (Tues) No office hours Mar 29	Naive Bayes [Slides] Maximum likelihood estimation	Notes on naive Bayes (Sections 1 & 2) Shalev-Shwartz & Ben-David Chapter 24 (except 24.4)
17	Mar 31 (Thurs)	Logistic regression [Slides]	Notes on logistic regression (Sections 3-5)
18	Apr 5 (Tues)	Graphical models [Slides] Modeling temporal data (e.g., hidden Markov models)	Tutorial on HMMs Introduction to Bayesian networks Optional: An introduction to graphical models
19	Apr 7 (Thurs)	Unsupervised learning I [Slides] Expectation maximization	Notes on mixture models Notes on Expectation Maximization Shalev-Shwartz & Ben-David Section 24.4
20	Apr 12 (Tues) No office hours Apr 12	History of artificial intelligence [Slides] Guest lecture by Prof. Zaid Harchaoui	A (Very) Brief History of Artificial Intelligence Optional: Chapter 1 of Russell & Norvig (available at NYU libraries)
21	Apr 14 (Thurs)	Unsupervised learning II [Slides] Topic models (e.g., latent Dirichlet allocation)	Review article on topic modeling Explore topic models of: politics over time, state-of-the-union addresses, literary studies (see also this blog), Wikipedia	ps6 due Apr 25 at 10pm
22	Apr 19 (Tues)	Topic modeling (continued) [Slides]
23	Apr 21 (Thurs)	Dimensionality reduction [Slides] Principal components analysis	Shalev-Shwartz & Ben-David Section 23-23.1 Optional: Barber, Chapter 15 Optional: Notes on PCA Optional: More notes on PCA
24	Apr 26 (Tues)	Introduction to neural networks [Slides] Backpropagation, convolution	Notes on backpropagation (extra) Optional: Neural network playground Optional: Nature article on deep learning	ps7 (data for q2, data for q3) due May 9th at 10pm
25	Apr 28 (Thurs)	Lab on deep learning TensorFlow tutorial by Yijun Xiao
26	May 3 (Tues)	Project presentations (group 1)
27	May 5 (Thurs)	Project presentations (group 2)		Final project writeup due May 17th at 12pm, via NYU classes

Acknowledgements: Many thanks to the University of Washington, Carnegie Mellon University, UT Dallas, Stanford, UC Irvine, Princeton, and MIT for sharing material used in slides and homeworks.

Reference materials

Machine learning books

Trevor Hastie, Rob Tibshirani, and Jerry Friedman, Elements of Statistical Learning, Second Edition, Springer, 2009. (Can be downloaded as PDF file.)
David Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press, 2012. (Can be downloaded as PDF file.)
Shai Shalev-Shwartz, and Shai Ben-David, Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press, 2014. (Can be downloaded as PDF file.)

Probability

Chapter 2 of either Murphy or Bishop (see also Bishop Appendix B)
Review notes from Stanford's machine learning class
Sam Roweis's probability review

Linear algebra

Bishop Appendix C
Online class from MIT
Review notes from Stanford's machine learning class
Sam Roweis's linear algebra review

Calculus

Bishop Appendix D and E (Lagrange multipliers)
Notes from MIT on Lagrange multipliers
Dan Klein's Lagrange Multipliers without Permanent Scarring

Optimization

Convex Optimization by Stephen Boyd and Lieven Vandenberghe. (Can be downloaded as PDF file.)

Problem Set policy

I expect you to try solving each problem set on your own. However, when being stuck on a problem, I encourage you to collaborate with other students in the class, subject to the following rules:

You may discuss a problem with any student in this class, and work together on solving it. This can involve brainstorming and verbally discussing the problem, going together through possible solutions, but should not involve one student telling another a complete solution.
Once you solve the homework, you must write up your solutions on your own, without looking at other people's write-ups or giving your write-up to others.
In your solution for each problem, you must write down the names of any person with whom you discussed it. This will not affect your grade.
Do not consult solution manuals or other people's solutions from similar courses.

Late submission policy: During the semester you are allowed at most two extensions on the homework assignment. Each extension is for at most 48 hours and carries a penalty of 25% off your assignment.

MIT Accessibility