Introduction To Machine Learning, Fall 2012

Introduction To Machine Learning

Fall 2012

Overview

Machine learning is an exciting and fast-moving field of Computer Science with many recent consumer applications (e.g., Kinect, Google Translate, Siri, digital camera face detection, Netflix recommendations) and applications within the sciences and medicine (e.g., predicting protein-protein interactions, species modeling, detecting tumors, personalized medicine). In this undergraduate-level class, students will learn about the theoretical foundations of machine learning and how to apply machine learning to solve new problems.

General information

Lectures: Tuesday and Thursday, 11am-12:15pm
Room: Warren Weaver Hall 317

Instructor:
Prof. David Sontag
dsontag {@ | at} cs.nyu.edu

Grader:
Jinglun Dong
jinglundong {@ | at} gmail.com

Office hours: Tuesday 5-6pm and by appointment. Location: 715 Broadway, 12th floor, Room 1204

Grading: problem sets (50%) + midterm exam (25%) + project (20%) + participation (5%). Problem Set policy

Books: No textbook is required for this class, but students may find it helpful to purchase one of the following books. Bishop's book is much easier to read, whereas Murphy's book has substantially more depth and coverage (and is up to date).

Machine Learning: a Probabilistic Perspective, by Kevin Murphy (2012).
Pattern Recognition and Machine Learning, by Chris Bishop (2006).

Mailing list: To subscribe to the class list, follow instructions here.

Project information

Schedule

Note: the Bishop and Murphy readings suggested below are optional.

Lecture	Date	Topic	Required reading	Assignments
1	Sept 4 (Tues)	Overview [Slides ]	Chapter 1 of Murphy's book Bishop, Chapter 1 (optional)
2	Sept 6 (Th)	Introduction to learning [Slides ] Loss functions, Perceptron algorithm	Review notes on probability, linear algebra, and calculus
3	Sept 11 (Tues)	Linear classifiers [Slides] Proof of perceptron mistake bound, introduction to Support vector machines	Notes on perceptron mistake bound (just section 1) Notes on support vector machines (sections 1-4) Bishop, Section 4.1.1 (pg. 181-182) and Chapter 7 (pg. 325-328) Murphy, Section 14.5.2 (pg. 498-501)	ps1 (data) due Sept 25 at 11am [Solutions]
4	Sept 13 (Th)	Support vector machines [Slides]	See above. Also: Bishop, Sections 7.1.1 and 7.1.3
5	Sept 18 (Tues)	Support vector machines (continued) [Slides] Derivation of SVM dual, introduction to kernels	Optional: Hastie (book online for free - see below), Section 4.5 (pg. 129)
6	Sept 20 (Th)	Kernel methods [Slides]	Notes on kernel methods (sec. 3-8) Bishop, Section 6.2, Section 7.1 (except for 7.1.4), and Appendix E Murphy, Chapter 14 (except 14.4 and 14.7) Optional: For more on SVMs, see Hastie, Sections 12.1-12.3 (pg. 435). For more on cross-validation see Hastie, Section 7.10 (pg. 250). Optional: For more advanced kernel methods, see chapter 3 of this book (free online from MIT libraries)
7	Sept 25 (Tues)	Kernel methods & optimization Mercer's theorem, convexity	Lecture notes
8	Sept 27 (Th)	Learning theory [Slides] Generalization of finite hypothesis spaces	These have only high-level overviews: - Murphy, Section 6.5.4 (pg. 209) - Bishop, Section 7.1.5 (pg. 344)
9	Oct 2 (Tues)	Learning theory (continued) [Slides] VC-dimension	Notes on learning theory	ps2, due Oct 8 by 4pm [Solutions]
10	Oct 4 (Th)	Nearest neighbor methods [Slides] Also margin-based generalization	Notes on gap-tolerant classifiers (section 7.1, pg. 29-31) Hastie et al., Sections 13.3-13.5 (on nearest neighbor methods)
11	Oct 9 (Tues)	Decision trees [Slides]	Bishop, Section 14.4 (pg. 663) Murphy, Section 16.2
12	Oct 11 (Th) No class on Oct 16 (Fall recess)	Ensemble methods, Boosting [Slides]	Hastie et al., Section 8.7 (bagging) and Sections 10.1-10.6 (boosting) Boosting overview (except sections 6 and 8) Bishop, Sections 14.2 & 14.3 Murphy, Section 16.4 Optional: Hastie et al. Chapter 15 (on random forests)
13	Oct 18 (Th)	Boosting (continued) [Slides]	A Few Useful Things to Know About Machine Learning	ps3, due Oct 25 at 11am
	Oct 23 (Tues)	Midterm exam
14	Oct 25 (Th)	Clustering [Slides] K-means, Agglomerative clustering	Hastie et al., Sections 14.3.6, 14.3.8, 14.3.9, 14.3.12 Murphy, Sections 25.1, 25.5-25.5.3 Bishop, Section 9.1 (pg. 424)	ps4 (2 page project proposal), due Nov. 6 at 5pm by e-mail
	Oct 30 (Tues)	Class cancelled due to Hurricane Sandy
	Nov 1 (Th)	Class cancelled due to Hurricane Sandy
15	Nov 6 (Tues)	Clustering (continued) [Slides] Spectral clustering	Hastie et al., Section 14.5.3 Optional: Tutorial on spectral clustering Murphy, Section 25.4
16	Nov 8 (Th)	Introduction to Bayesian methods [Slides] Probability, decision theory	Murphy, Sections 3-3.3 Bishop, Sections 2-2.3.4
17	Nov 13 (Tues)	Naive Bayes [Slides]	Murphy, Sections 3.4, 3.5 (naive Bayes), 5.7 (decision theory) Bishop, Section 1.5 (decision theory)
18	Nov 15 (Th)	Logistic regression [Slides]	Notes on naive Bayes and logistic regression Murphy, 8-8.3 (logistic reg.), 8.6 (generative vs. discriminative) Bishop, 4.2-4.3.4 (logistic reg.)	ps5, due Nov 27 at 11am [Solutions]
19	Nov 20 (Tues) No class on Nov 22 (Thanksgiving)	Logistic Regression (continued) [Slides]
20	Nov 27 (Tues)	Linear Regression [Slides]	Hastie et al., Chapter 3 (pages 43-47) Notes on linear regression (see Part 1) Murphy, Section 7-7.5 Bishop, Sections 3-3.2
21	Nov 29 (Th)	Mixture models, EM algorithm [Slides]	Notes on mixture models Notes on Expectation Maximization Murphy, 11-11.4.2.5, Section 11.4.7 Bishop, Sections 9.2, 9.3, 9.4
22	Dec 4 (Tues)	EM algorithm (continued) [Slides]		ps6, due Dec 11 at 11am
23	Dec 6 (Th)	Hidden Markov models [Slides]	Notes on HMMs Tutorial on HMMs Introduction to Bayesian networks Murphy, Chapter 17 Bishop, Sections 8.4.1, 13.1-2
24	Dec 11 (Tues)	Factor analysis & dimensionality reduction (PCA) [Slides]	Notes on PCA More notes on PCA Bishop, Sections 12.1 (PCA), 12.4.1 (ICA) Optional: Barber, Chapter 15
25	Dec 13 (Th)	Dimensionality reduction (continued) [Slides] Latent Dirichlet allocation	Review article on topic modeling
	Dec 18 (Tues)	Project presentations 10 - 11:50am, WWH 317		Final projects due Dec 16 at 5pm (electronically)

Acknowledgements: Many thanks to the University of Washington, Carnegie Mellon University, UT Dallas, Stanford, UC Irvine, Princeton, and MIT for sharing material used in slides and homeworks.

Reference materials

Machine learning books

Trevor Hastie, Rob Tibshirani, and Jerry Friedman, Elements of Statistical Learning, Second Edition, Springer, 2009. (Can be downloaded as PDF file.)
David Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press, 2012. (Can be downloaded as PDF file.)

Probability

Chapter 2 of either Murphy or Bishop (see also Bishop Appendix B)
Review notes from Stanford's machine learning class
Sam Roweis's probability review

Linear algebra

Bishop Appendix C
Online class from MIT
Review notes from Stanford's machine learning class
Sam Roweis's linear algebra review

Calculus

Bishop Appendix D and E (Lagrange multipliers)
Notes from MIT on Lagrange multipliers
Dan Klein's Lagrange Multipliers without Permanent Scarring

Optimization

Convex Optimization by Stephen Boyd and Lieven Vandenberghe. (Can be downloaded as PDF file.)

Prerequisites

This is an undergraduate-level course. Students should be very comfortable with basic mathematical skills in addition to good programming skills. Some knowledge of probability theory and statistics, linear algebra, and multivariable calculus will be helpful. Basic Algorithms (CSCI-UA.0310) is a prerequisite, although well qualified students taking it during Fall 2012 will be accepted with permission of the instructor. Please contact the professor with any additional questions.

Problem Set policy

I expect you to try solving each problem set on your own. However, when being stuck on a problem, I encourage you to collaborate with other students in the class, subject to the following rules:

You may discuss a problem with any student in this class, and work together on solving it. This can involve brainstorming and verbally discussing the problem, going together through possible solutions, but should not involve one student telling another a complete solution.
Once you solve the homework, you must write up your solutions on your own, without looking at other people's write-ups or giving your write-up to others.
In your solution for each problem, you must write down the names of any person with whom you discussed it. This will not affect your grade.
Do not consult solution manuals or other people's solutions from similar courses.