Introduction To Machine Learning
Fall 2012
Overview Machine learning is an exciting and fast-moving field of
Computer Science with many recent consumer applications
(e.g., Kinect, Google Translate, Siri, digital camera face
detection, Netflix recommendations) and applications
within the sciences and medicine (e.g., predicting
protein-protein interactions, species modeling, detecting
tumors, personalized medicine). In this
undergraduate-level class, students will learn about the
theoretical foundations of machine learning and how to
apply machine learning to solve new problems. |
||||||||||
General information Lectures: Tuesday and Thursday,
11am-12:15pm
Office hours: Tuesday
5-6pm and by appointment. Location: 715 Broadway, 12th floor, Room
1204 Grading: problem sets (50%) + midterm
exam (25%) + project (20%) + participation (5%). Problem Set policy Books: No textbook is required for this class, but students may find it helpful to purchase one of the following books. Bishop's book is much easier to read, whereas Murphy's book has substantially more depth and coverage (and is up to date).
Mailing list: To subscribe to the class list,
follow instructions here. |
Schedule
Note: the Bishop and
Murphy readings suggested below are optional.
Lecture | Date | Topic | Required reading | Assignments |
1 | Sept 4 (Tues) |
Overview [Slides]
|
Chapter
1 of Murphy's book Bishop, Chapter 1 (optional) |
|
2 | Sept 6 (Th) |
Introduction to learning [Slides] Loss functions, Perceptron algorithm |
Review notes on probability,
linear algebra, and calculus |
|
3 |
Sept 11 (Tues) |
Linear classifiers [Slides] Proof of perceptron mistake bound, introduction to Support vector machines |
Notes
on perceptron mistake bound (just section 1) Notes on support vector machines (sections 1-4) Bishop, Section 4.1.1 (pg. 181-182) and Chapter 7 (pg. 325-328) Murphy, Section 14.5.2 (pg. 498-501) |
ps1 (data) due Sept 25 at
11am [Solutions] |
4 |
Sept 13 (Th) |
Support vector machines [Slides] |
See above. Also: Bishop, Sections 7.1.1 and 7.1.3 |
|
5 |
Sept 18 (Tues) |
Support vector machines (continued) [Slides] Derivation of SVM dual, introduction to kernels |
Optional: Hastie (book online for free
- see below), Section 4.5 (pg. 129) |
|
6 |
Sept 20 (Th) |
Kernel methods [Slides] | Notes
on kernel methods (sec. 3-8) Bishop, Section 6.2, Section 7.1 (except for 7.1.4), and Appendix E Murphy, Chapter 14 (except 14.4 and 14.7) Optional: For more on SVMs, see Hastie, Sections 12.1-12.3 (pg. 435). For more on cross-validation see Hastie, Section 7.10 (pg. 250). Optional: For more advanced kernel methods, see chapter 3 of this book (free online from MIT libraries) |
|
7 |
Sept 25 (Tues) |
Kernel methods & optimization Mercer's theorem, convexity |
Lecture
notes |
|
8 |
Sept 27 (Th) |
Learning theory [Slides] Generalization of finite hypothesis spaces |
These have only high-level overviews: - Murphy, Section 6.5.4 (pg. 209) - Bishop, Section 7.1.5 (pg. 344) |
|
9 |
Oct 2 (Tues) |
Learning theory (continued) [Slides] VC-dimension |
Notes
on learning theory |
ps2, due Oct 8 by 4pm [Solutions] |
10 |
Oct 4 (Th) |
Nearest neighbor methods [Slides] Also margin-based generalization |
Notes
on gap-tolerant classifiers (section 7.1, pg. 29-31) Hastie et al., Sections 13.3-13.5 (on nearest neighbor methods) |
|
11 |
Oct 9 (Tues) |
Decision trees [Slides] |
Bishop, Section 14.4 (pg. 663) Murphy, Section 16.2 |
|
12 |
Oct 11 (Th) No class on Oct 16 (Fall recess) |
Ensemble methods, Boosting [Slides] |
Hastie et al., Section 8.7 (bagging)
and Sections 10.1-10.6 (boosting) Boosting overview (except sections 6 and 8) Bishop, Sections 14.2 & 14.3 Murphy, Section 16.4 Optional: Hastie et al. Chapter 15 (on random forests) |
|
13 |
Oct 18 (Th) |
Boosting (continued) [Slides] |
A
Few Useful Things to Know About Machine Learning |
ps3, due Oct 25 at 11am |
Oct 23 (Tues) |
Midterm exam
|
|||
14 |
Oct 25 (Th) |
Clustering [Slides] K-means, Agglomerative clustering |
Hastie et al., Sections 14.3.6,
14.3.8, 14.3.9, 14.3.12 Murphy, Sections 25.1, 25.5-25.5.3 Bishop, Section 9.1 (pg. 424) |
ps4 (2 page project proposal), due Nov. 6 at
5pm by e-mail |
Oct 30 (Tues) |
Class cancelled due to Hurricane Sandy |
|||
Nov 1 (Th) |
Class cancelled due to Hurricane Sandy |
|||
15 |
Nov 6 (Tues) |
Clustering (continued)
[Slides] Spectral clustering |
Hastie et al., Section 14.5.3 Optional: Tutorial on spectral clustering Murphy, Section 25.4 |
|
16 |
Nov 8 (Th) |
Introduction to Bayesian methods
[Slides] Probability, decision theory |
Murphy, Sections 3-3.3 Bishop, Sections 2-2.3.4 |
|
17 |
Nov 13 (Tues) |
Naive Bayes [Slides] |
Murphy, Sections 3.4, 3.5 (naive Bayes), 5.7
(decision theory) Bishop, Section 1.5 (decision theory) |
|
18 |
Nov 15 (Th) |
Logistic regression [Slides] |
Notes
on naive Bayes and logistic regression Murphy, 8-8.3 (logistic reg.), 8.6 (generative vs. discriminative) Bishop, 4.2-4.3.4 (logistic reg.) |
ps5, due Nov 27 at 11am [Solutions] |
19 |
Nov 20 (Tues) No class on Nov 22 (Thanksgiving) |
Logistic Regression (continued)
[Slides] |
||
20 |
Nov 27 (Tues) |
Linear Regression [Slides] |
Hastie et al., Chapter 3 (pages
43-47) Notes on linear regression (see Part 1) Murphy, Section 7-7.5 Bishop, Sections 3-3.2 |
|
21 |
Nov 29 (Th) |
Mixture models, EM algorithm
[Slides] |
Notes
on mixture models Notes on Expectation Maximization Murphy, 11-11.4.2.5, Section 11.4.7 Bishop, Sections 9.2, 9.3, 9.4 |
|
22 |
Dec 4 (Tues) |
EM algorithm (continued) [Slides] |
ps6, due Dec 11 at 11am | |
23 |
Dec 6 (Th) |
Hidden Markov models [Slides] |
Notes
on HMMs Tutorial on HMMs Introduction to Bayesian networks Murphy, Chapter 17 Bishop, Sections 8.4.1, 13.1-2 |
|
24 |
Dec 11 (Tues) |
Factor analysis & dimensionality reduction (PCA) [Slides] | Notes
on PCA More notes on PCA Bishop, Sections 12.1 (PCA), 12.4.1 (ICA) Optional: Barber, Chapter 15 |
|
25 |
Dec 13 (Th) |
Dimensionality reduction (continued)
[Slides] Latent Dirichlet allocation |
Review
article on topic modeling |
|
Dec 18 (Tues) |
Project
presentations 10 - 11:50am, WWH 317 |
Final
projects due Dec 16 at 5pm (electronically) |
Reference materials
|
PrerequisitesThis is an undergraduate-level course. Students should be very comfortable with basic mathematical skills in addition to good programming skills. Some knowledge of probability theory and statistics, linear algebra, and multivariable calculus will be helpful. Basic Algorithms (CSCI-UA.0310) is a prerequisite, although well qualified students taking it during Fall 2012 will be accepted with permission of the instructor. Please contact the professor with any additional questions. |
Problem
Set policy I expect you to try solving each problem set on your own. However, when being stuck on a problem, I encourage you to collaborate with other students in the class, subject to the following rules:
|