Project information
Due date:
Dec. 16 at 5pm (electronically, as PDF file, by e-mail to Professor Sontag)
Instructions for final project writeup
Suggested data sets (or better: find your own!)
:
Kaggle
Identify patients diagnosed with Type 2 Diabetes
(see
interview
with winners).
Exploratory data.
Score essays
Face recognition, collaborative filtering, web ranking
(see bottom, under "Projects")
See
here
for more collaborative filtering data
20 Newsgroups
Blogs
(with spam labels)
Enron e-mail data set
(see also
here
)
Congress voting records
Twitter, Slashdot, etc.
Large network datasets
NYTimes news articles
Useful links:
Python for data scientists
scikit.learn
: Python machine learning modules (very good!)
SVM
light
software (also very good)
Matlab/Octave resources
(see bottom of page)
Examples of
how to write up your project