Project groups must consist of 2 or 3 people.
A 2-page PDF of your project proposal (one per group) is due by Monday Mar. 28 at 10pm, submitted via NYU classes. The course staff will send you feedback ~1.5 weeks later. Please note that your project proposal will factor into your overall project grade, so make sure that it is written well and follows the below guidelines.
I have posted a few links to a few publicly available data sets here (by no means comprehensive!), and may post more over the next few weeks.
You are strongly encouraged to think out of the box and think of new problems that you can tackle using machine learning, and where you can get data from. Also, please do post to Piazza to discuss ideas or send pointers to data sets that you think might be interesting for other students.
Your project proposal must detail the data that you plan to use, how you will pre-process it, and a precise plan of action, including what questions you would like to ask/problems to solve, machine learning algorithm(s) you hope to apply, how you will perform your evaluation (e.g., for supervised prediction you might use cross validation, looking at accuracy; then you might analyze your false positives/negatives to understand where and why the algorithms succeed/fail), a timeline for your work, and an explanation of what you expect to learn from your project. I strongly encourage you to download the data and explore it carefully prior to submitting your project proposal.
This is meant to be open ended, and I don't expect any two projects to be similar (that said, it is permissible for two groups to use the same data set). The goal here is for you to spend time thinking deeply about machine learning. To give you an idea of the scope, I am expecting you to spend ~40 hours (per person) between now and the end of the semester on the project. Do not forget that you will also have 2 more homework assignments and so you'll have to split your time between the project and regular homework.