6.864: Projects

Gilbert Street
[Main] | [General Information] | [Problem Sets]

Overview

It's up to you to choose a topic for your final project. It should be on work that is clearly related to the class material, but could potentially be in any area of NLP. The project will most likely involve the implementation of some algorithm/technique applied to an NLP problem.

Group projects, with a maximum of 4 people, are allowed---the amount of work involved should scale linearly with the number of people, and all people in a group should contribute an equal amount of effort. The amount of work involved (per person) in the project should be equivalent to around 2 problem sets.

Some example projects:

A one person group should write a 4 page project description. Two/three/four person groups should submit 6/8/10 page reports respectively. Each group should submit a single write-up. The project reports should be written in 11 pt font; the page limits will be enforced. The report should describe the project's goals, the methods used, and experimental results.

Due Dates

The final report is due on Wednesday, December 5th. This is a hard deadline; there are no "late days" like those we've had for the homeworks.

In addition, a preliminary report is due on Wednesday, November 28th. This should be 1-2 pages in length. It should describe results that you've obtained at that point, and should convince us that you are on track to complete the project by December 5th.

Finally, please submit a project proposal by Friday November 2nd, to mcollins@csail.mit.edu and igorm@csail.mit.edu. Class on November 8th will be cancelled. Instead, I'll arrange to meet with each group for 30 minutes to discuss their project. The aim will be for all meetings to take place on November 7th and November 8th.

The proposal should contain the following information:

Corpora

For many (most) projects, you'll need some access to corpora. There are now many available datasets: treebanks exist in many languages including English; bilingual data (for machine translation projects) is also freely available; many other sources of data are available. Please don't hesitate to ask us for pointers to suitable corpora for your project.

For some initial pointers, Chris Manning has a webpage which lists many useful resources for work in NLP, including both corpora and software.