6.881 Homework #2

Due: 11/2/2004

In this homework, you will explore corpus-based approaches to lexical semantics. More concretely, you will implement and analyze a method for clustering words based on their distributional properties. By evaluating the resultant clustering on two disambiguation tasks, you will explore the merits of different representations and study the properties of the learning method.

To train your method, you will use the lecture transcript corpus from the first homework, and a 6.001 textbook source file.

What to do?

What to submit?

You have to submit a writeup that clearly explains parameters of your models, presents the results and analyzes its performance. You have to submit your code, and the output of your model. The README file should clearly specify how to run your program.