6.085 Statistics for Research Projects

IAP 2010

Finale Doshi-Velez

( finale at mit dot edu )

This class is a practical introduction to statistical modeling and experimental design, intended to provide essential skills for doing research. We will cover basic techniques--such as hypothesis-testing and regression models--for both traditional experiments and newer paradigms such as evaluating simulations. Students with research projects will be encouraged to share their experiences and project-specific questions.

Students are expected to attend class and participate in discussions. Coursework will consist of three “practicals”--analyzing simple datasets to solidify core concepts—and two “case studies”--critical reading assignments of actual articles. Each assignment should take on the order of one hour; some class time may be devoted to doing practicals in class as time allows. Students are welcome to work in groups, but each student must submit an individual write-up in his or her own words. If you do work in a group, please also indicate with whom you worked. To pass, students must get a check/check+ on all assignments.

Finally, as this class is meant to be practical, I welcome any suggestions on topics and teaching style that will help you gain more from this course.



Topics and Assignments

January 6

Introduction to statistics terminology. Begin exploratory data analysis and hypothesis testing on a single variable.

January 8

Hypothesis testing on a single varable (continued). Begin exploratory data analysis and regression for multiple quantitative variables.

Due: paragraph on your research interests

January 11

Regression for multiple quantitative variables (continued).

Due: practical one

January 13

Advanced regression techniques.

January 15

Discussion on model fitting. Exploratory data analysis and inference for categorical variables with a count response.

Due: practical two

January 20

Exploratory data analysis and inference for categorical variables with a quantitative response.

Due: case study one

January 22

Experimental Design

January 25

Testing for distributions and rank

Due: practical three

January 27

TBD – more advanced topics/discussion of student projects

Due: case study two


Each of the practicals involves carrying out some statistical analysis on some small realworld datasets. You may use any software to complete the assignments; all the data is in comma-separated format which should be readable by most software packages. If you do not already have a favorite, I encourage you to try out R. I'm also familiar with Matlab and (slightly less) Excel/OpenOffice. Outside of those, I'll do my best to help, but I can't promise to get you unstuck. Finally, keep in mind that in most cases, each analysis will be a single line of R code. Rarely will it be more than three. Please contact me if you find yourself getting bogged down in trying to run the analyses.

In your write-up (feel free to use bullet points/keep it brief), make sure you explain your reasoning for the tests that you ran and the parameter settings that you used. Also explain and interpret the results of any exploratory data analysis and statisical inference. Include relevant plots and output to back up your claims; however, I don't want to see loads of print-outs! Your job is to provide succinct summaries of your analysis, not the computer spew.

Assignment One: Hypotheses Testing , dataset: cellphones

Assignment Two: Regression , dataset: bats

Assignment Three: Multiple Variables , dataset: cellphones (see above)

Additional pointers for those using R: This short reference card contains a quick-lookup list of a lot of common functions. If you need more extensive data manipulation, this card is also a good reference. I've also listed the key commands/syntax you'll need for the assignments here.

Case Studies

Review each of the two articles below. Each review should be no more than one page. Lists, bullet points, etc. are fine as long as your writing is clear. Reviews should consist of:

Case Study One: Gauges

Case Study Two: Mangosteen


Acknowledgements: Thanks to Micheal Bernstein (and Missy Cummings) for their syllbus suggestions and Bobby Gramacy for his regression notes.