6.S085 Statistics for Research Projects

IAP 2012

Finale Doshi-Velez

( finale at mit dot edu )

Ramesh Sridharan

( rameshvs at mit dot edu )

Instructor List

( iap-stats at mit dot edu )

This class is a practical introduction to statistical modeling and experimental design, intended to provide essential skills for doing research. We will cover basic techniques--such as hypothesis-testing and regression models--for both traditional experiments and newer paradigms such as evaluating simulations. Students with research projects will be encouraged to share their experiences and project-specific questions.

Students are expected to attend class and participate in discussions. Coursework will consist of two “practicals”--analyzing simple datasets to solidify core concepts—and two “case studies”--critical reading assignments of actual articles. Each assignment should take on the order of one hour; some class time may be devoted to doing practicals in class as time allows. Students are welcome to work in groups, but each student must submit an individual write-up in his or her own words. If you do work in a group, please also indicate with whom you worked. To pass, students must get a check/check+ on all assignments.

Finally, as this class is meant to be practical, I welcome any suggestions on topics and teaching style that will help you gain more from this course.

Schedule

Date

Topics and Assignments

January 23

Introduction to statistics terminology. Begin exploratory data analysis and hypothesis testing on a single variable.

January 24

Hypothesis testing on a single varable (continued). Begin exploratory data analysis and regression for multiple quantitative variables.

Due: paragraph on your research interests

January 25

Regression for multiple quantitative variables (continued).

January 26

Advanced regression techniques.

Due: practical one

January 30

Discussion on model fitting. Exploratory data analysis and inference for categorical variables with a count response.

Due: practical two

January 31

Exploratory data analysis and inference for categorical variables with a quantitative response.

Due: case study one

February 1

Experimental Design


February 2

TBA, based on student interests

Due: case study two



Practicals

Each of the practicals involves carrying out some statistical analysis on some small realworld datasets. You may use any software to complete the assignments; all the data is in comma-separated format which should be readable by most software packages. If you do not already have a favorite, We encourage you to try out R. We're also familiar with Matlab and (slightly less) Excel/OpenOffice. Outside of those, We'll do my best to help, but we can't promise to get you unstuck. Finally, keep in mind that in most cases, each analysis will be a single line of R code. Rarely will it be more than three. Please contact me if you find yourself getting bogged down in trying to run the analyses.

In your write-up (feel free to use bullet points/keep it brief), make sure you explain your reasoning for the tests that you ran and the parameter settings that you used. Also explain and interpret the results of any exploratory data analysis and statisical inference. Include relevant plots and output to back up your claims; however, we qdon't want to see loads of print-outs! Your job is to provide succinct summaries of your analysis, not the computer spew.

Assignment One: Hypotheses Testing , dataset: cellphones

Assignment Two: Regression , dataset: bats

Additional pointers for those using R: This short reference card contains a quick-lookup list of a lot of common functions. If you need more extensive data manipulation, this card is also a good reference. We've also listed the key commands/syntax you'll need for the assignments here.

Case Studies

Review each of the two articles below. Each review should be no more than one page. Lists, bullet points, etc. are fine as long as your writing is clear. Reviews should consist of:

Case Study One: Gauges

Case Study Two: Mangosteen

Links

Finally, here are the text files of the demos used in class:Basic Regression ,Robust Regression,QQ plots,and Chi-Squared Tests.

Acknowledgements: Thanks to Micheal Bernstein (and Missy Cummings) for their syllbus suggestions and Bobby Gramacy for his regression notes.