6.085 Statistics for Research Projects
IAP 2010
Finale Doshi-Velez
( finale at mit dot edu )
This class is a practical introduction to statistical modeling and experimental design, intended to provide essential skills for doing research. We will cover basic techniques--such as hypothesis-testing and regression models--for both traditional experiments and newer paradigms such as evaluating simulations. Students with research projects will be encouraged to share their experiences and project-specific questions.
Students are expected to attend class and participate in discussions. Coursework will consist of three “practicals”--analyzing simple datasets to solidify core concepts—and two “case studies”--critical reading assignments of actual articles. Each assignment should take on the order of one hour; some class time may be devoted to doing practicals in class as time allows. Students are welcome to work in groups, but each student must submit an individual write-up in his or her own words. If you do work in a group, please also indicate with whom you worked. To pass, students must get a check/check+ on all assignments.
Finally, as this class is meant to be practical, I welcome any suggestions on topics and teaching style that will help you gain more from this course.
Schedule
Date |
Topics and Assignments |
January 6 |
Introduction to statistics terminology. Begin exploratory data analysis and hypothesis testing on a single variable. |
January 8 |
Hypothesis testing on a single varable (continued). Begin exploratory data analysis and regression for multiple quantitative variables. Due: paragraph on your research interests |
January 11 |
Regression for multiple quantitative variables (continued). Due: practical one |
January 13 |
Advanced regression techniques. |
January 15 |
Discussion on model fitting. Exploratory data analysis and inference for categorical variables with a count response. Due: practical two |
January 20 |
Exploratory data analysis and inference for categorical variables with a quantitative response. Due: case study one |
January 22 |
Experimental Design
|
January 25 |
Testing for distributions and rank Due: practical three |
January 27 |
TBD – more advanced topics/discussion of student projects Due: case study two |
Practicals
Each of the practicals involves carrying out some statistical analysis on some small realworld datasets. You may use any software to complete the assignments; all the data is in comma-separated format which should be readable by most software packages. If you do not already have a favorite, I encourage you to try out R. I'm also familiar with Matlab and (slightly less) Excel/OpenOffice. Outside of those, I'll do my best to help, but I can't promise to get you unstuck. Finally, keep in mind that in most cases, each analysis will be a single line of R code. Rarely will it be more than three. Please contact me if you find yourself getting bogged down in trying to run the analyses.
In your write-up (feel free to use bullet points/keep it brief), make sure you explain your reasoning for the tests that you ran and the parameter settings that you used. Also explain and interpret the results of any exploratory data analysis and statisical inference. Include relevant plots and output to back up your claims; however, I don't want to see loads of print-outs! Your job is to provide succinct summaries of your analysis, not the computer spew.
Assignment One: Hypotheses Testing , dataset: cellphones
Assignment Two: Regression , dataset: bats
Assignment Three: Multiple Variables , dataset: cellphones (see above)
Additional pointers for those using R: This short reference card contains a quick-lookup list of a lot of common functions. If you need more extensive data manipulation, this card is also a good reference. I've also listed the key commands/syntax you'll need for the assignments here.
Case Studies
Review each of the two articles below. Each review should be no more than one page. Lists, bullet points, etc. are fine as long as your writing is clear. Reviews should consist of:
Summary: What was the objective of the study? Summarize the hypothesis, design methodology, analysis approach, and major findings. (This is to check whether you understood the study.)
Experimental Design: Was the experimental design appropriate for the study? Provide your reasoning for both sound and unsound aspects.
Statistical Analysis: Was the statitiscal analysis sound? Provide your reasoning for both sound and unsound aspects.
Case Study One: Gauges
Case Study Two: Mangosteen
Links
Graphpad : Table of which statistical test to run for different data types.
Onlinestatsbook : contains many of the demos used in class.
Statnotes : Nice overview of many common methods (more methods than we will cover in class).
18.443 OCW Website : Contains coursenotes and references (Probability and Statistics, deGroot) for applied statistics at MIT. Good if you want a little more math/derivations behind the tests.
Introduction the Practice of Statistics, Moore and McCabe : clear introductio to the key concepts without a lot of math (good for intuition; you'll probably need something more detailed for your actual analysis).
Also, note that wikipedia is often a good starting point if you want to look up a summary about a test or distribution (the descriptions are generally quite accurate, and the references will get you more information).
Acknowledgements: Thanks to Micheal Bernstein (and Missy Cummings) for their syllbus suggestions and Bobby Gramacy for his regression notes.