6.S085 Statistics for Research Projects
IAP 2012
Finale DoshiVelez
( finale at mit dot edu )
Ramesh Sridharan
( rameshvs at mit dot edu )
Instructor List
( iapstats at mit dot edu )
This class is a practical introduction to statistical modeling and experimental design, intended to provide essential skills for doing research. We will cover basic techniquessuch as hypothesistesting and regression modelsfor both traditional experiments and newer paradigms such as evaluating simulations. Students with research projects will be encouraged to share their experiences and projectspecific questions.
Students are expected to attend class and participate in discussions. Coursework will consist of two “practicals”analyzing simple datasets to solidify core concepts—and two “case studies”critical reading assignments of actual articles. Each assignment should take on the order of one hour; some class time may be devoted to doing practicals in class as time allows. Students are welcome to work in groups, but each student must submit an individual writeup in his or her own words. If you do work in a group, please also indicate with whom you worked. To pass, students must get a check/check+ on all assignments.
Finally, as this class is meant to be practical, I welcome any suggestions on topics and teaching style that will help you gain more from this course.
Schedule
Date 
Topics and Assignments 
January 23 
Introduction to statistics terminology. Begin exploratory data analysis and hypothesis testing on a single variable. 
January 24 
Hypothesis testing on a single varable (continued). Begin exploratory data analysis and regression for multiple quantitative variables. Due: paragraph on your research interests 
January 25 
Regression for multiple quantitative variables (continued). 
January 26 
Advanced regression techniques. Due: practical one 
January 30 
Discussion on model fitting. Exploratory data analysis and inference for categorical variables with a count response. Due: practical two 
January 31 
Exploratory data analysis and inference for categorical variables with a quantitative response. Due: case study one 
February 1 
Experimental Design

February 2 
TBA, based on student interests Due: case study two 
Practicals
Each of the practicals involves carrying out some statistical analysis on some small realworld datasets. You may use any software to complete the assignments; all the data is in commaseparated format which should be readable by most software packages. If you do not already have a favorite, We encourage you to try out R. We're also familiar with Matlab and (slightly less) Excel/OpenOffice. Outside of those, We'll do my best to help, but we can't promise to get you unstuck. Finally, keep in mind that in most cases, each analysis will be a single line of R code. Rarely will it be more than three. Please contact me if you find yourself getting bogged down in trying to run the analyses.
In your writeup (feel free to use bullet points/keep it brief), make sure you explain your reasoning for the tests that you ran and the parameter settings that you used. Also explain and interpret the results of any exploratory data analysis and statisical inference. Include relevant plots and output to back up your claims; however, we qdon't want to see loads of printouts! Your job is to provide succinct summaries of your analysis, not the computer spew.
Assignment One: Hypotheses Testing , dataset: cellphones
Assignment Two: Regression , dataset: bats
Additional pointers for those using R: This short reference card contains a quicklookup list of a lot of common functions. If you need more extensive data manipulation, this card is also a good reference. We've also listed the key commands/syntax you'll need for the assignments here.
Case Studies
Review each of the two articles below. Each review should be no more than one page. Lists, bullet points, etc. are fine as long as your writing is clear. Reviews should consist of:
Summary: What was the objective of the study? Summarize the hypothesis, design methodology, analysis approach, and major findings. (This is to check whether you understood the study.)
Experimental Design: Was the experimental design appropriate for the study? Provide your reasoning for both sound and unsound aspects.
Statistical Analysis: Was the statitiscal analysis sound? Provide your reasoning for both sound and unsound aspects.
Case Study One: Gauges
Case Study Two: Mangosteen
Links
Graphpad : Table of which statistical test to run for different data types.
Onlinestatsbook : contains many of the demos used in class.
Statnotes : Nice overview of many common methods (more methods than we will cover in class).
18.443 OCW Website : Contains coursenotes and references (Probability and Statistics, deGroot) for applied statistics at MIT. Good if you want a little more math/derivations behind the tests.
Introduction the Practice of Statistics, Moore and McCabe : clear introductio to the key concepts without a lot of math (good for intuition; you'll probably need something more detailed for your actual analysis).
Also, note that wikipedia is often a good starting point if you want to look up a summary about a test or distribution (the descriptions are generally quite accurate, and the references will get you more information).
Finally, here are the text files of the demos used in class:Basic Regression ,Robust Regression,QQ plots,and ChiSquared Tests.