Statistics

Breiman (2001): "Statistical Modeling: The Two Cultures" pdf
summary
This paper criticizes 1990s statistics culture from a machine learning perspective. Breiman claims statisticians need to focus more on "algorithmic" models, which neglect the data generating process and simply attempt to make predictions. Breiman also argues that predictive power and model comparison should be the fundamental test of any model, generative or algorithmic. One of my take-aways from this article is the interesting an difference between the machine learning and statistics cultures. Machine learning, which has more of a philosophy of engineering, emphasized methods such as prediction and model comparison in order to make their models actually work. Statistics, which has more of a theoretical (scientific and mathematical) history, emphasized making binary decisions on hypotheses and elegant models.

Gelman (2013): "P-values and Statistical Practice" pdf
Lyon (2013): "Why are Normal Distributions Normal?" pdf
summary
The author argues firstly that Normal distributions may not be as common as we think, and Log-Normal distributions may be more common, and secondly that the standard explanation that Normal distributions often arise from the CLT may usually be incorrect. Instead, the author proposes that Normal distributions may often arise because of their maximum entropy property. If you fix the mean and variance of a product, e.g. for quality control, then as you increase the entropy by adding steps in its manufacturing, you will arive having a Normal distribution.

Reinhart: "Statistics Done Wrong" link
summary
This website provides a clear and concise overview of many of the most common errors practitioners make in data analysis and ways to avoid those errors.

Shalizi: "Advanced Undergraduate Data Analysis" link

Approximate Bayesian Computation

Chiachio et al. (2014): "Approximate Bayesian Computation by Subset Simulation" link
Marin et al. (2013): "Relevant statistics for Bayesian model choice" pdf
Meeds and Welling (2014): "GPS-ABC: Gaussian Process Surrogate Approximate Bayesian Computation" link
Ratmann et al. (2013): "Statistical modelling of summary values leads to accurate Approximate Bayesian Computations" link
Wilkinson (2014): "Accelerating ABC methods using Gaussian processes" link

Bayesian Statistics

Michael Jordan (2013): "Hierarchical Models, Nested Models and Completely Random Measures" pdf
Schervish and Seidenfeld (1990): "An Approach to Consensus and Certainty with Increasing Evidence" pdf

Causality

*Pearl (2009): "Causal inference in statistics: An overview" pdf

Computational Statistics

Venkat Chandrasekaran and Michael I. Jordan (2012): "Computational and Statistical Tradeoffs via Convex Relaxation" link

Markov Chain Monte Carlo

Beskos (2014): "A Stable Manifold MCMC Method for High Dimensions" pdf
Diaconis (2013): "Some things we've learned (about Markov chain Monte Carlo)" pdf
Doucet et al. (2014): "Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator" link
DuBois et al. (2014): "Approximate Slice Sampling for Bayesian Posterior Inference" pdf
summary
Applies the hypothesis-testing-to-choose-sample-size trick from Korattikara et al. to slice sampling. When choosing the slice width in slice sampling a continuous distribution, instead of doing full likelihood calculations in determining whether a point is in the slice, take a sample of the data and do a hypothesis test.

Korattikara et al. (2014): "Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget" pdf
summary
The authors suggest using samples of data to compute approximate MH likelihood ratios and hypothesis testing to decide whether to includ more data in the decision-making process for each proposal.

van de Meent et al. (2014): "Tempering by Subsampling" pdf
summary
Perform annealing in MCMC by subsampling data instead of increasing a temperature parameter on the likelihood. This makes annealing faster by allowing high-temperature sampling to go quickly.

van Dyke and Jiao (2014): "Metropolis-Hastings within Partially Collapsed Gibbs Samplers" pdf
summary
This paper explores some of the undesirable properties of partially collapsed Gibbs samplers. For example, as you might expect, partially collapsed Gibbs samplers might not have the correct stationary distributions.

Yang and Dunson (2013): "Sequential Markov Chain Monte Carlo" link

Misspecified Models

Shalizi (2009): "Dynamics of Bayesian Updating with Dependent Data and Misspecied Models" pdf
Wainwright (2006): "Estimating the "Wrong" Graphical Model: Beneļ¬ts in the Computation-Limited Setting" pdf

Networks

Hillar and Wibisono (2013): "Maximum entropy distributions on graphs" link

Statistical Tests

Borgwardt and Ghahramani (2009): "Bayesian two-sample tests" link

Peter M Krafft Last modified: Mon May 12 11:46:08 EDT 2014