Machine Learning

Grosse and Duvenaud (2014): "Testing MCMC code" pdf
summary
This is a really important practical paper for anyone who has to implement MCMC methods but is not familiar with best practices in terms of debugging. The paper discusses using modular architecture, using unit tests for conditional probability calculations, and using Geweke testing for integration testing. However, the paper does not seem to provide a way for unit testing individual samples from conditional distributions, which I don't know a good way of doing either.

Active Learning

Golovin et al. (2010): "Near–Optimal Bayesian Active Learning with Noisy Observations" pdf

Bayesian Nonparametrics

Orbanz and Roy (2013): "Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures" link
Tamara Broderick, Michael I. Jordan, and Jim Pitman (2013): "Cluster and Feature Modeling from Combinatorial Stochastic Processes" link

Causality

Claassen et al. (2013): "Learning Sparse Causal Models is not NP-hard" link
Janzing et al. (2014): "Justifying Information-Geometric Causal Inference" pdf
Mooij et al. (2013): "From Ordinary Differential Equations to Structural Causal Models: the deterministic case" pdf

Clustering

Ver Steeg et al. (2013): "Demystifying Information-Theoretic Clustering" link

Collective Graphical Models

Duong, Wellman, and Singh (2012): "Knowledge Combination in Graphical Multiagent Model" link
Duong et al. (2012): "Learning and Predicting Dynamic Networked Behavior with Graphical Multiagent Models" pdf
Kumar et al. (2013): "Collective Diusion Over Networks: Models and Inference" pdf

Computational Considerations

Michael Jordan (2013): "On statistics, computation and scalability" pdf

Control

Ortega and Braun (2013): "Generalized Thompson sampling for sequential decision-making and causal inference" pdf

Convex Optimization

Bach (2013): "Learning with Submodular Functions: A Convex Optimization Perspective" pdf
John Duchi (2009): "Introduction to Convex Optimization for Machine Learning" link
Freund et al. (2013): "AdaBoost and Forward Stagewise Regression are First-Order Convex Optimization Methods" link

Distributed Machine Learning

Baruch Awerbuch and Robert Kleinberg (2005): "Competitive Collaborative Learning" pdf (In which a system of agents, some of whom may be malicious and others of whom belong to coalitions, solve a bandit problem together)
Broderick et al. (2013): "Streaming Variational Bayes" pdf
summary
Although stochastic variational inference works for large data sets, it requires knowing the number of documents ahead of time. Streaming variational Bayes allows the number of documents to increase dynamically. This algorithm can also be implemented in a distributed, asynchronous setting. This approach uses the "classical" streaming interpretation of Bayes rule where new data is used to update the previous posterior into the new posterior. Assumingn IID data, the posterior can be further decomposed into a number of component likelihood functions, each of which can be transformed into a mini-posterior and approximated in parallel using a variational approxation algorithm. To make the algorithm asynchronous, the authors allow worker threads to possibly evaluate multiple mini-posteriors.

Campbell and How (2014): "Approximate Decentralized Bayesian Inference" pdf
summary
Label-switching is always a problem for approximate inference in mixture models, but the problem is more severe in decentralized inference, because different processors might decide on different permutations of labels, which may then be combined inappropriately. Characterizing the entire posterior including all permutations of the parameters instead of just trying to focus on one would solve this problem but is intractable, so the paper suggests finding the permutation that is best represented across different agents. The authors suggest using an approximate discrete optimization technique for this purpose.

Lee et al. (2013): "More Effective Distributed ML via a Structure-Aware Dynamic Scheduler" link
Pan et al. (2013): "Optimistic Concurrency Control for Distributed Unsupervised Learning" pdf
Pearl (1982): "Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach" pdf
Shamir (2013): "Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation" link
Silver et al. (2013): "Concurrent Reinforcement Learning from Customer Interactions" pdf
Wei et al. (2013): "Consistent Bounded-Asynchronous Parameter Servers for Distributed ML" link

Hawkes Processes

Filimonov and Sornette (2013): "Apparent criticality and calibration issues in the Hawkes self-excited point process model: application to high-frequency financial data" link

Inference

Chang et al. (2013): "A path-integral approach to Bayesian inference for inverse problems" pdf
Gogate and Domingos (2013): "Structured Message Passing" pdf
Lindsten et al. (2014): "Particle Gibbs with Ancestor Sampling" pdf
Steinhardt and Liang (2014): "Filtering with Abstract Particles" pdf
summary
This paper suggests doing approximate inference of a distribution by using local variational approximations of partitions of the distribution's support, and suggests improving this approximation by optimizing the partitions used via hierarchical decomposition.

Tarlow et al. (2012): "Fast Exact Inference for Recursive Cardinality Models" pdf

Information Geometry

Arvind Agarwal and Hal Daume III (2013): "A Geometric View of Conjugate Priors" pdf
Raskutti and Mukherjee (2013): "The information geometry of mirror descent" pdf

Interpretable Machine Learning

Lloyd et al. (2013): "Automatic Construction and Natural-language Description of Additive Nonparametric Models" pdf
summary
The title explains this paper pretty well. The authors use Gaussian processes to flexibly model bivariate relationships. They use a fixed set of kernels, each of which is associated with a particular English description, combine those kernels in the Gaussian processes, perform model selection to choose the best kernels to keep, and generate paragraphs according to the components used. Major components are described by noun phrases and multiplicative factors are described by adjective phrases.

Modeling

Peter Grunwald and John Langford (2007): "Suboptimal behavior of Bayes and MDL in classification under misspecification" pdf

Models of Human Behavior

Scholkopf et al. (2013): "Learning Time-Intensity Profiles of Human Activity using Non-Parametric Bayesian Models" link

Nonnegative Matrix Facorization

Lee and Seung (2001): "Algorithms for Non-negative Matrix Factorization" pdf

Probabilistic Programming

Mansinghka et al. (2014): "Venture: a higher-order probabilistic programming platform with programmable inference" link

Supervised Learning

Boots et al. (2013): "Hilbert Space Embeddings of Predictive State Representations" pdf

Theory

Denil et al. (2013): "Narrowing the Gap: Random Forests In Theory and In Practice" link

Topic Models

Fox and Jordan (2013): "Mixed Membership Models for Time Series" link
McFarland et al. (2013): "Differentiating language usage through topic models" link

Peter M Krafft Last modified: Sun Dec 28 12:32:37 EST 2014