Tamara Broderick

Selected Presentations


Research presentations

  • Automated Scalable Bayesian Inference via Data Summarization. [abstract]
    The use of Bayesian methods in large-scale data settings is attractive because of the rich hierarchical relationships, uncertainty quantification, and prior specification these methods provide. Many standard Bayesian inference algorithms are often computationally expensive, however, so their direct application to large datasets can be difficult or infeasible. Other standard algorithms sacrifice accuracy in the pursuit of scalability. We take a new approach. Namely, we leverage the insight that data often exhibit approximate redundancies to instead obtain a weighted subset of the data (called a "coreset") that is much smaller than the original dataset. We can then use this small coreset in existing Bayesian inference algorithms without modification. We provide theoretical guarantees on the size and approximation quality of the coreset. In particular, we show that our method provides geometric decay in posterior approximation error as a function of coreset size. We validate on both synthetic and real datasets, demonstrating that our method reduces posterior approximation error by orders of magnitude relative to uniform random subsampling.
  • Fast Quantification of Uncertainty and Robustness with Variational Bayes. [abstract]
    In Bayesian analysis, the posterior follows from the data and a choice of a prior and a likelihood. These choices may be somewhat subjective and reasonably vary over some range. Thus, we wish to measure the sensitivity of posterior estimates to variation in these choices. While the field of robust Bayes has been formed to address this problem, its tools are not commonly used in practice. We demonstrate that variational Bayes (VB) techniques are readily amenable to robustness analysis. Since VB casts posterior inference as an optimization problem, its methodology is built on the ability to calculate derivatives of posterior quantities with respect to model parameters. We use this insight to develop local prior robustness measures for mean-field variational Bayes (MFVB), a particularly popular form of VB due to its fast runtime on large data sets. A potential problem with MFVB is that it has a well-known major failing: it can severely underestimate uncertainty and provides no information about covariance. We generalize linear response methods from statistical physics to deliver accurate uncertainty estimates for MFVB---both for individual variables and coherently across variables. We call our method linear response variational Bayes (LRVB).
  • Posteriors, conjugacy, and exponential families for completely random measures. [abstract]
    We demonstrate how to calculate posteriors for general Bayesian nonparametric priors and likelihoods based on completely random measures (CRMs).We further show how to represent Bayesian nonparametric priors as a sequence of finite draws using a size-biasing approach---and how to represent full Bayesian nonparametric models via finite marginals. Motivated by conjugate priors based on exponential family representations of likelihoods, we introduce a notion of exponential families for CRMs, which we call exponential CRMs. This construction allows us to specify automatic Bayesian nonparametric conjugate priors for exponential CRM likelihoods. Wedemonstrate that our exponential CRMs allow particularly straightforward recipes for size-biased and marginal representations of Bayesian nonparametric models. Along the way, we prove that the gamma process is a conjugate prior for the Poisson likelihood process and the beta prime process is a conjugate prior for a process we call the odds Bernoulli process. We deliver a size-biased representation of the gamma process and a marginal representation of the gamma process coupled with a Poisson likelihood process.
  • Feature allocations, probability functions, and paintboxes. [abstract]
    The problem of inferring a clustering of a data set has been the subject of much research in Bayesian analysis, and there currently exists a solid mathematical foundation for Bayesian approaches to clustering. In particular, the class of probability distributions over partitions of a data set has been characterized in a number of ways, including via exchangeable partition probability functions (EPPFs) and the Kingman paintbox. Here, we develop a generalization of the clustering problem, called feature allocation, where we allow each data point to belong to an arbitrary, non-negative integer number of groups, now called features or topics. We define and study an "exchangeable feature probability function" (EFPF)---analogous to the EPPF in the clustering setting---for certain types of feature models. Moreover, we introduce a "feature paintbox" characterization---analogous to the Kingman paintbox for clustering---of the class of exchangeable feature models. We use this feature paintbox construction to provide a further characterization of the subclass of feature allocations that have EFPF representations.
  • Streaming variational Bayes. [abstract]
    We present SDA-Bayes, a framework for (S)treaming, (D)istributed, (A)synchronous computation of a Bayesian posterior. The framework makes streaming updates to the estimated posterior according to a user-specified approximation batch primitive. We demonstrate the usefulness of our framework, with variational Bayes (VB) as the primitive, by fitting the latent Dirichlet allocation model to two large-scale document collections. We demonstrate the advantages of our algorithm over stochastic variational inference (SVI) by comparing the two after a single pass through a known amount of data---a case where SVI may be applied---and in the streaming setting, where SVI does not apply.
  • Clusters and features from combinatorial stochastic processes. [abstract]
    In partitioning---a.k.a. clustering---data, we associate each data point with one and only one of some collection of groups called clusters or partition blocks. Here, we formally establish an analogous problem, called feature allocation, for associating data points with arbitrary non-negative integer numbers of groups, now called features or topics. Just as the exchangeable partition probability function (EPPF) can be used to describe the distribution of cluster membership under an exchangeable clustering model, we examine an analogous "exchangeable feature probability function" (EFPF) for certain types of feature models. Moreover, recalling Kingman's paintbox theorem as a characterization of the class of exchangeable clustering models, we develop a similar "feature paintbox" characterization of the class of exchangeable feature models. We use this feature paintbox construction to provide a further characterization of the subclass of feature allocations that have EFPF representations. We examine models such as the Bayesian nonparametric Indian buffet process as examples within these broader classes.
    • [video: 2012 September 20]. Bayesian Nonparametrics, ICERM Semester Program on Computational Challenges in Probability, Brown University, Providence, Rhode Island, USA.
  • MAD-Bayes: MAP-based asymptotic derivations from Bayes [abstract]
    The classical mixture of Gaussians model is related to K-means via small-variance asymptotics: as the covariances of the Gaussians tend to zero, the negative log-likelihood of the mixture of Gaussians model approaches the K-means objective, and the EM algorithm approaches the K-means algorithm. Kulis & Jordan (2012) used this observation to obtain a novel K-means-like algorithm from a Gibbs sampler for the Dirichlet process (DP) mixture. We instead consider applying small-variance asymptotics directly to the posterior in Bayesian nonparametric models. This framework is independent of any specific Bayesian inference algorithm, and it has the major advantage that it generalizes immediately to a range of models beyond the DP mixture. To illustrate, we apply our framework to the feature learning setting, where the beta process and Indian buffet process provide an appropriate Bayesian nonparametric prior. We obtain a novel objective function that goes beyond clustering to learn (and penalize new) groupings for which we relax the mutual exclusivity and exhaustivity assumptions of clustering. We demonstrate several other algorithms, all of which are scalable and simple to implement. Empirical results demonstrate the benefits of the new framework.
  • User-friendly conjugacy for completely random measures
    • [slides pdf] Joint Statistical Meetings (JSM) 2013, Montreal, Canada.
  • Fast and flexible selection with a single switch.
    • [video: 2009 December 10]. Mini-Symposia on Assistive Machine Learning for People with Disabilities, Neural Information Processing Systems (NIPS) 2009, Vancouver, British Columbia, Canada.
    • [video] Nomon keyboard tutorial.
    • [video] Example sentence written using Nomon.