I am a final year PhD student in the Geometric Data Processing Group at MIT. My research focus is on discrete optimal transport with applications to Bayesian inference, robust learning, clustering, and data summarization. During my time in GDP I’ve been lucky to collaborate with many incredible people, among them Edward Chien, Matthew Staib, Hugo Lavenant, Pierre Monteiller, Charlie Frogner, Mikhail Yurochkin, Farzaneh Mirzazadeh, and Justin Solomon.
Prior to joining Justin’s group, I was a Master’s student with Daniela Rus in the Distributed Robotics Lab where I worked on modular robotics and path planning. Before coming to MIT, I received my BSc from the University of Southampton in the United Kingdom under the supervision of Klaus-Peter Zauner.
I have done internships in interpretability of deep learning at Google, semantic segmentation with Bosch Research and Technology Center, and user experience at EPFL in Lausanne.
In my spare time, I enjoy hiking, climbing, and reviewing books I’ve read.
PhD in Computer Science, 2020
Massachusetts Institute of Technology
SM in Computer Science, 2016
Massachusetts Institute of Technology
BSc in Computer Science, 2014
University of Southampton
Label switching is a phenomenon arising in mixture model posterior inference that prevents one from meaningfully assessing posterior statistics using standard Monte Carlo procedures. This issue arises due to invariance of the posterior under actions of a group; for example, permuting the ordering of mixture components has no effect on posterior likelihood. We propose a resolution to label switching that leverages machinery from optimal transport. Our algorithm efficiently computes posterior statistics in the quotient space of the symmetry group. We give conditions under which there is a meaningful solution to label switching and demonstrate advantages over alternative approaches on simulated and real data.
The ability to measure similarity between documents enables intelligent summarization and analysis of large corpora. Past distances between documents suffer from either an inability to incorporate semantic similarities between words or from scalability issues. As an alternative, we introduce hierarchical optimal transport as a meta-distance between documents, where documents are modeled as distributions over topics, which themselves are modeled as distributions over words. We then solve an optimal transport problem on the smaller topic space to compute a similarity score. We give conditions on the topics under which this construction defines a distance, and we relate it to the word mover’s distance. We evaluate our technique fork-NN classification and show better interpretability and scalability with comparable performance to current methods at a fraction of the cost.
We propose a technique for interpolating between probability distributions on discrete surfaces, based on the theory of optimal transport. Unlike previous attempts that use linear programming, our method is based on a dynamical formulation of quadratic optimal transport proposed for flat domains by Benamou and Brenier [2000], adapted to discrete surfaces. Our structure-preserving construction yields a Riemannian metric on the (finite-dimensional) space of probability distributions on a discrete surface, which translates the so-called Otto calculus to discrete language. From a practical perspective, our technique provides a smooth interpolation between distributions on discrete surfaces with less diffusion than state-of-the-art algorithms involving entropic regularization. Beyond interpolation, we show how our discrete notion of optimal transport extends to other tasks, such as distribution-valued Dirichlet problems and time integration of gradient flows.
We present a stochastic algorithm to compute the barycenter of a set of probability distributions under the Wasserstein metric from optimal transport. Unlike previous approaches, our method extends to continuous input distributions and allows the support of the barycenter to be adjusted in each iteration. We tackle the problem without regularization, allowing us to recover a much sharper output. We give examples where our algorithm recovers a more meaningful barycenter than previous work. Our method is versatile and can be extended to applications such as generating super samples from a given distribution and recovering blue noise approximations
This paper presents a new preconditioning technique for large-scale geometric optimization problems, inspired by applications in mesh parameterization. Our positive (semi-)definite preconditioner acts on the gradients of optimization problems whose variables are positions of the vertices of a triangle mesh in $\mathbb{R}^2$ or of a tetrahedral mesh in $\mathbb{R}^3$ , converting localized distortion gradients into the velocity of a globally near-rigid motion via a linear solve. We pose our preconditioning tool in terms of the Killing energy of a deformation field and provide new efficient formulas for constructing Killing operators on triangle and tetrahedral meshes. We demonstrate that our method is competitive with state-of-the-art algorithms for locally injective parameterization using a variety of optimization objectives and show applications to two- and three-dimensional mesh deformation.