David Alvarez-Melis

PhD Candidate, MIT Computer Science and Artificial Intelligence Lab

Stata Center, Bldg 32-G496, Cambridge MA 02139

d_alv_mel_[at]_mit_[dot]_edu (humans: remove underscores)

Optimal Transport with Local and Global Structure

Generalizing the OT problem to include local structure (or ignore global invariances).

TL;DR: Vanilla OT is not always directly applicable, and if it is, it might not fully capture available structural information. We extend it to (i) directly incorporate structural information or (ii) be invariant to global transformations.


I. Injecting Structure into the OT Cost Objective

An important appeal of optimal transport distances is that they reflect the metric of the underlying space in the transport cost. Yet, in a number of settings, there is further important structure that remains uncaptured. This structure can be intrinsic if the distributions correspond to structured objects (e.g., images with segments, or sequences) or extrinsic if there is side information that induces structure (e.g., groupings). A concrete example arises when applying optimal transport to do- main adaptation, where a subset of the source points to be matched have known class labels. In this case, we may desire source points with the same label to be matched coherently to the same compact region of the target space, preserving compact classes, and not be split into disjoint, distant locations (Courty et al., 2017). When pairing control and treatment units in observational studies of treatment effects, it is beneficial to compare treated and control subjects from the same “natural block” (e.g., family, hospital) so as to minimize the difference between unmeasured covariates (Pimentel et al., 2015). In all these examples, the additional structure essentially seeks correlations in the mappings of “similar” source points. Such dependencies, however, cannot be induced by standard formulations of optimal transport whose cost is separable in the mapping variables; they require nonlinear interactions.

In the motivating application of Domain Adaptation, we intuitively want the cost of mapping points the source distribution with the same label together to be lower than splitting them apart. Our structured OT framework formalizes (and generalizes) this intuitive desideratum.
In the motivating application of Domain Adaptation, we intuitively want the cost of mapping points the source distribution with the same label together to be lower than splitting them apart. Our structured OT framework formalizes (and generalizes) this intuitive desideratum.

In this work, we develop a framework to incorporate such structural information directly into the OT problem. This novel formulation opens avenues to a much richer class of (nonlinear) cost functions, allowing us to encode known or desired interactions of mappings, such as grouping constraints, correlations, and explicitly modeling topological information that is present, for instance, in sequences and graphs. The tractability of this nonlinear formulation arises from polytopes induced by submodular set functions. Submodular functions possess two highly desirable properties for our problem: (1) they naturally encode combinatorial structure, via diminishing returns and as combinatorial rank functions; and (2) their geometry leads to efficient algorithms.

The resulting combination of the geometries of transportation and submodularity leads to a problem with rich, favorable polyhedral structure and connections to game theory and saddle point optimization. We leverage this structure to solve the submodular optimal transport problem via a saddle-point mirror-prox algorithm involving alternating projections onto the polytope defined by the transportation constraints and the base polytope associated with the submodular cost function.

Image
Our Structured Optimal Transport objective can be interepreted as a zero-sum two-player game between the 'cost minimizer' and the 'adversary'. Both players choose strategies over polytopes: the former of the transportation polytope and the latter over the submodular base polytope. The adversary seeks a realization of cost matrix $\kappa$, consitent with the cost function, that maximally penalizes the current coupling $\gamma$, and its strength (~flexibility) depends on how submodular the cost function is. In the extreme case, the base polytope collapses to a single point, a fixed-cost matrix, yielding a single-player minimization game, i.e., recovering the original OT formulation.
Progression of the transport coupling $\Gamma$ (left) and cost matrix $\kappa$ (right) during optimization for a pair of distributions with two clusters each. The latter, chosen adversarially, adapts to the coupling's mass allocation.

Via various applications and experiments, we show how different submodular functions yield solutions that interpolate between strictly structure-aware transportation plans and structure-agnostic regularized versions of the problem. Besides these synthetic experiments, we evaluate our framework in two real-life applications: domain adaption for digit classification and sentence similarity prediction. In both cases, introducing structure leads to better empirical results.

Image
Beyond cluster information, the SOT framework can be used to encode more general structures, such as those arising from sequences or trees. In the application to sentence similarity, we enhance the Word Mover's Distance (Kusner et al., 2015) to penalize more when ngrams from one sentence are split when mapping to the other sentence.


II. Making OT invariant to global transformations

A key limitation of classic OT is that it implicitly assumes that the two sets of objects in question are represented in the same space, or at least that meaningful pairwise distances between them can be computed. This is not always the case, especially when the objects are represented by learned feature vectors. For example, word embedding algorithms operate at the level of inner products or distances between word vectors, so the representations they produce can be arbitrarily rotated, sometimes even for different runs of the same algorithm on the same data. Such global degrees of freedom in the vector representations render direct pairwise distances between objects across the sets meaningless. Indeed, OT is locally greedy as it focuses on minimizing individual movement of mass, oblivious to global transformations.

Endowing the optimal transport problem with invariance to global transformations (rotations in this case) allows it to recover correspondences in cases where the traditional formulation would not.
Endowing the optimal transport problem with invariance to global transformations (rotations in this case) allows it to recover correspondences in cases where the traditional formulation would not.

In this work, we propose a general framework for optimal transport in the presence of latent global transformations. We cast the problem as a joint optimization over transport couplings and transformations chosen from a flexible class of invariances (defined as Schatten $\ell_p$-norm balls), propose algorithms to solve it, and show promising results in various tasks, including a popular unsupervised word translation benchmark.

Image
Schatten-norm invariance classes. The depicted $\ell_p$-norm balls in singular value space correspond to matrix invariance classes $\mathcal{F}_p$. The radius is chosen so as to include the identity matrix ($\sigma = [1,1]$). For linear objectives, solutions when optimizing over $\mathcal{F}_{\infty}$ and $\mathcal{F}_{1}$ can be found on the extreme points of their respective constraint spaces: orthogonal matrices for the former and rank-one matrices for the latter.

Our framework is very general and allows various types of invariances to be encoded. Moreover, our approach unifies previous methods for fusing OT with global transformations such as Procrustes mappings (Rangarajan et al., 1997; Zhang et al., 2017b; Grave et al., 2018), and reveals unexpected connections to the Gromov-Wasserstein distance (Mémoli, 2011).

Relevant Publications:

  1. Alvarez-Melis, Jaakkola and Jegelka. “Structured Optimal Transport”, AISTATS 2018.
  2. Alvarez-Melis, Jegelka and Jaakkola. “Towards Optimal Transport with Global Invariances”, Preprint.