Variational Analysis in the Wasserstein Space.
Nicolas Lanzetti*, Antonio Terpin*, Florian Dörfler
Project description.
We study optimization problems whereby the optimization variable is a probability measure. Since the probability space is not a vector space, many classical and powerful methods for optimization (e.g., gradients) are of little help. Thus, one typically resorts to the abstract machinery of infinite-dimensional analysis or other ad-hoc methodologies — not tailored to the probability space — which however involve projections or rely on convexity-type assumptions. We believe instead that these problems call for a comprehensive methodological framework for calculus in probability spaces. In this work, we combine ideas from optimal transport, variational analysis, and Wasserstein gradient flows to equip the Wasserstein space (i.e., the space of probability measures endowed with the Wasserstein distance) with a variational structure, both by combining and extending existing results and introducing novel tools. Our theoretical analysis culminates in very general necessary optimality conditions for optimality. Notably, our conditions (i) resemble the rationales of Euclidean spaces, such as the Karush–Kuhn–Tucker and Lagrange conditions, (ii) are intuitive, informative, and easy to study, and (iii) they provide either closed-form solutions or computationally attractive algorithms. We believe this framework lays the foundation for new algorithmic and theoretical advancements in the study of optimization problems in probability spaces, which we exemplify with numerous case studies and applications to machine learning, drug discovery, and distributionally robust optimization.
Problem formulation.
It is quite simple: this work considers optimization problems of the form
$$ \newcommand{\varGeneric}{\mu} \newcommand{\feasibleSet}{\mathcal{C}} \newcommand{\reals}{\mathbb{R}} \newcommand{\realsBar}{\bar{\mathbb{R}}} \newcommand{\Pp}[2]{\mathcal{P}_{#1}(#2)} \newcommand{\objective}{\mathcal{J}} \inf_{\varGeneric \in \feasibleSet}\, \objective(\varGeneric), $$
where \(\mathcal{C} \subseteq \mathcal{P}(\mathbb{R}^d)\) is a set of admissible probability measures and \(\mathcal{J}: \mathcal{P}(\mathbb{R}^d) \to \bar{\mathbb{R}}\) is a functional to minimize. This abstract problem setting stems from the observation that numerous fields, including machine learning, robust optimization, and biology, tackle their own version of this problem, but with ad-hoc methods that often cease to be effective as soon as the problem structure deviates from idealized assumptions.
Main result.
Gradients are aligned with the constraints at optimality. Our main result is general first-order optimality conditions for the optimization problem above:
If \(\mu^\ast \in \mathcal{P}(\mathbb{R}^d)\) is an optimal solution with finite second moment and provided that a constraint qualification holds, then the "Wasserstein subgradients" are "aligned" with the constraints at "optimality", i.e.,
$$ 0_{\mu^\ast} \in \partial \mathcal{J}(\mu^\ast) + \mathrm{N}_{\mathcal{C}}(\mu^\ast), $$
where \(\partial\mathcal{J}(\mu^\ast)\) is the Wasserstein subgradient of \(\mathcal{J}\) at \(\mu^\ast\), \(\mathrm{N}_{\mathcal{C}}(\mu^\ast)\) is the Wasserstein normal cone of \(\mathcal{C}\) at \(\mu^\ast\), and \(0_{\mu^\ast}\) is a null Wasserstein tangent vector at \(\mu^\ast\).
For more examples and applications, check out the paper.
Cite.
@article{lanzetti2024variational,
author = {Lanzetti, Nicolas and Terpin, Antonio and D\"orfler, Florian},
title = {Variational Analysis in the Wasserstein Space},
journal = {arXiv preprint arXiv:2406.10676},
year = {2024}
}