Réunion


Transport optimal et ses applications en machine learning et analyse de données

Date : 17 Février 2025
Horaire : 09h30 - 17h00
Lieu : Ecole normale supérieure de Lyon

Axes scientifiques :
  • Apprentissage machine

Organisateurs :

Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.

Inscriptions

30 personnes membres du GdR IASIS, et 38 personnes non membres du GdR, sont inscrits à cette réunion.

Capacité de la salle : 120 personnes. 52 Places restantes

Annonce

Les demandes de prise en charge de missions seront traitées à partir du 27 janvier et sont acceptées jusqu’au 3 février.

En raison de sa capacité à comparer des distributions de probabilités, le transport optimal a suscité l’intérêt de la communauté du machine learning. Il est désormais un outil largement adopté dans de nombreuses applications de ML, telles que la classification, l’analyse de graphes ou de données cellulaires, ainsi que les modèles génératifs, et constitue aujourd’hui une composante clé de nombreux modèles performants.

Cette réunion a pour but de présenter les dernières avancées sur le transport optimal, sa théorie, ses variantes (Schrödinger bridge, transport non balancé, distance de Gromov-Wasserstein, transport optimal sur des géometries non euclidiennes) et ses applications (modèles génératifs, flow matching, graphes, etc.)

Cette journée se tiendra à l’Ecole normale supérieure de Lyon. Elle propose de faire un état des lieux sur les travaux en cours sur ces problèmes fondamentaux et appelle à des contributions sur les thèmes (non exhaustifs) suivants :

  • Résultats fondamentaux en transport (convergence, convexité)
  • Algorithme et résolution des problèmes de TO et TO régularisé
  • Apprentissage multi-tâche, transfert, adaptation de domaine et graphes
  • Barycentres de Wasserstein, modélisation de distributions
  • Distance de Wasserstein, Gromov-Wasserstein, TO non balancé, Sliced Wasserstein, TO entre distributions sur des espaces non euclidiens
  • Applications aux modèles génératifs: Schrödinger bridge, flow matching
  • Applications au machine learning sur données structurées
  • Applications au traitement du signal
  • Applications à la vision par ordinateur
  • Applications aux séries temporelles

Le programme reste à finaliser (voir l’appel à contributions), mais nous aurons le plaisir d’accueillir les orateur.ice.s suivant.e.s

Keynote speakers :

  • Filippo Santombrogio (Prof, Institut Camille Jordan):
    • Une introduction au transport optimal et à ses applications en EDP: L’exposé commencera en rappelant les notions principales concernant les problèmes de Monge et Kantorovich ainsi que la dualité de Kantorovich. Je rappellerai ensuite ce que sont les distances de Wasserstein et présenterai le lien entre courbes dans l’espace de Wasserstein et équation de continuité. Je passerai ensuite à un sujet très vivant en transport : ses applications à des équations aux dérivées partielles ayant une structure de flot de gradient. J’expliquerai comment obtenir l’EDP à partir de la fonctionnelle choisie et montrerai comment certaines équations de diffusion connues rentrent dans ce cadre. Enfin, je ferai l’exemple du Sliced Wasserstein Flow (en mentionnant brièvement la distance correspondante), qui a un intérêt applicatif et en même temps soulève de questions mathématiques passionnantes.
  • Bruno Levy (DR, Inria):
    • Transport optimal L2 semi-discret avec 10^9 points et au delà: pourquoi et comment.: Je présenterai un ensemble d’algorithme permettant de calculer le transport optimal L2 depuis la mesure de Lebesgue vers une mesure empirique en 3D pour des problèmes de très grande taille (10^9 points et plus), ainsi que le contexte applicatif en cosmologie numérique qui motive ce type de calculs. Il s’agit d’un travail commun avec Nicolas Ray (PIXEL-Inria Nancy), Quentin Mérigot et Hugo Leclerc (ParMA-Inria Saclay) dans le cadre d’une coopération avec Roya Mohayaee (institut d’Astrophysique de Paris).
  • Laetitia Chapel (Prof, Institut Agro Rennes-Angers):
    • Unbalanced optimal transport: formulation and efficient computational solutions: Optimal Transport (OT) offers a versatile framework for quantifying the discrepancy between two probability distributions by extending a distance metric defined on their respective supports. In practice, it operates on empirical distributions comprising acquisition artefacts, such as outliers or noise, which hinders the reliable calculation of the OT plan. Additionally, OT necessitates equal total mass between the two distributions, a condition that can be overly restrictive in many machine learning and computer vision applications. Unbalanced Optimal Transport (UOT) addresses the issue of rebalancing or removing some mass from the problem by relaxing the marginal conditions. Consequently, it is often deemed to be more robust, to some extent, to these artefacts than its standard balanced counterpart. In this talk, I will review the formulation of UOT and discuss its ability to deal with outliers or noisy samples. I will inspect several divergences for relaxing the marginal constraints, ranging from vertical divergences such as the Kullback-Leibler or L2-norm, which permit the removal of some mass, to horizontal ones, like OT, that enables rebalancing mass. Finally, I will discuss efficient algorithms that do not necessitate additional regularization on the OT plan.

Invited speaker:

  • Nicolas Bonneel & Julie Digne (CNRS, LIRIS):
    • A survey of Optimal Transport for Computer Graphics and Computer Vision: Optimal transport is a long-standing theory that has been studied in depth from both theoretical and numerical point of views. Starting from the 50s this theory has also found a lot of applications in operational research. Over the last 30 years it has spread to computer vision and computer graphics and is now becoming hard to ignore. Still, its mathematical complexity can make it difficult to comprehend, and as such, computer vision and computer graphics researchers may find it hard to follow recent developments in their field related to optimal transport. This survey first briefly introduces the theory of optimal transport in layman’s terms as well as most common numerical techniques to solve it. More importantly, it presents applications of these numerical techniques to solve various computer graphics and vision related problems. This involves applications ranging from image processing, geometry processing, rendering, fluid simulation, to computational optics, and many more.
  • Eloi Tanguy (Phd Student, MAP5):
    • Constrained Approximate Optimal Transport Maps: We investigate finding a map g within a function class G that minimizes an Optimal Transport (OT) cost between a target measure ν and the image by g of a source measure μ. This is relevant when an OT map from μ to ν does not exist or does not satisfy the desired constraints of G. We address existence and uniqueness for generic subclasses of L-Lipschitz functions, including gradients of (strongly) convex functions and typical Neural Networks. We explore a variant that approaches a transport plan, showing equivalence to a map problem in some cases. For the squared Euclidean cost, we propose alternating minimization over a transport plan π and map g, with the optimization over g being the L2 projection on G of the barycentric mapping g_π. In dimension one, this global problem equates the L2 projection of g_π* onto G for an OT plan π* between μ and ν, but this does not extend to higher dimensions. We introduce a simple kernel method to find g within a Reproducing Kernel Hilbert Space in the discrete case. Finally, we present numerical methods for L-Lipschitz gradients of ℓ-strongly convex potentials
  • Theo Lacombe (MCF, LIGM):
    • On the Existence of Monge-map for the Gromov–Wasserstein problem: The Gromov–Wasserstein (GW) problem is a non-convex quadratic optimization problem defined on the space of transport plans between two measures. Analogous to its linear counterpart—the standard optimal transport problem—a natural question arises: do the solutions of the GW problem concentrate on deterministic plans, a.k.a. Monge maps, when the source measure admits a density? In this talk, we will present a general affirmative answer for matching costs defined by inner products. For matching costs defined by squared Euclidean norms, we provide a positive answer under specific assumptions and provide numerical counterexamples (cases where the existence of Monge map fails) when these assumptions are not satisfied. Joint work with T. Dumont and F.-X. Vialard.
  • Jules Samaran (Phd Student, ENS):
    •  Bridging the gap between cellular modalities with Inverse Optimal Transport: Single-cell transcriptomics has revolutionized biology and medicine by unraveling the diversity of the cells constituting human tissues. The single-cell technological development has now shifted to the measurement of other modalities (e.g., chromatin accessibility, proteomics). Different single-cell modalities provide complementary information on the molecular identity of a cell, their integration is thus expected to allow a more comprehensive view of cellular identities. Achieving this integration, however, requires a bridge—a way to connect these different molecular spaces from which one can learn how to align them effectively.
      In this talk, I will present two novel approaches, each tailored to a distinct type of bridge for integrating single-cell modalities and both relying on Optimal Transport (OT). Indeed, OT has proven particularly effective in analysing single-cell data given its ability to compare probability distributions in a geometrically meaningful way. First, I introduce scConfluence, a framework that embeds cells from different modalities into a low-dimensional shared latent space. By leveraging weakly connected features—biological prior knowledge of relationships across modalities—scConfluence employs autoencoders supervised by an Inverse Optimal Transport loss to align cells effectively, enabling meaningful integration across modalities. Second, I address the scenario where a small subset of cells with paired measurements across modalities is available. Here, the goal shifts from learning embeddings to only matching cells between modalities. We leverage paired samples to simultaneously learn a ground metric between the two spaces and a transport plan matching unpaired measurements. Our combined use of scalable machine learning models, biological prior knowledge and unbalanced OT allows us to tackle challenging applications on which most existing methods struggle. Our method has been applied to refine previous classifications of cell types and impute missing features on large and diverse datasets, serving as a stepping stone toward a deeper understanding of cancer heterogeneity.
  • Quentin Bouniot (Post-doc, Telecom Paris):
    • From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport: In the last decade, we have witnessed the introduction of several novel deep neural network (DNN) architectures exhibiting ever-increasing performance across diverse tasks. Explaining the upward trend of their performance, however, remains difficult as different DNN architectures of comparable depth and width — common factors associated with their expressive power — may exhibit a drastically different performance even when trained on the same dataset. In this paper, we introduce the concept of the non-linearity signature of DNN, the first theoretically sound solution for approximately measuring the non-linearity of deep neural networks. Built upon a score derived from closed-form optimal transport mappings, this signature provides a better understanding of the inner workings of a wide range of DNN architectures and learning paradigms, with a particular emphasis on the computer vision task. We provide extensive experimental results that highlight the practical usefulness of the proposed non-linearity signature and its potential for long-reaching implications.

Appel à contribution :

Les personnes souhaitant présenter leurs travaux sont invitées à faire part de leur intention aux organisateurs avant le 27/01/2025, en envoyant par mail aux organisateurs un titre, un résumé et la liste des auteurs avec le sujet GDR IASIS OTML aux adresses suivantes : titouan.vayer@inria.fr, remi.flamary@polytechnique.edu




Les commentaires sont clos.