Réunion
Bilevel Optimization and Hyperparameter Learning
Axes scientifiques :
- Théorie et méthodes
Organisateurs :
- - Jordan Frecon Patracone (LaHC)
- - Mathurin Massias (LIP - Lyon)
Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.
Inscriptions
21 personnes membres du GdR IASIS, et 42 personnes non membres du GdR, sont inscrits à cette réunion.
Capacité de la salle : 70 personnes. 7 Places restantes
Annonce
Bilevel optimization has become a crucial framework in machine learning for addressing hierarchical problems, where one optimization task depends on the outcome of another. This approach plays a pivotal role in hyperparameter learning, meta-learning, domain adaptation, and advanced regularization techniques.
The event will highlight recent developments in bilevel optimization and hyperparameter learning, exploring their theoretical foundations, numerical strategies, and diverse applications across machine learning and related fields. Topics of interest include (but are not limited to):
- Advances in bilevel optimization theory (convergence, complexity, and stability)
- Algorithms for solving bilevel problems efficiently
- Meta-learning, domain adaptation and transfer learning using bilevel frameworks
- Hyperparameter optimization and automated model selection
- Regularization and sparsity-inducing techniques in hyperparameter learning
- Applications to neural architecture search and optimization pipelines
- Robust hyperparameter learning for adversarial and noisy environments
This event, hosted at the École normale supérieure de Lyon (Site Monod, Room Condorcet) on March 25, 2025, invites contributions showcasing fundamental research, novel algorithms, or innovative applications that leverage bilevel optimization and hyperparameter learning for advancing the state of the art in machine learning.
Schedule
9 h – Welcome
9 h 10 – General introduction
9 h 20 – Luce Brotcorne: Introduction to Bilevel Optimization and Applications in Pricing
10 h 10 – Léo Davy: Restart Strategies for Learning Hyperparameters of Proximal Neural Networks via Automatic Differentiation
10 h 35 – Coffee break
10 h 55 – Julien Mairal: Functional Bilevel Optimization for Machine Learning
11 h 45 – Barbara Pascal: Bilevel optimization for automated data-driven inverse problem resolution
12 h 10 – Lunch break
13 h 30 – Tony Silvetti-Falls: Nonsmooth Implicit Differentiation for Machine Learning
14 h 20 – Christian Daniele: Deep Equilibrium Models for Poisson Inverse Problems via Mirror Descent
14 h 45 – Coffee break
15 h 10 – Saverio Salzo: Nonsmooth Implicit Differentiation: Deterministic and Stochastic Convergence Rates
16 h 00 – Samuel Vaiter: Successes and pitfalls of bilevel optimization in machine learning
16 h 50 – Conclusion
Keynote Speakers:
- Luce BROTCORNE (Researcher, Inria)
Introduction to Bilevel Optimization and Applications in Pricing
(TBC) - Julien MAIRAL (Researcher, Inria)
Functional Bilevel Optimization for Machine Learning
In this talk, we introduce a new functional point of view on bilevel optimization problems for machine learning, where the inner objective is minimized over a function space. These types of problems are most often solved by using methods developed in the parametric setting, where the inner objective is strongly convex with respect to the parameters of the prediction function. The functional point of view does not rely on this assumption and notably allows using over-parameterized neural networks as the inner prediction function. We propose scalable and efficient algorithms for the functional bilevel optimization problem and illustrate the benefits of our approach on instrumental regression and reinforcement learning tasks. This is a joint work with Ieva Petrulionyte and Michael Arbel. - Saverio SALZO (Associate Professor, Sapienza Università di Roma)
Nonsmooth Implicit Differentiation: Deterministic and Stochastic Convergence Rates
I will address the problem of efficiently computing a generalized derivative of the fixed-point of a parametric nondifferentiable contraction map. This problem has wide applications in machine learning, including hyperparameter optimization, meta-learning and data poisoning attacks. Two popular approaches are analyzed: iterative differentiation (ITD) and approximate implicit differentiation (AID). A key challenge behind the nonsmooth setting is that the chain rule does not hold anymore. Building upon the recent work by Bolte et al. (2022), who proved linear convergence of nondifferentiable ITD, I will show an improved linear rate for ITD and a slightly better rate for AID, both in the deterministic case. I will also introduce NSID, a new stochastic method to compute the implicit derivative when the fixed point is defined as the composition of an outer map and an inner map which is accessible only through a stochastic unbiased estimator. Rates for such stochastic method rates will be presented. - Antonio SILVETI-FALLS (Associate Professor, CVN, CentraleSupélec)
Nonsmooth Implicit Differentiation for Machine Learning
In view of training increasingly complex learning architectures, we establish a nonsmooth implicit function theorem with an operational calculus. Our result applies to most practical problems (i.e., definable problems) provided that a nonsmooth form of the classical invertibility condition is fulfilled. This approach allows for formal subdifferentiation: for instance, replacing derivatives by Clarke Jacobians in the usual differentiation formulas is fully justified for a wide class of nonsmooth problems. Moreover this calculus is entirely compatible with algorithmic differentiation (e.g., backpropagation). We provide several applications such as training deep equilibrium networks, training neural nets with conic optimization layers, or hyperparameter-tuning for nonsmooth Lasso-type models. To show the sharpness of our assumptions, we present numerical experiments showcasing the extremely pathological gradient dynamics one can encounter when applying implicit algorithmic differentiation without any hypothesis. - Samuel VAITER (Researcher, CNRS, LJAD, Université Côte d’Azur)
Successes and pitfalls of bilevel optimization in machine learning
In this talk, I will introduce bilevel optimization (BO) as a powerful framework to address several machine learning-related problems, including hyperparameter tuning, meta-learning, and data cleaning. Based on this formulation, I will describe some successes of BO, particularly in a strongly convex setting, where strong guarantees can be provided along with efficient stochastic algorithms. I will also discuss the outstanding issues of this framework, presenting geometrical and computational complexity results that show the potential difficulties in going beyond convexity, at least from a theoretical perspective. - Léo DAVY: Restart Strategies for Learning Hyperparameters of Proximal Neural Networks via Automatic Differentiation
Bi-level optimization problems involving variational formulations have been widely studied, particularly in cases where the inner problem is solved by proximal iterations and where bi-level strategy aims to estimate the hyperparameters involved. These inner iterations are referred to as Proximal Neural Networks (PNNs). While PNNs have convergence guarantees to a specific minimization problem when the number of layers is arbitrarily large, in practice, truncation to a finite number of layers is required for the learning stage, leading to approximate solutions. To overcome this limitation, we investigate restart strategies that leverage the contractivity properties of PNNs to ensure convergence to the exact inner solution. This work is closely related to the Deep Equilibrium Learning framework, where the outer problem is optimized under the assumption that the inner solver converges to a fixed point. While this approach is theoretically well-founded, it presents significant computational challenges. However, under the Jacobian-Free Backpropagation (JFB) assumption, standard automatic differentiation techniques can be applied. We examine how structural properties of PNNs and variational formulations, such as convergence rates and strong convexity, can be utilized to satisfy the JFB assumption. Numerical experiments on several image restoration and analysis tasks will illustrate both the benefit of restart strategies. - Christian DANIELE: Deep Equilibrium Models for Poisson Inverse Problems via Mirror Descent
Inverse problems in imaging arise in a wide range of scientific and engineering applications, including medical imaging, astrophysics, and microscopy. These problems are inherently ill-posed, requiring advanced regularization techniques and optimization strategies to achieve stable and accurate reconstructions. In recent years, hybrid approaches that combine deep learning and variational methods have gained increasing attention. Well-established techniques include Algorithmic Unrolling, Plug-and-Play methods, and Deep Equilibrium Models. These models are networks with fixed points, which are trained to match data samples from a training dataset. In this work, we focus on the latter approach to learn a data-driven regularization function for Poisson inverse problems, using the Kullback-Leibler divergence as the data fidelity term. To effectively handle this fidelity term, we employ Mirror Descent as the underlying solver. We discuss theoretical guarantees of convergence, even in non-convex settings, incorporating a backtracking strategy, along with key aspects of training this class of models. To validate our approach, we evaluate its performance on a deblurring task with different kernels and varying levels of Poisson noise. - Barbara PASCAL: Bilevel optimization for automated data-driven inverse problem resolution
Most inverse problems in signal and image processing are ill-posed. To remove the ambiguity about the solution and design noise-robust estimators, a priori properties, e.g., smoothness or sparsity, can be imposed to the solution through regularization. The main bottleneck to use the obtained variational regularized estimators in practice, i.e., without access to ground truth, is that the quality of the estimates strongly depends on the fine-tuning of the level of regularization. A classical approach to automated and data-driven selection of regularization parameter consists in designing a data-dependent unbiased estimator of the error, the minimization of which provides an approximate of the optimal parameters. The resulting overall procedure can be formulated as a bilevel optimization problem, the inner loop computing the variational regularized estimator and the outer loop selecting hyperparameters. The design of a fully automated data-driven procedure adapted to inverse problems corrupted with highly correlated noise will be described in detail and exemplified on a texture segmentation problem. Its applicability to other inverse problems will be demonstrated through numerical simulations on both synthetic and real-world data.
Organizing Team:
- Jordan FRECON PATRACONE (Associate Professor, Inria, LabHC)
- Quentin BERTRAND (Researcher, Inria, LabHC)
- Mathurin MASSIAS (Researcher, Inria, LIP)
La journée bénéficie du soutien logistique de l’Institut Rhônalpin des Systèmes Complexes IXXI et de la Fédération Informatique de Lyon