Annonce


[PhD] Learning poorly known and observed large scale complex systems

24 Avril 2026


Catégorie : Postes Doctorant ;

Plus d'informations, téléchargement :

Context and Objectives

Governing is forecasting. This proverbial saying is relevant to many situations of engineering interest where decisions must be taken based on predictions or when devising a suitable sequence of actions to achieve some goal requires a good knowledge of the effect of these actions onto the system under consideration. Such predictions usually rely on a simulation of a model of the system at hand and/or observations collected over time. A reliable model may however not be available, or be too computationally costly to be useful. Observations, on the other hand, are often scarce and do not provide a complete picture of the state of the system.

In this thesis, we aim at deriving a principled approach to predict the time-evolution of quantities of interest associated with a system observed only via a few noisy sensors active at unpredictable times. To this end, we leverage the history of the information one can collect. This paradigm of predicting the future from whatever available knowledge over a past horizon is rigorously justified by the Mori-Zwanzig framework developed in the statistical physics community in the late 60s.
A particular focus will be on developing scalable approaches, suited for large-scale systems, such as those encountered in haemodynamics.

Methodology and Approach

While these approaches are purely data-driven, they do not explicitly exploit the physical structure of the underlying system. We aim at leveraging a theoretically grounded approach to efficiently predict quantities of interest or (approximation of) the state of a system. We rely on the Mori-Zwanzig framework developed in the statistical physics community in the late 60s. In a nutshell, it formalizes the time-evolution of a set of variables x(t) related to the system as a function of their history, without requiring knowledge of the other variables describing the system

Accounting for the past essentially allows to isolate the dynamics of these observables. This framework is general and applies widely. For instance, when the whole state of the system is not accessible, the dynamics of the observables can be described with a non-Markovian model via this framework. It similarly provides a principled closure for coarse models which can be effectively complemented with a history-based term.

In this thesis, we will explore the potential of Signatures to efficiently approximate the history of the observations. The Signature transform has recently been used in several areas, including rough path theory, finance, stochastic control, and machine learning. It has proven to be an effective tool to summarize the information of paths and dependencies across different dimensions, with high computational efficiency. Signatures consist of iterated integrals of the history of its inputs and enjoys interpretability. They provide a way to linearize all possible functions of their input and exhibit nice theoretical properties. In particular, owing to tensor algebra, they can be efficiently updated when new observations become available, without recomputing the whole object.

Challenges

Many open questions however remain and will be the focus of this thesis. In particular, how are the different time scales of the physical system preserved across the Signature of its observations? What are the properties of the time series to retain in order to allow for a reliable and efficient prediction based on Signatures? How large should the truncation order be for a given performance? How frugal can the Signature-based term in the Mori-Zwanzig framework be in terms of training data, a critical point in many situations? Does the Mori-Zwanzig solution has a structure that can be exploited, such as low rankness, sparsity or multi-dependence which can be captured with tensor formats, etc.?
These methodological developments will first be illustrated on low-dimensional dynamical systems before, if time allows, being demonstrated on large scale real data from geophysics.

Team and research environment

The work will take place at the Laboratoire Interdisciplinaire des Sciences du Numérique (LISN — https://www.lisn.upsaclay.fr) on the campus of Université Paris-Saclay, benefiting from expertise of the research team in machine learning, applied mathematics, computer science, statistical physics, fluid mechanics and dynamical systems.

The PhD student will be integrated in a vibrant research team focused on scientific machine learning, deep learning, applied mathematics and statistical physics.
He/She will be advised by Lionel Mathelin and Onofrio Semeraro, both CNRS researchers.
In addition to the rich scientific environment of the Paris-Saclay, the student will benefit from the numerous interactions within the team, in particular with other PhD students and postdocs, and from the weekly seminars which provide exposition to a wide state-of-the-art research.

The candidate should ideally have a solid background in machine learning, applied maths and/or statistics. Knowledge in machine learning numerical framework (for instance, Pytorch, Jax or Julia) is a plus.

Les commentaires sont clos.