[StageM2] Equilibrium propagation based learning mechanism for graph transformer

★ This internship proposal falls within the scope of two research groupes of the CRIStAL laboratory of the Université de Lille, namely the GT Image, whose main focus is to develop new tools and algorithms for image analysis, video scene interpretation, and 3D object shape analysis, and the GT DatInG, whose focus is on machine learning, data mining and signal processing.

The internship will take place at the SCOOL team of CRIStAL laboratory, at the University of Lille.

Summary :
In recent years, “Graph Convolutional Networks (GCNs)” have become one of the most popular choices for modeling relational data due to their ability to capture both the local and global structure of the graphs. Hence, GCN has shown promising results in various tasks related to the spatial-temporal data. In the same manner, the “Transformer” models have also revolutionized the field of Natural Language Processing (NLP) and have become the most popular architecture to be used for several NLP tasks. In addition to NLP tasks, the Transformer architecture (i.e. “Vision Transformer”) is also applied to tasks beyond NLP, such as skeleton-based human action recognition, landmark-based human gesture recognition, and facial expression recognition, etc.

Note that all of these tasks (i.e. human facial expression recognition, human action recognition, and human gesture recognition from video) can be structured and represented as a sequence of video frames, where each video frame can be structured by constructing a graph based on the detected landmarks (for facial expression and human gesture) in it. In the case of human action recognition, the human body can be represented by a skeleton, based on detected landmarks. Considering the similarity between these three tasks, where all of these tasks can be represented by graph structure and the temporal sequences can be represented by dynamic spatial-temporal patterns, in this work we propose to develop a “Graph-Transformer” model [1,2,3] that combines the strengths of spectral GCNs for learning spatial-temporal representations and a “Transformer” model for capturing the long-term dependencies in video sequences.

Motivation :
The main scientific objective of this internship will be to make a contribution in “Graph-Transformer” model and make it a generalized architecture that can be applied to all of these three tasks. The “Graph-Transformer” based spatial-temporal model will learn the dynamics of spatial-temporal correlations of hand skeletons. It consists of a spectral GCNs for spatial-temporal feature learning, followed by a “Transformer-Encoder Only” model for capturing global temporal dependencies.

Data in multiple tasks, like human facial expression recognition, human action recognition and human gesture recognition from videos, can be structured and represented as a sequence video frames, where each video frames can be structured by constructing graph based on the detected landmarks (for facial expression and human gesture) in it. In the case of human action recognition, the human body can be represented by a skeleton, based on detected landmarks. Considering the similarity between these three tasks, where all of these tasks can be represented by graph structure and the temporal sequences can be represented by dynamic spatial-temporal patterns, we want to understand why and how to optimally use the graph dynamics to learn from spatio-temporal data, specifically, with neural networks.

In this project, we would like to develop an energy-based learning model [1] for efficient and neurally plausible learning from spatio-temporal data. The motivation comes from the fact that the backpropagation algorithm [2] to train neural networks is biologically implausible, as it requires a special computational circuit and a special kind of digital computation in the second phase of training. As an alternative, Equilibrium Propagation (EP) [3] is proposed as a learning algorithm that uses a single circuit for inference and gradient computation and yields an unbiased, variance-free gradient estimation. We have recently extended EP to propose Lagrangian EP [4] that can work with time-varying inputs and a computationally-efficient version of it, namely Recurrent Hamiltonian Echo Learning (RHEL) [5]. But presently these methods have to be scaled for real-life ML tasks and spatio-temporal data. We would like to develop an efficient algorithm that leverages the graph dynamics [6, 7] to develop an EP-like training method for neural networks working on spatio-temporal data and deploy it at scale.

Desired Profile :

Final-year Master’s student (M2) or engineering student specializing in machine learning, computer vision and neural rendering, or a related field.
Knowledge of computer vision, machine learning, and deep learning.
Generative models (diffusion models, GANs, or neural implicit representations)
Image super-resolution and perceptual learning
Programming skills (Python).
Autonomy, rigor, and critical thinking skills.

★ The internship will take place at the FOX team of CRIStAL laboratory, at University of Lille.

Address of the Internship :
CAMPUS Haute-Borne CNRS IRCICA-IRI-RMN
Parc Scientifique de la Haute Borne, 50 Avenue Halley, BP 70478, 59658 Villeneuve d’Ascq Cédex, France.

Candidature :

If this proposal interests you, please send the following documents to
Dr. Debabrota Basu (debabrota.basu@inria.fr), Dr. Tanmoy MONDAL (tanmoy.mondal@univ-lille.fr) and Dr. Deise Santana Maia (deise.santanamaia@univ-lille.fr)

CV
Motivation Letter
Transcripts of grades obtained in Bachelor’s/Master’s/Engineering school as well as class ranking
Name and contact details of at least one reference person who can be contacted if necessary

References

Pineda, Fernando. « Generalization of back propagation to recurrent and higher order neural networks. » Neural information processing systems. 1987.
Hertz, J., Krogh, A., Lautrup, B. and Lehmann, T., 1997. « Nonlinear backpropagation: doing backpropagation without derivatives of the activation function« . IEEE Transactions on neural networks.
Scellier, B. and Bengio, Y., 2017. « Equilibrium propagation: Bridging the gap between energy-based models and backpropagation ». Frontiers in computational neuroscience.
Pourcel, G., Basu, D., Ernoult, M. and Gilra, A., 2025. « Lagrangian-based Equilibrium Propagation: generalization to arbitrary boundary conditions & equivalence with Hamiltonian Echo Learning« . arXiv preprint arXiv:2506.06248.
Pourcel, G. and Ernoult, M. « Learning long-range dependencies through time reversal symmetry breaking« . Neural information processing systems. (NeurIPS), 2025.
Qin, Y., Ju, W., Wu, H., Luo, X. and Zhang, M., 2024. « Learning graph ode for continuous-time sequential recommendation« . IEEE Transactions on Knowledge and Data Engineering.
Jin, M., Zheng, Y., Li, Y.F., Chen, S., Yang, B. and Pan, S., 2022. Multivariate time series forecasting with dynamic graph neural odes. IEEE Transactions on Knowledge and Data Engineering.

Annonce

[StageM2] Equilibrium propagation based learning mechanism for graph transformer

IASIS en chiffres

A noter

Cartographie des expertises du GdR

Actus de la communauté

Graphes, la science des liens

Jean-Louis Lacoume (1940-2026)

GreenDays 2026

L’intelligence artificielle pour les sciences

Concours Chercheurs CNRS