Réunion
Deformable Object Modelling Trends: from Perception to Applications
Axes scientifiques :
- Audio, Vision et Perception
- Apprentissage machine
Organisateurs :
Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.
Inscriptions
11 personnes membres du GdR IASIS, et 10 personnes non membres du GdR, sont inscrits à cette réunion.
Capacité de la salle : 140 personnes. 119 Places restantes
Annonce
Description:
This GDR IASIS workday focuses on recent advancements and ongoing challenges in the sensing, perception, registration, reconstruction, tracking and manipulation of deformable objects. These objects, characterized by transformations ranging from elasticity to isometry (geodesic-preserving), display complex behaviors that pose significant obstacles for geometric modeling and machine learning. They also often require considering specific properties of the underlying manifold defined by the object surface, e.g., when performing optimization or feature learning. These challenges often limit the development of effective algorithms for diverse applications. This GDR meeting aims to unite researchers from the computer vision, machine learning and robotics communities to collaboratively discuss these issues, exchange ideas, and explore future directions in this evolving field.
Topics:
The following topics and call for contributions and not limited to:
- – Sensing for deformable objects: plenoptic, events-based, RGB-D, ultrasound
- – Perception: visual description, matching and shape analysis for deformable objects
- – Optimization on manifolds, physics-informed feature learning
- – Deformable 3D reconstruction, deformable surface registration and tracking, human and animal pose estimation
- – Datasets, benchmarks and tools for analysis and synthesis of data of deformable objects
- – Applications: scene analysis and understanding, medical robotics, robot manipulation, gaming and VR/AR
Keynote speakers:
- – Adrien Bartoli – Université Clermont Auvergne ( https://encov.ip.uca.fr/ab/ )
- – Shizhe Chen – Inria ( https://cshizhe.github.io/ )
- – Maks Ovsjanikov – École Polytechnique ( https://www.lix.polytechnique.fr/~maks/ )
- – Nicolas Thome – Sorbonne Université ( https://thome.isir.upmc.fr/ )
Call of contributions:
Beyond the keynote speakers, this GDR meeting will have short talks of 15 mins covering the topics of interest. The selection of the contributed talks will be done with submission of proposals in the form of abstracts (title, authors and summary with max 200 words) before March 15, 2025 March 27, 2025 to the organizers. Please note that registration is free but mandatory on the GdR IASIS website.
Important dates:
- –
March 15, 2025March 27, 2025 : deadline for the submission of abstracts - –
March 22, 2025March 28, 2025 : notification to authors & program digest - – April 18, 2025: workshop day
Organizers and contact:
- Shaifali Parashar – LIRIS UMR CNRS 5205, CNRS
Email: shaifali.parashar@liris.cnrs.fr
- Renato Martins – ICB UMR CNRS 6303, Université Bourgogne Europe
Email: renato.martins@u-bourgogne.fr
Acknowledgements: We deeply thank Inria Paris for providing the location for this GDR meeting.
Programme
09h00-09h10: Welcome (meet and greet)
09h10-09h15: Introduction to the workshop
09h15-10h15: Keynote by Nicolas Thome - "Machine Learning in Scientific Applications: From Enhanced Forecasting to Hybrid Controllers"
10h15-10h35: Walid Abouzoul - "Optimization and Actuation Strategies for Morphing Wings: From Geometric Prediction to Physical Deformation"
10h35-10h40: Break
10h40-11h40: Keynote by Adrien Bartoli - "Reconstructing 3D Deformable Objects from 2D Images Helps Minimally-Invasive Surgery"
11h40-12h00: Dinh Vinh Thuy Tran - “Image-Guided Shape-from-Template Using Mesh Inextensibility Constraints”
12h00-13h30: Lunch Break
13h30-14h30: Keynote by Shizhe Chen - "Generalizable Robot Manipulation with 3D Vision and Language"
14h30-14h50: Mandela Ouafo Fonkoua - “Asservissement visuel d'objets déformables à partir d'espaces latents et d'un modèle FEM élastique dynamique”
14h50-15h00: Break
15h00-16h00: Keynote by Maks Ovsjanikov - "Learning general purpose features on deformable 3D shapes"
16h00-16h20: Ruochen Chen - "Function-based Representation and Learning: A Novel Technique for Efficient Digital 3D Manipulation"
16h20-16h40: Quentin Rapilly - "Prédiction de Surfaces Paramétriques pour la Segmentation Multi-objets en Imagerie Biologique 3D"
16h40-17h00: Closing remarks and discussions
Détail du Programme
"Machine Learning in Scientific Applications: From Enhanced Forecasting to Hybrid Controllers"
Nicolas Thome - Sorbonne Université
Abstract: Recent progress in Artificial Intelligence (AI) and Machine Learning (ML) holds great promise for scientific applications. In this talk, I present some of my contributions to the field of AI for science. Firstly, I will discuss an ML approach for learning turbulent fluid dynamics. I introduce a publicly available dataset containing ~1M 2D meshes resulting from simulations of unsteady fluid dynamics. Additionally, I present a new mesh transformer that leverages node clustering, graph pooling, and global attention to capture long-range dependencies effectively. In physics-informed ML, I showcase a seminal approach for learning residual models, where a data-driven component enhances an incomplete physical prior. I introduce the APHYNITY framework, which provides a principled approach to decomposing dynamics into physical and statistical components. I then illustrate how this concept has been applied to enhance unsupervised video prediction and optical flow estimation. Finally, I present our efforts to integrate physics-informed predictors within model-based reinforcement learning (RL). I introduce the PhIHP RL framework, which combines the strengths of model-based (MB) and model-free (MF) RL methods. PhIHP achieves remarkable sample efficiency and offers an excellent trade-off between inference time and performance compared to state-of-the-art MB and MF approaches. Additionally, I discuss an extension of this framework designed to handle hard constraints during inference by addressing inference delays and directly learning controllers on robotic platforms.
Bio: Nicolas Thome is a full professor at Sorbonne Universite (Paris, France). His research interests include machine learning and deep learning for understanding low-level signals, e.g. vision, time series, acoustics, etc.. His current application domains are targeted towards healthcare, autonomous driving and physics.
_______
“Optimization and Actuation Strategies for Morphing Wings: From Geometric Prediction to Physical Deformation”
Walid Abouzoul1,2, Souad Tayane2, Jaafar Gabera1, Mohamed Ennaji2, Ahmed Amine Chafik1
1 FEMTO-ST/DISC/OMNI, UMR CNRS 6174, Univ. Bourgogne Franche-Comté, UTBM Belfort
2 Complex Cyber-Physical Systems Laboratory, Hassan II University ENSAM Casablanca, Morocco.
Abstract: Programmable matter, an emerging field at the intersection of computer science, mechatronics, and materials science, aims to create systems capable of controlled deformation or adaptation—an idea central to morphing wing technologies. Morphing wings represent a promising approach to enhance aerodynamic performance, adaptability, and energy efficiency in modern aircraft. This work focuses on the integrated modeling and actuation of deformable airfoils, addressing both shape optimization and physical implementation. Initially, a data-driven framework that uses geometric airfoil equations and machine learning models is used to predict the optimal airfoil profile under specific flight conditions. Once the optimal geometry is identified, shape memory alloy actuators are employed to morph the baseline airfoil into its target configuration. Strategic actuator placement and control schemes ensure smooth, energy-efficient deformations. In parallel, a key focus is placed on the development of a morphing skin that accommodates the airfoil’s shape changes. This skin must balance flexibility for deformation with sufficient stiffness to preserve aerodynamic smoothness and structural integrity. Altogether, this study establishes a continuous pipeline from geometry-based optimization to material-based actuation, highlighting a cohesive approach to the design and control of morphing structures in aerospace applications.
_______
"Reconstructing 3D Deformable Objects from 2D Images Helps Minimally-Invasive Surgery"
Adrien Bartoli - Université Clermont Auvergne
Abstract: Reconstructing 3D deformable objects from 2D images has a long-standing history in computer vision. The general setting may involve a single or multiple images, and a deformable object model. It may involve correspondences and other visual cues such as shading or an end-to-end process, and an object deformation model. Such general techniques find an interesting application in minimally-invasive surgery, where the surgeon observes the patient’s internal organs by means of a surgical camera inserted by a small incision. This surgical approach has limited haptic feedback and is in need for assistive tools to ease the surgeon’s spatial understanding of hidden structures such as tumours. This may be achieved by combining a patient 3D model obtained before surgery by means of imaging (typically CT or MRI) with the surgical camera images. Interestingly, this forms an instance of 3D deformable object reconstruction from 2D images problem, where the object of interest is an organ and the object model is found from the preoperative imaging. This reconstruction problem is however particularly challenging, owing to the high organ shape and appearance variability, large organ deformations, and limited correspondences which can be established. We will see how the general techniques and knowledge from computer vision can be adapted to develop a working surgical assistance system, review numerous clinical results establishing the strong benefit of deploying such a system, and highlight steps for future work.
Bio: Adrien Bartoli has held the position of Professor of Computer Science at Université Clermont Auvergne since fall 2009 and has been a member of Institut Universitaire de France since 2016. He is currently on leave as research scientist at the University Hospital of Clermont-Ferrand and as Chief Scientific Officer at SurgAR. He leads the Endoscopy and Computer Vision (EnCoV) research group at the University and the AI research department at the Hospital of Clermont-Ferrand. His main research interests are in computer vision, including image registration and Shape-from-X for rigid and deformable environments, and their application to computer-aided medical interventions.
_______
“Image-Guided Shape-from-Template Using Mesh Inextensibility Constraints”
Dinh Vinh Thuy Tran1, Ruochen Chen1, Shaifali Parashar1
1 LIRIS UMR CNRS 5205, CNRS, Ecole Centrale de Lyon
Abstract: Shape-from-Template (SfT) refers to the class of methods that reconstruct the 3D shape of a deforming object from images/videos using a 3D template. Traditional SfT methods require point correspondences between images and the texture of the 3D template in order to reconstruct 3D shapes from images/videos in real time. Their performance severely degrades when encountered with severe occlusions in the images because of the unavailability of correspondences. In contrast, modern SfT methods use a correspondence-free approach by incorporating deep neural networks to reconstruct 3D objects, thus requiring huge amounts of data for supervision. Recent advances use a fully unsupervised or self-supervised approach by combining differentiable physics and graphics to deform 3D template to match input images. In this paper, we propose an unsupervised SfT which uses only image observations: color features, gradients and silhouettes along with a mesh inextensibility constraint to reconstruct at a 400 times faster pace than (best-performing) unsupervised SfT. Moreover, when it comes to generating finer details and severe occlusions, our method outperforms the existing methodologies by a large margin.
_______
"Generalizable Robot Manipulation with 3D Vision and Language"
Shizhe Chen - Inria Paris
Abstract: Training robots to assist humans in daily tasks has been a longstanding dream in robotics. To achieve this, robots need to communicate effectively with humans, understand complex 3D environments, and execute precise actions. While current robot policies perform well in controlled enviornments, they struggle with accurate action execution and generalization to novel objects, scenes, and tasks. In this talk, I will present our recent efforts to enhance robotic perception, reasoning and action capabilities by integrating 3D vision and language. First, I will introduce a 3D point cloud-based vision-language policy that improves spatial understanding and action precision, bridging the gap between natural language instructions and fine-grained 3D motion control. Next, I will discuss two novel methods addressing data scarcity challenges for robotic manipulation: one using simulated data to pretrain 3D representations that transfer robustly to real-world tasks, and another leveraging human videos to learn dexterous robotic skills. Finally, I will present GemBench, a comprehensive benchmark for evaluating robotic policies on novel rigid/articulated objects and long-horizon tasks. I will also share our insights in integrating LLMs and 2D foundation models with 3D-based policies, aiming to build more generalizable, accurate and efficient robot manipulation systems.
Bio: Shizhe Chen is a research scientist at WILLOW project-team in Inria Paris. She received her bachelor’s and PhD degrees at Renmin University of China in 2015 and 2020 respectively, supervised by Prof. Qin Jin. She then spent two wonderful post-doctoral years at Inria Paris collaborating with Dr. Ivan Laptev and Dr. Cordelia Schmid. Shizhe’s primary interests lie in embodied AI, vision and language, and multimodal deep learning. She has published over 40 peer-reviewed papers in leading conferences in computer vision, machine learning and robotics such as CVPR, ICCV, ECCV, NeurIPS, ICLR, ACM MM, CoRL, ICRA and IROS. She also served as area chairs in CVPR, ICCV, ECCV, ACM MM, NeurIPS, ICML and ICLR.
_______
“Asservissement visuel d'objets déformables à partir d'espaces latents et d'un modèle FEM élastique dynamique”
Mandela Ouafo Fonkoua1, François Chaumette1, Alexandre Krupa1
1Equipe Rainbow, Centre Inria de l'Université de Rennes/IRISA
Abstract: Nous présentons un algorithme de commande de la forme d'un objet déformable à l’aide d’un bras robotisé. Notre approche repose sur l’intégration de la vision par ordinateur, des modèles d’éléments finis dynamiques (FEM) et des représentations en espace latent, afin d’assurer une commande précise et rapide de la déformation. Notre méthodologie s’appuie sur une fonction différentiable, qui projette un maillage 3D de grande dimension représentant un objet souple dans un espace latent de faible dimension, optimisé pour la commande. En exploitant le modèle FEM et la différentiabilité de cette fonction, nous établissons une relation analytique entre l’évolution de la représentation latente et le mouvement à six degrés de liberté (6 DOF) de l’effecteur du robot. Sur cette base, nous développons un algorithme d’asservissement de forme en boucle fermée. Pour améliorer la robustesse face aux incertitudes et aux approximations du modèle, nous intégrons un suivi visuel en temps réel à l’aide d’une caméra RGB-D, permettant une correction continue des écarts entre la configuration réelle de l’objet et sa représentation numérique. L’efficacité de notre approche est validée expérimentalement sur divers objets déformables élastiques, en utilisant deux techniques distinctes d’extraction des caractéristiques de forme : l’Analyse en Composantes Principales (ACP) et les autoencodeurs.
_______
- "Learning general purpose features on deformable 3D shapes"
Maks Ovsjanikov - École Polytechnique
Abstract: In this talk I will give an overview of several recent advances in learning features for rigid and non-rigid 3D shapes. My main goals will be two-fold: first, to show how generalizable feature pre-training can be obtained and used in downstream tasks involving non-rigid 3D shapes. Secondly, I'll discuss several applications, including deformable shape matching and protein analysis among others.
Bio: Maks Ovsjanikov is a Professor at Ecole Polytechnique and a Visiting Researcher at Google DeepMind. He works on 3D shape analysis. He obtained his PhD from Stanford University under the supervision of Leonidas Guibas. He has received a Eurographics Young Researcher Award, an ERC Starting Grant, an ERC Consolidator Grant and a CNRS Bronze Medal. His works have received 11 best paper awards or nominations at top conferences, including CVPR, ICCV, 3DV, and a Test of Time Award at SIGGRAPH in 2023, among others. His main research topics include 3D shape comparison and deep learning on 3D data.
_______
“Function-based Representation and Learning: A Novel Technique for Efficient Digital 3D Manipulation”
Ruochen Chen1, Dinh Vinh Thuy Tran1, Shaifali Parashar1
1 LIRIS UMR CNRS 5205, CNRS, Ecole Centrale de Lyon
Abstract: In this paper, we present a novel, function-based representation of surfaces, PolyFit, which is obtained by fitting jet functions locally on surface patches. Such a representation can be learned efficiently in a supervised fashion from both analytical functions and real data. Once learned, it can be generalized to various types of objects. Using Polyfit, the surfaces can be efficiently manipulated digitally using only the jet parameters (instead of per-vertex manipulation in case of meshes) for many downstream tasks in computer vision and graphics. We demonstrate the capabilities of our proposed methodologies with two well-known applications: 1) Shape-from-template (SfT): where the goal is to deform the input 3D template of an object as seen in image/video. Using PolyFit, we adopt an unsupervised scheme that outperforms existing methods in terms of computation and accuracy. 2) Garment draping: where the goal is to transform a garment from the rest state to plausibly fit into the given body pose and shape. Using PolyFit, we adopt a self-supervised learning scheme which is mesh- and garment-agnostic, and generalises well to a large set of garments. It is 10 times faster than the existing best performing methods.
_______
"Prédiction de Surfaces Paramétriques pour la Segmentation Multi-objets en Imagerie Biologique 3D"
Quentin Rapilly1, Pierre Maindron, Anaïs Badoual, Guenaelle Bouet Challon, Charles Kervrann1
1 Equipe SAIRPICO, Centre Inria de l'Université de Rennes/IRISA
Abstract: Les algorithmes de segmentation multi-objets sont utilisés dans de nombreux domaines. L'essor de l'apprentissage profond a permis des progrès considérables en termes de vitesse de traitement et de précision des prédictions. Néanmoins, certaines méthodes traditionnelles, telles que les surfaces actives, présentent des caractéristiques que les méthodes d'apprentissage profond ne peuvent pas fournir : une représentation géométrique continue des objets, l'intégration d'informations à priori sur leurs formes. Ces caractéristiques sont utiles en biologie pour segmenter efficacement des données bruitées et mal résolues, puis pour comprendre les interactions entre les cellules segmentées. Nous avons développé une méthode de segmentation hybride dédiée à la segmentation multi-objets d'images 3D qui combine l'efficacité de l'apprentissage profond et la puissante représentation des surfaces actives. Nous avons évalué notre méthode sur des jeux de données 3D réels et synthétiques de microscopie à fluorescence. Nous l’avons comparé aux algorithmes les plus utilisés par les biologistes : des méthodes 3D directes (Stardist 3D, Omnipose) et des méthodes d’agrégation basées sur des modèles 2D (CellStitch, USegment3D, Cellpose). Les résultats obtenus montrent que notre méthode offre des performances comparables, voire supérieures à celles des techniques de l'état-de-l'art, en particulier lorsque les objets à segmenter présentent des formes complexes (non convexes-étoilées).