Keywords
Computer Vision, Photogrammetry, denes matching, multi view stereo, deep learning, multi-model
Contexte
Traditional 3D reconstruction methods based on stereo dense matching or Multi-View Stereo (MVS) reconstruction rely solely on photogrammetry and often fail in areas with low texture, specular surfaces, or complex geometries [6,8]. Meanwhile, LiDAR systems produce dense and accurate point clouds but lack continuous radiometric information, and the acquisition is expensive. Traditional methods struggle in difficult situations, for example, with low texture and thin objects [13]. With the development of deep learning, deep learning stereo dense matching or deep MVS methods have revolutionized 3D reconstruction by learning implicit stereo correspondence estimation from large datasets. But deep learning stereo dense matching still needs a fusion disparity map to a point cloud, and this degrades the advantage of the learning method, so deep MVS provides an end-to-end method for reconstruction. However, these methods remain limited in two key aspects: (1) Geometric accuracy — especially in textureless or repetitive regions; (2) Domain generalization — models trained on specific datasets fail to perform on new domains. LiDAR, with its dense and metrically precise 3D measurements, offers an ideal complementary source of geometric information [12].
Introduction and goals
State of the art
3D reconstruction is an important topic both for photogrammetry and computer vision. With the development of deep learning, learning methods outperform traditional methods. Stereo dense matching only uses two-view images; in real 3D reconstruction applications, large overlap can be achieved, and multi-view stereo can improve the robustness and ambiguity. Recent research on MVS (MVSNet [14], CasMVSNet [11], TransMVSNet [2], PatchmatchNet [10], GeoMVSNet [16]) has advanced 3D reconstruction through deep learning.
Deep learning based MVS is a widely explored topic in computer vision, because there are many datasets for example, DTU [3], Tanks and Temples [4], ETH3D [7], and BlendedMVS [15]. So there are many types of learning MVS methods, which can be categorized: (1) From cost-volume MVS to cascade; (2) PatchMatch and iterative methods; (3) Transformers and global context for improved matching method; (4) Geometry-aware based method; (5) Self-supervision based method; (6) Cross-modal fusion; (7) Large-scale application.
Concurrently, Transformer-based models have shown superior feature representation and matching capabilities [9]. Transformers bring significant advantages to deep learning MVS by modeling global context and long-range dependencies that CNNs cannot capture[1,2]. Through self- and cross-view attention, they aggregate information from multiple images, improving correspondence estimation in textureless, reflective, or occluded areas. Their ability to jointly reason about geometry and appearance enhances metric accuracy, while their global feature modeling provides better domain generalization across different environments. Moreover, Transformers offer a flexible architecture that can fuse heterogeneous data such as LiDAR and imagery, making them ideal for accurate and domain-adaptive 3D reconstruction.
Goals
However, few studies have integrated LiDAR data to guide MVS learning or explored cross-domain adaptation[5]. The objective of the project is (1) to explore the fusion of LiDAR and image in the MVS framework using the Transformer. (2) domain adaptation, because different types of data influence the performance a lot, but it is impossible to access the training data for every application, so exploring the domain adaptation is also important for real applications.
Organization
Duration: The doctoral contract is for a 3 years period, and may or may not include teaching tasks, depending on the candidate’s profile and preference.
Workplace: LASTIG Lab, Geodata Paris, Gustave Eiffel University, Champs-sur-Marne (RER A, station Noisy-Champs).
IGN(French Mapping Agency) is a Public Administrative Institution part of the French Ministry for Ecology and Sustainable Development. IGN is the national reference operator for the mapping of the territory; in particular, the agency is currently in charge of the 3D mapping program of France with LiDAR HD. The LASTIG is one of the research laboratories of IGN, attached to the Geodata Paris (ex-ENSG, Ecole Nationale des Sciences Géographiques), and Gustave Eiffel University in Grand Paris area.
Candidate profile
Only students who are citizens of the European Union, the United Kingdom, or Switzerland are eligible. The candidate should hold a Master’s degree in computer science, robotic or computer vision (master or engineering school); good knowledge in image, 3D data processing, and deep learning, as well as strong skills in programming (e.g. Python), knowing C/C++ is highly recommended. Good interpersonal skills, motivation for research and teamwork, initiative, writing skills, and proficiency in English are required.
Application
Send a mail to the contacts in a single PDF file below with:
Contact
Bruno VALLET, senior researcher,bruno.vallet@ign.fr
Ewelina RUPNIK, researcher, LASTIG : ewelina.rupnik@ign.fr
Teng WU, researcher, LASTIG : teng.wu@ign.fr
For more information, you can find the job description English version: https://drive.google.com/file/d/1wlnEJtfz-9EKknUZtvTzXZsJ9OhmxEyb/view?usp=sharing
French version: https://drive.google.com/file/d/1Bhg4dRFft_MnUMdDvTV9abPhxsuuz1Yb/view?usp=sharing
