[PhD] LiDAR–Image Fusion for Accurate and Domain-Adaptive Multi-View Stereo Reconstruction using Transformer-based Architectures

Keywords

Computer Vision, Photogrammetry, denes matching, multi view stereo, deep learning, multi-model

Contexte

Traditional 3D reconstruction methods based on stereo dense matching or Multi-View Stereo (MVS) reconstruction rely solely on photogrammetry and often fail in areas with low texture, specular surfaces, or complex geometries [6,8]. Meanwhile, LiDAR systems produce dense and accurate point clouds but lack continuous radiometric information, and the acquisition is expensive. Traditional methods struggle in difficult situations, for example, with low texture and thin objects [13]. With the development of deep learning, deep learning stereo dense matching or deep MVS methods have revolutionized 3D reconstruction by learning implicit stereo correspondence estimation from large datasets. But deep learning stereo dense matching still needs a fusion disparity map to a point cloud, and this degrades the advantage of the learning method, so deep MVS provides an end-to-end method for reconstruction. However, these methods remain limited in two key aspects: (1) Geometric accuracy — especially in textureless or repetitive regions; (2) Domain generalization — models trained on specific datasets fail to perform on new domains. LiDAR, with its dense and metrically precise 3D measurements, offers an ideal complementary source of geometric information [12].

Introduction and goals

State of the art

3D reconstruction is an important topic both for photogrammetry and computer vision. With the development of deep learning, learning methods outperform traditional methods. Stereo dense matching only uses two-view images; in real 3D reconstruction applications, large overlap can be achieved, and multi-view stereo can improve the robustness and ambiguity. Recent research on MVS (MVSNet [14], CasMVSNet [11], TransMVSNet [2], PatchmatchNet [10], GeoMVSNet [16]) has advanced 3D reconstruction through deep learning.

Deep learning based MVS is a widely explored topic in computer vision, because there are many datasets for example, DTU [3], Tanks and Temples [4], ETH3D [7], and BlendedMVS [15]. So there are many types of learning MVS methods, which can be categorized: (1) From cost-volume MVS to cascade; (2) PatchMatch and iterative methods; (3) Transformers and global context for improved matching method; (4) Geometry-aware based method; (5) Self-supervision based method; (6) Cross-modal fusion; (7) Large-scale application.

Concurrently, Transformer-based models have shown superior feature representation and matching capabilities [9]. Transformers bring significant advantages to deep learning MVS by modeling global context and long-range dependencies that CNNs cannot capture[1,2]. Through self- and cross-view attention, they aggregate information from multiple images, improving correspondence estimation in textureless, reflective, or occluded areas. Their ability to jointly reason about geometry and appearance enhances metric accuracy, while their global feature modeling provides better domain generalization across different environments. Moreover, Transformers offer a flexible architecture that can fuse heterogeneous data such as LiDAR and imagery, making them ideal for accurate and domain-adaptive 3D reconstruction.

Goals

However, few studies have integrated LiDAR data to guide MVS learning or explored cross-domain adaptation[5]. The objective of the project is (1) to explore the fusion of LiDAR and image in the MVS framework using the Transformer. (2) domain adaptation, because different types of data influence the performance a lot, but it is impossible to access the training data for every application, so exploring the domain adaptation is also important for real applications.

Benchmark Dataset Preparation: IGN is the national geospatial data provider in France, offering nationwide aerial and satellite imagery, as well as the LiDAR HD dataset with a density of nearly 10~points/m$^2$ covering most of the country. Assuming that the imagery and LiDAR point clouds are accurately co-registered, the first objective of the project will be to generate high-quality benchmark datasets from these multimodal geospatial data sources.
Transformer-based MVS with LiDAR Fusion: The main objective of the PhD project is to develop a transformer-based framework for LiDAR-image fusion in multi-view stereo (MVS) reconstruction. Existing state-of-the-art methods will first be evaluated on the benchmark dataset. The project will then investigate how to integrate 3D LiDAR features into the MVS network in order to improve the robustness and accuracy of 3D reconstruction.
Domain Adaptation: Domain adaptation remains a major challenge in photogrammetry applications. Deep learning models trained on data from one city often suffer significant performance degradation when applied to another city, and the gap becomes even larger when transferring across different sensor types. Considering the availability of both aerial and satellite imagery, this project will explore domain adaptation strategies between these two sensing modalities to improve the generalization capability of the proposed methods.

Organization

Duration: The doctoral contract is for a 3 years period, and may or may not include teaching tasks, depending on the candidate’s profile and preference.

Workplace: LASTIG Lab, Geodata Paris, Gustave Eiffel University, Champs-sur-Marne (RER A, station Noisy-Champs).

IGN(French Mapping Agency) is a Public Administrative Institution part of the French Ministry for Ecology and Sustainable Development. IGN is the national reference operator for the mapping of the territory; in particular, the agency is currently in charge of the 3D mapping program of France with LiDAR HD. The LASTIG is one of the research laboratories of IGN, attached to the Geodata Paris (ex-ENSG, Ecole Nationale des Sciences Géographiques), and Gustave Eiffel University in Grand Paris area.

Candidate profile

Only students who are citizens of the European Union, the United Kingdom, or Switzerland are eligible. The candidate should hold a Master’s degree in computer science, robotic or computer vision (master or engineering school); good knowledge in image, 3D data processing, and deep learning, as well as strong skills in programming (e.g. Python), knowing C/C++ is highly recommended. Good interpersonal skills, motivation for research and teamwork, initiative, writing skills, and proficiency in English are required.

Application

Send a mail to the contacts in a single PDF file below with:

CV
motivation letter
2 recommendation letters, or persons to contact
Transcript of grades from the last two years of study

Contact

Bruno VALLET, senior researcher,bruno.vallet@ign.fr

Ewelina RUPNIK, researcher, LASTIG : ewelina.rupnik@ign.fr

Teng WU, researcher, LASTIG : teng.wu@ign.fr

For more information, you can find the job description English version: https://drive.google.com/file/d/1wlnEJtfz-9EKknUZtvTzXZsJ9OhmxEyb/view?usp=sharing

French version: https://drive.google.com/file/d/1Bhg4dRFft_MnUMdDvTV9abPhxsuuz1Yb/view?usp=sharing

Annonce