Annonce


[PhD] Language-aided Detection and Matching of Semantic Landmarks for Visual Localization in Complex Environments

24 Avril 2025


Catégorie : Postes Doctorant ;

Plus d'informations, lien externe :

General information

  • Position: PhD
  • Duration: 36 months, starting in September/October 2025
  • Location: Loria, Nancy, France
  • Affiliation: TANGRAM team (Inria-Loria)
  • Supervision: Gilles Simon (supervisor), Vincent Gaudillière (co-advisor) and Marie-Odile Berger

Description

Context

Landmark detection, description and matching is the cornerstone of autonomous visual localization systems deployed in unknown environments. While most widely-adopted and accurate solutions exploit low-level landmarks such as points or lines, dealing with large-scale and/or visually ambiguous environments remains highly challenging due to the inherent multiplicity, ambiguity and sensitivity of such local primitives. In the perspective of visual localization systems with broader scope of application, high-level landmarks such as objects present in the scene have proven to offer key advantages such as lower multiplicity, higher detection repeatability across viewpoints and sensors, and potentially lower ambiguity compared to their local counterparts. However, current solutions are limited to pre-defined categories of objects and detectors need to be fine-tuned to handle novel uncommon categories. The recent emergence of zero-shot or open-vocabulary object detectors based on vision-only and vision-language foundation models represents a promising alternative, but their exploitability for solving precise visual localization task (i.e., pose estimation) is still to demonstrate. Moreover, the challenges posed by complex man-made environments such as factories, often featuring intra-class variations of specialized equipment rather than common distinctive objects, are to be addressed. Ultimately, the question of environments that do not contain objects per se, such as natural terrains, remains largely unexplored.

Objectives

The research of this PhD will be articulated around the concept of useful landmark for localization, that can fit different environments and application scenarios. Indeed, unlike cases where object detection or segmentation methods are used with no objective than their own, using objects as landmarks for localization introduces specific constraints. Notably, landmarks must be consistently perceived from a wide range of viewpoints and reliably re-identified when they reappear in new images. Such requirements might be more or less stringent depending on the type of environment within which the system is deployed. In other words, perceiving common objects in moderately complex scenes is less demanding than perceiving uncommon objects in real-life specialized environments. To understand the complexity of landmark selection and derive automated processes, we are targeting challenging application scenarios within complex unknown environments, such as autonomous computer vision systems operating in a factory or on an extraterrestrial planet.

To address these challenges, we propose to exploit the possibilities offered by pre-trained foundation models and we are particularly interested in the possible contributions of vision-language alignment models such as CLIP. More precisely, we want to first examine how general-purpose unsupervised detection and segmentation models can be guided towards extracting Potential Objectness Landmark (POL) in specialized environments in a zero-shot manner, by leveraging adequate visual and text prompting strategies. We then want to study how language-based description of POL can encapsulate geometric and semantic properties relevant for POL re-identification across viewpoints, according to the way these descriptions are extracted from images. Finally, we want to combine the proposed landmark detection and description approaches with off-the-shelf object-based localization methods, in order to be tested in two complementary types of environments: industrial settings (e.g., factories, plants, ships) and extraterrestrial terrains (i.e., Moon or Mars surface).

Profile

  • The candidate holds a Master’s or engineering’s degree in Computer Vision, Electrical Engineering, Computer Science, Applied Mathematics or a related field.
  • A strong background in image processing or/and in computer vision is required.
  • Strong programming skills in Python.
  • Strong mathematical background.
  • Familiarity with deep learning frameworks such as PyTorch.
  • Commitment, team working and a critical mind.
  • Fluent verbal and written communication skills in English.

How to apply

Interested candidates are encouraged to send their applications (detailed CV, transcripts and a brief motivation letter) as soon as possible to the following addresses: gilles.simon@loria.fr and vincent.gaudilliere@loria.fr. Applications will be processed upon reception.

Les commentaires sont clos.