Annonce

Les commentaires sont clos.

M2 internship - Unsupervised language-aided landmark discovery and matching for visual localization in complex environments

9 December 2024


Catégorie : Stagiaire


1. General information

  • Position: M2 internship
  • Duration: 5 to 6 months, starting in February 2025 (flexible)
  • Location: Loria, Nancy, France
  • Affiliation: TANGRAM team (Inria-Loria)
  • Supervisors: Vincent Gaudillière, Marie-Odile Berger and Gilles Simon


2. Context, description and objectives

This internship will deal with the problem of visual localization, which involves determining a camera's viewpoint by automatically matching features in an image with elements from a known 3D model of the environment. These features are referred to as landmarks.

Object-based localization uses ``high-level'' landmarks, such as objects (e.g., chairs, tables, cupboards), as opposed to the more commonly used ``low-level'' keypoints (e.g., SIFT, ORB). This approach offers the advantage of relying on fewer, more discriminative landmarks but is currently limited to environments that are rich in common objects, often artificially created for research purposes. Moreover, creating the 3D model requires manually matching objects detected across multiple images, a process that can be time-consuming and tedious.

For this internship, we will focus on complex industrial environments (e.g., factories, power plants, ships) where the concept of an object is not always clearly defined. The goal is to automatically identify high-level landmarks in each image and ensure automatic matching of the detected landmarks across different images. To achieve this, we will employ ``unsupervised'' methods, which do not require environment-specific training, and explore the role of language in describing objects.

The first part of the internship will involve a literature review on unsupervised object localization in images and the use of vision-language models (e.g., CLIP). The second part will involve applying some of these methods to images of industrial environments and analyzing their results in terms of relevancy and repeatability. The final part will focus on proposing methods for automatically matching detected landmarks across different images.

3. Candidate profile

  • The candidate is pursuing his/her last year of Master's or engineering’s degree in Computer Vision, Electrical Engineering, Computer Science, Applied Mathematics or a related field.
  • A strong background in image processing or/and in computer vision is required.
  • A strong level of Python (or Matlab) programming is required.
  • An interest in deep learning frameworks (Pytorch) is also required.
  • Commitment, team working and a critical mind.
  • Good oral and written communication skills in English.


4. How to apply

Interested candidates are encouraged to send their applications (detailed CV, transcripts and a brief motivation letter) as soon as possible to the following address: vincent.gaudilliere@loria.fr. Applications will be processed upon reception.

Original proposal: https://vincentgaudilliere.github.io/files/Master_internship_proposal_2025.pdf