Supervisors: A. Macario Barros · F.-M. Ngoule Mboula
Laboratoire Instrumentation Intelligente, Distribuée et Embarquée (LIIDE) · CEA
Start: October 2026 · Deadline: May 18, 2026
Automatic detection, 3D localization and classification of acoustic events (falls, alarms, gunshots) in uncontrolled environments remains an open problem. Existing approaches rely on microphone arrays with fixed known geometry, or on single platforms whose localization capabilities are inherently limited to one observation point.
A network of heterogeneous mobile platforms offers a distributed and redundant sensing architecture with spatial coverage that no individual platform can achieve. The central scientific challenge is that platforms have no prior knowledge of their relative spatial configuration. Without GPS, the network geometry must be inferred dynamically from the audio-visual observations themselves, without any external infrastructure.
The research is structured around three axes:
- Axis 1: Network spatial calibration. Acoustic events simultaneously perceived by several platforms generate geometric constraints between their poses, exploited to infer relative positions and orientations incrementally without any dedicated calibration procedure.
- Axis 2: Direction fusion and 3D localization. Each platform estimates the direction of arrival of the sound in its local frame; these estimates are fused in a common world frame to reconstruct the 3D position of the source, even under partial occlusion or signal degradation.
- Axis 3: Cooperative classification. Audio-visual representations extracted by each platform are combined by a fusion module to produce a robust final classification (fall, alarm, gunshot, mechanical failure, etc.).
Algorithms will be validated both in simulation and on real platforms equipped with cameras and microphone arrays.
Expected outcomes: inter-platform calibration algorithm without external infrastructure; cooperative 3D localization pipeline robust to partial failures; cooperative classifier outperforming single-platform approaches.
Profile: MSc in Signal Processing, Computer Vision, Robotics or ML · Python/PyTorch · background in audio and/or vision · embedded systems experience is a plus · English required, French appreciated.
Eligibility: Nationals of the EU, Switzerland or the United Kingdom only.
To apply, send CV and motivation letter (1 page max) to andrea.barros@cea.fr before May 18, 2026.
More information: https://instn.cea.fr/these/apprentissage-multimodaldistribue-pour-la-localisation-et-la-classification-cooperatives-de-sources-acoustiques/
