[StageM2] Active and Incremental Learning for Object Detection

★ The internship will take place at the FOX team of CRIStAL laboratory, at the University of Lille.

Summary :
Humans naturally acquire knowledge in a continual manner by progressively learning new concepts while retaining previously acquired ones. In contrast, machine learning models are prone to catastrophic forgetting [1], where adapting to non-independent and non-identically distributed (non-i.i.d) data distributions causes previously learned knowledge to be overwritten. This phenomenon is equally severe in object detection [2,3] related research domains. In this context, the problem of an incremental object detection (IOD) protocol is defined as follows: where the training samples for different object categories are presented sequentially across phases, and the learner/trainer is restricted from accessing data from earlier stages, which means the past data that were used for training previously.

Two dominant strategies for mitigating forgetting in continual learning, particularly in classification, are Knowledge Distillation (KD) and Exemplar Replay (ER). KD [4] introduces regularization terms that encourage the current model to preserve the previous knowledge when training on the new data. The key idea here is to encourage the new model’s logits or feature maps to be close to or in proximity to those of the old model.
Whereas, on the other hand, the ER methods [5] work instead of memorizing a limited set of training data (the exemplars) from earlier stages/phases and replaying them during subsequent training stages/phases to retain knowledge of the old object categories.

Meanwhile, state-of-the-art performance in object detection has been driven by transformer-based architectures, such as DETR [6], Deformable DETR [7], and UP-DETR [8]. However, existing studies indicate that KD and ER, when applied naively to these detectors, fail to deliver satisfactory incremental performance. In particular, applying standard KD or ER to Deformable DETR results in a significant performance gap compared to joint training with access to all data, i.e., the non-incremental setting.

The literature identifies two primary causes for this degradation. First, transformer-based detectors operate by evaluating a large set of object queries in parallel, the majority of which correspond to background regions. This leads to a highly imbalanced KD signal dominated by negative samples. Furthermore, because both old and new object categories can co-exist in any given training image, the KD loss and regular training objective can provide contradictory evidence. Second, ER methods for image classification try to sample the same number of exemplars for each category. In IOD, this is not a good strategy because the true object category distribution is typically highly skewed. Balanced sampling causes a mismatch between the training and testing data statistics.

Requirement of Active Learning: In this context, for the continuation of incremental learning through each stage, where the new training samples arrive along with the new object categories, it is required to annotate the data to be used for the training. The efforts required for the annotation in each stage should be optimized by choosing the correct images only to annotate, because this annotation is expensive and tedious. If we are required to do some annotation, it would make sense to annotate the images that will be of the greatest benefit when used for training. But how do we know which ones to choose? The goal of active learning is to tell us. Given a large unlabelled data pool, active learning (AL) [9] aims to sample data that would maximally improve a model’s performance if that data was annotated and used for training. There are typically two main streams of active learning :

Uncertainty-based AL methods: This category of methods selects samples that maximize a measure of model uncertainty, e.g., those with the least mutual information with the current set of labelled data.
Diversity-based AL approaches: This category of methods instead selects samples that are representative of the whole distribution of unlabeled data; this can be achieved by minimizing the similarities between the features or posterior probability vectors inside this subset.

In this context, we would like to work on Plug and Play Active Learning (PPAL), a plug-and-play AL algorithm for object detection. The PPAL module should be easy to use, which would require no modifications to architectures or training pipelines, and it works across a wide range of object detectors.
Furthermore, we plan that the PPAL will be a two-stage algorithm that would combine the uncertainty- and diversity-based sampling.

Research Directions :

We propose to address these limitations through the following directions:

Confident Prediction Merging :
We aim to select high-confidence object predictions from the previous model and merge them with ground-truth annotations of newly introduced categories. Conflicts between predictions and annotations can be explicitly resolved, after which standard bipartite matching can be performed between the merged label set and the current model’s predictions during incremental training.
Foreground-Aware Distillation:
By integrating old-model predictions directly into the matching process, the KD objective is implicitly implemented into the detection loss. The plan is to apply the KD loss only to foreground predictions that are correctly matched to the appropriate model’s hypothesis, avoiding contradictory gradients and redundant background supervision.
Distribution-Aware Exemplar Replay:
We further plan to improve ER by introducing a calibration mechanism that preserves the distribution of object categories observed in the training data.
Revised Incremental Detection Protocol:
In previous works, in each phase/stage, the incremental detector is allowed to observe all images that contain a certain type of object. Because images often contain a mix of object classes, both old and new, this means that the same image can be observed in different training phases. This violates the standard continual learning assumption/definition, where the training samples do not repeat across different phases/stages. We propose a revised IOD protocol that eliminates this overlap issue.
Plug-and-play Active Learning for Object Detection:
This active learning algorithm for object detection should combine uncertainty-based and diversity-based sampling. It should be plug-and-play, requiring no architectural modifications or any change to training pipelines.

We validate our approach, CL-DETR, on multiple transformer-based detectors, including Deformable DETR [61] and UP-DETR [10].

Summary of initial research direction :

Distillation-aware Knowledge Distillation (DKD) that resolves conflicts between old knowledge and new supervision while suppressing redundant background detections;
A calibrated exemplar replay strategy that aligns the exemplar memory with the true training data distribution;
A refined IOD benchmark protocol that prevents image reuse across incremental phases and adheres to standard continual learning principles.

Desired Profile :

Final-year Master’s student (M2) or engineering student specializing in machine learning, computer vision, or a related field.
Knowledge of computer vision, machine learning, and deep learning.
Programming skills (Python).
Autonomy, rigor, and critical thinking skills.

★ The internship will take place at the FOX team of CRIStAL laboratory, at University of Lille.

Address of the Internship :
CAMPUS Haute-Borne CNRS IRCICA-IRI-RMN
Parc Scientifique de la Haute Borne, 50 Avenue Halley, BP 70478, 59658 Villeneuve d’Ascq Cédex, France.

Candidature :

If this proposal interests you, please send the following documents to Dr. Tanmoy MONDAL (tanmoy.mondal@univ-lille.fr)

CV
Motivation Letter
Transcripts of grades obtained in Bachelor’s/Master’s/Engineering school as well as class ranking
Name and contact details of at least one reference person who can be contacted if necessary

References

James Kirkpatrick, Razvan Pascanu; et al. Overcoming catastrophic forgetting in neural networks. PNAS, pages 3521–3526, 2017.
K. J. Joseph, Jathushan Rajasegaran, Salman Khan, Fahad Shahbaz Khan, and Vineeth N Balasubramanian. Incremental object detection via meta-learning. TPAMI, 2021.
Binbin Yang, Xinchi Deng; et al. Continual object detection via prototypical task correlation guided gating mechanism. In CVPR, pages 9255–9264, 2022
Bowen Zhao, Xi Xiao, Guojun Gan; et al. Maintaining discrimination and fairness in class incremental learning. In CVPR, pages 13208–13217, 2020.
Liyuan Wang, Xingxing Zhang; et al. Memory replay with data compression for continual learning. In ICLR, 2022.
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. In ECCV, pages 213–229, 2020
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. Deformable DETR: deformable transformers for end-to-end object detection. In ICLR, 2021
Zhigang Dai, Bolun Cai, Yugeng Lin, and Junying Chen. UP-DETR: Unsupervised pre-training for object detection with transformers. In CVPR, pages 1601–1610, 2021.
Yang, C., Huang, L., & Crowley, E. J. (2024). Plug and Play Active Learning for Object Detection. 17784–17793. http://arxiv.org/abs/2211.11612

Annonce