In collaboration with our partner EdF, and specifically the Automated Non-Destructive Testing Department, you will be in charge of the development of a defect detection algorithm for images obtained from CCTV inspections. This task addresses several challenges related to the fact that industrial defects must be detected, yet they appear scarcely, are tiny and visually subtle, and occur on surfaces with complex textures (machining marks, scratches), and potentially in blurred images which hinders their detection.

The cracks are scarce, resulting in an extremely imbalanced dataset, with positives vs negatives ratio as low as 1:1000. In such a setting, a standard training regime will result in a model that predicts « no crack » for everything to achieve 99.9% accuracy. Classical techniques will be experimented first such as Class-Balanced Sampling will be leveraged, to ensure each batch contains at least 10–20% positive examples by oversampling the minority class combined with various Data Augmentation techniques specifically for the crack patches (Buslaev, et al 2020).
Tiny objects at the limit of optical resolution require architectures that preserve high-resolution spatial information. First experiments with usual SOTA models like EfficientNet (Mingxing and Quoc, 2019) will allow to obtain baseline results and scores. Various modifications appear as possible improvements: Shallow Detection Heads: Modify the backbone to include a specialized « shallow » detection head that operates on the earliest, high-resolution feature maps (Yang, M., Bai, H., 2023). Attention Mechanisms: Integrating Efficient Channel Attention or Coordinate Attention modules is likely to help to suppress the « machining tool » noise and focus on the fine, non-periodic textures of cracks (Wang, Q., et al 2020, Hou, et al, 2021). Feature Pyramid Networks (FPN): Use an improved Path Aggregation Network (PANet) to ensure that semantic information from deep layers is effectively fused with the fine structural details of the shallow layers (Lin, et al, 2017, and Liu, et al, 2018). Optimization & Ensemble Strategy: This phase focuses on improving recall without sacrificing precision through specialized loss functions and model aggregation. Hybrid Loss Function: Experiment with a combination of Focal Loss and Generalized Dice Loss (Lin, et al, 2017, Sudre, et al, 2017 and Yeung, M., et al, 2021) with Focal Loss addressing the extreme imbalance by down-weighting « easy » negatives, while Dice Loss focusing on the spatial overlap, which is more effective for small, pixel-thin objects.
Once this baseline is established, an effort to push upwards the scores will focus on modern techniques of dataset oversampling by using deep synthetic image generation techniques. Traditional oversampling methods such as SMOTE and ADASYN provide an accessible baseline for addressing class imbalance by interpolating new minority samples between existing neighbors, but their reliance on local feature-space relationships renders them ineffective for high-dimensional image data, where they fail to capture global defect appearance distributions and produce unrealistic synthetic textures (Chawla et al., 2002; He et al., 2008). Deep generative models have largely superseded these approaches for image-based defect detection. Vanilla GANs learn to synthesize realistic defect images through adversarial generator-discriminator training, though they remain susceptible to mode collapse and limited sample diversity (Goodfellow et al., 2014). Conditional GANs (CGANs) extend this by conditioning both networks on class labels, enabling targeted generation of specific defect categories, though they exhibit high sensitivity to imbalance ratio — with F1-scores progressively collapsing as the imbalance ratio rises (Fajardo et al., 2021). VAEs offer a more stable latent-space alternative, while their conditional variant CVAEs further improve minority class F1-scores through class-guided generation, at the cost of higher computational overhead (Fajardo et al., 2021). BAGAN addresses data scarcity more directly by initializing the GAN from a pretrained autoencoder, enabling class-conditional image generation with minimal degradation across imbalance ratios — making it among the most robust image-oriented architectures under severe positives scarcity (Fajardo et al., 2021). For hyperspectral images, 3D-HyperGAMO extends this paradigm to spectral-spatial image volumes through a dedicated conditional patch generator, sustaining strong classification performance even at high imbalance ratios (Roy et al., 2022). Finally, GANSO addresses the most extreme scarcity regime by incorporating vector Markov Random Field structural priors into the adversarial framework, synthesizing realistic instances from very few original samples — a property particularly valuable in industrial contexts where collecting labeled defect images is costly or operationally constrained (Salazar et al., 2021).
The results of the research are to be tested on test images and the scores and robustness reported numerically.
The findings are expected to be published in high-ranked conferences and impacted journals. The outcome (a software) is to be used daily on the industrial premises for inspection.
Proposed Timeline and location
October 2026 – September 2029
part-time : EdF, Automated Non-Destructive Testing Department, Seine Saint-Denis and The STIM (Statistics and Images) research center at MINES Paris PSL, Fontainebleau
Expected Profile
- Finished CS/EE/Applied-Math Engineering program or a MSc research master program with excellent academic records.
- Excellent coding skills in Python, Proficiency in signal processing techniques and machine learning methods applied to images (TensorFlow/PyTorch).
- Excellent communications skills (oral and written).
- Excellent level of English (C1 or C2)
How to apply
Please send your application—including your CV, academic transcripts, cover letter and two or three contacts able to provide references—to: petr.dokladal@minesparis.psl.eu
Bibliography
Buslaev, A., Iglovikov, V. I., Khvedchenya, E., Parinov, Alex, Druzhinin, M., & Kalinin, A. A. (2020). Albumentations: Fast and Flexible Image Augmentations. Information, 11(2), 125. https://doi.org/10.3390/info11020125
Chawla, N. V. Bowyer, K. W., Hall, L. O. and Kegelmeyer W. P., « SMOTE: Synthetic minority over-sampling technique, » Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.
Fajardo, V. A. et al., « On oversampling imbalanced data with deep conditional generative models, » Expert Systems with Applications, vol. 169, May 2021. https://doi.org/10.1016/j.eswa.2020.114463
Goodfellow, I. J., et al., « Generative adversarial networks, » arXiv preprint, arXiv:1406.2661, 2014. https://doi.org/10.48550/arXiv.1406.2661
Mingxing Tan, Quoc V. Le, EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, https://doi.org/10.48550/arXiv.1905.11946
He, H., Bai, Y., Garcia, E. A. and Li, S., « ADASYN: Adaptive synthetic sampling approach for imbalanced learning, » Proceedings of the International Joint Conference on Neural Networks, pp. 1322–1328, 2008. https://doi.org/10.1109/IJCNN.2008.4633969
Yang, M., Bai, H. (2023). A Shallow Information Enhanced Efficient Small Object Detector Based on YOLOv5. https://doi.org/10.1007/978-3-031-44195-0_1
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., & Hu, Q. (2020). ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Hou, Q., Zhou, D., & Feng, J. (2021). Coordinate Attention for Efficient Mobile Network Design. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2117–2125. https://doi.org/10.1109/CVPR.2017.227
Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. (2018). Path aggregation network for instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 8759–8768. https://doi.org/10.1109/CVPR.2018.00913
Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2980–2988.
Roy S. K., Haut J. M., Paoletti M. E., Dubey S. R., and Plaza A., « Generative adversarial minority oversampling for spectral-spatial hyperspectral image classification, » IEEE Transactions on Geoscience and Remote Sensing, vol. 60, 2022. https://doi.org/10.1109/TGRS.2021.3052048
Salazar A., Vergara L., and Safont G., « Generative adversarial networks and Markov random fields for oversampling very small training sets, » Expert Systems with Applications, vol. 163, Jan. 2021. https://doi.org/10.1016/j.eswa.2020.113819
Solovyev, R., Wang, W., & Gabruseva, T. (2021). Weighted boxes fusion: Ensembling boxes from different object detection models. Image and Vision Computing, 107, 104117. https://doi.org/10.1016/j.imavis.2021.104117
Sudre, C. H., Li, W., Vercauteren, T., Ourselin, S., & Cardoso, M. J. (2017). Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, 240–248.
Yeung, M., Sala, E., Schönlieb, C. B., & Rundo, L. (2021). Unified Focal loss: Generalising Dice and cross entropy-based losses to handle class imbalanced medical image segmentation. arXiv. https://doi.org/10.48550/arxiv.2102.04525
