École d’Été Peyresq 2025
Thème Quantification d’incertitude Le GRETSI et le GdR IASIS organisent depuis 2006 une École d’Été...
3 Janvier 2025
Catégorie : Stagiaire
Scientific fields: Computer science, Artificial Intelligence, Computer Vision
Keywords: Generative Few-Shot Learning; Transformers; Mixture-of-Experts; RGB+D Datasets; Human-System Interaction (HSI)
Research interest: Computer Vision; Machine Learning; Deep Learning
Research work: Deep Learning Models for 3D Skeleton-based Human Action Recognition
3D Skeleton-based Human Action Recognition (HAR) [1], [2] is fundamental task of pattern recognition and computer vision and a key issue in many applications, e.g., in medical and industrial imaging, robotics and VR/AR.
Human Action Recognition (HAR) [1], [2] decodes human movements by analyzing sequential 3D skeletal joint coordinates obtained through advanced sensor technologies like motion capture devices, depth cameras such as Microsoft Kinect, Intel RealSense, and wearable motion sensors. These sensors track body joint positions in real-time, enabling sophisticated computational analysis of human actions and gestures across diverse domains.
Recently, researchers in [3] introduced few-shot generative models for skeleton-based human action recognition, enabling accurate action classification using limited training samples from specific domains. They leveraged large public datasets (NTU RGB+D 120 [4, p. 120] and NTU RGB+D [5]) to develop cross-domain generative models. By introducing novel entropy-regularization losses, they effectively transferred motion diversity from source to target domains, enabling more robust action recognition with limited training samples. They used a standard model called Spatial temporal graph convolutional network GCN (ST-GCN) [6] for generation of action recognition samples, then, they trained the Few-shot generative model with the concatenated data of the real data and generated samples by ST-GCN.
Few-shot scenarios [3], [6], [7] are often the case when training HAR models with very limited labeled data; this is considered a major obstacle to practical use because collecting human action data and annotating labels correctly are time consuming and labor intensive. In [8], they proposed few-shot learning using Cross-Domain HAR. Self-training is proposed to adapt representations learned in a labeled source domain (defined by activities, sensor positions, and users) to the target domain with very limited labeled data.
In this internship, we will develop Generative Few-Shot Learning Models for HAR that aims to generate actions samples and data augmentation of limited training data. We will propose a novel approach to 3D Skeleton-based Human Action Recognition (HAR) by combining Generative Few-Shot Learning with Mixture-of-Experts (MoE) Transformers [9], [10], [11]. The proposed approach aims to improve the efficiency and accuracy of action recognition in RGB+D datasets while addressing the challenges of limited training data.
The main key concepts of the Generative Few-Shot Learning for HAR:
1.3D Skeleton-based Human Action Recognition (HAR):
2.Generative Few-Shot Learning:
3.Generative MoE Transformers:
The main potential of Generative Few-Shot Learning with MoE Transformer Architecture for 3D Skeleton-based HAR:
In this internship, we will train the proposed models on different datasets for 3D Human action recognition. Finally, to measure the accuracy and study the performance, we will test the proposed models on “NTU RGB+D” and “NTU RGB+D 120” Datasets.
Work plan:
The working plan in general is divided in two phases:
1) In the first phase (about two-months), student will provide the state of-the-art (SOTA) of the Few-Shot learning models (machine/deep learning) applied for 3D Skeleton-based Human Action Recognition (HAR). Then, student will test SOTA models on “NTU RGB+D” and “NTU RGB+D 120” Datasets.
2) In the second phase (about four months), student would propose contributions to the following research directions:
Expected scientific production
Different scientific productions, write an international peer-reviewed conference paper or an indexed journal paper are expected:
Introduction to the laboratory CESI LINEACT- Research Unit
CESI LINEACT (Digital Innovation Laboratory for Companies and Learnings at the service of the territories competitiveness) is the CESI group laboratory whose activities are implemented on CESI campuses.
Link to the laboratory website:
CESI LINEACT (EA 7527), Digital Innovation Laboratory for Business and Learning at the service of the Competitiveness of Territories, anticipates and accompanies the technological mutations of the sectors and services related to industry and construction. CESI's historical proximity to companies is a determining factor for our research activities and has led us to focus our efforts on applied research close to companies and in partnership with them. A human-centered approach coupled with the use of technologies, as well as the territorial network and the links with training, have allowed us to build transversal research; it puts the human being, his needs and his uses, at the center of its problems and approaches the technological angle through these contributions.
Its research is organized according to two interdisciplinary scientific themes and two application areas.
These two themes develop and cross their research in the two application areas of the Industry of the Future and the City of the Future, supported by research platforms, mainly the one in Rouen dedicated to the Factory of the Future and the one in Nanterre dedicated to the Factory and Building of the Future.
CESI LINEACT RESEARCH THEME:
Human-System Interactions (HSI)
Your application must include :
Please send all documents in one file.
Your skills:
Scientific and technical skills:
1. Master Research student or in the final year of Engineering School in Computer sciences
2. Python programming skills and experience with standard Computer Vision and Machine/Deep Learning library’s
3. Basics in Machine Learning and Deep Learning Neural Networks, GAN, Transformers and Mixture-of-Experts
4. Skills in Machine/Deep Learning Frameworks: Pytorch, Keras, TensorFlow
5. Computer Vision applications, Image Classification, Action Recognition, etc.
6. Soft computing skills to take in hand the expected materials, e.g. Python with Google Colab, Jupyter Lab/Notebook, Weights & Biases (wandb), etc.
7. Ability for writing Master report
8. Fluency in English to write an international peer-reviewed conference paper or an indexed journal paper with impact factor
Interpersonal skills:
References
[1] A. Ali, E. Pinyoanuntapong, P. Wang, and M. Dorodchi, “Skeleton-based Human Action Recognition via Convolutional Neural Networks (CNN),” Jan. 30, 2023, arXiv: arXiv:2301.13360. Accessed: Sep. 09, 2024. [Online]. Available: http://arxiv.org/abs/2301.13360
[2] H. Duan, Y. Zhao, K. Chen, D. Lin, and B. Dai, “Revisiting Skeleton-based Action Recognition,” Apr. 02, 2022, arXiv: arXiv:2104.13586. Accessed: Oct. 10, 2024. [Online]. Available: http://arxiv.org/abs/2104.13586
[3] K. Fukushi, Y. Nozaki, K. Nishihara, and K. Nakahara, “Few-shot generative model for skeleton-based human action synthesis using cross-domain adversarial learning,” in 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA: IEEE, Jan. 2024, pp. 3934–3943. doi: 10.1109/WACV57701.2024.00390.
[4] J. Liu, A. Shahroudy, M. Perez, G. Wang, L.-Y. Duan, and A. C. Kot, “NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 10, pp. 2684–2701, Oct. 2020, doi: 10.1109/TPAMI.2019.2916873.
[5] A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang, “NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis,” Apr. 11, 2016, arXiv: arXiv:1604.02808. Accessed: Sep. 20, 2024. [Online]. Available: http://arxiv.org/abs/1604.02808
[6] L. Wang, J. Liu, and P. Koniusz, “3D Skeleton-based Few-shot Action Recognition with JEANIE is not so Naïve,” Dec. 23, 2021, arXiv: arXiv:2112.12668. Accessed: Nov. 20, 2024. [Online]. Available: http://arxiv.org/abs/2112.12668
[7] L. Xu, Q. Wang, X. Lin, and L. Yuan, “An Efficient Framework for Few-shot Skeleton-based Temporal Action Segmentation,” Jul. 20, 2022, arXiv: arXiv:2207.09925. Accessed: Nov. 20, 2024. [Online]. Available: http://arxiv.org/abs/2207.09925
[8] M. Thukral, H. Haresamudram, and T. Ploetz, “Cross-Domain HAR: Few Shot Transfer Learning for Human Activity Recognition,” Oct. 22, 2023, arXiv: arXiv:2310.14390. Accessed: Nov. 20, 2024. [Online]. Available: http://arxiv.org/abs/2310.14390
[9] A. Alboody and R. Slama, “Graph Transformer Mixture-of-Experts (GTMoE) for 3D Hand Gesture Recognition,” in Intelligent Systems and Applications, vol. 1067, K. Arai, Ed., in Lecture Notes in Networks and Systems, vol. 1067. , Cham: Springer Nature Switzerland, 2024, pp. 317–336. doi: 10.1007/978-3-031-66431-1_21.
[10] A. Alboody and R. Slama, “EPT-MoE: Toward Efficient Parallel Transformers with Mixture-of-Experts for 3D Hand Gesture Recognition,” presented at the The 10th World Congress on Electrical Engineering and Computer Systems and Science, Aug. 2024. doi: 10.11159/mvml24.105.
[11] T. Chen et al., “AdaMV-MoE: Adaptive Multi-Task Vision Mixture-of-Experts,” in 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France: IEEE, Oct. 2023, pp. 17300–17311. doi: 10.1109/ICCV51070.2023.01591.
[12] C. Plizzari, M. Cannici, and M. Matteucci, “Spatial Temporal Transformer Network for Skeleton-Based Action Recognition,” in Pattern Recognition. ICPR International Workshops and Challenges, vol. 12663, A. Del Bimbo, R. Cucchiara, S. Sclaroff, G. M. Farinella, T. Mei, M. Bertini, H. J. Escalante, and R. Vezzani, Eds., in Lecture Notes in Computer Science, vol. 12663. , Cham: Springer International Publishing, 2021, pp. 694–701. doi: 10.1007/978-3-030-68796-0_50.
[13] X. Wang, X. Wang, B. Jiang, and B. Luo, “Few-Shot Learning Meets Transformer: Unified Query-Support Transformers for Few-Shot Classification,” Aug. 26, 2022, arXiv: arXiv:2208.12398. Accessed: Nov. 20, 2024. [Online]. Available: http://arxiv.org/abs/2208.12398