Generative Zero-Shot Learning for Real-Time Object Tracking using 3D Point Cloud Sequences (LiDAR Datasets) in Collaborative Robot Environment

STAGE MASTER

Scientific fields: Computer science, Artificial Intelligence, Computer Vision

Keywords: Generative Deep Learning; Zero-Shot Learning; Transformers; Mixture-of-Experts; 3D Point Cloud Sequences; Multi-LiDAR Datasets; Real-Time Object Tracking; Human-Robot Collaboration (HRC); Human-System Interaction (HSI); Real Industrial Environment

Research interest: Computer Vision; Machine Learning; Deep Learning Research work: Zero-Shot Deep Learning Models for Real-Time Object Tracking using 3D Point Cloud Sequences (LiDAR Datasets) in Collaborative Robot Environment

In the emerging paradigm of Industry 5.0, more intelligent and adaptable collaborative robots (cobots) are replacing traditional robots [1]. Ensuring safe and efficient interaction between cobots and humans depends heavily on the cobots’ ability to achieve a comprehensive semantic understanding of dynamic actions and to perform precise object detection and tracking in industrial environments. These capabilities are also critical for various robotics applications [1], including autonomous driving [2], [3], [4]. Despite the growing importance of semantic understanding in such applications, research on object detection [5], [6], real-time 3D object tracking [7], [8] in collaborative robot workspaces, real-time 3D object tracking models remains insufficient. However, 3D object tracking [7], [9] is gaining significant attention from both academia and industry as a fundamental task in AI-driven domains such as robotics [7], [8], autonomous vehicles [2], [10], and Extended Reality [11].

The integration of zero-shot learning (ZSL) [5], [12], [13] with real-time object tracking in collaborative robotics represents a significant leap forward in artificial intelligence and 3D computer vision. In this research, the generative approach leverages the power of zero-shot learning (ZSL) model for real-time object tracking leveraging 3D point cloud sequences (multi-LiDAR datasets [14]) in collaborative robotic environments. Zero-shot learning (ZSL) models can process and interpret complex 3D point cloud sequences andenable robots to detect and track previously unseen objects [5], [12], [13]without the need for task-specific retraining. By integrating multiple LiDAR datasets and advanced generative deep learning architectures [7], [8], [9], including transformer-based models [15] and mixture-of-experts models [16], [17], we will try to demonstrate a robust performance in tracking previously unseen objects without prior training examples.

The proposed approach uniquely will combine spatial-temporal feature extraction from point cloud sequences with a scalable generative transformer [15]with mixture-of-experts architecture [16], [17], enabling efficient processing of high-dimensional LiDAR data streams.

The main objective of the proposed internship is to provide a comprehensive literature review on three technologies: Zero-shot learning (ZSL), Real-time object detection/tracking and LiDAR sequences datasets, which have strong potential to promote and optimize operational efficiency and intelligent systems integration within Industry 5.0, facilitating advancements in automation, quality control, and safety measures in industrial environments.

In this internship, student will propose Zero-shot learning (ZSL) models. Then, student will train these models on different 3D Point cloud datasets (COVERED-CollabOratiVE Robot Environment Dataset [14] and PandaSet [18] LiDAR dataset) for smart real-time object detection/tracking for Human-Robot Collaboration (HRC) in real industrial environment.

Work plan:

The working plan in general is divided in two phases:

1) In the first phase (about two-months), student will provide the state of-the-art (SOTA) of the Zero-shot learning models (machine/deep learning) applied for Real-Time Object Detection and Object Tracking using 3D Point Cloud LiDAR datasets. Then student will test SOTA models on COVERED 3D Point Cloud datasets [14].

2) In the second phase (about four months), student would propose contributions to the following research directions:

Proposing new zero-shot learning model based on 3D Point Cloud Transformers and Mixture-of-Experts for Smart Real-Time Object Detection and Object Tracking
Studying the properties of such models (complexity, expressivity, frugality)
Application of the proposed zero-shot learning model on 3D Point Cloud as COVERED datasets [14] and Pandaset Lidar Data [18]for Smart Real-Time Object Detection and Object Tracking tasks to measure the accuracy and study the performance. Expected scientific production Different scientific productions, write an international peer-reviewed conference paper or an indexed journal paper are expected:
Journal publication relating to the literature review about Generative Zero-Shot learning for 3D Real Time Object Tracking using 3D Point Cloud datasets
Publication relating to our proposal of a new Generative Zero-Shot learning model for 3D Real Time Object Tracking, model performance and evaluation based on training/validation and testing on Human-Robot Collaboration datasets for 3D Point Cloud Datasets in real industrial environment.

Introduction to the laboratory CESI LINEACT- Research Unit

CESI LINEACT (Digital Innovation Laboratory for Companies and Learnings at the service of the territories competitiveness) is the CESI group laboratory whose activities are implemented on CESI campuses.

Link to the laboratory website:

https://lineact.cesi.fr/en/

https://lineact.cesi.fr/en/research-unit/presentation-lineact/

CESI LINEACT (EA 7527), Digital Innovation Laboratory for Business and Learning at the service of the Competitiveness of Territories, anticipates and accompanies the technological mutations of the sectors and services related to industry and construction. CESI’s historical proximity to companies is a determining factor for our research activities and has led us to focus our efforts on applied research close to companies and in partnership with them. A human-centered approach coupled with the use of technologies, as well as the territorial network and the links with training, have allowed us to build transversal research; it puts the human being, his needs and his uses, at the center of its problems and approaches the technological angle through these contributions.

Its research is organized according to two interdisciplinary scientific themes and two application areas.

Theme 1 « Learning and Innovation » is mainly concerned with Cognitive Sciences, Social Sciences and Management Sciences, Training Sciences and Techniques and Innovation Sciences. The main scientific objectives of this theme are to understand the effects of the environment, and more particularly of situations instrumented by technical objects (platforms, prototyping workshops, immersive systems, etc.) on the learning, creativity and innovation processes.
Theme 2 « Engineering and Digital Tools » is mainly concerned with Digital Sciences and Engineering. The main scientific objectives of this theme concern the modeling, simulation, optimization and data analysis of industrial or urban systems. The research work also focuses on the associated decision support tools and on the study of digital twins coupled with virtual or augmented environments.

These two themes develop and cross their research in the two application areas of the Industry of the Future and the City of the Future, supported by research platforms, mainly the one in Rouen dedicated to the Factory of the Future and the one in Nanterre dedicated to the Factory and Building of the Future.

CESI LINEACT RESEARCH THEME:

Human-System Interactions (HSI)

https://lineact.cesi.fr/en/engineering-and-numerical-tools/thematics/human-system-interactions-theme

Your application must include :

A detailed curriculum vitae.
A cover letter explaining why the candidate is interested in this internship.
Master 1 and 2 transcripts (to be adapted to the level of the internship)
Recommendation letters if available
Any other documents you consider useful such as project reports, publications, datasets, codes, related to this internship topic.

Please send all documents in one file.

Your skills:

Scientific and technical skills:

1. Master Research student or in the final year of Engineering School in Computer sciences

2. Python programming skills and experience with standard Computer Vision and Machine/Deep Learning library’s

3. Basics in Machine Learning and Deep Learning Neural Networks, CNN, Transformers and Mixture-of-Experts

4. Skills in Machine/Deep Learning Frameworks: Pytorch, Keras, TensorFlow

5. Computer Vision applications, 3D Point Cloud Object Detection, Object Tracking, etc.

6. Soft computing skills to take in hand the expected materials, e.g. Python with Google Colab, Jupyter Lab/Notebook, Weights & Biases (wandb), etc.

7. Other skills in Matlab 2024a (Lidar Toolbox, Computer vision Toolbox, Deep Learning and Machine Learning Toolbox)

8. Ability for writing Master report

9. Fluency in English to write an international peer-reviewed conference paper or an indexed journal paper with impact factor

Interpersonal skills:

Be autonomous and have a spirit of initiative and curiosity,
Know how to work in a team and have good interpersonal skills,
Be rigorous

References

[1] M. H. Zafar, E. F. Langås, and F. Sanfilippo, “Exploring the synergies between collaborative robotics, digital twins, augmentation, and industry 5.0 for smart manufacturing: A state-of-the-art review,” Robotics and Computer-Integrated Manufacturing, vol. 89, p. 102769, Oct. 2024, doi: 10.1016/j.rcim.2024.102769.

[2] S. Y. Alaba and J. E. Ball, “A Survey on Deep-Learning-Based LiDAR 3D Object Detection for Autonomous Driving,” Sensors, vol. 22, no. 24, p. 9577, Dec. 2022, doi: 10.3390/s22249577.

[3] J. Fang et al., “LiDAR-CS Dataset: LiDAR Point Cloud Dataset with Cross-Sensors for 3D Object Detection,” Mar. 05, 2024, arXiv: arXiv:2301.12515. Accessed: Nov. 20, 2024. [Online]. Available: http://arxiv.org/abs/2301.12515

[4] D. Choi, W. Cho, K. Kim, and J. Choo, “iDet3D: Towards Efficient Interactive Object Detection for LiDAR Point Clouds,” Dec. 24, 2023, arXiv: arXiv:2312.15449. Accessed: Nov. 20, 2024. [Online]. Available: http://arxiv.org/abs/2312.15449

[5] A. Cheraghian, S. Rahman, and L. Petersson, “Zero-shot Learning of 3D Point Cloud Objects,” Feb. 27, 2019, arXiv: arXiv:1902.10272. Accessed: Nov. 09, 2024. [Online]. Available: http://arxiv.org/abs/1902.10272

[6] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “PointPillars: Fast Encoders for Object Detection From Point Clouds,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA: IEEE, Jun. 2019, pp. 12689–12697. doi: 10.1109/CVPR.2019.01298.

[7] L. B. Chung and D. D. Nguyen, “REAL-TIME OBJECT DETECTION AND TRACKING FOR MOBILE ROBOT USING YOLOV8 AND STRONG SORT,” UniTech, vol. 116, no. 11, Nov. 2023, doi: 10.32743/UniTech.2023.116.11.16223.

[8] T. B. Tuli and M. Manns, “Real-Time Motion Tracking for Humans and Robots in a Collaborative Assembly Task,” in The 6th International Electronic Conference on Sensors and Applications, MDPI, Nov. 2019, p. 48. doi: 10.3390/ecsa-6-06636.

[9] M. Simon, S. Milz, K. Amende, and H.-M. Gross, “Complex-YOLO: Real-time 3D Object Detection on Point Clouds,” Sep. 24, 2018, arXiv: arXiv:1803.06199. Accessed: Nov. 20, 2024. [Online]. Available: http://arxiv.org/abs/1803.06199

[10] J. Mao, S. Shi, X. Wang, and H. Li, “3D Object Detection for Autonomous Driving: A Comprehensive Survey,” Apr. 04, 2023, arXiv: arXiv:2206.09474. Accessed: Nov. 20, 2024. [Online]. Available: http://arxiv.org/abs/2206.09474

[11] Y. Feddoul, N. Ragot, F. Duval, V. Havard, D. Baudry, and A. Assila, “Exploring human-machine collaboration in industry: a systematic literature review of digital twin and robotics interfaced with extended reality technologies,” Int J Adv Manuf Technol, vol. 129, no. 5–6, pp. 1917–1932, Nov. 2023, doi: 10.1007/s00170-023-12291-3.

[12] A. Cheraghian, S. Rahman, T. F. Chowdhury, D. Campbell, and L. Petersson, “Zero-Shot Learning on 3D Point Cloud Objects and Beyond,” Apr. 11, 2021, arXiv: arXiv:2104.04980. Accessed: Nov. 19, 2024. [Online]. Available: http://arxiv.org/abs/2104.04980

[13] F. Pourpanah et al., “A Review of Generalized Zero-Shot Learning Methods,” Jul. 13, 2022. doi: 10.1109/TPAMI.2022.3191696.

[14] C. Munasinghe, F. M. Amin, D. Scaramuzza, and H. W. van de Venn, “COVERED, CollabOratiVE Robot Environment Dataset for 3D Semantic segmentation,” Apr. 04, 2023. doi: 10.1109/ETFA52439.2022.9921525.

[15] G. K. Erabati and H. Araujo, “Li3DeTr: A LiDAR based 3D Detection Transformer,” Oct. 27, 2022, arXiv: arXiv:2210.15365. Accessed: Nov. 20, 2024. [Online]. Available: http://arxiv.org/abs/2210.15365

[16] T. Chen et al., “AdaMV-MoE: Adaptive Multi-Task Vision Mixture-of-Experts,” in 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France: IEEE, Oct. 2023, pp. 17300–17311. doi: 10.1109/ICCV51070.2023.01591.

[17] A. Alboody and R. Slama, “EPT-MoE: Toward Efficient Parallel Transformers with Mixture-of-Experts for 3D Hand Gesture Recognition,” presented at the The 10th World Congress on Electrical Engineering and Computer Systems and Science, Aug. 2024. doi: 10.11159/mvml24.105.

[18] P. Xiao et al., “PandaSet: Advanced Sensor Suite Dataset for Autonomous Driving,” Dec. 23, 2021, arXiv: arXiv:2112.12610. doi: 10.48550/arXiv.2112.12610.

https://cesirh.talentview.io/jobs/k05lvt?utm_source=mail