Stage Master
Location: Ecole d’ingénieurs de numérique (Isep), Paris, France
Starting date: February 2025 (negotiable if necessary)
Supervisors: Idowu AJAYI and Wafa NJIMA
General Presentation / Technological Context
The premise in the field of Large Language Models (LLMs) is that the larger the pre-trained model, the better the performance [1]. However, many of the existing LLMs are generic and not domain specific. Full instruction fine-tuning of these models is computationally expensive and often not feasible with limited resources. In this project, we are interested in using an efficient fine-tuning method known as Low-Rank Adaptation (LoRA) [2]. Some of the pre-trained models for this project include one of the FLAN family of models: FLAN-T5 small [3], DistilBERT [4], ALBERT [5], MiniLM [6]. The original models’ weights are frozen, and only the additional LoRA-related weights, which are lighter, are trained using telecommunication-related documents. This approach prevents catastrophic forgetting [7] of other domains, ensures the fine-tuned model becomes proficient in our domain of choice, and, most importantly, is computationally feasible.
Specifications / Tasks
Some of the internship objectives include:
• State-of-the-art on LLM fine-tuning with a focus on Parameter Efficient Fine-tuning (PEFT).
• Study the model card for pre-trained models such as Flan-T5, BERT, etc to understand how it works.
• Prompt engineering using In-Context Learning to manually evaluate the model performance in order to establish a baseline.
• Instruct fine-tuning of the pre-trained models using Telecommunication documents from 3GPP, IEEE, etc.
• Test the effectiveness of the fine-tuned models using metrics such as ROUGE and BERTScore [8].
• Test the fine-tuned models to ensure no catastrophic forgetting.
• Conduct a comparative analysis of the pre-trained models in relation to the task.
Qualifications and Required Skills
This internship has a minimum 5 months duration beginning in February 2025. Internships will be awarded on a rolling basis and candidates are encouraged to apply early.
• A master’s student in Data Science, Artificial Intelligence, and related disciplines.
• Good mastery of AI libraries such as PyTorch, TensorFlow, Keras, Scikit-Learn, etc.
• Good programming skills in Python.
• Good understanding of LLMs.
• Excellent problem-solving and analytical skills.
• Strong communication skills and teamwork experience.
• Good level in oral and writing English (French optional).
How to Apply
Interested candidates should send a detailed CV, a one-page motivation letter, two academic references and M1 transcript to idowu.ajayi@isep.fr and wafa.njima@isep.fr. For applications, the subject of your email should be » Internship Application – Domain-Specific Parameter-Efficient Fine-Tuning ».
Closing date: 11:59 pm 31st December 2024 (GMT+1 Time Zone)
Interviews will be conducted by videoconference between 13-17 January 2025.
References
[1] N. Ding et al., “Parameter-efficient fine-tuning of large-scale pre-trained language models,” Nat. Mach. Intell., vol. 5, no. 3, pp. 220–235, Mar. 2023, doi: 10.1038/s42256-023-00626-4.
[2] E. J. Hu et al., “LoRA: Low-Rank Adaptation of Large Language Models,” arXiv.org. Accessed: Sep. 06, 2024. [Online]. Available: https://arxiv.org/abs/2106.09685v2
[3] S. Longpre et al., “The Flan Collection: Designing Data and Methods for Effective Instruction Tuning,” in Proceedings of the 40th International Conference on Machine Learning, PMLR, Jul. 2023, pp. 22631–22648. Accessed: Sep. 06, 2024. [Online]. Available: https://proceedings.mlr.press/v202/longpre23a.html
[4] V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” ArXiv, Oct. 2019, Accessed: Dec. 03, 2024. [Online]. Available: https://www.semanticscholar.org/paper/DistilBERT%2C-a-distilled-version-of-BERT%3A-smaller%2C-Sanh-Debut/a54b56af24bb4873ed0163b77df63b92bd018ddc
[5] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “ALBERT: A Lite BERT for Self-supervised Learning of Language Representations,” presented at the International Conference on Learning Representations, Sep. 2019. Accessed: Dec. 03, 2024. [Online]. Available: https://openreview.net/forum?id=H1eA7AEtvS
[6] W. Wang, F. Wei, L. Dong, H. Bao, N. Yang, and M. Zhou, “MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers,” in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2020, pp. 5776–5788. Accessed: Dec. 03, 2024. [Online]. Available: https://proceedings.neurips.cc/paper/2020/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
[7] A. Kumar, S. Agarwal, and D. J. Hemanth, “A Methodology-Oriented Study of Catastrophic Forgetting in Incremental Deep Neural Networks,” arXiv.org. Accessed: Sep. 06, 2024. [Online]. Available: https://arxiv.org/abs/2405.08015v1
[8] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, “BERTScore: Evaluating Text Generation with BERT,” arXiv.org. Accessed: Sep. 06, 2024. [Online]. Available: https://arxiv.org/abs/1904.09675v3