Trained on a large-scale dataset, similar to other vision foundation models, Segment Anything Model (SAM) can generate fine-grained masks given manually defined visual prompts. Despite its remarkable success, however, it does not easily generalize to the segmentation of flexible objects like garments, since unlike the majority of objects, garments depict complex topological and geometric configurations, involving in particular strong self-occlusions or deformations. In this internship, we aim to leverage promptable segmentation capability of SAM to the challenging problem of garment image segmentation. Our focus will be on developing a dedicated prompt tuning and learning strategy that generates optimal prompts for SAM, enabling accurate and efficient segmentation of garment images.
We will proceed with the following tasks:
- Testing of foundation models: We will first test the performance of foundation models for segmentation. Among others, we will consider SAM (Segment Anything Model).
- Adaptation to garment images: We will adapt the chosen foundation model to our downstream task, i.e. segmentation of garment images. A devoted prompt tuning/learning strategies will be developed, eventually by basing on an available 3D dataset. A self-supervised loss will be integrated to exploit the domain specificity of the garment object.
- Experiements: The segmentation performance will be evaluated by comparing the results to the groundtruth. The developed model as well as state-of-the-art methods will be evaluated to compare the performances. Additionally, a number of ablation studies will be conducted to assess the impact of various components on performance.
https://mlms.icube.unistra.fr/img_auth_namespace.php/4/4c/Stage-Garment-SAM-2025_En.pdf