Annonce

Les commentaires sont clos.

M2 internship: Enhancing and Diversifying Low-Resolution Videos using AI-driven Image Generation

7 Novembre 2024


Catégorie : Stagiaire


M2 internship

Title: Enhancing and Diversifying Low-Resolution Videos using AI-driven Image Generation

Context and motivation:

This internship project aims to develop a method for enhancing and diversifying low-resolution videos, leveraging the power of large image generation models—specifically, Stable Diffusion [1]. Many existing computer vision datasets contain low-quality video footage that limits the effectiveness of model training and real-world applications [2]. Through this project, the intern will focus on utilizing AI to upscale video resolution and enrich visual variety, thus improving both the quality and diversity of these datasets. The core objective is to apply AI-driven image generation techniques to alter scenes within videos, transforming them by adjusting environments, modifying object appearances, and experimenting with different visual styles while maintaining coherence and realism across frames. For example, low-resolution videos depicting common settings like urban areas or natural scenes could be enhanced to high-definition quality, diversified into various conditions, or reimagined with unique visual themes, such as futuristic, seasonal, or artistic transformations.

Proposed Solution:

To achieve these goals, the project will employ ComfyUI [3], an open-source tool that facilitates complex workflows in Stable Diffusion. ComfyUI’s node-based system allows users to construct custom pipelines for image processing and frame transformation, making it a flexible platform for tasks like image upscaling, style transfer, and scene editing. Importantly, ComfyUI enables developers to create customized nodes for specialized processes, allowing the intern to optimize and streamline the video transformation process.

Key Tasks and Responsibilities:

  1. Developing Enhancement Pipelines: The intern will design and implement pipelines in ComfyUI to upscale and refine low-resolution video frames, ensuring continuity and visual coherence.
  2. Creating Diversity through Scene Transformation: Using AI-driven style transfer and object manipulation, the intern will explore ways to add diversity to videos, experimenting with new scenes, weather conditions, and visual styles.
  3. Optimizing Workflow for Automation: By designing custom nodes within ComfyUI, the intern will contribute to an automated process that applies these enhancements across large video datasets.

Requirements:

Prospective interns should have a strong background in computer vision, deep learning, and experience with deep learning frameworks like PyTorch or TensorFlow. The ability to work with video data and experience in video processing is a plus. Strong programming skills in Python are essential.

Practical information:

1. Location: laboratoire CIAD, Montbéliard, France.

2. This internship is remunerated.

Application:

Send a curriculum vitae, referees coordinates, and grades for the two last years before the 15th of December, 2024 to:

Ibrahim.kajo@utbm.fr

yassine.ruichek@utbm.fr

References:


[1] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10684-10695).

[2] Kajo, I., Kas, M., Ruichek, Y., & Kamel, N. (2023). Tensor based completion meets adversarial learning: A win–win solution for change detection on unseen videos. Computer Vision and Image Understanding, 226, 103584.

[3] ComfyUI. Comfyui: The most powerful and modular diffusion model gui, api and backend with a graph/nodes interface. https://github.com/comfyanonymous/ComfyUI, 2023.