Réunion


Détection de contenus générés

Date : 27 Avril 2026
Horaire : 09h00 - 17h00
Lieu : Centre Inria de Paris Amphithéâtre Jacques-Louis LIONS Bâtiment A 48 Rue Barrault 75013 Paris

Axes scientifiques :
  • Codage et sécurité multimedia

GdRs impliqués :
Organisateurs :

Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.

Inscriptions

31 personnes membres du GdR IASIS, et 57 personnes non membres du GdR, sont inscrits à cette réunion.

Capacité de la salle : 140 personnes. 52 Places restantes

Annonce

Journée commune entre le GDR IASIS, le GDR Sécurité informatique et le PEPR Cybersécurité (projet COMPROMIS)

La cellule gestion du GdR sera fermée du 13 au 20 avril. Par suite, il est fortement recommandé de formuler les demandes de prise en charge de missions par le GdR IASIS avant le 6 avril.

Dans un contexte dans lequel la génération de contenus par intelligence artificielle devient de plus en plus simple et accessible, la capacité à différencier les contenus générés de ceux qui ne le sont pas est primordiale. Le groupe de travail (GT) “Détection de Contenus Générés (DCG)” organisant cet événement se concentre sur l’analyse et la détection de contenus synthétisés par des modèles d’apprentissage profond tels que les modèles de langage (LLM) pour le texte, les modèles de diffusion pour les images, et bien d’autres applications.

La question de la détection des contenus générés va au-delà de la simple différenciation ; elle se situe également au cœur de la lutte contre les falsifications numériques. En effet, une partie seulement d’un contenu peut être générée, ce qui complique davantage l’identification des manipulations. Ces approches basées sur l’intelligence artificielle prennent progressivement le relais des méthodes traditionnelles de manipulation d’images et de vidéos (Photoshop, Gimp, etc.), et sont désormais intégrées directement dans ces outils (comme Adobe Firefly dans Photoshop). Cette évolution soulève des enjeux importants en termes de fiabilité, de sécurité, et d’éthique des contenus numériques.

L’événement se concentre sur plusieurs axes clés autour de la détection et de la manipulation de contenus par IA :

  • Explicabilité des détecteurs et classificateurs d’IA : Comprendre et rendre transparent le fonctionnement des systèmes de détection afin d’améliorer leur fiabilité et leur interprétabilité.
  • Détection de contenus générés par IA : Textes, audios, vidéos et autres formats.
  • Localisation des manipulations par IA : Identifier où et comment l’intelligence artificielle a été utilisée pour modifier ou créer un contenu.
  • Watermarking de l’IA : Techniques visant à intégrer des filigranes numériques permettant de tracer l’origine des contenus générés.

Orateurs invités

Organisateurs :

  • Jan Butora (CRIStAL)
  • Eva Giboulot (INRIA Rennes)
  • Vincent Itier (IMT Nord Europe)
  • Pauline Puteaux (LIRMM)

Programme

09h00 : Ouverture de la journée

09h10 - 10h10 : Orateur invité Luca Cuccovillo – “Breaking the neural encoding illusion in synthetic speech detection”, Fraunhofer IDMT

10h10 - 10h35 : Pause café

10h35 - 11h00 : Gaspard Defreville, Christian Launay, Gohar Dashyan – “Assessing and improving operational detection of artificial content and measure their dissemination on digital platforms”, PEReN

11h00 - 11h25 : Syamantak Sarkar – “Latent Trajectory Analysis for Detection of AI-Generated Videos”, Normandie Univ, UNICAEN, ENSICAEN, CNRS, GREYC

11h25 - 11h50 : Abderrezzaq Sendjasni, Chaker Larabi – “When Statistics, Semantics, and Texture Align: A Multi-Feature Fusion Approach for GenAI Images Detection”, CNRS, Univ. Poitiers, XLIM, UMR 7252

11h50 - 12h15 : Gautier Evennou – “The Forensic Cost of Watermark Removal”, Inria, IRISA, Univ. Rennes, ARTISHAU, UMR 6074

----- Pause déjeuner -----

14h15 - 15h15 : Oratrice invitée Cecilia Pasquini – “Learning metadata and signals in media forensics: happy marriage or toxic relationship?”, Center for Cybersecurity, Fondazione Bruno Kessler

15h15 - 15h40 : Pause café

15h40 - 16h05 : Anh Kiet Duong and Petra Gomez-Krämer – “Scalable Framework for Classifying AI-Generated Content Across Modalities”, L3i, La Rochelle Université

16h05 - 16h30 : Pol Labarbarie – “AI-generated image detection for anonymized face security analysis”, LIRMM, Univ. Montpellier, CNRS

16h30 - 16h55 : Elliot Cole – “Geometry-Aware Watermarking of Large Language Models via RoPE”, Télécom SudParis, Institut Polytechnique de Paris

16h55 : Conclusion de la journée

Résumés des contributions

Breaking the neural encoding illusion in synthetic speech detection

Orateur invité : Luca Cuccovillo (Fraunhofer IDMT)

Synthetic speech detection has been identified by Europol and Interpol as a critical challenge in combating organized crime and digital media manipulation. While data-driven approaches have achieved impressive benchmark results, their reliability is increasingly in question.

In this talk, we show that dominant detection methods—from SincNet-based to self-supervised learning (SSL)-based approaches—share a common flaw: rather than detecting genuine acoustic anomalies, they respond to neural encoding artifacts introduced by the vocoding stage. This "neural encoding illusion" exposes existing systems to a fundamental fragility: As neural codecs become ubiquitous in natural speech transmission, these detectors are going to become suddenly obsolete. We present experimental evidence showing state-of-the-art detectors fail significantly on neurally encoded natural speech, and outline actionable directions including improved datasets, standardized benchmarking, and explainability tools.

As a constructive alternative, we are going to advocate for hypothesis-driven detection paradigms that are explainable by design. Special attention will be devoted to formant-analysis-based methods, which ground the detection decision in the acoustic physiology of human speech production, making them inherently more interpretable, more legally admissible, and more resilient to shortcut learning than their data-driven counterparts.


Assessing and improving operational detection of artificial content and measure their dissemination on digital platforms

Gaspard Defreville, Christian Launay, Gohar Dashyan (Pôle d'Expertise de la Régulation Numérique (PEReN))

PEReN is an interdepartmental governmental office with national competence placed under the joint authority of the French Ministers of Economy, Culture and Digital Technology. We support public authorities in the regulation of digital platforms and AI. PEReN has started to work on AI-generated image detection in 2024 in preparation for the French AI summit and has conducted several projects on the topic since then. In this presentation we will provide an overview of our work and outline what we think are current operational priorities in this area. We will then take a closer look to a recent project: an internal initiative where we attempted a real-world evaluation of state-of-the-art generated content detectors on YouTube and TikTok.


Latent Trajectory Analysis for Detection of AI-Generated Videos

Syamantak Sarkar (Normandie Univ, UNICAEN, ENSICAEN, CNRS, GREYC)

The rapid progress of generative models has made the creation of realistic synthetic videos increasingly accessible, raising critical challenges for reliable content authentication. In this work, we propose a novel approach for detecting AI-generated videos based on the analysis of latent trajectories extracted from vision foundation models. We represent each video as a temporal trajectory in a high-dimensional embedding space using DINOv2 features. By interpreting this trajectory as a signal, we extract a set of complementary descriptors capturing temporal, spectral, and structural properties of motion. These include spectral features (entropy, flatness, dominant energy), high-frequency energy ratios, curvature-based motion irregularity, patch-level consistency, and participation ratio for manifold complexity. Our experiments show that these features improve detection performance compared to geometric baselines and generalize well to unseen generative models. Furthermore, we demonstrate that different generators exhibit distinct spectral and structural signatures, enabling generator classification and providing insights into their temporal behavior. Through feature importance analysis, ablation studies, and visualization techniques such as t-SNE and spectral profiling, we highlight the interpretability of the proposed approach and its ability to capture fundamental temporal artifacts introduced by generative models. Overall, this work suggests that latent trajectory analysis provides a robust and explainable framework for AI-generated video detection.


When Statistics, Semantics, and Texture Align: A Multi-Feature Fusion Approach for GenAI Images Detection

Abderrezzaq Sendjasni, Chaker Larabi (CNRS, Univ. Poitiers, XLIM, UMR 7252)

The rapid evolution of Generative AI has produced highly realistic synthetic images, challenging traditional detection methods. Existing detectors often rely on single-feature spaces (e.g., statistical regularities, semantic embeddings, or texture patterns) and lack robustness across diverse generative models. We investigate a multi-feature fusion framework combining three complementary cues: MSCN features (low-level statistics), CLIP embeddings (semantic coherence), and multi-scale LBP (mid-level texture anomalies). Extensive experiments on four benchmarks show that individual features vary significantly across generators, while their fusion yields superior and consistent performance, especially in mixed-model scenarios. Compared to state-of-the-art methods, our framework consistently improves detection across all datasets, highlighting the importance of hybrid representations for robust GenAI image detection.


The Forensic Cost of Watermark Removal

Gautier Evennou (Inria, IRISA, Univ. Rennes, ARTISHAU, UMR 6074)

Current watermark removal methods are evaluated on two axes: attack success rate and perceptual quality. We show this is insufficient. While state-of-the-art attacks successfully degrade the watermark signal without visible distortion, they leave distinct statistical artifacts that betray the removal attempt. We name this overlooked axis Watermark Removal Detection (WRD) and demonstrate that a modern classifier trained on these artifacts achieves state-of-the-art detection rates with 0.92 TPR at 10−3 FPR across every removal method tested. We introduce the compound success rate to better report attacks that both erase the watermark and avoid forensic detection, revealing that the best practical threat achieves only 10%. We provide extensive evaluation accross text-to-image and editing models spanning 2022–2026, finding that GenAI faces a fundamental dilemma: either they fail to remove the watermark, or they introduce distortions strong enough to be forensically exposed. No current method balances all three criteria simultaneously.


Learning metadata and signals in media forensics: happy marriage or toxic relationship?

Oratrice invitée : Cecilia Pasquini (Center for Cybersecurity, Fondazione Bruno Kessler)

Media forensics techniques mainly focus on signal-level detection, leveraging intrinsic traces to identify data manipulations and origin. However, digital media are embedded within data containers that expose richer metadata, potentially offering complementary forensic evidence in practical scenarios.

This talk examines the evolving role of metadata-based cues in the image and video forensics literature, showing how they can be leveraged as a valuable asset for tasks such as manipulation detection and platform provenance analysis. Particular attention is given to hybrid approaches integrating signal-based cues in supervised learning pipelines, with a critical assessment of their performance benefits and common pitfalls.

The discussion extends to the emerging technological trend of cryptographically verifiable metadata, outlining opportunities and practical open challenges in the deployment of provenance frameworks for reliably characterizing AI-generated and manipulated data.


Scalable Framework for Classifying AI-Generated Content Across Modalities

Anh Kiet Duong and Petra Gomez-Krämer (L3i, La Rochelle Université)

The rapid growth of generative AI technologies has heightened the importance of effectively distinguishing between human and AI-generated content, as well as classifying outputs from diverse generative models. We present a scalable framework that integrates perceptual hashing, similarity measurement, and pseudo-labeling to address these challenges. Our method enables the incorporation of new generative models without retraining, ensuring adaptability and robustness in dynamic scenarios. Comprehensive evaluations on the Defactify4 dataset demonstrate competitive performance in text and image classification tasks, achieving high accuracy across both distinguishing human and AI-generated content and classifying among generative methods. These results highlight the framework’s potential for real-world applications as generative AI continues to evolve.


AI-generated image detection for anonymized face security analysis

Pol Labarbarie (LIRMM, Univ. Montpellier, CNRS)

Face anonymization aims to protect privacy by transforming facial images into realistic yet identity-obscured alternatives, ensuring both visual realism and utility for downstream computer vision tasks. For real-world applications, such as criminal investigations using CCTV footage or testimonial videos, a reversible de-anonymization process is essential to enable authorized re-identification of individuals. In this presentation, we analyze and discuss the security of reversible face anonymization methods through the lens of AI-generated image detection. We consider an adversarial scenario in which an attacker attempts to distinguish between anonymized and original faces. We assess the generated anonymized face diversity across distinct secret keys, a critical factor because a low diversity may facilitate an attacker's classification. Our analysis demonstrates that the diffusion-based method we recently proposed achieves the highest diversity, closely aligning with the natural variability observed in human faces. In contrast, we show that the limited diversity of previous methods, such as GAN-based methods, facilitates adversarial detection. Limited diversity suggests that the anonymization process is deterministic with respect to the secret key, potentially enabling adversaries to infer or reconstruct the original face more easily.


Geometry-Aware Watermarking of Large Language Models via RoPE

Elliot Cole (Télécom SudParis, Institut Polytechnique)

The proposed methodology focuses on a black-box trigger set watermarking technique for Large Language Models. The core innovation involves leveraging Rotary Positional Embeddings (RoPE) to embed a unique geometric signature directly into the model's latent space. This signature allows for robust model identification and authentication via a dedicated detector.




Les commentaires sont clos.