Corporation Consumer Carrier Home and Enterprise

Language

Dataset Copyright Auditing for Large Models: Fundamentals, Open Problems, and Future Directions

Release Date：2025-10-10 Author：DU Linkang, SU Zhou, YU Xinyi

Abstract: The unprecedented scale of large models, such as large language models (LLMs) and text-to-image diffusion models, has raised critical concerns about the unauthorized use of copyrighted data during model training. These concerns have spurred a growing demand for dataset copyright auditing techniques, which aim to detect and verify potential infringements in the training data of commercial AI systems. This paper presents a survey of existing auditing solutions, categorizing them across key dimensions: data modality, model training stage, data overlap scenarios, and model access levels. We highlight major trends, such as the predominance of black-box auditing methods and the focus on fine-tuning rather than pre-training. Through an in-depth analysis of 12 representative works, we extract four key observations that reveal the limitations of current methods. Furthermore, we identify three open challenges and propose future directions for robust, multimodal, and scalable auditing solutions. Our findings underscore the urgent need to establish standardized benchmarks and develop auditing frameworks that are resilient to low watermark densities and applicable in diverse deployment settings.

Keywords: dataset copyright auditing; large language models; diffusion models; backdoor attacks; membership inference

download： PDF

Related Articles

Antenna Parameter Calibration for Mobile Communication Base Station via Laser Tracker

Real-Time 7-Core SDM Transmission System Using Commercial 400 Gbit/s OTN Transceivers and Network Management System

Key Techniques and Challenges in NeRF-Based Dynamic 3D Reconstruction

Analysis of Feasible Solutions for Railway 5G Network Security Assessment

StegoAgent: A Generative Steganography Framework Based on GUI Agents

From Function Calls to MCPs for Securing AI Agent Systems: Architecture, Challenges and Countermeasures

VOTI: Jailbreaking Vision-Language Models via Visual Obfusca-tion and Task Induction

Poison-Only and Targeted Backdoor Attack Against Visual Object Tracking

Special Topic on Security of Large Models