Following the adoption of the AI Act (Reg. EU 2024/1689) on August 1, 2024, one of the main issues of debate among stakeholders has been the obligation set forth in Article 53.1, letter (d), and Recital 107, particularly regarding transparency over training data used in general-purpose AI models.
The regulation requires providers of such models to make publicly available a sufficiently detailed summary of the data used for training – that is, the informational corpus employed to tune and optimize the model’s parameters. From the outset, the expression “sufficiently detailed” has sparked intense debate: what does “sufficient” mean, exactly? And more importantly, what criteria should guide providers in drafting this summary?
It is precisely on the elasticity or rigidity of the interpretation of what constitutes “sufficiently detailed” that a decisive legal battle will unfold between content owners and AI platforms. On one side, rights holders demand meaningful and verifiable access to information about the data used, as a prerequisite for enforcing their rights. On the other, providers will likely advocate for a more flexible approach that protects their strategic assets and avoids disclosing too much, also for competitive reasons. The boundary between genuine transparency and mere formal compliance will be fine, and it will inevitably be drawn by the first court rulings.
The rationale behind the obligation is clear: to enable holders of legitimate interests to more effectively exercise their rights. The most immediate reference is, of course, to copyright holders, who may use the disclosed information to verify whether and how their content was used without authorization.
But the scope of protected interests goes well beyond copyright. Also at stake are personal data protection, the right to scientific research, and the increasingly urgent need to detect and mitigate bias – with implications across a wide range of contexts, from service platforms to public decision-making systems, to commercial AI products.
Recital 107, in laying out the modalities of compliance, also underscores the need to strike a balance: on one hand, the interest of stakeholders in knowing what data was used; on the other, the legitimate concern of providers to avoid disclosing strategic assets such as trade secrets, algorithms, or data collection and processing methods.
To offer initial practical guidance, the European Commission published in January 2025 a template designed to assist providers in preparing the required summary. The model was developed through a broad consultation process involving both AI sector representatives and rights holders already engaged in drafting the Code of Practice on General-Purpose AI (CPAI).
The template guides providers through all stages of the data lifecycle – from pre-training to fine-tuning – and requires clear and comprehensible language, designed to be accessible even to those without advanced technical knowledge.
It is structured into three sections:
The final publication of the template and accompanying guidelines is expected in the second quarter of 2025, ahead of the full entry into force of the obligations, scheduled for August 2, 2025.
What is certain is that this regulation, and its practical implementation, will have a significant impact on the choices of AI providers worldwide. Some countries may choose to align with the European model, thereby creating an international standard. Others, conversely, may opt for more flexible regulations to attract research, investment, and development to their own jurisdictions.
However, the real test will come with the first legal disputes, which will give concrete form to the principles currently set forth in the regulation. Those rulings will shape the future direction of European AI regulation.