La tua ricerca

    17.04.2025

    The European Commission’s Template on Training Data Transparency: First Guidelines for the AI Act


    Following the adoption of the AI Act (Reg. EU 2024/1689) on August 1, 2024, one of the main issues of debate among stakeholders has been the obligation set forth in Article 53.1, letter (d), and Recital 107, particularly regarding transparency over training data used in general-purpose AI models.

    The regulation requires providers of such models to make publicly available a sufficiently detailed summary of the data used for training – that is, the informational corpus employed to tune and optimize the model’s parameters. From the outset, the expression “sufficiently detailed” has sparked intense debate: what does “sufficient” mean, exactly? And more importantly, what criteria should guide providers in drafting this summary?

    It is precisely on the elasticity or rigidity of the interpretation of what constitutes “sufficiently detailed” that a decisive legal battle will unfold between content owners and AI platforms. On one side, rights holders demand meaningful and verifiable access to information about the data used, as a prerequisite for enforcing their rights. On the other, providers will likely advocate for a more flexible approach that protects their strategic assets and avoids disclosing too much, also for competitive reasons. The boundary between genuine transparency and mere formal compliance will be fine, and it will inevitably be drawn by the first court rulings.

    The rationale behind the obligation is clear: to enable holders of legitimate interests to more effectively exercise their rights. The most immediate reference is, of course, to copyright holders, who may use the disclosed information to verify whether and how their content was used without authorization.

    But the scope of protected interests goes well beyond copyright. Also at stake are personal data protection, the right to scientific research, and the increasingly urgent need to detect and mitigate bias – with implications across a wide range of contexts, from service platforms to public decision-making systems, to commercial AI products.

    Recital 107, in laying out the modalities of compliance, also underscores the need to strike a balance: on one hand, the interest of stakeholders in knowing what data was used; on the other, the legitimate concern of providers to avoid disclosing strategic assets such as trade secrets, algorithms, or data collection and processing methods.

    To offer initial practical guidance, the European Commission published in January 2025 a template designed to assist providers in preparing the required summary. The model was developed through a broad consultation process involving both AI sector representatives and rights holders already engaged in drafting the Code of Practice on General-Purpose AI (CPAI).

    The template guides providers through all stages of the data lifecycle – from pre-training to fine-tuning – and requires clear and comprehensible language, designed to be accessible even to those without advanced technical knowledge.

    It is structured into three sections:

    1. General Information
      This section collects general details about the model: who developed it, when it was released, and what the knowledge cut-off date is (i.e., the date of the last content update). It also requires information on the overall size and characteristics of the data (number of images, minutes of audio, languages, and geographic origin).
    2. List of Data Sources
      Here, providers must list the sources of data used: public datasets, third-party datasets, data collected via web crawling (with an indication of the tools used), user-submitted data, or data self-generated by the provider.
      A controversial aspect is that the template focuses only on “major” or “large” datasets – defined as those representing more than 5% of the total. This could distort the picture, as:
      • some providers might artificially split large datasets into smaller subsets to avoid disclosure;
      • visual datasets (images/videos), due to their nature, are larger than textual ones, potentially leading to unjustified technical discrimination.
    3. Relevant Data Processing Aspects
      This section requires a description of the measures taken to protect copyright, such as the identification and removal of protected content, as well as the handling of inappropriate materials.
      However, some criticisms have emerged: the section appears overly focused on copyright protection, while overlooking crucial aspects such as pre-processing steps – particularly methods of anonymization or data filtering.

    The final publication of the template and accompanying guidelines is expected in the second quarter of 2025, ahead of the full entry into force of the obligations, scheduled for August 2, 2025.

    What is certain is that this regulation, and its practical implementation, will have a significant impact on the choices of AI providers worldwide. Some countries may choose to align with the European model, thereby creating an international standard. Others, conversely, may opt for more flexible regulations to attract research, investment, and development to their own jurisdictions.

    However, the real test will come with the first legal disputes, which will give concrete form to the principles currently set forth in the regulation. Those rulings will shape the future direction of European AI regulation.

    Energy Law Italy Outlook | November 2025
    The third issue of Energy Law Italy Outlook, the #newsletter published by ADVANT…
    Approfondisci
    Fil Rouge : Cross-disciplinary approach, from generalist lawyer to specialist lawyer
    In this FIl Rouge, Gildas Robert and Philippe Goossens reveal the importance of …
    Approfondisci
    ADVANT Beiten Advises LUEHR FILTER on Sale to MARTIN Group
    Frankfurt, 4 December 2025 – The international commercial law firm ADVANT Beiten…
    Approfondisci
    Legal Alert: Corona-Überbrückungshilfen auf dem Prüfstand
    EU-beihilferechtliche Zeitbombe: Gerichte stellen Überbrückungshilfen grundsätzl…
    Approfondisci
    Legge 106: le nuove regole per lavoratori fragili e genitori di figli con invalidità
    Dal 9 agosto 2025 è entrata in vigore la legge 106/2025, destinata a rafforzare …
    Approfondisci
    Klimaziele im Unternehmen: Zwischen Selbstverpflichtung, Haftungsrisiken und regulatorischem Druck
    Klimaschutz ist längst nicht mehr nur Sache der Politik – Unternehmen stehen zun…
    Approfondisci
    Klimaziele von Unternehmen: freiwillig, aber verbindlich?
    Angesichts der unlängst beendeten Weltklimakonferenz (COP30) in Belém standen ak…
    Approfondisci
    BGH-Urteil 2025: D&O-Versicherung und Ausschluss bei wissentlicher Pflichtverletzung
    Das schon lang erwartete BGH-Urteil vom 19. November 2025 (Az. IV ZR 66/25) konk…
    Approfondisci
    Nachruf Dr. Gerhard Beiten
    Am 21. November 2025 verstarb im Alter von 86 Jahren Dr. Gerhard Beiten Rechtsan…
    Approfondisci