La tua ricerca

    17.04.2025

    The European Commission’s Template on Training Data Transparency: First Guidelines for the AI Act


    Following the adoption of the AI Act (Reg. EU 2024/1689) on August 1, 2024, one of the main issues of debate among stakeholders has been the obligation set forth in Article 53.1, letter (d), and Recital 107, particularly regarding transparency over training data used in general-purpose AI models.

    The regulation requires providers of such models to make publicly available a sufficiently detailed summary of the data used for training – that is, the informational corpus employed to tune and optimize the model’s parameters. From the outset, the expression “sufficiently detailed” has sparked intense debate: what does “sufficient” mean, exactly? And more importantly, what criteria should guide providers in drafting this summary?

    It is precisely on the elasticity or rigidity of the interpretation of what constitutes “sufficiently detailed” that a decisive legal battle will unfold between content owners and AI platforms. On one side, rights holders demand meaningful and verifiable access to information about the data used, as a prerequisite for enforcing their rights. On the other, providers will likely advocate for a more flexible approach that protects their strategic assets and avoids disclosing too much, also for competitive reasons. The boundary between genuine transparency and mere formal compliance will be fine, and it will inevitably be drawn by the first court rulings.

    The rationale behind the obligation is clear: to enable holders of legitimate interests to more effectively exercise their rights. The most immediate reference is, of course, to copyright holders, who may use the disclosed information to verify whether and how their content was used without authorization.

    But the scope of protected interests goes well beyond copyright. Also at stake are personal data protection, the right to scientific research, and the increasingly urgent need to detect and mitigate bias – with implications across a wide range of contexts, from service platforms to public decision-making systems, to commercial AI products.

    Recital 107, in laying out the modalities of compliance, also underscores the need to strike a balance: on one hand, the interest of stakeholders in knowing what data was used; on the other, the legitimate concern of providers to avoid disclosing strategic assets such as trade secrets, algorithms, or data collection and processing methods.

    To offer initial practical guidance, the European Commission published in January 2025 a template designed to assist providers in preparing the required summary. The model was developed through a broad consultation process involving both AI sector representatives and rights holders already engaged in drafting the Code of Practice on General-Purpose AI (CPAI).

    The template guides providers through all stages of the data lifecycle – from pre-training to fine-tuning – and requires clear and comprehensible language, designed to be accessible even to those without advanced technical knowledge.

    It is structured into three sections:

    1. General Information
      This section collects general details about the model: who developed it, when it was released, and what the knowledge cut-off date is (i.e., the date of the last content update). It also requires information on the overall size and characteristics of the data (number of images, minutes of audio, languages, and geographic origin).
    2. List of Data Sources
      Here, providers must list the sources of data used: public datasets, third-party datasets, data collected via web crawling (with an indication of the tools used), user-submitted data, or data self-generated by the provider.
      A controversial aspect is that the template focuses only on “major” or “large” datasets – defined as those representing more than 5% of the total. This could distort the picture, as:
      • some providers might artificially split large datasets into smaller subsets to avoid disclosure;
      • visual datasets (images/videos), due to their nature, are larger than textual ones, potentially leading to unjustified technical discrimination.
    3. Relevant Data Processing Aspects
      This section requires a description of the measures taken to protect copyright, such as the identification and removal of protected content, as well as the handling of inappropriate materials.
      However, some criticisms have emerged: the section appears overly focused on copyright protection, while overlooking crucial aspects such as pre-processing steps – particularly methods of anonymization or data filtering.

    The final publication of the template and accompanying guidelines is expected in the second quarter of 2025, ahead of the full entry into force of the obligations, scheduled for August 2, 2025.

    What is certain is that this regulation, and its practical implementation, will have a significant impact on the choices of AI providers worldwide. Some countries may choose to align with the European model, thereby creating an international standard. Others, conversely, may opt for more flexible regulations to attract research, investment, and development to their own jurisdictions.

    However, the real test will come with the first legal disputes, which will give concrete form to the principles currently set forth in the regulation. Those rulings will shape the future direction of European AI regulation.

    ALERTE DROIT SOCIAL - Période d’essai : l’employeur peut prévoir une période d’essai s’il n’a pas pu apprécier l’aptitude professionnelle du salarié lors de la précédente relation de travail
    La période d’essai est destinée à évaluer les compétences du salarié (C. trav. Art. L.1221-20). Au cours de cette période, le contrat de travail peut être rompu librement et sans motif (sauf abus). 📢 Dans…
    Approfondisci
    5 Minuten Handelsvertreterrecht für Entscheider: Folge #16 - Kündigt der Handelsvertreter, verliert er seinen Ausgleich. Oder nicht?
    Wenn der Handelsvertreter den Vertrag kündigt, verliert er seinen Ausgleichsansp…
    Approfondisci
    ADVANT Beiten berät CATL als German Legal Counsel bei Börsengang in Hongkong
    Berlin/München, 20. Mai 2025 – Die internationale Wirtschaftskanzlei ADVANT Beit…
    Approfondisci
    Justizstandort-Stärkungsgesetz
    Am 1. April 2025 tritt das Gesetz zur Stärkung des Justizstandortes Deutschland …
    Approfondisci
    Vom Schriftformerfordernis zur Textform bei Gewerberaummietverträgen: (K)eine „Erleichterung“ für Transaktionsparteien?!
    Durch das Inkrafttreten des Vierten Bürokratieentlastungsgesetzes (BEG IV) genüg…
    Approfondisci
    Ersatzanspruch des Errichters eines Gebäudes bei Errichtung auf einem fremden Grundstück und damit einhergehender grundlegender Veränderung des Grundstücks
    Die Bedeutung der Änderung der höchstrichterlichen Rechtsprechung zum Verwendung…
    Approfondisci
    Mitwirkungshandlungen des Auftraggebers im Bauvertrag
    Schon lange hadern die Bauwirtschaft und Teile der Lehre mit einer Besonderheit …
    Approfondisci
    Aggiustare la RIS o rifarla da zero
    Le regole per gli investitori al dettaglio? O cambiano in modo compatibile con l…
    Approfondisci
    Le cripto-attività classificate come quote di organismi di investimento collettivo
    A cura di Lorenzo Macchia per Fondi & Sicav Le cripto-attività, fin dall'inizio…
    Approfondisci