How to Comply with EU AI Act using PrivateGPT

Share This Post

The recently amended EU AI Act proposal we introduced in this blog post, would regulate “foundational models,” defined in Art. 3(1c) as “an AI model that is trained on broad data at scale, is designed for generality of output, and can be adapted to a wide range of distinctive tasks.” This blog post sets out what compliance obligations will be imposed on OpenAI’s ChatGPT and other foundational models upon the coming into force of the proposed act, the most recent amendments to which were recently endorsed by Parliament.

The obligations imposed on a provider of a foundation model include additional transparency criteria, such as disclosing that the content is generated by artificial intelligence, ensuring the model is designed to prevent the generation of unlawful content, and providing summaries of copyrighted data utilized for training purposes. See how current foundation model providers do with their compliance here. Spoiler alert, they’re not doing great. We’ll dive into each of the requirements under the Act in detail and follow with what PrivateGPT can do to facilitate compliance with these new obligations.

Obligations of the Provider of a Foundational Model

Classifying ChatGPT as its own type of AI system by introducing the new category of “foundation model” and the subcategory of “generative AI” is apparently a compromise that intends to support innovation. If ChatGPT were instead classified as a “high-risk” AI system under the EU AI Act, more stringent obligations would apply than the ones that follow: 

1. Registration and Transparency: Like high-risk AI models, foundational models must be registered in an EU database established and managed by the Commission, including the following information:

a) Name, address and contact details of the provider;

b) Where submission of information is carried out by another person on behalf of the provider, the name, address and contact details of that person;

c) Name, address and contact details of the authorised representative, where applicable;

d) Trade name and any additional unambiguous reference allowing the identification of the foundation model

e) Description of the data sources used in the development of the foundational model

f) Description of the capabilities and limitations of the foundation model, including the reasonably foreseeable risks and the measures that have been taken to mitigate them as well as remaining non-mitigated risks with an explanation on the reason why they cannot be mitigated

g) Description of the training resources used by the foundation model including computing power required, training time, and other relevant information related to the size and power of the model

h) Description of the model’s performance, including on public benchmarks or state of the art industry benchmarks

i) Description of the results of relevant internal and external testing and optimisation of the model 

j) Member States in which the foundation model is or has been placed on the market, put into service or made available in the Union;

k) URL for additional information (optional).

2. Demonstrable risk mitigation: Providers of foundational models are required to ensure that risk of identification, to health, safety, fundamental rights, the environment, democracy, and the rule of law is demonstrably reduced and mitigated through design, testing, and analysis, and remaining non-mitigable risks are documented. (Art. 28b(2)(a)).

3. Suitable data sources: Data sources must be examined for their suitability, possible biases, and appropriate mitigation. (Art. 28b(2)(b)).

4. Long-term view to design and development: Design and development of foundational models must achieve appropriate levels of performance, predictability, interpretability, corrigibility, safety, and cybersecurity throughout its lifecycle. Relevant documentation must be kept for 10 years. (Art. 28b(2)(c)).

5. Environmental sustainability: The foundational model must be designed and developed making use of applicable standards to reduce energy use, resource use and waste, as well as to increase energy efficiency, and the overall efficiency of the system. (Art. 28b(2)(d)).

6. Instructions to downstream providers: Providers of foundational models must enable downstream providers to comply with their respective obligations under the Act by drawing up extensive technical documentation and intelligible instructions. (Art. 28b(2)(e)).

7. Quality management system: Providers of foundation models must also implement a quality management system to ensure and document their compliance. The Act does, however, explicitly allow for “the possibility to experiment in fulfilling this requirement.”

Additional Obligations for Generative AI Models

Foundation models used in AI systems specifically intended to generate, with varying levels of autonomy, content such as complex text, images, audio, or video are called “generative AI.” Providers of these kinds of AI systems have to meet the same obligations set out above, and in addition, the following:

1. Additional Transparency

a) Notify individuals that they are interacting with an AI system, unless it is obvious, or the AI system is authorized to be used to detect, prevent, investigate and prosecute criminal offences, unless those systems are available for the public to report a criminal offence.

b) If the foundation model is used as an emotion recognition or a biometric categorization system, the person exposed to it must be informed of the operation of the system, with the same exception noted under a.

c) Users of an AI system that generates or manipulates image, audio or video content that appreciably resembles existing persons, objects, places or other entities or events and would falsely appear to a person to be authentic or truthful (‘deep fake’), shall disclose that the content has been artificially generated or manipulated, with certain exceptions for law enforcement purposes and fundamental rights protections.

2. Safeguards against rights violations and breaches of Union law: Adequate, state-of-the-art safeguards must be implemented to ensure that the generated content does not breach Union law or fundamental rights, incl. freedom of expression.

3. Copyright law compliance: Providers of generative AI systems must make publicly available a sufficiently detailed summary of the use of training data protected under copyright law without such publication itself violating applicable copyrights.

Consequences of Non-Compliance

Violation by a foundation model of EU AI Act provisions discussed above would cost € 10 million or for businesses 2 % of total worldwide annual turnover, whichever is higher (Art. 71(4)). Orders and warnings are also measures that can be imposed either instead of or in addition to monetary fines (Art. 71(6)).

How Can Private AI Help with EU AI Act Compliance When Using ChatGPT?

First, Private AI’s PrivateGPT is uniquely equipped to help achieve compliance with the “demonstrable risk mitigation” requirement, in particular, the requirement to mitigate the risk of identification. The EU AI Act doesn’t say anything more about this requirement. It stands to reason that the risk of identification is a subtle reference to the GDPR, with which the foundation model of course also has to comply. For example, this requirement demands that the inclusion of personal information in the data set is closely monitored and ideally avoided.

PrivateGPT filters out more than 50 entity types including personally identifiable information, protected health information, and Payment Card Industry data before such identifiers are sent to ChatGPT. Here is how:

Had ChatGPT been trained on data that was rendered safe by Private AI, there would be no risk of re-identification and the “Suitable data source” requirement could have more easily been met as well. PrivateGPT can be used to render any training data set safe if employed early on in the development and training process.

View the Huggingface guide for privacy-preserving sentiment analysis 

 Furthermore, when deployed for the subsequent use of an LLM, PrivateGPT can make sure no personal data of users is fed to the model which may or may not be used to train the model. In any case, the privacy filter will prevent the disclosure of personal data and hence ensure that the identification of the user by third parties with access to the chat history will not be possible. 

Second, by removing personal identifiers from the input before it is disclosed to ChatGPT, PrivateGPT can help mitigate the risk of biased output, as we have demonstrated in this blog post. This ability can help address the requirement under Art. 28(b)(2)(b), namely, to examine data sources for their suitability, possible biases, and appropriate mitigation. Again, a data source can’t be rendered unbiased by PrivateGPT once the bias is in there, but it can certainly mitigate the risk of having biased outputs as a result of the data source containing biased content, for example by removing any indicators regarding race, ethnicity, gender, etc.

Conclusion

In conclusion, the introduction of the EU AI Act and its recent amendments highlights the need for increased transparency, risk mitigation, environmental consciousness and more in the deployment of foundation models like ChatGPT. PrivateGPT by Private AI emerges as a valuable solution to address some of these requirements. PrivateGPT can contribute to a more privacy-conscious and ethically sound AI ecosystem. By leveraging PrivateGPT’s capabilities, compliance with the EU AI Act can be facilitated, fostering responsible AI development and improved protection of individuals’ privacy.  

Get started with PrivateGPT today:

Subscribe To Our Newsletter

Sign up for Private AI’s mailing list to stay up to date with more fresh content, upcoming events, company news, and more! 

More To Explore

Blog

End-to-end Privacy Management

End-to-end privacy management refers to the process of protecting sensitive data throughout its entire lifecycle, from the moment it is collected to the point where

Read More »

Download the Free Report

Request an API Key

Fill out the form below and we’ll send you a free API key for 500 calls (approx. 50k words). No commitment, no credit card required!

Language Packs

Expand the categories below to see which languages are included within each language pack.
Note: English capabilities are automatically included within the Enterprise pricing tier. 

French
Spanish
Portuguese

Arabic
Hebrew
Persian (Farsi)
Swahili

French
German
Italian
Portuguese
Russian
Spanish
Ukrainian
Belarusian
Bulgarian
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
Greek
Hungarian
Icelandic
Latvian
Lithuanian
Luxembourgish
Polish
Romanian
Slovak
Slovenian
Swedish
Turkish

Hindi
Korean
Tagalog
Bengali
Burmese
Indonesian
Khmer
Japanese
Malay
Moldovan
Norwegian (Bokmål)
Punjabi
Tamil
Thai
Vietnamese
Mandarin (simplified)

Arabic
Belarusian
Bengali
Bulgarian
Burmese
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
French
German
Greek
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Italian
Japanese
Khmer
Korean
Latvian
Lithuanian
Luxembourgish
Malay
Mandarin (simplified)
Moldovan
Norwegian (Bokmål)
Persian (Farsi)
Polish
Portuguese
Punjabi
Romanian
Russian
Slovak
Slovenian
Spanish
Swahili
Swedish
Tagalog
Tamil
Thai
Turkish
Ukrainian
Vietnamese

Rappel

Testé sur un ensemble de données composé de données conversationnelles désordonnées contenant des informations de santé sensibles. Téléchargez notre livre blanc pour plus de détails, ainsi que nos performances en termes d’exactitude et de score F1, ou contactez-nous pour obtenir une copie du code d’évaluation.

99.5%+ Accuracy

Number quoted is the number of PII words missed as a fraction of total number of words. Computed on a 268 thousand word internal test dataset, comprising data from over 50 different sources, including web scrapes, emails and ASR transcripts.

Please contact us for a copy of the code used to compute these metrics, try it yourself here, or download our whitepaper.