Canada’s Principles for Responsible, Trustworthy and Privacy-Protective Generative AI Technologies and How to Comply Using Private AI

Share This Post

The realm of generative AI, encompassing technologies that generate content like text, images, and videos, have seen a significant surge in usage and development. In response, the Office of the Privacy Commissioner of Canada (OPC) has introduced key privacy principles tailored to generative AI technologies on December 7, 2023. This framework is crucial for organizations navigating the complex intersection of AI innovation and data privacy. Here, we explore the significance of these principles for generative AI development and use and the pivotal role of Private AI’s technology in ensuring compliance with a broad subset of these principles.

1. Legal Authority and Consent

The Principle: Legal authority for collecting, using, or disclosing personal data for use in generative AI systems, whether that is training, development, deployment, operation, or decommissioning of a generative AI system, is as crucial as it is for any other use case. The OPC emphasises that this principle also applies when data containing personal information is obtained from third parties or when an inference is drawn from personal information, as drawing inferences is considered collection for which legal authority is required. Oftentimes, the legal authority required will be consent. It may not be easy to obtain valid consent, the first hurdle being that training data scraped from the internet may contain an unmanageable amount of personal data.

Private AI’s Role: While Private AI’s technology does not directly handle consent, it can aid organizations in minimizing the scope of personal data used, reducing the breadth of consent required. Private AI can detect, redact, or replace personal information with synthetic data. It does so in 52 languages, various different file types, and with unparalleled accuracy.

2. Limiting Collection, Use, and Disclosure to what is Necessary

The Principle: Data collection, use and disclosure must be restricted to what is necessary for the AI model’s training and operation. Unnecessary data collection can lead to privacy risks and regulatory non-compliance. The OPC proposes the use of anonymized, synthetic, or de-identified data rather than personal information where the latter is not required to fulfill the identified appropriate purpose(s). The OPC further reminds developers of AI systems that personal information available on the internet is not outside of the purview of applicable privacy laws.

Private AI’s Advantage: Private AI’s technology aids in ensuring that only essential data is utilized for AI training and operations, helping to render personal information anonymized or de-identified, or by creating synthetic data in place of personal information. For the use of AI systems, organizations are well advised to use Private AI’s PrivateGPT, which ensures that user prompts are sanitized – i.e., personal information is filtered out from the prompts – before it is sent to the AI system. Depending on the use case, the personal information to be excluded from the prompt can be selected on a very granular level to ensure the prompt’s utility. Before the system’s answer is sent back to the user, the personal information is automatically inserted into the output, without ever being disclosed to the model. 

3. Openness

The Principle: The openness principle is very broad and asks for transparency by developers, providers, and organizations using AI systems regarding collection, use, and disclosure of personal information, associated risks and their mitigation, the training data set(s), whether an AI system will be used as part of a decision-making process, and more. 

Private AI’s Role: Private AI can help with one aspect of compliance with the openness principle: To be open about the use of personal information firstly requires knowing what personal information is being used, if using it is indeed necessary for the use case. Given the enormous amount of data AI models are usually trained on, this is not an easy ask. Private AI’s algorithm can detect personal information at scale and make otherwise overwhelmingly large data sets reportable. 

4. Accountability

The Principle: Accountability is a big topic that includes, among other aspects, an internal governance structure and clearly defined roles and responsibilities. It also importantly requires explainability of the model’s output. The OPC advises that one aspect of achieving this is conducting Privacy Impact Assessments (PIA) and/or Algorithmic Impact Assessments (AIA) to identify and mitigate against privacy and other fundamental rights impacts.

Private AI’s Role: For PIAs and AIAs it is again crucial to know what the model has been trained or fine-tuned on, as these models are all about data, and lots of it. Private AI can help with this onerous task. 

5. Access and Correction

The Principle: The principle of individual access necessitates that users can access their personal data used by generative AI systems. They also have a right to ask for their information to be corrected, especially if the information is included in the model’s output. Both of these requirements pose a particular challenge in the context of generative AI. These models do not in any straightforward way store the source they are trained on, so that it could be retrieved like from a folder. However, the encoded training data may be spewed out in production. Removing incorrect information from AI systems can therefore mean that the model has to be retrained, which is expensive and time consuming. 

Private AI’s Role:  Private AI’s technology can rapidly identify and categorize personal data within AI training data, whether this is pre-training data or stored user prompts, aiding organizations in efficiently responding to user access or correction requests. There are limits to this, though. The technology can only help identify what went into the model – how to get it out or corrected remains a challenge, which is another great reason for ensuring that as little personal information is contained in training data as possible. 

6. Data Safeguards in Generative AI Operations

The Principle: Implementing safeguards for personal information is essential in generative AI, particularly given the vast amount of data these systems can process and store and the pervasive proliferation and impact these tools are expected to have on society. 

Private AI’s Contribution: Aside from the previously discussed help with anonymization and pseudonymization, Private AI’s tools can also aid in mitigating bias. When sensitive identifiers such as gender, race, origin, and religion are removed from the model’s training data, the generated output of the model is necessarily more neutral and less likely to be biased or discriminatory. 

Conclusion: Private AI as an Ally in Generative AI Privacy Compliance

In the rapidly evolving field of generative AI, adhering to Canadian privacy principles is a complex but critical endeavor. Technologies like Private AI’s detection and redaction products play a crucial role in this landscape, offering tools for anonymization, pseudonymization, and synthetic data generation that can help protect privacy and reduce bias in model outputs. While challenges like retraining AI models and preventing data extraction persist, leveraging Private AI’s solutions is a substantial step towards responsible, trustworthy, and privacy-compliant AI development and usage in Canada. Try it on your own data using our web demo or sign up to get a free API key.

Subscribe To Our Newsletter

Sign up for Private AI’s mailing list to stay up to date with more fresh content, upcoming events, company news, and more! 

More To Explore

Blog

End-to-end Privacy Management

End-to-end privacy management refers to the process of protecting sensitive data throughout its entire lifecycle, from the moment it is collected to the point where

Read More »

Download the Free Report

Request an API Key

Fill out the form below and we’ll send you a free API key for 500 calls (approx. 50k words). No commitment, no credit card required!

Language Packs

Expand the categories below to see which languages are included within each language pack.
Note: English capabilities are automatically included within the Enterprise pricing tier. 

French
Spanish
Portuguese

Arabic
Hebrew
Persian (Farsi)
Swahili

French
German
Italian
Portuguese
Russian
Spanish
Ukrainian
Belarusian
Bulgarian
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
Greek
Hungarian
Icelandic
Latvian
Lithuanian
Luxembourgish
Polish
Romanian
Slovak
Slovenian
Swedish
Turkish

Hindi
Korean
Tagalog
Bengali
Burmese
Indonesian
Khmer
Japanese
Malay
Moldovan
Norwegian (Bokmål)
Punjabi
Tamil
Thai
Vietnamese
Mandarin (simplified)

Arabic
Belarusian
Bengali
Bulgarian
Burmese
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
French
German
Greek
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Italian
Japanese
Khmer
Korean
Latvian
Lithuanian
Luxembourgish
Malay
Mandarin (simplified)
Moldovan
Norwegian (Bokmål)
Persian (Farsi)
Polish
Portuguese
Punjabi
Romanian
Russian
Slovak
Slovenian
Spanish
Swahili
Swedish
Tagalog
Tamil
Thai
Turkish
Ukrainian
Vietnamese

Rappel

Testé sur un ensemble de données composé de données conversationnelles désordonnées contenant des informations de santé sensibles. Téléchargez notre livre blanc pour plus de détails, ainsi que nos performances en termes d’exactitude et de score F1, ou contactez-nous pour obtenir une copie du code d’évaluation.

99.5%+ Accuracy

Number quoted is the number of PII words missed as a fraction of total number of words. Computed on a 268 thousand word internal test dataset, comprising data from over 50 different sources, including web scrapes, emails and ASR transcripts.

Please contact us for a copy of the code used to compute these metrics, try it yourself here, or download our whitepaper.