Privacy and Compliance Considerations for ChatGPT Applications

privacy chatgpt

Share This Post

With OpenAI’s and Azure OpenAI’s API offerings, businesses are enabled to develop their own applications on top of the powerful large language models (LLMs) underlying ChatGPT, Whisper, and OpenAI’s other models. Close to a thousand commercial applications already exist that make use of OpenAI’s foundation models, including our own PrivateGPT which acts as the privacy layer for ChatGPT by removing any personally identifiable information from prompts before they get shared with the third party. 

With the incredible boost for innovative capabilities come certain drawbacks as well. If OpenAI gets things wrong in their development of the LLMs, the applications built upon the foundation model will likely suffer from the same flaws. This is an issue for responsible AI development in general, but we will only focus on data privacy compliance in this blog post. 

OpenAI’s Data Protection in General

OpenAI’s Data Processing Addendum (DPA), not their Privacy Policy, governs the data collected from organizations using OpenAI’s API services for businesses in the absence of any other individually negotiated agreements. The DPA says that all parties agree to comply with applicable data privacy and data protection laws, which may include certain listed US privacy laws, including the CCPA, and the GDPR as well as subordinate legislation and regulation that implement those laws.

This language is not particularly comforting if your business is offering services built on ChatGPT outside of these jurisdictions, e.g., in Canada or the UK. OpenAI also states that the DPA cannot be customized on a case-by-case basis, hence it may be advisable to obtain legal counsel to assess whether your or your customers’ data would be appropriately protected under the DPA or whether an individual agreement with OpenAI may be necessary to ensure data protection in accordance with the data privacy laws applicable to your organization.

OpenAI’s SOC2 Type 2 certification and the fact that it has been audited against the 2017 Trust Services Criteria for Security by an independent auditor may provide some assurance as to the company’s data protection practices. However, prudent business practice would advise a review of these audit reports.

Location of Data

OpenAI represents that all customer data is processed and stored in the US. No data centres are located in the EU or elsewhere, and no capability currently exists to self-host.

Disclosure of Customer Data

Subservice providers used by OpenAI are, at the time of writing, Microsoft for providing cloud infrastructure; OpenAI affiliates for services and support; Snowflake for data warehousing; and TaskUS for user support and human annotation of data for service improvement. Microsoft and TaskUS are located ‘worldwide’ and the other two are in the US. In light of the fact that OpenAI says all customer data is processed and stored in the US, it seems that the Microsoft servers hosting OpenAI customer data are located in the US as well. It may be possible, however, to arrange storage in a jurisdiction more appropriate for an organization’s needs as Microsoft Azure’s data centres are located in many different geographies. 

For purposes of European businesses wishing to build an application based on OpenAI’s models, it makes little difference in which US state the data is stored, as no US state has currently received an adequacy decision by the EU Commission which would allow the data to be transferred across borders without further due diligence. In order to be able to still transfer personal data to the US, Art. 46 of the GDPR requires, among some other alternatives, that prescriptive contractual clauses be included in an agreement with OpenAI to ensure the protection of EU citizens’ data. Whether the DPA meets the GDPR’s standards may need to be assessed by legal counsel.

For businesses intending to use ChatGPT as the basis for their own application which will process personal information of Quebec residents (a province in Canada), the data location information provided by OpenAI is insufficient. Starting September 22, 2023, the data protection law of Quebec requires a privacy impact assessment to be conducted before any personal information is disclosed outside of Quebec, which includes a jurisdiction-specific assessment of the adherence to recognized privacy principles. Hence, an inquiry into the location of the data warehouse where the information will be stored is necessary. Similarly, starting September 1, 2023, businesses to which the new Swiss data protection law applies are required to disclose to individuals the country to which their personal information is transferred as well as certain guarantees in place that ensure adequate data protection.

Data Transmission

According to OpenAI’s FAQ as well as the DPA, data in transmission via the API is encrypted. Transport Layer Security (TLS) ensures that the data cannot be altered or viewed by third parties during the transfer.

Data Retention

OpenAI retains data transmitted by API for 30 days at which point it is either deleted or aggregated or stored in a manner that does not identify individuals or customers. 

Data Deletion Requests

OpenAI is committed to responding to reasonable deletion requests by their customers and will delete customer data upon termination of the business relationship. If a deletion request is made by an individual directly to them, OpenAI will contact the business promptly. 

Personal Information Used for Product Improvement

The use of identifiable personal information for product improvement purposes will only take place upon explicit opt-in by the business using OpenAI’s API services. However, in aggregated or de-identified form, the data can be retained for longer than 30 days and be used for the purposes of improving OpenAI’s systems and services. 

HIPAA Compliance

The US Health Insurance Portability and Accountability Act (HIPAA) requires health service providers to enter into so-called “business associate contracts” with entities that perform functions or activities on behalf of, or provide certain services to them that involve access to protected health information (PHI). This contract ensures that PHI is properly safeguarded, that the data is used for permissible purposes only, and that it is only disclosed further as required under the contract or the law.  

OpenAI indicates that they are able to enter into business associate contracts if required. Hence, compliance with HIPAA seems to be achievable, allowing businesses to build applications supporting healthcare services based on ChatGPT.

Conclusion

OpenAI has taken important steps towards compliance with data privacy laws. Several of the measures taken were enhancements in reaction to pressure applied by privacy regulators, for example the temporary ban of ChatGPT in Italy. Such measures are the opt-in approach to using data for product improvement purposes as well as limited data retention. Businesses building applications on top of models such as ChatGPT rely on OpenAI’s proper data handling processes. But blind trust in this regard is misplaced. The necessary due diligence should not be skipped in the race to build the next smashing ChatGPT application. 

The most foolproof measure to protect your customers’ personal information is to not transmit it to OpenAI in the first place. Private AI’s PrivateGPT filters out more than 50 entity types including PHI and Payment Card Industry (PCI) data from your prompt before it is sent to ChatGPT. The generated response is then repopulated with the original personal information before it is displayed to you. This seamless process allows your business to safely use the benefits of OpenAI’s models for your application while maintaining the trust of your customers who may otherwise be skeptical about the proper protection of their personal information. Try PrivateGPT today.

Subscribe To Our Newsletter

Sign up for Private AI’s mailing list to stay up to date with more fresh content, upcoming events, company news, and more! 

More To Explore

Download the Free Report

Request an API Key

Fill out the form below and we’ll send you a free API key for 500 calls (approx. 50k words). No commitment, no credit card required!

Language Packs

Expand the categories below to see which languages are included within each language pack.
Note: English capabilities are automatically included within the Enterprise pricing tier. 

French
Spanish
Portuguese

Arabic
Hebrew
Persian (Farsi)
Swahili

French
German
Italian
Portuguese
Russian
Spanish
Ukrainian
Belarusian
Bulgarian
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
Greek
Hungarian
Icelandic
Latvian
Lithuanian
Luxembourgish
Polish
Romanian
Slovak
Slovenian
Swedish
Turkish

Hindi
Korean
Tagalog
Bengali
Burmese
Indonesian
Khmer
Japanese
Malay
Moldovan
Norwegian (Bokmål)
Punjabi
Tamil
Thai
Vietnamese
Mandarin (simplified)

Arabic
Belarusian
Bengali
Bulgarian
Burmese
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
French
German
Greek
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Italian
Japanese
Khmer
Korean
Latvian
Lithuanian
Luxembourgish
Malay
Mandarin (simplified)
Moldovan
Norwegian (Bokmål)
Persian (Farsi)
Polish
Portuguese
Punjabi
Romanian
Russian
Slovak
Slovenian
Spanish
Swahili
Swedish
Tagalog
Tamil
Thai
Turkish
Ukrainian
Vietnamese

Rappel

Testé sur un ensemble de données composé de données conversationnelles désordonnées contenant des informations de santé sensibles. Téléchargez notre livre blanc pour plus de détails, ainsi que nos performances en termes d’exactitude et de score F1, ou contactez-nous pour obtenir une copie du code d’évaluation.

99.5%+ Accuracy

Number quoted is the number of PII words missed as a fraction of total number of words. Computed on a 268 thousand word internal test dataset, comprising data from over 50 different sources, including web scrapes, emails and ASR transcripts.

Please contact us for a copy of the code used to compute these metrics, try it yourself here, or download our whitepaper.