How to Protect Confidential Corporate Information in the ChatGPT Era

Share This Post

Many of the security measures businesses put in place to protect the personal information of their customers and employees will also help safeguard confidential corporate information. For example, access controls and incident protection systems will help avoid this information getting into the hands of internal and external threat agents. However, if that’s all they do, the protection provided to confidential business information lacks behind that of personal information. 

You may be aware, for example, that individuals, subject to certain restrictions, can request OpenAI to remove their personal information from ChatGPT’s output. This right is not a courtesy by OpenAI but flows from the applicable data privacy laws, such as the GDPR. Under Article 17 and 18 of the GDPR, individuals have the right to erase or restrict the processing of their personal data under certain conditions, e.g., if the data has been processed unlawfully. 

A similar right exists for confidential corporate information but with important caveats. While certain mechanisms are available to protect such information from disclosure and misuse, companies do not have the same control over this information, particularly once it has been leaked. 

In this post we go over the applicable protection mechanisms available to safeguard confidential corporate information, highlight the lack of protection they provide that is becoming more and more obvious in several recent data leakage incidents involving ChatGPT, and consider ways to expand the existing legal frameworks to remedy this problematic gap.

What Rights Can Currently Protect Confidential Corporate Information?

Mechanisms such as copyright and patent protection exist to protect works in the public domain from being commercially exploited by someone other than the copyright and patent owner, and unfair competition legislation and case law can protect from violations of trade secrets. The EU Market Abuse Regulation prohibits unlawful disclosure of material non-public information (“MNPI”) that could influence the stock price of a public corporation. Information that does not fall within those categories can be protected by non-disclosure agreements and clear corporate guidelines. These mechanisms provide a patchwork of protection, but not the kind of tight-knit safety net personal information enjoys (at least on paper, and in some jurisdictions). 

Let’s take a closer look at the scenarios in which confidential corporate information is protected.

Copyright

Copyright commonly protects the financial and reputational interests of an author or owner of an original literary, scientific, or artistic work by prohibiting the use and reproduction of the work without the author’s or owner’s permission. A copyright infringement can be remedied by the courts ordering recalls, that is, removal of the work from the distribution channel or destruction of copies, injunctions, damages, i.e., monetary compensation for financial losses, and, upon request, the publication of judicial decisions against the copyright infringer. Copyright holders do not have to wait for a court order, though. They can put the infringer on notice by substantiating their claim to the work and an infringement thereof, requesting the removal and prevention of future infringements. 

The copyright holder’s interests are also protected in the digital space. Under the EU’s proposed Digital Services Act, expected to come into force in early 2024, online copyright enforcement is a major topic. Even today, the removal of online content is by far led by copyright enforcement efforts (we’re talking billions of removals), according to  Google’s Transparency Report. This report lists removal requests due to copyright, privacy, government initiatives, and more.

That’s great news for copyright holders, but when it comes to confidential corporate information, it only helps so much. Copyright law can protect relevant confidential corporate information, such as software code, and under the letter of the law also emails, from the moment of creation without an obligation to publish or register it. And even inadvertent infringement generally gives rise to the same enforcement rights mentioned above. Thus, unpermitted use of copyrighted works gives in many important ways the same rights to copyright holders as individuals have whose personal information was used without their consent: You own it, you have the right to say who can use it, and you can request removal of it from where you don’t want it to be.

But there are a few caveats to this. First, you will hardly have any issue proving that something constitutes personal information about you. But unless you register your copyrighted work, you may have a hard time proving it is copyrighted and that you hold that right. A copyright certificate also does not prove ownership of your copyright; it constitutes prima facie evidence of your right, which means the infringer can disprove that you are the copyright owner. Secondly, there isn’t a lot of clarity as to whether copyrights are infringed during the development, training, or improvement of foundational models such as ChatGPT. OpenAI, for example, explains that 

Machine learning models are made up of large strings of numbers, called “weights” or “parameters,” and code that interprets and executes those numbers. Models do not contain or store copies of information that they learn from. Instead, as a model learns, some of the numbers that make up the model change slightly to reflect what it has learned. 

Yet, we have seen instances where chatbots have unmistakably reproduced original works, rendering the foregoing assurance substantially less comforting.

Plus, foundation model providers are quite unhelpful in terms of providing clarity as to whether and how they train their models on copyrighted works. Stanford researcher have found that transparency with regards to copyrighted training data is the worst area of non-compliance with the proposed EU AI Act, demanding that companies be more transparent:

“Disclosure of copyrighted training data is the area where we find foundation model providers achieve the worst compliance. Legislators, regulators and courts should clarify how copyright relates to (i) the training procedure, including the conditions under which copyright or licenses must be respected during training as well as the measures model providers should take to reduce the risk of copyright infringement and (ii) the output of generative models, including the conditions under which machine-generated content infringes on the rights of content creators in the same market.” 

But here is the biggest problem with relying on copyright protection for keeping confidential corporate information safe: Copyright law protects the actual work, but not the ideas contained in it. From a copyright perspective, there is no problem with going around and telling everyone how the latest Harry Potter ends, or, more topically, to speak about what you have heard board members say about the planned merger that isn’t public yet. 

In summary, the scope of copyright protection is fairly broad and exists without a publication requirement. The content must, however, be original and tangible. Enforcement is possible against anyone who uses the work without permission, with some exceptions. However, copyright does not protect the ideas revealed in the written work.

Trade Secrets

Moving on to the next mechanism enabling corporations to protect their confidential corporate information – trade secrets. There are three key elements for information to qualify as a trade secret:

  • The information must not be known or readily accessible to the public
  • The secrecy of the information constitutes a competitive advantage
  • The company or individual in control of the information has taken active steps to keep the information a secret

Trade secrets need not be novel, there is no registration required, and no time limit exists. However, for the protection to apply, a business must implement measures to keep the information secret, such as data handling procedures, technical safeguards, employee training, and non-disclosure agreements. 

The protection granted to trade secrets and the enforcement depends on the jurisdiction. Directive 2016/943 of the European Parliament and the Council of June 2016 on the protection of undisclosed know-how and business information (trade secrets) aims to provide uniform protection against unauthorized benefitting from unlawful acquisition, disclosure, use, or trade of trade secrets in the EU. Other than EU regulations, a directive is not binding on the member states, yet it can help streamline the legal protection granted to trade secrets in the EU. In the US and Canada, on the other hand, trade secret protection is largely only a common law concept. 

As opposed to a copyrighted work, trade secrets can be intangible as well as tangible. A trade secret can include business, financial, technical, or scientific information. Common examples are the Coca-Cola recipe and Google’s search algorithm. 

Trade secret protection can also apply to early-stage inventions that are in the process of being patented (at which time the information becomes public and thus no longer a trade secret), manufacturing processes, and lists of suppliers and clients. 

There are two main downsides to trade secret protection. For one, if individuals or organizations independently acquire or develop technical or commercial information through their own research and development, reverse engineering, or marketing analysis, trade secret owners cannot prevent them from utilizing the same information. Unlike patents that are publicly disclosed, trade secrets do not offer “defensive” protection under priority claims. 

Second, in most jurisdictions, the trade secret protection only applies against an intentional infringer. If the infringer were to leak the data to, say, OpenAI, no rights against the recipient of the information exist unless the recipient knows or has a reason to know that an obligation existed not to disclose the information.  

In summary, trade secret protection applies to a broad range of information but only so long as it is successfully kept secret. Enforcement is for the most part limited against the infringer, and no protection exists against the use of the information that was obtained by lawful means, e.g., reverse-engineering.

Patents

A patent is an intellectual property right to an invention which enjoys protection by the government in exchange for making it public, thereby enriching public knowledge and fostering innovation. Inventions must be new, inventive, as opposed to obvious, and have an industrial application to be patentable. A patent holder can exclude others from using, making, or selling the same product for a limited amount of time, usually 20 years, depending on the jurisdiction. 

Patents are not protecting the confidentiality of information but the exclusive use. Other than copyright, there is an absolute requirement to publish the content the use of which is sought to be protected. In addition, the patent registration process is time consuming and costly.

Material Non-Public Information

Material Non-Public Information (MNPI) is a concept from the world of trading. It denotes information that is known only to certain insiders to a corporation and which, if made public, could influence the stock price of that corporation. There are disclosure obligations attached to MNPI, and individuals in possession of that information, no matter how it was obtained, are prohibited from trading the corporation’s stock before the information is made public. A violation would constitute illegal insider trading if the information is used to gain an unfair advantage over other traders and investors. 

Disclosure of MNPI itself can be illegal under certain circumstances. First, it can be a breach of contract if the disclosing individual is under a contractual obligation not to disclose the information. Depending on the person’s position in the company, there may even exist a fiduciary duty to keep the information confidential. Disclosure can also be a criminal offence, namely if an insider discloses MNPI to a recipient who the insider has reason to believe will use this information to conduct a trade or pass this information on further. Some jurisdictions tie stricter requirements to “tipping,” such as knowledge of the information recipient that the disclosing person was under a duty to protect the information, and intent to defraud. 

MNPI does not necessarily have to originate inside the corporation. It could be information held by regulators, legislators, or financial institutions. But most commonly MNPI would be information related to expected news events, such as earning reports, planned corporate actions, such as IPOs, mergers, acquisitions, an agreed-upon legal settlement, changes in the supply chain or operational models.

MNPI is usually not kept secret for very long. Public corporations are under an obligation to disclose material information. However, the timing and manner of the release can be important and hence the corporation has an interest in keeping MNPI secret until it decides to make it public. 

The securities commissions provide certain disclosure standards to ensure all investors have equal access to the information that may affect their investments. Generally, a material change in a business must be disclosed immediately, that is, once the decision has been made to implement the change. Exceptions apply when immediate disclosure would unduly be of detriment to the corporation’s interest. There are also ongoing disclosure obligations requiring corporations to disclose financial information publicly on a regular basis.

It’s important to note that the rules around MNPI are not designed primarily to protect the interest of the corporation in the secrecy of its information. Rather, the intention is first and foremost to ensure equal opportunity and fairness among shareholders of a corporation and those who may want to acquire shares.

To summarize, MNPI can generally only be kept confidential for a limited period of time. Yet, within that time frame, selective disclosure, that is, communication of this information to anyone but the broad public is most problematic and in fact may constitute a criminal offence punishable by imprisonment. Somewhat paradoxically, the best protection can be complete publication of the information because from the moment the material information is public, the interests of other investors are protected and no unfair disadvantage can ensue. This shows that the interest of the shareholders is in focus here, not that of the corporation and its interest in keeping its information confidential.

How does protection of confidential corporate information measure up against protection of personal data?

The following table provides a high-level summary of the mechanisms available to exercise control over confidential corporate information. A comparison is drawn to the rights individuals enjoy with regards to their personal information.

Asset

Protection

Personal information

Copyrighted works

Patents

Trade secrets

MNPI

Scope of protected information

Almost everything about an identifiable individual

Limited – Must be an original, tangible work, not an idea

Limited – Novel inventions

Broad scope, but the information must remain secret

Limited – must be material, as in, disclosure can influence stock price and pertain to publicly traded corporation

Transparency before using/

disclosing data

Yes

Uncertain; potentially yes, but not provided in practice



Irrelevant as patents are not confidential

No. Measures are in place to protect from disclosure; once breached, protection ceases

No, as selected disclosure is illegal as “tipping”

Enforcement

Yes – Millions of dollars of fines; private right of action

Yes – Fines and effective removal procedures

Yes

Difficult

Yes – Fines and imprisonment

Protection persists after disclosure

Yes

Yes

Public by default

No

No

Protection requirement

None – everyone has the rights per default

No registration is required, but for enforcement, it’s needed

Registration required – Onerous in terms of time and cost

Proof of measures taken to keep it secret required

None, so long as the materiality requirement is met and it’s not public

 

We can see that IP law in its many different forms does not focus on comprehensive data protection. Instead, for the most part, it aims to protect intellectual creations. In the case of MNPI protection, it isn’t even data protection that is the central goal of the law. And even where the law’s mandate and scope is broader than that, only certain select aspects of corporate information are granted protection under the various mechanisms discussed herein. To conclude, no broad concept equivalent to personal information as it pertains to individuals exists for corporations. 

To some extent, the different scope of protection is due to the nature of corporations. Particularly publicly traded corporations are considered “corporate citizens” that play a role in and have certain responsibilities to society and the environment beyond their economic activities. They are, however, much more like public figures than individual citizens with some of their information being everybody’s business. Since individuals can possess a part of these corporations in the form of ownership shares, individuals have a right to know what is going on, so that the corporation cannot suspect to do all of its business behind closed doors. Responsibilities to society can only be effective with transparency and accountability.

On the other hand, there are forceful arguments being made for certain personal information to be treated as a public good as well. Considering the immense value for society that can be leveraged by sharing personal information, e.g., in healthcare, to name only one obvious use case, the difference between corporate information that should form a part of public knowledge, and personal information vanishes to some extent.

A balance needs to be struck between information that should properly be public and that which deserves to be guarded as confidential corporate information, namely proprietary information which ensures the economic competitiveness of a business. As our analysis has shown, safeguarding confidential corporate information is an onerous endeavour. Trade secrets as well as MNPI rely on contractual and administrative safeguards implemented by the corporation, without which the respective legal protection does not even apply. Basically, if, and only if, corporations are successful in keeping their information safe, the laws geared towards trade secrets and MNPI protection add an additional layer of protection in the form of punishment of unauthorized use. The only mechanisms that actually empower corporations to take control of their information, then, are copyright and patent law, the latter of which does not protect the confidentiality of information. Compared to the extensive rights under privacy law, the protection granted to the information of corporations is hence notably thin. Yet, their interest is in many ways similar to the interests of natural persons. Just consider the definition of privacy, according to the International Association of Privacy Professionals:

“​​[…] privacy is the right to be let alone, or freedom from interference or intrusion. Information privacy is the right to have some control over how your personal information is collected and used.”

The key theme here is control, and with employees using ChatGPT to check source code, or using similar technology to transcribe and summarize meeting recordings, control over confidential corporate information is just as in jeopardy as when it comes to personal information exposed by employees in other departments writing employment offers or responding to customer complaints or by doctors using Large Language Models (LLMs) to help with diagnostics upon learning about a patient. 

Expanding existing legislative frameworks to protect confidential corporate information more broadly

In light of the attention that data leakage incidents involving ChatGPT and other generative AI models have recently received, it seems like the right time to have a conversation that reconsiders how we protect confidential corporate information. First signs of a move towards providing much needed clarity in this regard are the developments around the EU AI Act and its most recent amendments including requirements of transparency by AI developers around copyrighted works used for training their models. 

Japan is an example of a different approach where some usage of copyrighted works related to AI development is excluded from the concept of copyright infringements, under certain circumstances, thereby placing greater emphasis on facilitating innovation. 

As a thought experiment, let’s consider whether a comprehensive corporate data protection regime comparable to that provided by the GDPR would make sense. 

For one, the value of corporate information differs from that of personal information. Personal data is as good as gold is in advertising, as we have learned in the last decade from the new business model of social media platforms. Personal data also informs strategic business decisions and product roadmaps, as data provides insights into patterns that can be leveraged and into untapped business opportunities. 

Confidential corporate information could be those very insights that a competitor would like to get their hands on, or innovative designs, software code, recipes, and other achievements that are responsible for the business’s competitive advantage and which would be lost if it was widely shared. Direct competition and intellectual property concerns are different from the drivers behind the protection of personal information. 

These concerns need to be addressed differently than the protection of personal privacy. For example, the data minimization principle would be difficult to apply in the context of certain confidential corporate information. How would one use a trade secret “only to the extent necessary to achieve the purposes for which the information was obtained” and still give effect to the interests that are protected by trade secrets? It’s often an all or nothing approach when intellectual property rights are concerned. What would be a legitimate interest for using a competitor’s invention? Surely different considerations would have to come into play when corporate information is concerned as opposed to personal information. Can you expect an employee to self-report a “data breach” when the information constitutes a copyright, trade secret, or MNPI? Hardly.

Even the consent requirement that is so central to the use of personal information does not fit very well with all IP rights. And where it does, for copyrighted works and patents, it already exists in the form of a licensing regime. 

It seems, then, that the justified need to take control of confidential corporate information which is only partially met by the current regimes could not be appropriately met by expanding the concept of personal information, granting the same rights to the owners of that information as are enjoyed by natural persons with regard to their personal information. Rather, the existing regimes in place for protecting confidential corporate information need to be supplemented to address the novel risks such information is facing in the age of ChatGPT. 

But as a first step, awareness training would be a cost-efficient and fairly effective measure to mitigate the risks companies face today. The recent developments in data privacy have raised the level of awareness around the importance and the required protection of personal information considerably. In particular the GDPR has played an important role in that. Bringing the level of awareness regarding the need to protect confidential corporate information up to par with that would already make a big difference. 

What technology can do

Instead of waiting for regulators and legislators to go through their slow process of bringing about change, Private AI has developed a solution, integrated in its established, high-performing redaction tool, that removes confidential corporate information from unstructured data sets. Of course, this functionality is not limited to MNPI of publicly traded corporations but can be used by anyone with an interest to keep their information confidential. This solution can enable businesses to keep benefitting from the enormous efficiency boost provided by ChatGPT and similar technologies, while being confident that their information is not inadvertently disclosed.

Subscribe To Our Newsletter

Sign up for Private AI’s mailing list to stay up to date with more fresh content, upcoming events, company news, and more! 

More To Explore

Download the Free Report

Request an API Key

Fill out the form below and we’ll send you a free API key for 500 calls (approx. 50k words). No commitment, no credit card required!

Language Packs

Expand the categories below to see which languages are included within each language pack.
Note: English capabilities are automatically included within the Enterprise pricing tier. 

French
Spanish
Portuguese

Arabic
Hebrew
Persian (Farsi)
Swahili

French
German
Italian
Portuguese
Russian
Spanish
Ukrainian
Belarusian
Bulgarian
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
Greek
Hungarian
Icelandic
Latvian
Lithuanian
Luxembourgish
Polish
Romanian
Slovak
Slovenian
Swedish
Turkish

Hindi
Korean
Tagalog
Bengali
Burmese
Indonesian
Khmer
Japanese
Malay
Moldovan
Norwegian (Bokmål)
Punjabi
Tamil
Thai
Vietnamese
Mandarin (simplified)

Arabic
Belarusian
Bengali
Bulgarian
Burmese
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
French
German
Greek
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Italian
Japanese
Khmer
Korean
Latvian
Lithuanian
Luxembourgish
Malay
Mandarin (simplified)
Moldovan
Norwegian (Bokmål)
Persian (Farsi)
Polish
Portuguese
Punjabi
Romanian
Russian
Slovak
Slovenian
Spanish
Swahili
Swedish
Tagalog
Tamil
Thai
Turkish
Ukrainian
Vietnamese

Rappel

Testé sur un ensemble de données composé de données conversationnelles désordonnées contenant des informations de santé sensibles. Téléchargez notre livre blanc pour plus de détails, ainsi que nos performances en termes d’exactitude et de score F1, ou contactez-nous pour obtenir une copie du code d’évaluation.

99.5%+ Accuracy

Number quoted is the number of PII words missed as a fraction of total number of words. Computed on a 268 thousand word internal test dataset, comprising data from over 50 different sources, including web scrapes, emails and ASR transcripts.

Please contact us for a copy of the code used to compute these metrics, try it yourself here, or download our whitepaper.