Copyright in the Age of AI: Examining Ownership of AI-Generated Works 

Share This Post

The rapid advancement of generative artificial intelligence (AI) has raised intriguing questions about the copyrightability of AI-generated output. As AI systems become increasingly capable of producing what we would commonly consider original creative works, it becomes essential to examine the copyright implications and ask ourselves who, if anyone, can claim copyright to the output of AI such as ChatGPT’s.

There are currently several lawsuits pending from copyright holders of some of the training data. Under the concept of “derivative works,” we will consider whether such authors could make a copyright claim to the output generated by the AI trained on their works. We also examine whether the AI developers could be considered copyright holders regarding the output of the model they built, or whether the individuals who selected the training data and performed the training of the AI could lay such a claim to the generated output. Lastly, we will consider arguments for and against whether the user, if different from the personas listed before, can be the author of a copyrighted work that the AI produced based on their prompt.

What does Copyright Protect and Enable?

Copyright law is a fundamental pillar of intellectual property law that seeks to achieve several key goals and protect the diverse interests of creators, innovators, and society as a whole. At its core, copyright law aims to strike a balance between fostering creativity and ensuring the fair and equitable treatment of creators, on the one hand, and access by the public to creative works on the other. 

By granting exclusive rights to creators for a certain amount of time, copyright law incentivizes the production of original works, encouraging innovation, artistic expression, and the advancement of knowledge. Copyright legislation thereby seeks to safeguard the economic interests of creators, allowing them to reap the rewards of their creative efforts by controlling the reproduction, distribution, and public display of their works. Others may generally only reproduce the work of an author with permission; e.g., by purchasing a licence to do so.

Simultaneously, copyright law also recognizes the broader societal interests in access to and the dissemination of creative works, promoting cultural enrichment, educational purposes, and the public’s right to enjoy and benefit from artistic and intellectual creations. Hence, copyright laws commonly make an exception from the prohibition to reproduce an author’s work for purposes such as private study, criticism, news reporting, and education. 

Training Data Copyright Holders as Authors or Owners of ChatGPT’s Output

AI now has the ability to mimic the style of an author, painter, and even singer or composer. Trained on vast amounts of copyrighted work, ChatGPT could then produce a work that closely resembles that of a human author and write a sequence almost identical to, say, Harry Potter. It does not seem unreasonable at all that people would buy such content, and given the incredible speed of the output generation, no human could ever compete with that. This poses a real threat to copyright holders, financially and reputationally. It is in stark conflict with copyright’s intention to incentivise the creation of original works.

One possibility of protecting the interests of copyright holders on works the LLM has been trained on could be to grant the authors copyright on the output. Full disclosure, this proposition is a real stretch given the current existing legal frameworks and serious limitations regarding the explainability of how output was produced by generative AI systems. The US Copyright Act’s concept of “derivative works” protects authors against the production of a work that is based on or derived from an existing, copyrighted work. Without the author’s permission, no one may produce such a derivative work. The US Copyright Office says: “where a copyrighted work is used without the permission of the copyright owner, copyright protection will not extend to any part of the work in which such material has been used unlawfully. The unauthorized adaptation of a work may constitute copyright infringement.” The last sentence here is important, though. The original author will not be granted copyright on the work that is based on theirs. Rather, a copyright infringement will be found by the one violating the law. 

AI System as Copyright Holder

When we have no information regarding the genesis of the output, we will often not be able to tell whether a particular text was written by a human author or by ChatGPT. There is no arguing that AI produces original content. Nevertheless, many copyright laws require either explicitly or implicitly a creative act by a human author to extend copyright protection to the work. This has a principled reason. As explained above, copyright law aims to provide an incentive to produce original works, and no such incentive could ever motivate AI. 

An argument before a court or copyright authority to the effect that ChatGPT should hold a copyright on its output will likely fail. For example, the US Copyright Office explicitly requires a human to have created the work. Similarly, in Canada, the Copyright Act requires the copyright holder to be “a citizen or subject of, or a person ordinarily resident in, a treaty country,” thus implying the author to be a natural person. Hence, our second contestant must be disqualified as a potential copyright owner. 

Furthermore, as we have seen above, if the output is based on copyrighted input and constitutes a derivative work, no copyright protection to any part of the new work will be granted. This argument will likely only be important if the output is obviously mimicking the style or uses the characters of an author who holds a copyright to their work.

Developers as Copyright Owners

Considering the pivotal role of developers in designing and developing the AI system itself, we can argue that developers should be recognized as the authors or owners of the AI-generated output. Developers create the framework, algorithms, and training methods that enable the AI system’s creative abilities. Perhaps copyright protection should be granted to developers as they are responsible for the AI system’s capabilities and its ability to generate original works. Conversely, developers as such do not have direct creative control over the AI system’s output, and their role is primarily technical, thereby limiting their claim to copyright ownership.

LLM Users as Copyright Owners

An argument could be made that LLM users who purposefully and carefully curate the training data of an LLM model to ensure that it learns only a particular style are comparable to other creators of copyrighted works would use their tools and skills when performing their craft. There may even be instances where someone trains an AI model exclusively on data they generated themselves. In that case, ownership or authorship under copyright law may be achievable. Yet, on the other end of the spectrum there is of course the more common user who is not engaged with selecting or generating the training data. They would merely write a few lines of prompts and ChatGPT provides an entire article. Everything in between is also possible. Since an LLM is trained on a vast and almost indiscriminate amount of data, but also able to learn from a user’s input, it is conceivable that a user makes a very skillful and creative effort to tailor the prompt and then reiterates on it and asks the LLM to change its output to their liking. In such a scenario the user may well be able to meet the requirements for a copyright on the output. Given this broad range of effort that users may exhibit, it is difficult to make a sweeping judgment on whether the AI user should be granted copyright on the output.

But then there is also the contractual aspect of copyright law. Looking at OpenAI’s Terms of use, we can see that if there was any uncertainty as to whether OpenAI or the user of their models are entitled to the generated output, OpenAI assigns its rights to the user, but also the obligations that come with that:

Subject to your compliance with these Terms, OpenAI hereby assigns to you all its right, title and interest in and to Output. This means you can use Content for any purpose, including commercial purposes such as sale or publication, if you comply with these Terms. OpenAI may use Content to provide and maintain the Services, comply with applicable law, and enforce our policies. You are responsible for Content, including for ensuring that it does not violate any applicable law or these Terms.

Conclusion

The question of whether the output of generative AI can be copyrighted remains complex and subject to ongoing debate. As AI systems continue to evolve and push the boundaries of creative expression, it is essential to carefully evaluate the perspectives of different personas involved in the creation process. Achieving a consensus on copyrightability will require legal frameworks that adapt to technological advancements, recognizing both the contributions of AI systems and the roles of human creators and developers. Striking the right balance will promote innovation, protect the rights of creators, and foster responsible AI development in a rapidly changing digital landscape.

Subscribe To Our Newsletter

Sign up for Private AI’s mailing list to stay up to date with more fresh content, upcoming events, company news, and more! 

More To Explore

Privacy Management
Blog

End-to-end Privacy Management

End-to-end privacy management refers to the process of protecting sensitive data throughout its entire lifecycle, from the moment it is collected to the point where

Read More »

Download the Free Report

Request an API Key

Fill out the form below and we’ll send you a free API key for 500 calls (approx. 50k words). No commitment, no credit card required!

Language Packs

Expand the categories below to see which languages are included within each language pack.
Note: English capabilities are automatically included within the Enterprise pricing tier. 

French
Spanish
Portuguese

Arabic
Hebrew
Persian (Farsi)
Swahili

French
German
Italian
Portuguese
Russian
Spanish
Ukrainian
Belarusian
Bulgarian
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
Greek
Hungarian
Icelandic
Latvian
Lithuanian
Luxembourgish
Polish
Romanian
Slovak
Slovenian
Swedish
Turkish

Hindi
Korean
Tagalog
Bengali
Burmese
Indonesian
Khmer
Japanese
Malay
Moldovan
Norwegian (Bokmål)
Punjabi
Tamil
Thai
Vietnamese
Mandarin (simplified)

Arabic
Belarusian
Bengali
Bulgarian
Burmese
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
French
German
Greek
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Italian
Japanese
Khmer
Korean
Latvian
Lithuanian
Luxembourgish
Malay
Mandarin (simplified)
Moldovan
Norwegian (Bokmål)
Persian (Farsi)
Polish
Portuguese
Punjabi
Romanian
Russian
Slovak
Slovenian
Spanish
Swahili
Swedish
Tagalog
Tamil
Thai
Turkish
Ukrainian
Vietnamese

Rappel

Testé sur un ensemble de données composé de données conversationnelles désordonnées contenant des informations de santé sensibles. Téléchargez notre livre blanc pour plus de détails, ainsi que nos performances en termes d’exactitude et de score F1, ou contactez-nous pour obtenir une copie du code d’évaluation.

99.5%+ Accuracy

Number quoted is the number of PII words missed as a fraction of total number of words. Computed on a 268 thousand word internal test dataset, comprising data from over 50 different sources, including web scrapes, emails and ASR transcripts.

Please contact us for a copy of the code used to compute these metrics, try it yourself here, or download our whitepaper.