26 June 2024 • 6 minute read

Intellectual property in the age of Generative AI

With generative AI increasingly being incorporated into new and old products and services, a number of issues related to intellectual property have come to the forefront of its use.

Generative AI tools typically operate by responding to a user provided prompt. Depending on the implementation, the response could be text – like a chat bot, an image or even a song. Sophistication of the tools varies with some providing more accurate results, others more whimsical. Behind the scenes, the developers of these tools have ‘trained’ the tool using huge libraries of training data. Results based on probabilities and predictions, while they may appear creative or original, at times can too closely reflect portions of the training data.

In this article, we look at what protections are offered by intellectual property, not only for the tools themselves, but also for their inputs and outputs.

Training data and the inputs

Obtaining and using training data is the first area of potential intellectual property concerns. The training data used for training the AI tools may have been obtained from a variety of sources. These could include public domain material, such as old books that are no longer protected by copyright. Some of the training data may be licensed material such as from a publisher or other rights holder. Other portions of the data may have been scraped from public websites or otherwise obtained without explicit permission.

Since copyright protects works, ranging from photos, songs and video, and written words, entities using unlicensed material for training data may be relying on one or more exceptions to copyright. For example, in the United States, ‘fair use’ can protect transformative uses of copyright protected works in certain circumstances. Similarly ‘fair dealing’ in Canada under the Copyright Act is an exception to copyright infringement if the use is for research, private study, education, parody or satire, criticism, review and news reporting.

Users of AI tools may not be aware of what training data has been used to create the tool. However, issues related to intellectual property may arise if the output of the generative AI tool reproduces one or more of the works used as training data. Depending on the prompt used, the AI tool may produce output that either explicitly or inadvertently corresponds to material that was used for training without license. Whether this action results in liability is still to be determined and several lawsuits are testing these issues. Some providers of AI tools have offered indemnities to users in the event that rights holders pursue claims against users.

Rights holders of material useful for training AI tools can negotiate licenses for this use with developers of these tools. The quality, quantity and license status of training data can be a key differentiator for generative AI tools.

In addition, the prompts entered by the user may also be used to train the tool. This can create issues if sensitive or proprietary data is entered into the generative AI tool which it then incorporates into its training and is used to create outputs for other users.

The black box

AI tools that have been trained and are used for generative outputs have been referred to as a ‘black box’, particularly from the perspective of the user. For the developers of these tools, a variety of intellectual property is used in its protection. Beyond copyrighting the actual source code and related software, trade secrets and patents would typically be used.

Patents protect new and inventive ideas and provide a 20 year term of exclusivity in return for sharing details of the invention with the public to allow others to learn about and improve the ideas. Unfortunately, protecting ideas with patents can be risky given that while an innovation may be disclosed during the application process, it may not be granted by the patent office, resulting in no protection to the innovator. For example, the patent office may determine that the patent application sought is merely for an algorithm or an abstract idea, or decide that it is obvious over existing technology.

Trade secret protection may be a viable alternative to patents if the innovation can be kept confidential. This may involve protecting the idea legally, such as through non-disclosure agreements and confidentiality agreements, and physically by limiting access through passwords and computer use policies. Once a trade secret has been publicly disclosed it loses its value, however, it can be protected indefinitely as long it is kept confidential. This can be useful for computer algorithms used for tools where access to the servers can be carefully controlled and the use of the service does not disclose the innovation.

Outputs and who owns what

With AI generative tools creating human-like outputs, the question then arises if these outputs are similarly protected by the same intellectual property regimes that protect human created works.

Other than some limited circumstances, the current consensus is that if there is no human author or inventor, there is no protectable intellectual property. In other words, if a generative tool creates a poem, and this poem is copied without permission, there may not be a copyright that can be infringed.

However, depending on the role that a human had in using the generative AI, such as creating appropriate prompts and fine tuning those prompts to create the desired result, the user may still be the author for the purposes of creating a protectable work. Many jurisdictions are reviewing and consulting on this issue and we may see changes to the law in this area in the future.

Similarly, if a human contributed to the claimed invention, even if an AI tool was used as part of the process, then there is likely a human inventor. On the other hand, owning or operating an AI system that comes up with an invention may not be a sufficient enough contribution for the owner to be named as an inventor.

Conclusion

While the impact of intellectual property on generative AI can be somewhat abstract, there can be potential liability for using the outputs from generative tools if the creator of the training data finds that inputs have been reproduced. If the outputs of generative AI are being used as part of key business processes, the uncertainty of whether they are protected by intellectual property could make it more difficult to enforce, license and otherwise monetize. As these tools mature, there will likely be further clarity on the risks involved which will reduce uncertainty for users.

Intellectual property in the age of Generative AI

Training data and the inputs

The black box

Outputs and who owns what

Conclusion

Related capabilities