Red teaming: A practical guide to testing AI models
AI models are rapidly advancing, offering ever greater complexities and capabilities across a spectrum of applications, from natural language to computer vision. And in their wake comes an international regulatory push to ensure AI functions in a transparent, accurate way, bringing with it the increasing need for robust evaluation methods to accurately assess AI models’ performance.
Many evaluation methods, such as code reviews, may not adequately address the nuances of foundation models and generative AI in isolation. Effective and comprehensive evaluations, such as those developing under the proposed EU AI Act, therefore may require additional measures to be taken in order to meet the expectations of regulators and industry best practices.
One alternative (or complimentary) way to evaluate the accuracy and technical robustness of an AI model is red teaming. This method involves deliberately probing an AI model to test the limits of its capabilities. This may be done manually, by teams of individuals (similar to video game testing in which bug searches are carried out) or by pitting AI models against each other.
In our concise handbook, we explore red teaming – what it is, what its goals are, and how your company may make best use of it and become red team ready.
At DLA Piper, our integrated AI and Data Analytics team stands at the intersection of law and technology, comprising top-tier lawyers, data scientists, analysts, and policymakers leading AI development and deployment. We are a pioneering blend of lawyer-data scientists that seamlessly combine legal acumen with technical depth.
For more information on how to evaluate your AI systems, including foundation models and generative AI, and to keep up to date on the emerging legal and regulatory standards, please contact any of the authors, and visit DLA Piper’s Focus page on Artificial Intelligence.