Millions of people use Generative AI (GenAI) every day. While GenAI applications are creating significant benefits for users, they also pose risks. For example, we know that bad actors have used GenAI to create child sexual abuse material, low-cost deepfake adverts and synthetic terrorist content.
As the new regulator for online safety, Ofcom is exploring how online services could employ safety measures to protect their users from harm posed by GenAI. One such safety intervention is red teaming, a type of evaluation method that seeks to find vulnerabilities in AI models. Put simply, this involves ‘attacking’ a model to see if it can generate harmful content. The red team can then seek to fix those vulnerabilities by introducing new and additional safeguards, for example, filters that can block such content.
Many of the major model developers now claim to conduct red teaming of some form and red teaming is seen by many as a critical tool to ensure the safe deployment of GenAI models. However, there is not yet a clear consensus on its strengths and weaknesses, how it should be conducted, the skills and resources required to do so, and what outcomes it should lead to.
In our discussion paper, Red Teaming for GenAI Harms, we:
- Explore how red teaming – a type of AI model evaluation - differs from other evaluation techniques
- Unpack the steps involved in a red team exercise
- Outline a case study which illustrates the potential resource required
- Assess the strengths and limitations of this method
- Set out 10 practices that firms can adopt today to maximise the impact of red teaming exercises they already conduct
We will continue to examine the merits and limitations of red teaming, and in our paper, highlight several questions that we welcome stakeholders’ views on.
Discussion papers allow us to share our research and encourage debate in areas of our remit. This discussion paper does not constitute official guidance.
Read our research
Red Teaming for GenAI Harms - Revealing the Risks and Rewards for Online Safety (PDF)