feb 13 test consult

Published: 13 February 2025

Automated classifiers are tools used to detect harmful content, such as hate speech. These safety measures can be used to significantly reduce people’s experiences of harm online. Researchers also use these tools to identify how a change to a platform (for example, when it changes its rules or removes certain content or users) impact the frequency of hate speech.

However, according to Ofcom analysis published today, it is important for researchers to indicate which classifiers they have used and how they have performed. This is because the performance of classifiers may vary substantially. For example, widely-used classifiers may perform poorly in relation to some datasets.

Ofcom has analysed the performance of two hate speech classifiers: Perspective API – the most commonly used ‘off-the-shelf’ classifier – and HateXplain, which was trained on similar data to the test dataset used for this assessment. The purpose was to explore how these different classifiers perform and, then, the implications for research on the effectiveness of these types of safety measures.

Back to top