Automated classifiers are tools used to detect harmful content, such as hate speech. These safety measures can be used to significantly reduce people’s experiences of harm online. Researchers also use these tools to identify how a change to a platform (for example, when it changes its rules or removes certain content or users) impact the frequency of hate speech.
However, according to Ofcom analysis published today, it is important for researchers to indicate which classifiers they have used and how they have performed. This is because the performance of classifiers may vary substantially. For example, widely-used classifiers may perform poorly in relation to some datasets.
Ofcom has analysed the performance of two hate speech classifiers: Perspective API – the most commonly used ‘off-the-shelf’ classifier – and HateXplain, which was trained on similar data to the test dataset used for this assessment. The purpose was to explore how these different classifiers perform and, then, the implications for research on the effectiveness of these types of safety measures.
We found that Perspective API identified 13% of all hate speech in the test dataset, compared to 78% with HateXplain. This highlights that accuracy is significantly improved when using a classifier trained on a dataset from the same platform and user base.
We also found the performance of the classifiers varied depending on the target of the hate speech. The volume of errors made by the Perspective API classifier in identifying hate speech targeted at certain ethnic groups, in the dataset we used, renders it no better than random guessing.
The results suggest that, in relation to certain datasets and in comparison to classifiers which have been developed using similar datasets, Perspective API may make biased errors when predicting hate speech targeted at certain protected characteristics. We used Perspective API because it easily available and widely used. The purpose of our analysis is not to say that Perspective API is generally a poor performing classifier. Rather, that it may sometimes perform poorly, or poorly in comparison with other automated classifiers.
Based on this analysis, we believe it is important that, when researchers use hate speech classifiers to identify the frequency of hate speech, they also report how well the classifier performed and how it performs relative to other available classifiers. Otherwise, the results they present may not be robust.
To implement the UK’s online safety laws, Ofcom must produce Codes of Practice and Guidance that set out safety measures online services can adopt to protect their users and comply with their new duties.
Today’s study forms part of a substantial programme of research to inform our regulatory approach. We will update our Codes over time as our evidence base improves and as technology and harms evolve.