Hashing is an umbrella term for techniques to create fingerprints of files on a computer system. An algorithm known as a hash function is used to compute a fingerprint, known as a hash, from a file. Comparing such a hash with another hash stored in a database is called hash matching. In the context of online safety, hash matching can be a primary means for the detection of known illegal or otherwise harmful images and videos.
Hashing functions belong to either of two categories: cryptographic or perceptual hashing. Cryptographic hashing can be used to identify exact matches, while perceptual hashing can be used to determine whether images (or videos) are very similar. Importantly, perceptual hashing assesses the similarity of the images, not of the content depicted in the images: for example, very similar images of different items could be determined to be a match, while very different images of the same items would not.
In this paper, we explain how perceptual hashing technologies can used for the detection of known illegal or harmful visual media items. We also discuss some of the issues that may arise where perceptual hashing technology is used, including its reported limitations and the potential implications should hashing be deployed inappropriately and without considering the risks from adversarial exploitation.