Recently, Facebook released the Image Similarity data set and announced an associated competition hosted by DrivenData with a whopping $200,000 prize pool. The competition began on June 19, 2021, and will conclude on October 28, 2021. The challenge is being supported by Pinterest, BBC, Getty Images, iStock, and Shutterstock.
The data set contains nearly 1 million reference images and 50,000 query images, some of which are manipulated versions of a reference image. Through this dataset and challenge, Facebook hopes to enable new implementations of ML-based systems that can be utilized to help predict the similarity of two pieces of visual content and aid the industry in the at-scale detection of manipulated images.
Facebook AI believes that the best solutions will come from open collaboration by users across the AI community. Their Image Similarity data set is the most extensive known data set on image similarity, including human and automated edits representing on-platform behavior.
Nowadays, many social media networks utilize content tracing and image similarity detection to block or slow down the spread of malicious content that has a negative social impact. These networks include manual content moderation with automated matching tools.
Image similarity includes identifying the source of a doctored image within an extensive collection of unrelated images. This technology applies to various content moderation domains, including scams, misinformation, copyright infringement, etc.
Facebook designed the Image Similarity data set to provide a benchmark for work in image similarity detection, thus facilitating a reference collection of 1 million images and a set of 50,000 query images. They selected specific images with broad licenses from the YFCC100M and still images from their Deepfake Detection Challenge data set. They also used some images from the Casual Conversations data set, applying the Ciagan deepfake technique to modify the faces and make it harder for AI to identify source images.
Facebook transformed these source images in various ways. They also applied a wide range of automated transformations to a subset of the 50,000 query images using the recently open-sourced AugLy library.
Facebook AI also worked with trained third-party annotators to manually transform a smaller subset of the images to ensure more selections representative of how a human user would transform images. Along with this, The Image Similarity Challenge enables participants to test their image matching techniques on the Image Similarity data set. The Image Similarity Challenge has been accepted for the NeurIPS 2021 competition track.
Participants in the Image Similarity Challenge are expected to find the source reference image from all queries within the data set. Baseline methods include all techniques from the instance matching literature. The researchers worked together with several image matching experts from the Czech Technical University in Prague to choose the right evaluation metrics.
Facebook AI is confident that the Image Similarity Challenge will ensure faster progress across the industry in dealing with harmful/malicious content and help advance the similarity detection domain by providing a data set explicitly made to aid researchers in tackling this problem. It also provides a benchmark for work in image similarity detection.