Facebook AI’s Latest Computer Vision Model SEER Teaches Itself To Classify A Billion Images Accurately With No Human Annotations

Source: https://ai.facebook.com/blog/seer-the-start-of-a-more-powerful-flexible-and-accessible-era-for-computer-vision/

Facebook recently unveiled an AI-driven SEER model that can analyze billions of images without any labels or captions, then detect and classify these images all by itself. 

What is SEER?

An acronym for SElf-supERvised, SEER is a computer vision model capable of processing several trillion pixels, analyzing them, and classifying them based on objects detected. 

When the Facebook AI research team fed SEER with one billion public images from Instagram, without annotations or labels, SEER managed to detect objects and classify images with an accuracy of 82.4%. This is the best performance shown by any self-supervised AI model in the world.

How does SEER work?

SEER has three major components:

1. SwAV algorithm: Developed by FAIR and INRIA, it is an algorithm that uses online clustering to group similar images together. SwAV helped Facebook researchers classify images into clusters, based on similar features, with six times less training time than self-supervised algorithms developed earlier. 

2. RegNet: It is a Convolutional Neural Network that we can use to filter images using trillions of parameters simultaneously. It can be optimized to adapt to an extensive range of runtimes and memory requirements.

3. VISSL Library: It is a PyTorch-based library capable of self-supervised training at both small and massive scales. It is also made up of 60 pre-trained models that help researchers compare various modern self-supervision models. Facebook has open-sourced this library to make it possible for researchers worldwide to build diverse self-supervised learning models, specifically for optical vision. 

Why is SEER being hailed as a breakthrough?

Self-supervised learning has long been attributed to branches of AI such as Natural Language Processing and machine learning. It is easier for machines to detect language patterns than images because there is much greater diversity in pixels than in words. SEER marks a breakthrough in incorporating self-supervised learning models into optical vision, opening new avenues for growth in this sphere of AI.

SEER is highly beneficial in studying vast and diverse data sets. Earlier, scientists had to label/caption images before feeding them into AI models for classification. When there are billions of images involved, this becomes a daunting task. SEER resolves this challenge as it can segregate images by analyzing common patterns, even in the absence of human annotations.

https://ai.facebook.com/blog/seer-the-start-of-a-more-powerful-flexible-and-accessible-era-for-computer-vision/

How can SEER be used to steer change?

Facebook researchers believe that they can use SEER to identify images that promote hate on the platform. These images can then be effectively removed to ensure that Facebook stays a safe space for users worldwide.

SEER has the power to identify biases that seep into data curation and assist content creators in ensuring that they create inclusive and meaningful content.

In addition, SEER is predicted to revolutionize diagnoses of various diseases, such as cancer, by improving the speed and quality of classifying medical images. 

The way AI sees the world and learns about it is rapidly evolving. We must embrace this evolution and explore how it can help human societies evolve into more inclusive ones.

Paper: https://arxiv.org/pdf/2103.01988.pdf

Source: https://ai.facebook.com/blog/seer-the-start-of-a-more-powerful-flexible-and-accessible-era-for-computer-vision/
GitHub: https://github.com/facebookresearch/vissl