Researchers at Google AI have recently launched Crowdsourcing Adverse Test Sets for Machine Learning (CATS4ML) Data Challenge. This challenge focuses on improving evaluation datasets for machine learning (ML) by encouraging exploring existing ML benchmarks for adverse examples that can be ‘unknown unknowns.’
The performance of machine learning (ML) models greatly depends on the learning algorithms and the data used for training and evaluation. The researchers worldwide are making efforts to improve the data, including a series of workshops approaching ML evaluation issues. However, research and challenges that focus on the data used to evaluate ML models are not prevalent.
Moreover, many evaluation datasets contain easy-to-evaluate items; thus, they miss the natural ambiguity of real-world context. Evaluation of ML models without real-world examples makes it difficult to test machine learning performance reliably. This causes ML models likely to develop “weak spots.”
Google AI’s CATS4ML Data Challenge at HCOMP 2020 addresses the difficulty of identifying these ML models’ weaknesses. The principle aim of this challenge is to raise the bar in ML evaluation sets to spot new data examples about which machine learning is confident about but actually misclassifies. This challenge’s outcomes will help detect and avoid future errors and provide insights into model explainability.
What are Weak Spots in Machine Learning models?
Weak Spots are classes of examples that are difficult or impossible for a model to evaluate accurately. This is because the evaluation dataset does not include these classes of examples.
They are of two categories, as below:
- Known unknowns: Examples for which the ML model is unsure about the correct classification
- Unknown unknowns: Examples for which the ML model is confident about its answer but is actually wrong.
Researchers continue to study the ‘Known Unknowns’ in a domain called Active learning. The community has found a solution to obtain new labels from people on uncertain examples interactively. For instance, if a model is unsure whether the subject of a photo is a cat or not, a person is directed to verify it. However, if the system is sure, then the person is not asked for verification. In this case, the model’s confidence is correlated with its performance, i.e., one knows what the model doesn’t know.
Several efforts are also being made to discover ‘Unknown Unknowns,’ which have helped uncover many unintended machine behaviors. In contrast to methods used to find out unknown unknowns, Generative Adversarial Networks (GANs) generate unknown unknowns for image recognition models in the form of optical illusions that cause deep learning models to make mistakes beyond human perception.
But real-world examples can provide better insight into a model’s failures in its day-to-day performance. Therefore, the CATS4ML data challenge aims to collect unmanipulated samples that humans can reliably interpret but on which many ML models would make mistakes.
First Edition of CATS4ML Data Challenge
The first edition of CATS4ML Data Challenge focuses on visual recognition, using images and labels from the Open Images Dataset. Participants can select the target images from the Open Images Dataset along with a set of 24 target labels from the same dataset. ML researchers and practitioners are invited to invent unique and creative ways to explore the existing publicly available dataset and discover examples of unknown unknowns for machine learning models while focusing on a pre-selected target label list. This will also encourage researchers to create benchmark datasets for ML that are more balanced, diverse, and socially aware.
The challenge is open until 30 April 2021 to researchers and developers worldwide. Those who want to participate in the challenge can register on the challenge website, download the target images and labeled dataset, and contribute their discovered pictures.
Challenge website: https://cats4ml.humancomputation.com/