This Machine Learning Research Develops an AI Model for Effectively Removing Biases in a Dataset

Data collecting might be a prime opportunity for the unintended introduction of texture biases. When a model is trained on biased data and then applied to out-of-distribution data, the performance often drops dramatically since the source and nature of the biases need to be clarified. The literature is rich with research aimed at reducing or eliminating prejudice. Prior research proposed to extract bias-independent features through adversarial learning, enabling the model to solve the intended classification task without relying on biased data. However, since it is challenging to decouple biased features through adversarial learning entirely, texture-based representations are commonly retained after training. 

A team from Daegu Gyeongbuk Institute of Science and Technology (DGIST) has created a new image translation model that has the potential to lessen data biases significantly. When building an AI model from scratch from a collection of photos from multiple sources, data biases may exist despite the user’s best efforts to avoid them. High image-analysis performance is achieved thanks to the created model’s ability to eliminate data biases without knowledge about such aspects. Developments in autonomous vehicles, content creation, and healthcare would all benefit from this solution.

Deep learning models are often trained on biased datasets. For example, when developing a dataset to identify bacterial pneumonia from coronavirus disease 2019 (COVID-19), picture collection circumstances may vary because of the possibility of COVID-19 infection. Consequently, these variances result in small differences in the images, causing existing deep-learning models to diagnose diseases based on attributes resulting from differences in image procedures rather than the key qualities for practical disease identification.

Using spatial self-similarity loss, texture co-occurrence, and GAN losses, we can generate high-quality images with the desired qualities, such as consistent content and similar local and global textures. After images are produced with the help of the training data, a debiased classifier or modified segmentation model can be learned. The most important contributions are as follows:

As an alternative, the team suggest using texture co-occurrence and spatial self-similarity losses to translate images. The image translation task is one for which these losses have never been studied in isolation from others. They demonstrate that optimal pictures for debiasing and domain adaptation can be obtained by optimizing both losses.

The team present a strategy for learning downstream tasks that effectively mitigates unexpected biases during training by enriching the training dataset explicitly without utilizing bias labels. Our approach is also independent of the segmentation module, which allows it to function with state-of-the-art segmentation tools. Our approach can efficiently adapt to these models and boost performance by enriching the training dataset.

The team demonstrated the superiority of our approach over state-of-the-art debiasing and domain adaptation techniques by comparing it to five biased datasets and two domain adaptation datasets and by generating high-quality images compared to previous image translation models.

The created deep learning model outperforms preexisting algorithms because it creates a dataset by applying texture debiasing and then uses that dataset to train.

It achieved superior performance compared to existing debiasing and image translation techniques when tested on datasets with texture biases, such as a classification dataset for distinguishing numbers, a classification dataset for determining dogs and cats with different hair colours, and a classification dataset applying other image protocols for distinguishing COVID-19 from bacterial pneumonia. It also performed better than prior methods on datasets that include biases, such as a classification dataset designed to differentiate between multi-label integers and one intended to differentiate between still photographs, GIFs, and animated GIFs.


Check out the Paper and Reference Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...