AWS AI Labs Propose A Method That Predicts Bias In Face Recognition Models Using Unlabeled Data

Algorithmic bias has emerged as a major area of study in artificial intelligence in recent years. An examination of facial recognition software in 2018 defined bias as a disparity in the software’s performance when applied to people of diverse racial or ethnic backgrounds.

To check if a face recognition algorithm is biased, it is most straightforward to train it on a large dataset that includes human faces from various demographics and then observe its results. However, this requires identity-annotated data, which is prohibitively expensive to collect, especially on the massive scale needed to assess a face recognition model.

A new Amazon research introduces a method for evaluating bias in facial recognition systems without using identification annotations. Their findings show that the proposed method effectively detects disparities in performance indicative of bias, even though it only estimates a model’s performance on data from various demographic groups.

This unexpected finding suggests an evaluation paradigm that should make it much more practical for creators of face recognition software to test their models for bias.

This approach can be easily modified for use with other populations and has a low cost. While identity labels aren’t necessary, it is necessary to have a way to determine which subjects are members of each category.

Cloud computing provider AWS has released code to aid in the fight against bias in machine learning models. The researchers trained face recognition models using data sets where certain demographic information had been removed to introduce bias. Their technique reliably identified differential performance in the concealed demographic groups in all situations.

They compared their method to Bayesian calibration, the gold standard for forecasting an ML model’s results. Their results demonstrate that because the proposed method relied solely on unannotated data, the approach consistently outperforms Bayesian calibration, sometimes by a substantial margin.

Most current face recognition methods use annotated training data to generate vector representations (embeddings) of input photos and compare their distances in the embedding space to train a model. The researchers represented the same individual in any two embeddings whose distance was less than an arbitrary cutoff.

Their work is based on the hypothesis that there is a certain distribution to which distances between true matches belong and a separate distribution to which distances between non-identical faces belong. Their approach is to discover the values of the parameters describing these two distributions.

The team assumed a two-part distribution based on empirical evidence to explain the scores’ skewness. The mode is the most common value, and in two-piece distributions, the distribution is cut in half, with each half having different characteristics.

The trained facial recognition model was fed with pairs of photos annotated with demographic data but not identification data to assess its performance. Some of the paired faces in the face verification process will be correct, while others will not be. This way, there is no way of knowing which is which.

The model learns two distributions from the resulting scores: one for matches and one for non-matches. The distance between these distributions provides a measure of the model’s accuracy. This analysis is run for all the different demographic groups in our data set and then evaluates the differences.

Using hierarchical clustering of the test samples, researchers calculated error boundaries for accurate estimates. The results show that the proposed method still conveys a strong discrepancy signal even after accounting for error. 

The researchers believe their approach will be useful for AI researchers and developers working on biometric authentication systems like facial recognition.

Check out theĀ paper and reference article. All Credit For This Research Goes To Researchers on This Project. Also, donā€™t forget to joinĀ our Reddit pageĀ andĀ discord channel, where we share the latest AI research news, cool AI projects, and more.Ā 

Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.