Google AI Introduces A Dataset for Studying Gender Bias in Machine Translation

Neural machine translation (NMT) has been growing with leaps and bounds, and it has also enabled natural and fluid translations to a considerable degree. However, societal bias has been one problem that has been reflected time and again because of the stereotypical data already present while training the machine learning models. Gender, in particular, is an extremely sensitive issue wherein picking the correct pronouns is of utmost significance because it directly refers to how people self-identify. Google has claimed that it has been working to reduce the biases present with the help of innovative techniques and by using machine learning principles.

Gender Bias in NMT 

✅ [Featured Article] Selected for 2024 GitHub Accelerator: Enabling the Next Wave of Innovation in Enterprise RAG with Small Specialized Language Models

One central arena that needs change to be ushered in within societal bias is regarding gender, and it is possible to reduce the gender bias by using the context from the surrounding sentences and passages. This, in most cases, becomes a challenge because the neural machine translation translates one sentence at a time, that is, in an individual manner wherein the gendered information may not be present. Advanced translation techniques are required to look at the context and progress sentences and paragraphs with new metrics and datasets. 

To combat the contextual translation issues, Google has released the Translated Wikipedia Biographies dataset, which works to evaluate the gender bias present in most translation models. If it works in the same way as Google claims, it could help improve the Machine Learning models and systems to better focus on the pronouns and lessen the gender bias to a great degree. It would primarily present benchmarks wherein the translations’ accuracy can be put to the test and the changes made can also be effectively measured.

The Making of the Dataset

The Translated Wikipedia Biographies have been designed in a manner that they can analyze the common gender errors as already talked about above. The instances used in the dataset represent either a person that could be feminine or masculine, rock bands or sports teams that are considered neutral or genderless. All the instances have around 8 to 15 interconnected sentences that refer to a central subject, and all the articles have been written in native English. Furthermore, they have been translated to Spanish and German with the best possible professionality in language.

To avoid in-built biases in the dataset itself, a group of instances was chosen that equally represented both the geographies as well as the genders. For the same purpose, Google took biographies from Wikipedia, following the occupations, professions, jobs, or activities that people engaged in. According to Wikipedia statistics, nine occupations were chosen at large, and these presented an entire plethora of stereotypical gender associations. For better geographically diverse representation, all the instances were further divided into parts based on different geographies. What was required was one individual per region for each occupational category. Then, a relevant relationship of the individual was checked with the country of the designated geography. By following this very model of selection, the dataset was created to have people from more than 90 countries all around the globe.

Gender is usually considered to be non-binary; however, Google had its focus on having an equal representation of feminine as well as masculine entities, to begin with. To sum everything up, 12 instances with no gender were also added. For this, rock bands and sports teams were brought in because, in everyday language, they are usually referred to with a third person pronoun, for example, it or they. The team added all these instances to properly study when do models produce gender-specific pronouns even where they shouldn’t.


Result Of the Dataset and its Application 

This dataset has provided a significantly new method for evaluating the gender biases in the translation models and could even be used to better the machine learning models. Because of the numerous instances used, Google claims that the accuracy of the gender-specific translations can be studied with ease, especially those that refer to that particular subject. In English specifically, the computation is much easier because there are gender-specific pronouns provided. With that, the dataset is claimed to have reduced 67% of the errors in the context-aware models compared to those prior to these models. New research directions can also be implemented with the help of the data set in varied locations with numerous other occupations.

While this new dataset could help establish newer benchmarks for identifying the underlying gender bias in the machine learning models, it does not cover the entire range. It looks into specific arenas and could, if successful, be used to understand the correct approach for looking at gender bias.



Amreen Bawa is a consulting intern at MarktechPost. Along with pursuing BA Hons in Social Sciences from Panjab University, Chandigarh, she is also a keen learner and writer, having special interest in the application and scope of artificial intelligence in various facets of life.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...