Researchers’ positionality—their perspectives formed by their own experience, identity, culture, and background—influences their design decisions while developing NLP datasets and models.
Latent design choices and the researcher’s positionality are two sources of design bias in producing datasets and models. This leads to discrepancies in how well datasets and models function for different populations. However, by forcing one group’s standards upon the rest of the world, they can help maintain systemic inequities. The difficulty arises because of the wide variety of design decisions that must be taken, and only a subset of these decisions may be recorded when building datasets and models. Furthermore, many widely used models in production are not exposed outside of APIs, making it difficult to characterize design biases directly.
Recent research by the University of Washington, Carnegie Mellon University, and Allen Institute for AI presents NLPositionality, a paradigm for describing the positionality and design biases of natural language processing (NLP) datasets and models. The researchers recruit a global community of volunteers from various cultural and linguistic backgrounds to annotate a dataset sample. Next, they measure biases in the design by contrasting different identities and contexts to see which ones are more in line with the original dataset labels or model predictions.
NLPositionality has three benefits over other methods (such as paid crowdsourcing or in-lab experiments):
- Compared to other crowdsourcing platforms and conventional laboratory studies, LabintheWild has a more diverse participant population.
- Instead of relying on monetary remuneration, this method relies on participants’ intrinsic urge to grow by expanding their self-awareness. Learning possibilities for participants are increased, and data quality is improved compared to paid crowdsourcing platforms. Thus, unlike one-time paid studies like those found in other research, this platform can freely collect new annotations and reflect more recent observations of design biases over extended periods.
- This method does not require any pre-existing labels or predictions to be applied post hoc to any dataset or model.
The researchers use NLPositionality on two examples of NLP tasks known to be biased in their design: social acceptability and hate speech detection. They look at task-specific and task-general large language models (i.e., GPT-4) and the associated datasets and supervised models. On average, 1,096 annotators from 87 countries have contributed 38 annotations per day for 16,299 annotations as of May 25, 2023. The team found that White, college-educated millennials from English-speaking countries—a subset of “WEIRD” (Western, Educated, Industrialized, Rich, Democratic) populations—are the best fit for the datasets and models they examine. The importance of collecting data and annotations from a wide range of sources is also highlighted by their observation that datasets display high levels of alignment with their original annotators. Their findings indicate the necessity of expanding NLP research to include more diverse models and datasets.
Check out the Paper and Github link. Don’t forget to join our 26k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com
Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone's life easy.