Word embeddings can capture semantic and syntactic similarities between words meaningfully. Word2Vec, GloVe, and FastText are popular embeddings. Despite the growing popularity of contextual word embeddings such as BERT embeddings and ELMo, current research continues to use static word embeddings as input to its cutting-edge algorithms in downstream natural language processing and computer vision applications. Despite its efficiency, biases in word embeddings show negative associations between some ideas. The researchers initially discovered that the distance between man and woman is comparable to that between programmer and housewife. Similar phenomena in word embeddings lead to biassed readings in the word analog task, with specific terms associated with gender, racial, and religious prejudices. If used in downstream tasks, biased word embeddings would generate allocational and representational damages.
It is critical to learn bias-free word embeddings. However, dictionary definitions are a neutral source for reducing biases in word embeddings. A dictionary’s objective, impartial, and brief definitions of terms might serve as unbiased reference points. They suggest encouraging word embeddings to be comparable to their relatively neutral representations in a dictionary for bias reduction. Concurrently train and debiasing word embeddings from a fresh starting point to learn distributional models while mitigating biases using dictionary definitions. Furthermore, some gender-debiasing algorithms rely on a pre-compilation list of seed words to approximate the gender direction, along which the vector component is deleted for bias mitigation.
Dictionary definition Contributions from us They present DD-GloVe, a train-time debiasing approach for learning bias-reduced GloVe word embeddings, by leveraging the benefits of definitions. They discovered that, given one pair of beginning seed words, dictionary meanings could aid in the automated search for appropriate seed words. As a result, seed word compilation becomes automatic. They also discovered that the less human effort there is. In the word embedding space, artificially created seed words better encapsulate the concept of gender.
In summary, they contribute the following:
- They propose four dictionary-guided loss functions that promote word embeddings to contain less biased information and deeper semantic understanding by referring to their relatively neutral dictionary definition representations.
- Given only one pair of initial seed words, DD-GloVe automatically approximates the bias direction. This approach identifies the most attribute-specific definitions by projecting the definition embeddings onto the difference of the definition embeddings of the original seed words. They average the embeddings of the most attribute-specific terms to approximate the bias direction.
- They empirically show that DD-GloVe successfully learns bias-reduced word embeddings by achieving state-of-the-art WEAT outcomes. Furthermore, their tests indicate that debiasing may be accomplished without affecting the semantic meaning.
The code for DD-GloVe, a train-time debiasing approach for learning GloVe word embeddings using dictionary definitions, is publicly available on GitHub.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Learning Bias-reduced Word Embeddings Using Dictionary Definitions'. All Credit For This Research Goes To Researchers on This Project. Check out the paper, github link and reference article. Please Don't Forget To Join Our ML Subreddit
Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.