Google AI Introduces ‘GoEmotions’: An NLP Dataset for Fine-Grained Emotion Classification

The emotions one experiences daily can motivate them to act and influence the significant and minor decisions they make in their lives. Therefore, they greatly influence how people socialize and form connections. 

Communication helps us to express a vast range of delicate and complicated emotions with only a few words. With recent advancements in NLP, several datasets for language-based emotion categorization have been made accessible. The majority of them focus on specific genres (news headlines, movie subtitles, and even fairy tales) and the six primary emotions (anger, surprise, disgust, joy, fear, and sadness). There is, therefore, a need for a larger-scale dataset covering a greater range of emotions to allow for a broader range of possible future applications.

A recent Google study introduces GoEmotions: a human-annotated dataset of fine-grained emotions with 58k Reddit comments taken from major English-language subreddits and 27 emotion categories identified. It has 12 positive, 11 negatives, 4 ambiguous emotion categories, and 1 “neutral” emotion category, making it broadly useful for conversation interpretation tasks that demand delicate discrimination between emotion displays. They also demonstrate a full tutorial that shows how to use GoEmotions to train a neural model architecture and apply it to recommending emojis based on conversational text.

Dataset Creation

Their goal was to compile a huge dataset focusing on conversational data, in which emotion plays a vital role in communication. The Reddit platform is a significant resource for emotion research because it provides a vast, publicly available volume of content that includes direct user-to-user dialogue. The researchers collected the Reddit comments sourced from subreddits with at least 10,000 comments, removing deleted and non-English comments.

They used data curation procedures to ensure that the dataset did not promote general or emotion-specific linguistic biases, allowing them to create broadly representative emotion models. This was especially significant because Reddit has a well-documented demographic tilt toward young male users, not representative of the world’s population. The platform also promotes the use of toxic and inflammatory words.

They recognized negative remarks using specified criteria for offensive/adult and vulgar content, as well as identification and religion. They employed them to filter and mask data to address the above concerns. They also filtered the data to remove profanity, limit text length, and balance the emotions and opinions conveyed. They also balanced the data among subreddit communities to avoid over-representing prominent subreddits and guarantee that the comments also reflect less active subreddits.

They focused on three goals while constructing the taxonomy: 

  • Give the most comprehensive coverage of emotions expressed in Reddit data
  • Provide the most comprehensive coverage of types of emotional expressions
  • Restrict the overall number of emotions and their overlap. 

A taxonomy like this enables data-driven fine-grained emotion analysis while also resolving data scarcity for specific emotions.

The emotion label categories were defined and refined through an iterative approach to creating the taxonomy. They considered a total of 56 emotion categories during the data labeling steps. They found and eliminated emotions that were seldom chosen by raters, had low interrater agreement due to resemblance to other emotions, or were difficult to discern from text from this sample. Emotions that were commonly proposed by raters and well reflected in the data were also added. Finally, they revised emotion category names to improve interpretability. This resulted in a strong interrater agreement, with at least two raters agreeing on at least one emotion label in 94 percent of samples.

They employ principle preserved component analysis (PPCA) to ensure that the taxonomy choices reflect the underlying data. This assisted them in identifying emotional aspects with high agreement among raters. 


Each component is shown to be significant, demonstrating that each emotion captures a distinct element of the data. Based on correlations among rater judgments, they investigate the clustering of the defined emotions. When two emotions are often co-selected by raters, they will cluster together using this method. Despite the fact that the taxonomy has no specified definition of sentiment, they discover that emotions that are connected in terms of their sentiment (negative, positive, and ambiguous) cluster together, demonstrating the quality and consistency of the evaluations. 


Similarly, emotions that are similar in intensity, such as joy and excitement, uneasiness and fear, sadness and grief, annoyance and rage, are linked.


While GoEmotions has a vast collection of human-annotated emotion data, emotion datasets employ heuristics to automatically categorize weak emotions. The most popular heuristic uses emotion-related Twitter tags as emotion categories, making it easy to build big datasets cheaply. However, this method is limited because of several reasons.

Emojis have fewer inconsistencies than Twitter tags since they are more standardized and minimalist. They suggest a more accessible heuristic in which emojis contained in user conversations serve as a surrogate for emotion categories. This approach can be used on any language corpus with a reasonable number of emojis, including many conversational corpora.

This data can be helpful in developing expressive conversational agents as well as generating contextual emojis, and it’s a promising field for future research.




🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...