ArtEmis, described as the Affective Language for Visual Art, is a novel large-scale dataset and its accompanying ML models to provide a detailed understanding of the interplay between visual content, emotional effects it may have, and explanations of the latter in language. It is developed by researchers at Stanford University, Laboratoire d’Informatique de l’Ecole Polytechnique (LIX), and King Abdullah University of Science and Technology (KAUST).
In contrast to existing annotation datasets in computer vision, the researchers’ primary focus is on the Affective experience. Visual artworks trigger the affective experience. The annotators are asked to indicate the dominant emotion they feel for a given image and, crucially, to provide a grounded verbal explanation for their emotion choice. The above leads to a rich set of signals for both the factual content and the affective impact of an image, creating associations with abstract concepts or references beyond what is directly visible (including visual simile)references to personal experiences. The team focused on visual art like paintings or artistic photographs) because these are a clear example of imagery created to elicit emotional responses from its viewers.
The dataset contains 439K emotion attributions and explanations from humans, on 81K artworks from WikiArt. Building on this data, the researchers trained and demonstrated a series of captioning systems. These systems can express and explain emotions from visual stimuli. The captions obtained from these systems often succeed in reflecting the abstract content and semantic of the image.
Human cognition has a vital affective component that has been relatively undeveloped in AI systems. The language that explains emotions generated by visual stimulus gives us a direction to think on how the image content is related to the effect. The above enables learning that can lead to agents emulating the human emotional responses through data-driven approaches. The team takes the first step in this direction through:
(1) The ArtEmis dataset’s release focuses on linguistic explanations for affective responses triggered by visual artworks with great emotion-provoking content.
(2) Demonstrating a neural speaker capable of expressing emotions and providing a relevant explanation.
The ability to deal with the images’ emotional attributes opens an exciting new direction in human-computer communication and interaction.