Meet JoyTag: An Inclusive Image Tagging AI Model with Joyful Vision Model

With the latest advancements in Artificial Intelligence (AI), it is being used in all spheres of life. They are being used for various tasks. Machine vision models are a category of AI that can analyze visual information and make decisions based on that analysis. Machine vision models are used in several industries, including healthcare, security, automotive, entertainment, and social media. However, most publicly available models rely heavily on filtered training datasets, which limits their performance on various concepts. Moreover, they often need help understanding the world comprehensively due to strict censorship policies.

In this realm, we saw a very interesting post on Reddit that introduced a new model named JoyTag. JoyTag has emerged, designing tag images with a focus on gender positivity and inclusivity. This model is based on the ViT-B/16 architecture and has 448x448x3 input dimensions and 91 million parameters. The model is trained involving 660 million samples. JoyTag is superior to its counterparts due to its multi-label classification as its target task, 5000 unique tags, utilization of the Danbooru tagging schema, and extension of its application across various image types.

Check the demo here

JoyTag is trained on a combination of the Danbooru 2021 dataset and has manually tagged images to broaden its generalization beyond the anime/manga-centric focus of Danbooru. While the Danbooru dataset offers size, quality, and diversity, it is limited in content diversity, particularly in photographic images. To address this, the JoyTag team manually tagged a few pictures from the internet, emphasizing those photographs that are underrepresented in the primary dataset.

JoyTag is based on the ViT model with a CNN stem and GAP head. Further, the researcher emphasized that JoyTag’s design complies with major IT companies’ arbitrary wholesomeness standards, and the model achieves a mean F1 score of 0.578 across all tags, including pictures and anime/manga-styled images.

But JoyTag has some limitations. It faces challenges in concepts where data availability is scarce, such as facial expressions. Some subjective concepts, like the size of breasts, pose difficulties, as the Danbooru dataset’s tagging guidelines are not consistently followed. JoyTag’s ultimate purpose is to prioritize inclusivity and diversity while managing a wide variety of content with equal proficiency. The researchers highlight that to improve the F1 score and address particular deficiencies, there are plans to expand the dataset significantly in the continuous struggle against biases.

In conclusion, JoyTag represents a significant leap in image tagging. Its ability to overcome constrictive filtering and be inclusive is substantial. JoyTag opens new possibilities for automated image tagging, contributing to the evolution of machine learning models with a deeper and more inclusive understanding. Its capacity to autonomously anticipate more than 5000 distinct labels and manage large amounts of multimedia content without violating user rights also provides developers with strong tools they can utilize across a wide range of disciplines, which is a significant advancement. Overall, JoyTag provides a strong foundation upon which future improvements may build toward fully inclusive and equitable AI solutions.

๐Ÿ Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...