AI2’s PRIOR Team Introduces Unified-IO: The First Neural Model To Execute Various AI Tasks Spanning Classical Computer Vision, Image Synthesis, Vision-and-Language, and Natural Language Processing NLP

Almost all industries are now using machine learning systems to improve the efficiency and dependability of their work.  With the increasing use of ML, companies have seen a boom in the investments in the resources needed to support ML systems. Additionally, a single ML process necessitates the execution of numerous distinct models, further complicating the process and increasing costs.

The idea of “Unified Models” was established in recent years, where a single model is constructed to power a process or product rather than a collection of connected but independent models. Combining all of the necessary data into one array and passing it to the model makes it possible to create a unified model that delivers all of the findings at once rather than by calling individual models one at a time.

The intricacy of dense data, such as photos, and the distinctive methods employed for sequential data must both be handled by successful unified models. A large portion of the fascinating recent development in NLP has been based on transformer models. Transformer architectures are sequence-to-sequence designs; they typically accept inputs of word or token sequences and outputs of those same sequences. Large, potent transformer models can be used by researchers to successfully complete a wide range of NLP tasks because the majority of NLP tasks can be represented as sequences of language tokens.

In contrast, the input and output representations for tasks in computer vision are quite varied. For instance, an image segmentation task generates binary masks outlining regions, while an object identification task generates bounding boxes around objects in an image. In some tasks, such as visual question answering, which accepts an image and text as input and outputs an answer as text, there are even combinations of picture and language inputs. It is extremely difficult to design a single, all-encompassing model for these tasks due to the variability of the inputs and outputs.

A recent work BY AI2’s PRIOR introduces Unified-IO, the first neural model to execute a broad range of AI tasks, including traditional computer vision, image synthesis, vision-and-language, and natural language processing. The quest for a single, unified general-purpose system capable of parsing and creating visual, linguistic, and other structured data marks a key milestone with Unified-IO.

The model compresses the input and output of every activity into sequential data to accomplish extensive data unification. Using a universal compressor, Unified-IO transforms dense inputs like pictures, masks, and depth maps into sequences. Additionally, it can translate sparsely organized data into naturally sequential language, such as bounding boxes and human joint locations. This kind of data was tokenized by Unified-IO using byte pair encoding, a common NLP method for supplying data to neural networks.


According to the team, a single Unified-IO model can be trained to execute tasks across more than 80 different computer vision and NLP benchmarks by combining input and output data. 

The common representation that Unified-IO generates for a wide range of output kinds is what distinguishes it from other systems. Unified-IO is the first model to successfully complete all seven tasks on the recently formed GRIT benchmark for computer vision. The shared representation allows the team to simultaneously train Unified-IO on more computer vision and NLP tasks than was previously feasible.

Unified-IO significantly outperforms other general-purpose models, such as GPV-1, GPV-2, VL-T5, and Gato, which support either fewer tasks or ones that call for the model to produce language or sequential outputs, like button presses.

This Article is written as a summary article by Marktechpost Staff based on the research article 'Introducing AI2’s Unified-IO'. All Credit For This Research Goes To Researchers on This Project. Checkout the demo and reference article.

Please Don't Forget To Join Our ML Subreddit

Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.