Apple Researchers Propose Large Language Model Reinforcement Learning Policy (LLaRP): An AI Approach Using Which LLMs Can Be Tailored To Act As Generalizable Policies For Embodied Visual Tasks

Natural language Processing, understanding and generation have entered a new phase with the introduction of Large Language Models (LLMs). Models like GPT-3 have unparalleled language recognition abilities because they have been trained on enormous volumes of textual material. Their usefulness goes far beyond language-related activities as they have proven to be exceptionally skilled in a number of areas, such as embodied thinking, reasoning, visual comprehension, dialogue systems, code development, and even robot control. 

The fact that many of these abilities appear without the requirement for specialized training data is very intriguing because it shows how broad and generic these models’ understanding is. LLMs’ have the ability to handle tasks involving inputs and outputs that are not easily articulated in language. They are also able to provide robot commands as outputs or comprehend images as inputs. 

In Embodied AI, the goal is to develop agents that can make judgements that are transferable to other tasks and are generalizable. Static datasets, which demand large and costly quantities of different expert data, have historically been the main source of advancement in the use of LLMs for Embodied AI. As an alternative, agents can learn in virtual settings through interaction, exploration, and reward feedback with the help of embodied AI simulators. However, such agents’ generalization abilities frequently fall short of what has been shown in other domains.

In recent research, a team of researchers has proposed a new approach called Large Language Model Reinforcement Learning Policy (LLaRP), using which LLMs can be tailored to act as generalizable policies for embodied visual tasks. Using a pre-trained, fixed LLM, this approach processes text commands and visual egocentric observations to generate actions in real time inside an environment. LLaRP has been trained to sense its environment and behave only through encounters with it through reinforcement learning.

The primary findings of the research shared by the team are as follows.

  1. Robustness to Complex Paraphrasing: LLaRP demonstrates exceptional resilience to intricately worded re-phrasements of task instructions. This means that, while maintaining the intended behaviour, it may comprehend and carry out instructions that are given in a variety of ways. It is able to adjust to new linguistic phrasing for the same task.
  1. Generalization to New Tasks: One notable aspect of LLaRP is its ability to generalize. It is capable of taking on new duties that call for completely original and ideal behaviours. Itt demonstrates its variety and adaptability by adjusting to tasks it has never experienced during training.
  1. Remarkable Success Rate: LLaRP has demonstrated an astounding 42% success rate on a set of 1,000 unseen tasks. Compared to other widely used learning baselines or zero-shot LLM applications, this success rate is 1.7 times greater. This illustrates the LLaRP approach’s better performance and generalization ability.
  1. Benchmark Release: To enhance the research community’s understanding of language-conditioned, massively multi-task, embodied AI challenges, the research team has published a new benchmark named ‘Language Rearrangement.’ A sizable dataset with 150,000 training and 1,000 testing tasks for language-conditioned rearrangement is included in this benchmark. It’s a great tool for researchers who want to learn more about and develop this branch of AI.

To sum up, LLaRP is definitely an incredible approach that adapts pre-trained LLMs for embodied visual tasks and performs exceptionally well overall, robustly, and in terms of generalization.


Check out the Paper and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.

Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...