Various semantic information about the world can be encoded by large language models (LLM). Nevertheless, they can often generate responses that, while logically sound, would not be helpful for controlling a robot. The lack of contextual grounding in language models is a severe disadvantage. A language model might provide a fair narrative in response to a user’s request for instructions on cleaning up a spill, but a robot performing this task in a specific context might not find it helpful. As a result, it is challenging to use them in real-world contexts for decision-making.
In a recent work titled “Do As I Can, Not As I Say: Grounding Language in Robotic Affordances,” researchers from Google’s Robotics team offered an innovative strategy. This paper presents SayCan, a robot control approach that plans a series of robotic operations to accomplish a user-specified goal using an LLM. The method employs prompt engineering to translate the user’s input—in this case, a request for assistance in cleaning up milk—into a dialogue asking the robot to deliver the user a sponge. According to experimental analyses, SayCan generated the right action sequence 84% of the time.
The approach’s premise is that the language model may provide high-level semantic information about the activity, while the robot can serve as its “hands and eyes.” The work of the researchers shows how low-level tasks can be combined with LLMs in a way that the language model provides high-level knowledge about the methods for carrying out complex and temporally extended instructions. In contrast, value functions associated with these tasks provide the grounding required to connect this knowledge to a specific physical environment. The method was tested on various robotic jobs, demonstrating its viability for executing long-horizon, abstract, natural language commands on a mobile manipulator.
The raw user input was preceded by a chain of thought prompt that included 17 example inputs and their corresponding plans to enhance the LLM’s capacity to plan a series of activities in SayCan. The text description of a skill (the probability that skill is useful for the instruction) and its value function output (the probability of successfully executing said skill) can be used to choose the best action in the plan sequence. This is possible because LLM outputs a probability distribution over text tokens for the following item in a sequence.
A robot from Everyday Robots, who collaborated with Google on this project, was given a list of 101 commands to follow, ranging from “bring me a fruit” to “I spilled my coke on the table, throw it away and bring me something to clean,” in order to test SayCan. PaLM and FLAN are just two of the LLMs that Google integrated SayCan with. With a planning success rate of 84% and an execution success rate of 74%, PaLM-SayCan outperformed FLAN-SayCan, which had success rates of 70% and 61%, respectively. The team noticed that PaLM-SayCan had trouble with instructions containing a negative, but they also noted that this is a typical problem with LLMs in general.
The impressive development made by PaLM-SayCan opens up new study horizons. This study explains how a model can be used to solve reasoning problems by utilizing chain of thought reasoning and how new skills can be included in the system. Additionally, it demonstrates that the system can handle multilingual inquiries even if it was not intended. The researchers also think that PaLM-interpretability SayCan’s enables secure user interactions with robots in the real world.
The researchers want to understand further how data from the robot’s real-world experience could be used to enhance the language model and to what extent natural language is the appropriate ontology for programming robots as they explore future paths for this work. In order to give academics a helpful tool for upcoming research that blends robotic learning with sophisticated language models, Google Research has also open-sourced a robot simulation setup. An open-source desktop version of SayCan is now available on GitHub.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Do As I Can, Not As I Say: Grounding Language in Robotic Affordances'. All Credit For This Research Goes To Researchers on This Project. Check out the paper, project, github link and reference article. Please Don't Forget To Join Our ML Subreddit
Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Goa. She is passionate about the fields of Machine Learning, Natural Language Processing and Web Development. She enjoys learning more about the technical field by participating in several challenges.