Large Language Models are getting better with every new development in the Artificial Intelligence industry. With each modification and version, LLMs are becoming more capable of catering to different requirements in applications and scenarios. Recently released ChatGPT, developed by OpenAI, which works on the GPT transformer architecture, is one of the most popular LLMs. With the newest GPT-4 architecture, ChatGPT now even works well with multimodal data.
The goal of AI has always been to develop models and techniques which help automate repetitive tasks and solve complex problems by imitating humans. Though LLMs successfully manipulate text when performing computer tasks by taking keyboard and mouse actions, they face some challenges. These challenges include ensuring that the generated actions are appropriate for the given task, feasible in the agent’s current state, and executable. These three challenges are known as task grounding, state grounding, and agent grounding.
A new study has introduced an approach called Recursive Criticism and Improvement (RCI), which uses a pre-trained LLM agent to execute computer tasks guided by natural language. RCI uses a prompting scheme that prompts the LLM to generate an output. This is followed by identifying the problems with the output and thus generating an updated output.
RCI improves all three challenges of previous approaches, i.e., task grounding, state grounding, and agent grounding, resulting in better performance in executing computer tasks. For computer tasks, RCI prompting is applied in three stages. First, the LLM generates a high-level plan, then it generates an action based on the plan and the current state, and finally, it formats the action into the right keyboard or mouse action.
Task grounding basically involves producing a high-level plan based on the task text to ensure that the actions taken by the agent are appropriate for the given task. On the other hand, state grounding connects the high-level concepts derived from the task grounding step with the actual HTML elements present in the agent’s current state, thus ensuring that the actions produced by the agent are feasible in the current state. Finally, agent grounding ensures that the actions generated by the agent are executable and in the correct format.
This new approach can be used in ChatGPT for solving general computer tasks using a keyboard and mouse without the need for plugins. In RCI prompting, the LLM first identifies problems with the original answer, and based on those problems, it improvises on the answer. A unique feature of this approach is that it only requires a few demonstrations per task, unlike existing methods that require thousands of demonstrations per task.
The RCI approach outperforms existing LLM methods for automating computer tasks and surpasses supervised learning and reinforcement learning methods on the MiniWoB++ benchmark. On comparing RCI to Chain-of-Thought (CoT) prompting, which is a recognized method for its effectiveness in reasoning tasks, the researchers discovered a great collaborative impact between RCI prompting and the two CoT baselines. In conclusion, Recursive Criticism and Improvement (RCI) seems promising for solving complex computer tasks and reasoning problems with LLMs.
Check out the Paper, Github, and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 18k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.