AI Agents Can Learn to Think While Acting: A New AI Research Introduces A Novel Imitation Learning Framework Called Thought Cloning

Language gives humans an extraordinary level of general intellect and sets them apart from all other creatures. Importantly, language not only helps people interact with others better, but it also improves our capacity to think. Before discussing the advantages of language-thinking agents, which have gotten far less attention, they first discuss the benefits of language-understanding agents (a frequent topic in AI). If their agents can grasp the language, several advantages result. This is essential for agents to generalize to new tasks that are required of them. 

This is because giving an agent a job description rather than letting the agent figure it out on their results in a far more efficient sample. In addition, language-capable agents allow us to create new tasks during testing without having to guess what requests users may later have for their trained agents. This contrasts with traditional hand-designed job descriptions, which might be extensive but still have limitations on what an agent can be asked to do. While the advantages of agents that can interpret language are frequently explored, the advantages of agents that think in language have received far less attention in AI, particularly in Reinforcement Learning (RL). 

Humans who think linguistically can better generalize, extrapolate, adapt to new circumstances, combine prior information in novel ways, explore, plan, replan when advantageous, and so on. Despite these advantages, AI beings seldom think—at least not in human language. Although internal vector activations in neural networks can be regarded as thinking, many theorize that there are particular advantages to believing in the discrete, symbolic form of language (such as the ability to combine ideas in an exponential number of ways), which suggests that lingual agents may learn more quickly, perform better, and generalize more effectively than non-lingual agents. Agents who think in their native language have significant advantages in AI Safety and Interpretability and being more competent. 

Suppose one can see an agent’s thought process during training. In that case, one can identify areas where abilities or values need to be improved or determine whether the agent still needs to be prepared for deployment. The agent’s thoughts may be continuously monitored throughout testing to stop any bad plans. One may act to prevent that behaviour in advance. For instance, if an agent thinks, “My goal is to take my passenger to the store as quickly as possible so I will run through this red light without stopping.” Additionally, observing how agents think makes them easier to direct. 

One can provide an agent with their thoughts to assist it in solving problems in the way they want to be solved if the agent is having trouble with difficult issues. Agents that understand human language also facilitate the development of more intelligent, secure AI systems. Instead of merely seeing something broken, one may identify why it is damaged, offering suggestions on how to fix the problem or enhance AI training. These arguments imply that mimicking human thought is the most practical approach to accomplish this aim and that giving AI entities the capacity to think in language might result in many important benefits. 

Thinking abilities are not something people learn independently; rather, they are partially taught through instructor comments and examples. Using demonstrations where actors think aloud while performing to instruct agents is a good approach. This method differs from others that employ pre-trained Large Language Models (LLMs) for planning since these LLMs need to be trained on data from real-world situations where people speak aloud while acting. 

Millions of hours of individuals talking aloud while executing activities are captured in thought data, which includes YouTube videos and transcripts. This type of data reveals the reasoning behind people’s actions, plans, decisions, and rearranging plans, such as when playing video games. This study aims to stimulate more research into using thought data to teach thinking abilities to agents. Though data is very useful and generally accessible (Section 2), it has not yet been thoroughly investigated. There are enormous benefits to be achieved from developing more potent AI, or perhaps AGI, if they can address the genuine, substantial concerns of AI Safety and existential danger. 

In this research, the authors from the University of British Columbia and Vector Institue suggest a unique Imitation Learning paradigm called Thought Cloning, in which agents not only learn how to act from human demonstrations, as in Behavioural Cloning but also learn how to think from demos where human actors think aloud as they perform. This work supports the idea of artificial thinking data in a difficult area, BabyAI, even though they anticipate thinking Cloning to truly shine when trained on massive web datasets of synchronized human thoughts and activities. Their research shows that Thought Cloning performs better than Behavioural Cloning, even when Behavioural Cloning agents can think (in latent vectors) but must learn that competence without the supervision of thought offered by Thought Cloning. 

Additionally, they show that in zero-shot and fine-tuning conditions, Thought Cloning generalizes better than Behavioural Cloning in out-of-distribution tasks. Finally, they offer empirical support for the benefits of thought cloning in terms of Safety and Interpretability, where harmful behavior may be almost precisely prevented before execution, which was previously stated. Overall, the findings are encouraging and provide a peek of thought cloning’s huge potential to improve AI’s intelligence and make it safer and easier to understand.

Check Out The PaperCode, and Project Page. Don’t forget to join our 23k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at

🚀 Check Out 100’s AI Tools in AI Tools Club

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...