This AI Framework Called Read and Reward Speeds up Reinforcement Learning Algorithms on Atari Games by Reading Manuals Released by the Atari Game Developers

Recent technological breakthroughs have tremendously enhanced the performance of artificial intelligence (AI) agents and models. One such technique for creating AI models that are capable of solving diverse  problems is reinforcement learning (RL). Reinforcement learning is a domain of machine learning in which agents aim to take actions such that the cumulative reward is maximized. In other terms, RL works on a reward function basis and is responsible for the major breakthroughs in game-playing AI, such as DeepMind’s unbeatable Go-playing AI, AlphaGo. Despite the remarkable performance of AI agents using RL, they rely on trial and error to find an effective strategy. This suggests that an algorithm can spend several years blundering through the search space until it hits on a winning formula. Such scenarios limit the application of reinforcement learning to real-world situations. Moreover, the performance enhancements observed in AI agents often come at the cost of time, computational resources, and large amounts of data required to train these models. 

Current AI models are quite inefficient compared to humans, who can instantaneously learn things by interaction and demonstration and reading text documents such as instruction manuals. This observation sparked an idea among a team of researchers from Carnegie Mellon University (CMU) to drastically enhance the speed of AI agents by getting them to read instruction manuals before attempting a challenge. Their approach consists of a Read and Reward framework that was used to train an AI agent to play the video game Atari. The AI agent was trained almost 6000 times faster than a leading state-of-the-art model developed by DeepMind by reading the instructions.

Instruction manuals can be extremely instrumental in understanding valuable features and policies in a task-specific environment and informing the user about any reward systems. This served as the impetus for CMU researchers to focus on teaching AI agents how to learn policies for specific activities with the use of human-written manuals in order to enhance their performance and increase their efficiency. Also, due to their controlled environment and the fact that they feature a scoring system that can be utilized as the reward system in reinforcement learning algorithms, Atari video games have long been a well-liked benchmark for research in reinforcement learning. By combining these observations, the CMU researchers introduced the Read and Reward framework that speeds up RL algorithms on Atari games by reading manuals released by the Atari game developers. 

The framework primarily consists of two modules, the first of which is a QA Extraction module used to extract and summarise important information from the official instruction manual for the game. The second module, the Reasoning module, receives the data after it has been successfully extracted from the first module. This module is a pre-trained language model with capabilities and a size comparable to GPT-3 and assesses object-agent interactions based on queries made using manual data. The reinforcement algorithm then uses these responses to provide rewards beyond the game’s inherent scoring structure. These additional rewards enhance the capabilities of the reinforcement learning algorithm by helping it learn the game faster.

The researchers utilized Skiing 6000, one of the most challenging Atari games for AI to master, to assess their strategy. In contrast to the previous state-of-the-art Agent 57, which needed 80 billion frames to perform as well as a human, this new method just needed 13 million frames to get the hang of the game. However, it could only manage to score roughly half as well as the top method. Nonetheless, even while the novel approach falls short of an average person in performance, it is still vastly superior to a number of other top reinforcement learning approaches that were completely unable to grasp the game’s concepts.

Researchers from CMU stated that their study is the first of its kind to show that a completely automated reinforcement learning framework can benefit from the instruction manuals of a well-known game. The team has already started doing trials on other 3D games like Minecraft, where they have seen some encouraging results. They hope their approach can be extended to more complicated situations in future work. The research team fervently hopes that the AI community will view their work as a significant step forward in enhancing the effectiveness of reinforcement learning-based AI agents.

Check out the Paper and Reference Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 15k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Goa. She is passionate about the fields of Machine Learning, Natural Language Processing and Web Development. She enjoys learning more about the technical field by participating in several challenges.

🚀 LLMWare Launches SLIMs: Small Specialized Function-Calling Models for Multi-Step Automation [Check out all the models]