Recall to Imagine (R2I): A New Machine Learning Approach that Enhances Long-Term Memory by Incorporating State Space Models into Model-based Reinforcement Learning (MBRL)

With the recent advancements in the field of Machine Learning (ML), Reinforcement Learning (RL), which is one of its branches, has become significantly popular. In RL, an agent picks up skills to interact with its surroundings by acting in a way that maximizes the sum of its rewards. 

The incorporation of world models into RL has emerged as a potent paradigm in recent years. Agents may observe, simulate, and plan within the learned dynamics with the help of the world models, which encapsulate the dynamics of the surrounding environment. Model-Based Reinforcement Learning (MBRL) has been made easier by this integration, in which an agent learns a world model from previous experiences in order to forecast the results of its actions and make wise judgments.

One of the major issues in the field of MBRL is managing long-term dependencies. These dependencies describe scenarios in which an agent must recollect distant observations in order to make judgments or situations in which there are significant temporal gaps between the agent’s actions and the results. The inability of current MBRL agents to perform well in tasks requiring temporal coherence is a result of their frequent struggles with these settings. 

To address these issues, a team of researchers has suggested a unique ‘Recall to Imagine’ (R2I) method to tackle this problem and enhance the agents’ capacity to manage long-term dependency. R2I incorporates a set of state space models (SSMs) into the MBRL agent world models. The goal of this integration is to improve the agents’ capacity for long-term memory as well as their capacity for credit assignment.

The team has proven the effectiveness of R2I by an extensive evaluation of a wide range of illustrative jobs. First, R2I has set a new benchmark for performance on demanding RL tasks like memory and credit assignment found in POPGym and BSuite environments. R2I has also demonstrated superhuman performance in the Memory Maze task, a challenging memory domain, demonstrating its capacity to manage challenging memory-related tasks. 

R2I has not only performed comparably in standard reinforcement learning tasks like those in the Atari and DeepMind Control (DMC) environments, but it also excelled in memory-intensive tasks. This implies that this approach is both generalizable to different RL scenarios and effective in specific memory domains.

The team has illustrated the effectiveness of R2I by showing that it converges more quickly in terms of wall time when compared to DreamerV3, the most advanced MBRL approach. Due to its rapid convergence, R2I is a viable solution for real-world applications where time efficiency is critical, and it can accomplish desirable outputs more efficiently. 

The team has summarized their primary contributions as follows: 

  1. DreamerV3 is the foundation for R2I, an improved MBRL agent with improved memory. A modified version of S4 has been used by R2I to manage temporal dependencies. It preserves the generality of DreamerV3 and offers up to 9 times faster calculation while using fixed world model hyperparameters across domains. 
  1. POPGym, BSuite, Memory Maze, and other memory-intensive domains have shown that R2I performs better than its competitors. R2I performs better than humans, especially in a Memory Maze, which is a difficult 3D environment that tests long-term memory.
  1. R2I’s performance has been evaluated in RL benchmarks such as DMC and Atari. The results highlighted R2I’s adaptability by showing that its improved memory capabilities do not degrade its performance in a variety of control tasks.
  1. In order to evaluate the effects of the design choices made for R2I, the team carried out ablation tests. This provided insight into the efficiency of the system’s architecture and individual parts.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit

Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.

[Announcing Gretel Navigator] Create, edit, and augment tabular data with the first compound AI system trusted by EY, Databricks, Google, and Microsoft