With the constant advancements in technology, Artificial Intelligence is successfully enabling computers to think and learn in a manner comparable to that of humans by imitating human brainpower. Recent advances in Artificial intelligence, Machine Learning (ML), and Deep Learning have helped improve multiple fields, including healthcare, finance, education, and whatnot. Large Language Models, which have recently gathered a lot of attention due to their incredible potential, have shown great human-imitating skills. From question answering and text summarization to code generation and code completion, these models excel at every task.
LLMs are finetuned using the concept of a Machine Learning paradigm called Reinforcement Learning. In Reinforcement Learning, an agent picks up decision-making skills through interacting with their surroundings. It seeks to maximize a cumulative reward signal over time by acting in the environment. Model-based reinforcement learning (RL) has advanced recently and has shown promise in a variety of settings, especially ones that call for planning. However, these successes have been limited to fully-observed and deterministic situations.
In recent research, a team of researchers from DeepMind has proposed a new strategy for planning using Vector Quantized models. This approach is meant to solve problems in environments that are stochastic and partially observable. This method includes encoding future observations into discrete latent variables using a state VQVAE (Vector Quantized Variational Autoencoders) and transition model. This makes it relevant to stochastic or partially-observed contexts, enabling planning over future observations as well as future actions.
The team has shared that discrete autoencoders have been used in this approach in order to capture the various possible outcomes of an action in a stochastic setting. Neural network designs known as autoencoders take input data, encode it into a latent representation, and then decode it back to the original form. The depiction of several alternative outcomes arising from an agent’s behavior in a stochastic context has been made possible by the use of discrete autoencoders.
The team has used a stochastic version of Monte Carlo tree search to make planning easier in these kinds of contexts. One popular approach for making decisions in planning and decision-making processes is Monte Carlo tree search. In this case, the stochastic variant permits taking environmental uncertainty into account. Discrete latent variables that indicate the possible responses of the environment have been included in the planning process in addition to the actions of the agent. This all-encompassing method seeks to capture the complexity brought about by partial observability as well as stochasticity.
The team has evaluated the approach, which has demonstrated that it beats an offline variant of MuZero, a well-known RL system, in a stochastic interpretation of chess. According to this perspective, the adversary introduces uncertainty into the system and is viewed as an essential component of the surroundings. The suggested approach’s scalability has been proven by DeepMind Lab’s effective implementation of it. The favorable outcomes observed in this scenario have demonstrated the approach’s flexibility and efficacy in managing intricate and dynamic contexts beyond conventional board games.
In conclusion, this model-based reinforcement learning technique expands on the effectiveness of fully observed, deterministic environments in partially observable, stochastic settings. Discrete autoencoders and a stochastic Monte Carlo tree search version show a sophisticated grasp of the difficulties presented by uncertain environments, which improves performance in practical applications.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.