Google AI Introduces Menger, A Massive Large-Scale Distributed Reinforcement Learning (RL) Infrastructure

Reinforcement learning (RL) is a significant area of machine learning, with the potential to solve a lot of real world problems in various fields, like game theorycontrol theoryoperations researchinformation theorysimulation-based optimizationmulti-agent systemsswarm intelligence, and statistics. Reinforcement learning (RL) infrastructure is a loop system of data collection and training using actors as data sample collectors, and learners to train and update the model.  

Reinforcement learning (RL) techniques require many iterations over batches of millions of samples from the environment to learn a target task. For example, games like Dota 2 learn from batches of 2 million frames every 2 seconds. So, a Reinforcement Learning (RL) infrastructure not only demands an efficient increase in the number of actors and samples but also needs smooth and swift iterations over a massive number of samples during the training process.

Overview of a Reinforcement Learning (RL) system in which an actor sends trajectories (e.g., multiple samples) to a learner. The learner trains a model using the sampled data and pushes the updated model back to the actor (e.g., TF-AgentsIMPALA).

Thus, to serve this demand, Google AI introduces Menger, which is a massive large-scale distributed RL infrastructure. This reduces the overall training time by up to 8.6x compared to a baseline implementation, thus increasing the efficiency of iterations. Menger is implemented using Google TPU accelerators for fast training iterations. Menger uses local inference (rather than a centralized one) but pushes actors’ scalability to virtually an unbounded limit. This plays a major role in dealing with the complicated task of chip placement

Menger is expected not only to further progress in the chip design process but also in other sophisticated real-world tasks as well.

Source: https://ai.googleblog.com/2020/10/massively-large-scale-distributed.html

Shilpi is a Contributor to Marktechpost.com. She is currently pursuing her third year of B.Tech in computer science and engineering from IIT Bhubaneswar. She has a keen interest in exploring latest technologies. She likes to write about different domains and learn about their real life applications.

🚀 The end of project management by humans (Sponsored)