Reinforcement learning (RL) is a significant area of machine learning, with the potential to solve a lot of real world problems in various fields, like game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, and statistics. Reinforcement learning (RL) infrastructure is a loop system of data collection and training using actors as data sample collectors, and learners to train and update the model.
Reinforcement learning (RL) techniques require many iterations over batches of millions of samples from the environment to learn a target task. For example, games like Dota 2 learn from batches of 2 million frames every 2 seconds. So, a Reinforcement Learning (RL) infrastructure not only demands an efficient increase in the number of actors and samples but also needs smooth and swift iterations over a massive number of samples during the training process.
Overview of a Reinforcement Learning (RL) system in which an actor sends trajectories (e.g., multiple samples) to a learner. The learner trains a model using the sampled data and pushes the updated model back to the actor (e.g., TF-Agents, IMPALA).
Thus, to serve this demand, Google AI introduces Menger, which is a massive large-scale distributed RL infrastructure. This reduces the overall training time by up to 8.6x compared to a baseline implementation, thus increasing the efficiency of iterations. Menger is implemented using Google TPU accelerators for fast training iterations. Menger uses local inference (rather than a centralized one) but pushes actors’ scalability to virtually an unbounded limit. This plays a major role in dealing with the complicated task of chip placement.
Menger is expected not only to further progress in the chip design process but also in other sophisticated real-world tasks as well.