Facebook AI Introduces DrQ-v2, A Model-Free Reinforcement Learning Algorithm For Visual Continuous Control


One challenge in the field of reinforcement learning (RL) is that high-dimensional observations are difficult to control. The last three years have seen a major breakthrough with many new methods being developed for improved sample efficiency and better low dimensional representations. Methods such as autoencoders, variational inference, contrastive learning, self prediction or data augmentations all offer hope for overcoming this obstacle in RL research.

However, current take on model-free methods are still limited in three ways. First they can’t solve the more challenging visual control problems such as quadruped and humanoid locomotion. Second these often require significant computational resources, i.e lengthy training times using distributed multi-gpu infrastructure (in other words a lot of work). Lastly it’s unclear how different design choices affect overall system performance so you never really know what kind of outcome to expect.

Facebook AI Research unveiled DrQ-v2, a simple model-free algorithm that builds on the idea of using data augmentation to solve hard visual control problems. The technology is the first model free method and has had significant improvements in sample efficiency across tasks from DeepMind Control Suite. It’s also computationally efficient which allows solving most tasks in DeepMind Control Suite within 8 hours with just one GPU powering it all!


Recently, a model-based method called DreamerV2 was shown to solve visual continuous control problems and it also solved the humanoid locomotion problem from pixels. While our DrQ-v2 matches DreamerV2 in terms of sample efficiency and performance, we do so four times faster than their counterparts when considering wall clock time for training purposes. We believe this makes DrQ-v2 more accessible approach for research that focuses on these types of tasks while reinforcing the question if model free or model based is going to be better suited towards solving them.

DrQ-v2 is a new model-free off policy algorithm that builds upon DrQ, an actor critic approach. The improvements in this version of the software include:

  • Switch the base RL learner from SAC to DDPG.
  • Incorporate n-step returns to estimate TD error.
  • Introduce a decaying schedule for exploration noise.
  • Make implementation 3.5 times faster.
  • Find better hyper-parameters

More Details in the Paper: https://arxiv.org/pdf/2107.09645.pdf

PyTorch implementation of DrQ-v2 (Github): https://github.com/facebookresearch/drqv2

Asif Razzaqhttp://www.marktechpost.com
Asif Razzaq is an AI Journalist and Cofounder of Marktechpost, LLC. He is a visionary, entrepreneur and engineer who aspires to use the power of Artificial Intelligence for good. Asif's latest venture is the development of an Artificial Intelligence Media Platform (Marktechpost) that will revolutionize how people can find relevant news related to Artificial Intelligence, Data Science and Machine Learning. Asif was featured by Onalytica in it’s ‘Who’s Who in AI? (Influential Voices & Brands)’ as one of the 'Influential Journalists in AI' (https://onalytica.com/wp-content/uploads/2021/09/Whos-Who-In-AI.pdf). His interview was also featured by Onalytica (https://onalytica.com/blog/posts/interview-with-asif-razzaq/).

Share post:


More like this

Recent Research on Manifolds in Commonly Used Atomic Fingerprints and Failure to Machine Learning Four-Body Interactions

Atomic fingerprints are often employed in machine learning situations...

A Neural Network for Solving and Generating University Level Mathematics Problems Using Program Synthesis

The AI research community widely believed that modern deep...

Increased Data Security Using ‘EzPC’ In The Machine Learning Model Validation Process

Artificial intelligence (AI) has revolutionized various industries in the...
Join the AI conversation and receive daily AI updates