DeepMind Introduces MuZero That Achieves Superhuman Performance In Tasks Without Learning Their Underlying Dynamics

Previously, DeepMind has used reinforcement learning to teach programs to master various games such as the Chinese board game ‘Go,’ the Japanese strategy game ‘Shogi,’ chess, and challenging Atari video games, where earlier AI programs were taught the rules first during training.

DeepMind has introduced MuZero, an algorithm that (by combining a tree-based search with a learned model) achieves superhuman performance in several challenging and visually complex domains, without knowing their underlying dynamics. MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning.

The team has relied on a principle called “look-ahead search.” With that approach, MuZero estimates many potential moves based on the opponent’s response. While there are many actions possible in complicated games like chess, MuZero prioritizes the most relevant and suitable moves, understands successful tactics, and averts unsuccessful ones. It could even beat the earlier programs without first knowing the rules.

MuZero can start from nothing and, through trial and error, it can discover the world’s rules and use them to achieve superhuman performance. For the first time, a system can understand how the world works and understand look-ahead planning that we have earlier seen for games like chess. MuZero has shown remarkable performance against Atari’s Ms. Pac-Man, although it was restricted to considering only six to seven possible future moves.

Progress is being made towards the more significant application of MuZero, like video compression, and they have achieved a 5% improvement in compression to date. It is considered a challenging task considering the vast number of varying video formats and numerous compression modes. Researchers are also working on robotics programming and protein architecture design for personalized drug production.

Professor Wendy Hall at the University of Southampton (also a member of England’s AI council) believes that while the team is continuously striving to improve their algorithms’ performance and apply the results for society’s benefit, the potential unintended consequences of their work is worrisome.

The U.S. Air Force reported the use of early research papers covering MuZero (made public last year) to design an AI system that could launch missiles from a U-2 spy plane against specified targets. The team strictly opposes AI in creating lethal weapons. Therefore, DeepMind has signed the Lethal Autonomous Weapons Pledge, which asserts that deadly technology should always remain human-controlled, not AI-based algorithms.

The team recognizes that several challenges lie ahead while implementing algorithms as practical and powerful as the human brain. They believe the first step is to understand the meaning of achieving intelligence. The world does not provide a rulebook; therefore, it is essential to enrich what an AI can do to build an AI that plans and looks forward to problems where no one gives us the rulebook.

Paper: https://www.nature.com/articles/s41586-020-03051-4

Full Paper: https://arxiv.org/pdf/1911.08265.pdf

Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.

🐝 [FREE AI WEBINAR] 'Beginners Guide to LangChain: Chat with Your Multi-Model Data' Dec 11, 2023 10 am PST