MIT Researchers Introduce A Machine Learning Framework That Allows Cooperative Or Competitive AI Agents To Find An Optimal Long-Term Solution

Reinforcement learning is a machine learning method in which an artificial agent learns from its mistakes. The agent receives a reward from the researchers when its “positive” actions lead to the desired outcome. An expert-level performance is achieved when the agent modifies actions to maximize a reward.

“Multiagent reinforcement learning” involves multiple agents working together to solve a problem. Things grow more complicated when several cooperative or competitive agents are simultaneously learning. When agents take into account the future actions of their peers and the effects of their actions on those of their peers, the problem quickly becomes too computationally intensive to solve efficiently. This is why alternative methods ignore the long-term in favor of quick fixes.

In a world where AI agents are all learning at once, it’s difficult to teach one agent to predict the actions of another. The major issue of multiagent reinforcement learning is learning useful policies in the presence of other agents that are also learning and whose changing behaviors jointly alter the transition and reward dynamics of the environment.

Current methods to this problem tend to be short-sighted due to its complexity; agents can only anticipate the actions of their teammates or competitors a few steps ahead of time, which results in subpar performance throughout a game.

AI agents now have the benefit of foresight thanks to a novel method developed by researchers at MIT, the MIT-IBM Watson AI Lab, and other institutions. Their machine-learning framework allows AI entities, whether cooperative or competitive, to think about the actions of one another across infinitely many steps in the future. As a result, the agents can make the necessary adjustments to their actions to shape the future actions of other agents and find the best possible solution.

A swarm of autonomous drones may utilize this architecture to locate a missing hiker in a dense forest or self-driving cars to keep their passengers safe while navigating a congested roadway.

Many short-term actions have little bearing on the outcome. Whether AI agents are working together or against one another, it is important that their actions converge in the future. This convergent behavior was the team’s primary interest, and they developed a mathematical method to achieve it.

Since infinite can’t be entered into an algorithm, the researchers devised their system such that agents anticipate a time in the future when their actions will average out to those of other agents, a state called equilibrium. If all agents influence one another, the system approaches an “active equilibrium,” a term used by the study’s authors. In a multiagent system, the long-term performance of agents is determined by a particular equilibrium point, and there might be more than one such point. Therefore, a powerful agent actively shapes the actions of other agents in the future so that they settle into an equilibrium that is optimal from the agent’s point of view.

They devised a machine-learning framework called FURTHER (which stands for FUlly Reinforcing acTive influence witH averagE Reward) that teaches agents to adjust their actions in response to the actions of other agents until they reach a state of dynamic equilibrium.

FURTHER utilizes two machine-learning modules to accomplish this. The first is an inference module, which allows a bot to predict the actions of other bots and the learning algorithms they employ based on their history alone. With this data in hand, the agent can modify its actions and social interactions to optimize its reward.

In many scenarios, including a pair of robots engaging in sumo-style combat and a war between two teams of 25 agents, they compared their method to existing multiagent reinforcement learning frameworks. More often than not, games were won by the FURTHER-using AI agents in both cases.

According to the researchers, their system is more scalable than others, requiring a central computer to govern the agents since it is decentralized. In simple words, meaning the agents learn to win the games independently.

Researchers tested FURTHER in a gaming context, but it has broad applicability for solving multiple multiagent issues. Economists could use it, for instance, while trying to develop a workable policy for a complex system in which the behaviors and interests of numerous entities are dynamic.

Check out theย Paper, Project Website, and MIT Article.ย All Credit For This Research Goes To Researchers on This Project. Also, donโ€™t forget to joinย our Reddit pageย andย discord channel, where we share the latest AI research news, cool AI projects, and more.

๐Ÿ Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...