From Deep Blue’s victory over chess grandmaster Garry Kasparov to AlphaGo being the first computer program to defeat a Go World Champion, unbeatable superhuman agents have paved a new path for remarkable advancements made in AI. However, the primary question remains whether AI can create agents that can use language to negotiate and collaborate with others to achieve strategic goals in a manner comparable to humans.
As it involves players mastering the art of understanding other people’s perspectives and devising methods appropriately to persuade them to make agreements and form alliances with others, Diplomacy has long been considered a near-impossible challenge in AI. The complexity of human emotions makes it simple to learn these diplomatic skills. Nevertheless, the question remains: can artificially intelligent machines achieve this level of understanding and persuasion skills?
In order to take on this open challenge, Meta AI recently issued a ground-breaking announcement about its newest agent CICERO, the first AI to play at a human level in the well-known strategic game of Diplomacy. The agent participated in the online game at webDiplomacy.net, where it outperformed human players on average and placed in the top 10 percent of players who took part in multiple games. In diplomacy, players rather than pieces are the game’s focus. To win the game, the agent must, like humans, be able to spot bluffing and communicate like a genuine person, develop relationships, and display empathy and game-related expertise.
The CICERO team takes satisfaction in creating novel methods at the nexus of two fundamentally distinct fields of AI study: natural language processing and strategic reasoning, which are applied in models like GPT-3 and agents like AlphaGo, respectively. CICERO is smart enough to devise a plan of action on its own to win over a specific player, and it can even identify the chances and threats the player perceives from their unique perspective. The controllable dialogue paradigm for diplomacy and a strategic reasoning engine are the cornerstones of CICERO. The agent analyzes the conversation history on the game board and predicts how the other players will act at each stage of the game. It then employs this strategy to direct a language model that can provide spontaneous conversation, alerting other participants of its intentions and suggesting a sensible course of action that works well with them also.
The 2.7 billion parameter BART-like language model, which was pre-trained on publicly available data and then fine-tuned on more than 40,000 human games on webDiplomacy.net, served as the foundation for the controlled dialogue model. The next phase involved creating methods for automatically correlating messages in the training data with matching planned game movements. This helped regulate conversation production during the inference phase to talk about certain intended behaviors for the agent and its conversation participants. This controlled generation served CICERO well since it gave the agent a foundation for its conversations in a series of evolving plans. This makes it easier for the agent to communicate with and influence other participants. Additionally, several filtering techniques were applied to enhance the conversational quality.
When it comes to unbeatable AI agents developed for games like chess and Go, a reinforcement learning technique is used in which an agent learns the best strategies by playing millions of games to itself. However, cooperative games demand that players simulate real-world human behavior. Supervised learning is the conventional method used in human modeling. However, depending solely on supervised learning to make decisions about responding to prior dialogue produces an agent that is comparatively vulnerable and open to abuse.
CICERO approaches this problem by employing an iterative planning method that strikes a compromise between rationality and conversation coherence. Based on the conversations with other players, the agent first guesses everyone’s policy for the upcoming turn. It also predicts what other players believe the agent’s policy will be. After that, a planning algorithm called piKL is used to iteratively improve these predictions by selecting new policies with higher expected values given the predicted policies of the other players while also attempting to maintain a close relationship between the new predictions and the original policy predictions. Evaluations showed that, compared to supervised learning alone, piKL better models human play and produces better policies for the agent.
CICERO is not flawless, despite being a significant advancement in creating gaming agents that incorporates cooperation and competition in matching AI with human intentions and goals. It is vital to recognize that the agent occasionally produces inconsistent communication that can jeopardize its goals. By leveraging their dialogue model for zero-shot classification, they have made the first steps toward recognizing and eliminating poisonous messages.
The team also wishes to draw attention to the fact that while CICERO can currently only play Diplomacy, its underlying technology has many practical applications. By regulating natural language generation through planning and RL, they can hold ongoing discussions between humans and AI-powered agents. Meta researchers are quite enthusiastic about the potential for these areas to progress in the future and to further help the community build upon their work; they have open-sourced CICERO and published a paper outlining their findings. The CICERO website offers more information as well as a real-time agent demonstration.
Check out the Paper, Github, Project, and Reference. All Credit For This Research Goes To Researchers on This Project. Also, don’t forget to join our Reddit page and discord channel, where we share the latest AI research news, cool AI projects, and more.
Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Goa. She is passionate about the fields of Machine Learning, Natural Language Processing and Web Development. She enjoys learning more about the technical field by participating in several challenges.