Most AI systems excel in generating specific responses to a particular problem. Today, AI can outperform humans in various fields. For AI to do any task it is presented with; it needs to generalize, learn, and understand new situations as they occur without supplementary guidance. However, as humans can recognize chess and Poker both as games in the broadest sense, teaching a single AI to play both is challenging.
Perfect-Information games versus Imperfect-Information games
AI systems are relatively successful at mastering perfect-information games like chess, where nothing is hidden to either player. Each player can see the entire board and all possible moves in all instances. With bots like AlphaZero, AI can even combine reinforcement learning with search (RL+Search) to teach themselves to master these games from scratch.
Unlike perfect-information games and single-agent settings, imperfect-information games have a critical challenge that an action’s value may depend on their chosen probability. Therefore, the team states that it is also crucial to include the probability that different sequences of actions occurred and not just the sequences of actions alone.
Facebook has recently introduced Recursive Belief-based Learning (ReBeL). It is a general RL+Search algorithm that works in all two-player zero-sum games, including imperfect-information games. ReBeL grows on the RL+Search algorithms that have proved successful in perfect-information games. However, unlike past AIs, ReBeL makes decisions by factoring in the probability distribution of different views each player might have about the game’s current state, which is called a public belief state (PBS). For example, ReBeL can assess the chances that its poker opponent thinks it has.
Former RL+Search algorithms break down in imperfect-information games like Poker, where not complete information is known (for example, players keep their cards secret in Poker). These algorithms give a fixed value to each action regardless of whether the action is chosen. For instance, in chess, a right step is good irrespective of whether it is chosen frequently or rarely. But in games like Poker, the more a player bluffs, its value goes down as opponents can alter their strategy to call more of those bluffs. Thus Pluribus poker bot is trained on an approach that uses search during actual gameplay and not before.
ReBeL can treat imperfect-information games similar to perfect-information games by accounting for the views of each player. Facebook has developed a modified RL+Search algorithm that ReBeL can leverage to work with the higher-dimensional state and action range of imperfect-information games.
Experiments show that ReBeL is efficient in large-scale two-player zero-sum imperfect-information games such as Liar’s Dice and Poker. ReBeL achieves superhuman performance by even defeating a top human professional in the benchmark game of heads-up no-limit Texas Hold ’em.
Several works have occurred before to achieve the same. However, ReBeL executes it using considerably less expert domain knowledge than any previous poker AI. This is a crucial step to building a generalized AI that can solve complex real-world problems involving hidden information like negotiations, fraud detection, cybersecurity, etc.
ReBeL is the first AI to empower RL+Search in imperfect-information games. However, there are some limitations to its current implementation, as listed below:
- Foremost, the computation required for ReBeL grows heavy in certain games(such as Recon Chess) with strategic depth but minimal common knowledge.
- ReBeL relies on knowing the exact rules of the game.
- ReBeL’s theoretical guarantees are restricted to two-player zero-sum games, which are comparatively unusual in real-world interactions.
Nevertheless, ReBeL achieves low exploitability in benchmark games and is a significant start toward creating more general AI algorithms. To promote further research, Facebook has open-sourced the implementation of ReBeL for Liar’s Dice.
GitHub: (For ReBeL for Liar’s Dice) https://github.com/facebookresearch/rebel?
Related Paper: https://arxiv.org/pdf/2007.13544.pdf