Recently, Text-based games have become a popular testing method for developing and testing reinforcement learning (RL). It aims to build autonomous agents that can use a semantic understanding of the text, i.e., intelligent enough agents to “understand” the meanings of words and phrases like humans do.
According to a new study by researchers from Princeton University and Microsoft Research, current autonomous language-understanding agents can achieve high scores even in the complete absence of language semantics. This surprising discovery indicates that such RL agents for text-based games might not be sufficiently leveraging the semantic structure of the texts they encounter.
As a solution to this problem, the team proposes an inverse dynamics decoder designed to regularize the representation space and encourage the encoding of more game-related semantics. They aim to produce agents with more robust semantic understanding.
Usually, a spectrum of language processing methods was deployed for text-based games, including word vectors, neural networks, pre-trained language models, open-domain question answering systems, knowledge graphs, and reading comprehension systems. These methods are based on RL frameworks. The framework treats text games as particular instances of a partially observable Markov decision process (POMDP). In this process, agents can perform actions that affect the system intending to maximize a reward that depends on the sequence of system states and agent actions; since these actions and observations are in the language space, the decipherable semantics are attached to text observations and actions.
The researchers attempted to discover to what extent current RL agents leverage semantics in text-based games under three setups: Reducing Semantics via Minimizing Observation (MIN-OB), Breaking Semantics via Hashing (HASH), and Regularizing Semantics via Inverse Dynamics Decoding (INV-DY).
They employed a Deep Reinforcement Relevance Network (DRRN) as their baseline RL agent. In the MIN-OB setup, the researchers minimize the observation to only a location phrase to isolate the action semantics. To test whether a semantics continuity is useful, the team breaks these two encoders by hashing observation and action texts (HASH), such that hashing can identify different observations and actions. Finally, the researchers regulate semantics via an INV-DY approach. The INV-DY serves to regularize both action and observation representations to avoid degeneration by decoding back to the textual domain and to provide intrinsic motivation for exploration.
The team conducted three experiments to probe the effects of different semantic representations on 12 interactive fiction games from the Jericho benchmark. The MIN-OB setup achieved similar maximum scores on most games than the base DRRN. Still, it failed to reach high episodic scores, suggesting the importance of identifying different observations using language details.
Surprisingly, HASH almost doubled the DRRN final score on PENTARI, indicating that the DRRN model can have high performance without leveraging any language semantics. For INV-DY on the game ZORK I, the maximum observed score was 87. Meanwhile, the other models did not exceed 55. This study has shown the potential benefits of developing RL agents with more semantic representations and a better grasp of natural language.