Alchemy, a novel open-source benchmark for meta Reinforcement learning (RL) in the recent decade, has garnered much attention in the ML field. The RL approach not only cuts down the requirement of labeled data but has also yielded incredible successes on a wide variety of specific tasks. But issues such as generalization, sample efficiency, and transfer learning are still hurdles for RL. Researchers have been exploring meta-RL to overcome these hurdles.
In Meta-RL, learning strategies can quickly adapt to novel tasks. The above is done using experience gained on a large set of functions that have a shared structure. Even after the innovation of many exciting meta-RL techniques, no ideal task benchmark exists for testing new algorithms.
Inspired by humans’ ability to Produce and tackle new tasks, Meta-learning works by drawing on experiences from related learning tasks. A new learning paradigm is provided wherein agents can gain experience over multiple learning episodes. The above experience is used to improve learning performance.
Meta-RL environments present the learner with a task distribution instead of a single task. Accessible and interesting are two outstanding features for benchmark meta-RL task distribution.
Accessible refers to complete information of the entire task distribution.
Interesting means the displayed properties are comparable with the real-world tasks.
The previous research on meta-RL benchmarks has failed to achieve both the above elements simultaneously. Researchers agree that if the progress needs to be sustained in the Meta-RL field, the current work must be reproduced and accurately compared to assess new performance methods. Alchemy comes up to address this.
Alchemy has a motive to check both the boxes as a “best-of-both-worlds” benchmark for Meta-RL research.
Alchemy is a 3-Dimensional video game, which is played in a series of trials. The player receives a set of containers filled with the potion, stones, and a central cauldron in the beginning. The potions/brews are used to treat and boost the stones’ value, which is then added to the cauldron to register the maximum point within a fixed time limit.
The stones’ value is decided by their perceptual features like shape, size, and color. Thus, the task constitutes learning a “chemistry” that governs how various brews affect the different stones across trials in an episode.
Alchemy involves a set of latent causal relationships. It requires strategic experimentation with action sequencing. The levels are created based on an explicit generative process, which results in an accessible structure that is also interesting.
The team evaluated Alchemy’s environment on IMPALA and V-MPO. Although these agents have achieved outstanding performances in single-task RL environments, they displayed abysmal meta-learning performance in Alchemy even after extensive training. According to the team, the results reflect a failure of the structure learning and latent-state inference involved in meta-learning. The above validates Alchemy as a practical benchmark task for meta Reinforcement learning research.