UC Berkeley Research Explains How Self-Supervised Reinforcement Learning Combined With Offline Reinforcement Learning (RL) Could Enable Scalable Representation Learning

Machine learning (ML) systems have excelled in fields ranging from computer vision to speech recognition and natural language processing. Yet, these systems fall short of human reasoning in terms of flexibility and generality. This has prompted machine learning researchers to look for the “missing component” that could improve these systems’ understanding, reasoning, and generalization abilities.

A new study by UC Berkeley researchers shows that combining self-supervised and offline reinforcement learning (RL) might lead to a new class of algorithms that understand the world through actions and enable scale representation learning.

According to the researchers, RL can be used to create a generic, principled, and powerful framework for employing unlabeled data, allowing ML systems to better grasp the actual world by utilizing big datasets.

To solve this “missing component” challenge in ML systems, several hypotheses have been suggested. This includes causal reasoning, inductive bias, and better self-supervised or unsupervised learning algorithms. While the topic is complex and entails a lot of guesswork, some insights can be derived from recent advances in AI, such as:

  • The “irrational” effectiveness of huge, generic models fed vast amounts of training data
  • How manual labeling and supervision fail to scale and unsupervised or self-supervised learning.

An important step in this is to determine how to train large models without manual labeling or manual design of self-supervised objectives to obtain models that have a deep and meaningful understanding of their surroundings to can perform downstream tasks with robust generalization and some degree of common sense.

Autonomous agents will need a causal and generalizable grasp of their environment to accomplish this task. Such agents would go beyond the current RL paradigm, in which RL algorithms require a task goal (i.e., a reward function) to be specified by experts or where they are not inherently data-driven. This method limits both generalizations and learning about how the real world works.

Rather than focusing on a single user-specified task, the new algorithm aims to achieve whatever results they believe are conceivable in the real world. Using offline RL algorithms can successfully leverage previously gathered datasets. This will enable systems to use their training time to learn and execute user-specified tasks while also using their collected experience as offline training data to learn to achieve a broader range of outcomes.

The team believes that Offline RL has the potential to greatly expand the applicability of self-supervised RL methods. This can also be used with goal-conditioned policies to learn totally from previously gathered data.

This research shows how scalable representation learning can be achieved by combining self-supervised RL with offline RL. Self-supervised training can help models grasp how the world works. Furthermore, completing self-supervised RL objectives can help models understand the environment. Offline RL, which allows the use of huge, heterogeneous previously gathered datasets, meets this difficulty by allowing such techniques to be applied at scale to real-world datasets.

Paper: https://arxiv.org/pdf/2110.12543.pdf