Harnessing curiosity in Reinforcement Learning

Reinforcement learning is a specialized area of artificial intelligence where an algorithm applies independent and goal-oriented learning, based on rewards received from an environment. The core objective of this approach is to maximize positive rewards through continuous striving; for example, scoring high to win in a game. This approach is similar to the scenario when humans attempt to learn a new skill. In Reinforcement learning algorithms, a positive reward acts as a motivator, and a negative reward acts as a penalty.

The current reinforcement learning approach mainly faces challenges in environments where the reward mechanism is vague or unclear. Such scenarios are frequent in the real world; for example, learning to find a particular food item in a large  supermarket. Despite endless searching, the food item is nowhere to be seen. There is no inherent โ€œclueโ€ which guides you if you are heading towards the right path.

โœ… [Featured Article] LLMWare.ai Selected for 2024 GitHub Accelerator: Enabling the Next Wave of Innovation in Enterprise RAG with Small Specialized Language Models

Curiosity as a motivator

AI researchers are exploring different curiosity models to tackle the issue of rewards in reinforcement learning, where systems can explore curiosity as a motivator to strive in challenging tasks. Recent research by Google and DeepMind involved an additional exploration of rewards in Reinforcement learning for known and unknown data in a short-term memory input. In this research, the discovery of unknown information rewards more than the known information. Motivation gained from the unknown rewards is utilized in further pursuing solutions.

Curiosity-driven learning

In a research conducted by the University of California, Berkeley, a curiosity exploration framework named โ€œIntrinsic curiosity Modelโ€ was proposed where a failure to discover accurate prediction is perceived as a positive surprise. The element of surprise is a method to impart the maximization of rewards.

Surprise Maximization

As per a study by OpenAI, a large-scale study of curiosity-based learning showed risks associated with the intrinsic curiosity reward methods where instead of an effective pursuit of a solution, an agent may indulge in procrastination like a scenario without adding any value.

Self-rewarding actions through episodic curiosity

As per another research by Cornell University, agents in reinforcement learning algorithms can create rewards for themselves, thus forming more relevant and achievable. For example, discovering something new can be perceived as a reward. This approach uses episodic memory where the current data is compared with the existing data in the memory. The comparison is done with respect to the number of steps required to reach the current data point, preventing inaccurate and hurried results.

In a nutshell, the capability of reinforcement learning can be augmented with intrinsic curiosity-based rewarding methods. However, this approach is still in an early stage where an ideal, automated learning framework is yet to be discovered.

Vaibhavi is an AI-ML enthusiast, keen observer, and a data-story teller. She writes about cutting-edge research areas and technologies in the field of data science. She also covers ethical and social sides of AI as well as behavioral insights in technology. Her other areas of interests are digital marketing, business aspects of AI, design thinking as well as content strategy.

[Free AI Webinar] 'How to Build Personalized Marketing Chatbots (Gemini vs LoRA)' [May 31, 10 am-11 am PST]