Introduction to Reinforcement Learning

Reinforcement learning is a field of machine learning wherein the goal is learning to perform specific actions in an environment which leads to finding the right actions to be taken in different situations to achieve the absolute goal. The Agent learns to achieve a goal in an uncertain, complex environment through repeated trials to maximize a cumulative reward. The Agent employs a trial and error methodology to come up with a solution to the problem and gets either rewards or penalties for the actions it performs. Reinforcement Learning is employed to find the best possible behavior or path that should be taken in a given situation.

Reinforcement learning differs from supervised learning in not needing labeled input/output pairs to be presented to make the model learn something. In reinforcement learning, there is no answer, but the reinforcement agent decides what to do to perform the given task. In the absence of a training dataset, the focus is on finding a balance between exploration, exploitation, and learning from experience.

The essential elements of a Reinforcement learning model

The Agent is the learner and decision-maker of the model to do the job specified. The Environment is the world with which the Agent interacts and performs actions based on the information it has learned. The Agent receives state S⁰ from the Environment, and subsequently, the Agent and the Environment interact continually. The Agent selects actions, and the Environment responds to those actions and presents new situations to the Agent.

Action is a move made by the Agent, which causes a status change in the Environment. The Agent can take any random action and move to a new state. The best action is decided based on the maximum reward. A reward is the evaluation of an action given by the Environment and can be positive or negative. The fundamental goal of our model is to select actions to maximize the total reward.

source: Stanford-edu Docs

Approaches to a Reinforcement Learning Problem

A reinforcement learning problem can be approached in many ways:

There is no policy function in a value-based reinforcement learning approach, and the aim is to greedily select actions to maximize a value function V(s).

In a policy-based approach, the action performed at each state to gain maximum reward in the future is based on a policy function. Here, no value function is involved. The policy function can be Deterministic, which produces the same action A at any state s, or Stochastic, where each action A has a certain probability of occurrence.

A Model-Based approach generates a virtual model for each Environment, and the Agent learns to produce actions in that specific Environment. Since the model differs for each Environment, there is no singular solution or algorithm for this approach.

Implementing a Reinforcement Learning model

OpenAI Gym is a framework for developing reinforcement learning algorithms.OpenAI Gym provides various game environments in which the Agent can take action. Each Environment has an initial status, and the status is updated each time the Agent takes action.

Here we will be implementing our reinforcement learning model on the cart pole game from OpenAI and creating a high score!!

Write the following code to implement the model:

#importing essential libraries
import gym
import numpy as np
import matplotlib.pyplot as plt
import os
from collections import deque
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
import random

#Creating Agent
class Agent:
    def __init__(self,state_size,action_size):
        self.state_size = state_size
        self.action_size = action_size
        self.memory = deque(maxlen=2000)
        self.gamma = 0.95 #Discount Factor
        self.epsilon = 1.0 # Exploration Rate: How much to act randomly, 
        self.epsilon_decay = 0.995
        self.epsilon_min = 0.01
        self.learning_rate = 0.001 
        self.model = self._create_model()
        
 #Creating Model
    def _create_model(self):
        #Neural Network To Approximate Q-Value function
        model = Sequential()
        #1st Hidden Layer
        model.add(Dense(24,input_dim=self.state_size,activation='relu')) 
        model.add(Dense(24,activation='relu')) #2nd Hidden Layer
        model.add(Dense(self.action_size,activation='linear'))
        model.compile(loss='mse',optimizer=Adam(lr=self.learning_rate))
        return model

 #remembering previous experiences 
    def remember(self,state,action,reward,next_state,done):
        self.memory.append((state,action,reward,next_state,done)) 


#Creating an action function        
    def act(self,state):
        # Exploration vs Exploitation
        if np.random.rand()<=self.epsilon:
            return random.randrange(self.action_size)
        # predict reward value based upon current state
        act_values = self.model.predict(state)
        return np.argmax(act_values[0]) #Left or Right

#method that trains NN with experiences sampled from memory    
    def train(self,batch_size=32): 
        minibatch = random.sample(self.memory,batch_size)
        for state,action,reward,next_state,done in minibatch:
            
            if not done: #boolean 
                target = reward + self.gamma*np.amax(self.model.predict(next_state)[0])
            else:
                target = reward
            target_f = self.model.predict(state)
            target_f[0][action] = target
            self.model.fit(state,target_f,epochs=1,verbose=0) 
            
        if self.epsilon > self.epsilon_min:
            self.epsilon *= self.epsilon_decay
    
    def load(self,name):
        self.model.load_weights(name)
    def save(self,name):
        self.model.save_weights(name)


#Running the model
n_episodes = 1000
output_dir = "cartpole_model/"
agent = Agent(state_size=4,action_size=2)
done = False
state_size = 4
action_size =2
batch_size = 32
agent = Agent(state_size, action_size) # initialise agent
done = False
for e in range(n_episodes):
    state = env.reset()
    state = np.reshape(state,[1,state_size])
    
    for time in range(5000):
        env.render()
        action = agent.act(state) #action is 0 or 1
        next_state,reward,done,other_info = env.step(action) 
        reward = reward if not done else -10
        next_state = np.reshape(next_state,[1,state_size])
        agent.remember(state,action,reward,next_state,done)
        state = next_state
        
        if done:
            print("Game Episode :{}/{}, High Score:{},Exploration Rate:{:.2}".format(e,n_episodes,time,agent.epsilon))
            break
            
    if len(agent.memory)>batch_size:
        agent.train(batch_size)
    
    if e%50==0:
        agent.save(output_dir+"weights_"+'{:04d}'.format(e)+".hdf5")
        
env.close()
            

Visit the OpenAI Gym to find many exciting environments to work on and know more about Reinforcement Learning.

Happy Learning!!

Nitish is a computer science undergraduate with keen interest in the field of deep learning. He has done various projects related to deep learning and closely follows the new advancements taking place in the field.

✅ [Featured AI Model] Check out LLMWare and It's RAG- specialized 7B Parameter LLMs