Introduction to Recurrent Neural Networks

In typical neural networks, all the inputs and outputs are independent of each other, Which means each hidden layer has its separate set of Weights and biases. But, in cases like when it is required to predict the next word of a sequence, the previous words are necessary, and hence there is a need to remember the previous words. Recurrent Neural Networks (RNNs) are a kind of neural network that specializes in processing sequences. RNNs are often used in Natural Language Processing (NLP) tasks because of their effectiveness in handling text. 

Recurrent Neural Network(RNN) is a type of Neural Network where the previous step’s output is fed as input to the current step. The most crucial feature of RNN is the Hidden state, which remembers information about a sequence.RNNs have a memory that refers to all the information about what has been calculated. A part of data is made to pass through the same set of parameters, basically just using the sequential data in hand. Thus, it dramatically reduces the complexity of assigning parameters.

How do RNNs work?

RNN converts the activations from dependent to independent activations by providing the same weights and biases to all the layers and significantly reducing the complexity of increasing parameters by giving each output as input to the next hidden layer. Hence these layers can be joined together such that the weights and bias of hidden layers are the same.RNNs help variable-length sequences as both inputs and outputs and work by iteratively updating a hidden state h. At any given step t, The next hidden state h_t​ is calculated using the previous hidden state h_{t-1} ​and the following input x_t and the subsequent output y_t​ is calculated using h_t.

🔥 Recommended Read: Leveraging TensorLeap for Effective Transfer Learning: Overcoming Domain Gaps
source: towarddatascience.com

Training an RNN 

For training an RNN, A single time step of the input is provided to the network, and Then its current state is calculated using a set of current input and the previous state. The current h_t becomes h_t-1 for the next time step. We can calculate as many time steps according to the problem and join the information from all previous states. the final current state is used to calculate the output once all the time steps are completed. The predicted outcome is subsequently compared to the actual output, i.e., the target output, to generate the error. The error is back-propagated to the network to update the weights, and hence the RNN is trained.

Other RNN architectures

RNNs are great, but They may become severely challenging to train if the number of parameters becomes enormous. If the network is unrolled, it becomes so massive that its convergence is a challenge.

Long Short Term Memory networks(LSTMs) – are a special kind of RNN capable of learning long-term dependencies. They work well on a variety of problems and are widely used. LSTMs also have a similar structure to RNNs, but the repeating module has a slightly different design. Multiple layers are interacting in a unique way Instead of having a single neural network layer. They also have mechanisms known as an input gate, a forget gate, and an output gate. 

Another great RNN architecture is the Gated Recurrent Unit(GRU). They are a variant of LSTMs but much more straightforward in their structure and easier to train. They work by sending gating network signals that control how the present input and previous memory are used to update and produce the current state. There are two gates in GRUs, the reset and the update gate, and The gates have their own sets of weights that are updated adaptively.

Implementation of an RNN using Keras

We will implement an emoji predictor using Keras sequential model. The model outputs an emoji based on the text that we input. 

Write the following code to implement the model:-

# installing the emoji module
!pip install emoji

import numpy as np
import pandas as pd
import emoji as emoji
from keras.layers import *
from keras.models import Sequential
from keras.utils import to_categorical

train = pd.read_csv('dataset/train_emoji.csv',header=None)
test = pd.read_csv('dataset/test_emoji.csv',header=None)

emoji_dictionary = {"0": "\u2764\uFE0F",    
                    "1": ":baseball:",
                    "2": ":beaming_face_with_smiling_eyes:",
                    "3": ":downcast_face_with_sweat:",
                    "4": ":fork_and_knife:",
                   }
XT = train[0]
Xt = test[0]

YT = to_categorical(train[1])
Yt = to_categorical(test[1])

embeddings = {}
with open('glove.6B.50d.txt',encoding='utf-8') as f:
    for line in f:
        values = line.split()
        word = values[0]
        coeffs = np.asarray(values[1:],dtype='float32')
        
        #print(word)
        #print(coeffs)
        embeddings[word] = coeffs

def getOutputEmbeddings(X):
    
    embedding_matrix_output = np.zeros((X.shape[0],10,50))
    for ix in range(X.shape[0]):
        X[ix] = X[ix].split()
        for jx in range(len(X[ix])):
            embedding_matrix_output[ix][jx] = embeddings[X[ix][jx].lower()]
            
    return embedding_matrix_output

emb_XT = getOutputEmbeddings(XT)
emb_Xt = getOutputEmbeddings(Xt)

model = Sequential()
model.add(LSTM(64,input_shape=(10,50),return_sequences=True))
model.add(Dropout(0.4))
model.add(LSTM(64,input_shape=(10,50)))
model.add(Dropout(0.3))
model.add(Dense(5))
model.add(Activation('softmax'))
model.summary()

model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])
model.fit(emb_XT,YT,batch_size=32,epochs=40,shuffle=True,validation_split=0.1)

model.evaluate(emb_Xt,Yt)
pred = model.predict_classes(emb_Xt)

for i in range(30):
    print(' '.join(Xt[i]))
    print(emoji.emojize(emoji_dictionary[str(np.argmax(Yt[i]))]))
    print(emoji.emojize(emoji_dictionary[str(pred[i])]))

We have trained our first RNN model!!

Visit the tensorflow RNN guide to learn more about RNNs. 

Happy learning!!

Nitish is a computer science undergraduate with keen interest in the field of deep learning. He has done various projects related to deep learning and closely follows the new advancements taking place in the field.