In the previous article, we discussed Artificial Neural Networks (ANNs), Convolutional Neural Networks (CNNs) and applied them to detect fake news. This article will explain what RNNs are and how to use them. First, let us know what the problem is with usual artificial networks. The problem with standard networks is that they cannot capture the sequential relationship between the words in a sentence, i.e., we know that the next word depends on the incomplete sentence already known. To capture this, we introduce a sequential Neural Net, which is RNN recurrent neural networks.
Sequential Models ( RNNs ) :
Let us understand how Recurrent Neural Network (RNN’s) work, considering there is a sentence given, and we have to predict the next word, so this is how RNN works. It takes the first word of the sentence, passes it through a neural net, and predicts the next word but to predict the third word, it takes the activation of the hidden state and the second word as input. This process continues. Here, the sequential relation is captured because we use the previous word’s hidden state to predict the next one, which means somehow an encoded version of the previous sentence is being used to predict the next word. This is the reason why RNN’s are so powerful.
Sequential Models ( LSTMs and GRUs ) :
There are more effective structures which are Gated Recurrent Units (GRUs) and Long-Short-Term-Memory (LSTMs). The practical problem of why GRUs and LSTMs are used instead of RNN is as follows, in RNN, we use the information from every previous word to predict the next word right, but sometimes a part of a sentence is enough to predict the next word in LSTMs and GRUs. We use this idea and design the network in such a way that we allow the model which words to select. This is the intuition behind sequential models. Now let us apply them to create some exciting content. We are going to generate a model which writes like Shakespeare sounds excellent, right? Let us get into it.
Before creating a model, we should preprocess the data so that it fits the model perfectly. Let’s do that first. So here is the data. We teach the model to predict the next word starting by considering the previous 12 words. Hence the length of the input sequence must be 12, and that of output should be 1, right?. In this article, we learn about two different implementations those are
1.character level modelling
2.Using a word embedding
Character level modelling :
Here the input at each time step will be an encoded version of the letter. In this model, we use the previous 12 characters to predict the next one. Since 12 is a very small number, let’s try with 100.The data here was from Shakespeare’s writing. You can try it out using some other text.
In preprocessing, we should create the data set in such a way that input is 100 one hot encoded version of characters and output should be one hot encoded version of the predicted character. The implementation is as follows.
from __future__ import print_function from keras.callbacks import LambdaCallback from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.optimizers import RMSprop from keras.utils.data_utils import get_file import numpy as np import random import sys import io import tensorflow as tf import matplotlib.pyplot as plt import platform import time import pathlib import os cache_dir = './tmp' dataset_file_name = 'shakespeare.txt' dataset_file_origin = 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt' dataset_file_path = tf.keras.utils.get_file( fname=dataset_file_name, origin=dataset_file_origin, cache_dir=pathlib.Path(cache_dir).absolute() ) print(dataset_file_path) ss = open(dataset_file_path,mode='r') text = ss.read() chars = sorted(list(set(text))) char_indices = dict((c, i) for i, c in enumerate(chars)) indices_char = dict((i, c) for i, c in enumerate(chars)) def build_data(text, Tx = 100, stride = 1): X =  Y =  for i in range(0, len(text) - Tx, stride): X.append(text[i: i + Tx]) Y.append(text[i + Tx]) print('number of training examples:', len(X)) return X, Y X,Y = build_data(text[:10000]) def vectorization(X, Y, n_x, char_indices, Tx = 100): m = len(X) x = np.zeros((m, Tx, n_x), dtype=np.bool) y = np.zeros((m, n_x), dtype=np.bool) for i, sentence in enumerate(X): for t, char in enumerate(sentence): x[i, t, char_indices[char]] = 1 y[i, char_indices[Y[i]]] = 1 return x, y x,y = vectorization(X,Y,len(chars),char_indices,Tx=100) def sample(preds, temperature=1.0): preds = np.asarray(preds).astype('float64') preds = np.log(preds) / temperature exp_preds = np.exp(preds) preds = exp_preds / np.sum(exp_preds) probas = np.random.multinomial(1, preds, 1) out = np.random.choice(range(len(chars)), p = probas.ravel()) return out
Model Development :
Here I used LSTM but you can try changing it to GRU and RNN.
model = Sequential() model.add(LSTM(256, input_shape=(100, len(chars)),return_sequences=True)) model.add(LSTM(256)) model.add(Dense(128,activation='relu')) model.add(Dense(128,activation='relu')) model.add(Dense(len(chars), activation='softmax')) maxlen = 100 def on_epoch_end(epoch, _): if(epoch>0 and epoch%150 == 0): print() print('----- Generating text after Epoch: %d' % epoch) start_index = random.randint(0, len(text) - maxlen - 1) for diversity in [0.5]: print('----- diversity:', diversity) generated = '' sentence = text[start_index: start_index + maxlen] generated += sentence print('----- Generating with seed: "' + sentence + '"') sys.stdout.write(generated) for i in range(500): x_pred = np.zeros((1, maxlen, len(chars))) for t, char in enumerate(sentence): x_pred[0, t, char_indices[char]] = 1. preds = model.predict(x_pred, verbose=0) next_index = sample(preds, diversity) next_char = indices_char[next_index] sentence = sentence[1:] + next_char sys.stdout.write(next_char) sys.stdout.flush() print() optimizer = RMSprop(learning_rate=0.01) model.compile(loss='categorical_crossentropy', optimizer=optimizer) print_callback = LambdaCallback(on_epoch_end=on_epoch_end) model.fit(x, y, batch_size=512, epochs=200, callbacks=[print_callback])
We can observe that the results are pretty good. My suggestion is you must face hyper parameter tuning, which helps you a lot in the future, so by using this code, you may not get the optimal solution. To get it, you must do some hyper parameter tuning and many more things. I suggest you play with this model by changing model architecture and hyper parameters such as length of sequence epochs, batch size, optimizers, etc. The difference between word level and character level models will be that there will be extra embedding layer in word level model. Try it out.