The personal name tends to have different variations from country to country or even within a country. Typically the name of a person can be broken into two halves. The first name is the name given at birth and the last name (surname) presents the name of the family to which the child is born. But a large majority of the people from Tamilnadu will not have a surname.
In Chinese name Mao Ze Dong the family name is Mao, ie. the first name when reading (left to right). The given name is Dong. The middle character, Ze, is a generational name. Because of these inconsistencies or rather lack naming standards, it is a complete mess. Even the smartest programs today are not trained to handle these inconsistent naming standards.
In this tutorial, we will build a Recurrent Neural Network Model which classifies the nationalities of each name from the character level embeddings.
Recurrent Neural Network
In Feed-forward Neural Networks (FNN) the output of one data point is completely independent of the previous input i.e… the health risk of the second person is not dependent on the health risk of the first person and so on. Similarly, in the case of Convolution Neural Networks (CNN), the output from the softmax layer in the context of image classification is entirely independent of the previous input image.
Recurrent Neural Networks(RNN) are a type of Neural Network where the output from the previous step is fed as input to the current step. Read more about RNN here.
Run this notebook in Colab
All the code discussed in the article is present on my GitHub. You can open the code notebook with any setup by directly opening my Jupyter Notebook on Github with Colab which runs on Google’s Virtual Machine. It’s recommended that you click here to quickly open the notebook and follow along with this tutorial. To learn more about how to execute Pytorch tensors in Colab read my blog post.
Before we start building our network, first we need to import the required libraries.
#import packages from io import open import os, string, random, time, math import matplotlib.pyplot as plt import seaborn as sns import numpy as np from sklearn.model_selection import train_test_split import torch import torch.nn as nn import torch.optim as optim #clearing output from IPython.display import clear_output
Dataset is a text file contains the name of the person and nationality of the name separated by a comma.
Here is a look at the data:
Since the input, the model which is the name of the person is of varying size we have to use a sequence model instead of Feed Forward Neural Network. To load the dataset, we iterate through each row in the data and create a list of tuples containing name and nationality so that we can easily feed it into our sequential model.
languages =  data =  X =  y =  with open("name2lang.txt", 'r') as f: #read the dataset for line in f: line = line.split(",") name = line.strip() lang = line.strip() if not lang in languages: languages.append(lang) X.append(name) y.append(lang) data.append((name, lang)) n_languages = len(languages)
The dataset contains more than 20k names and 18 unique nationalities like Portuguese, Irish, Spanish, etc…
Since the data is quite large, we will split the data into training and testing in the ratio of 70 – 30. In this classification problem, we will use a stratified sampling technique since it’s an imbalanced dataset.
#split the data 70 30 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 123, stratify = y) print("Training Data: ", len(X_train)) print("Testing Data: ", len(X_test)) #Training Data: 14035 #Testing Data: 6015
Encoding Names and Nationalities
The sequence model we will make takes the encodings of the character as an input rather than the raw text data. So we have to encode the input and label at the character level. Once we create encodings at the character level, we need to concatenate all the character level encodings to get the encodings for the whole word. We do this operation for all the names and nationalities.
#get all the letters all_letters = string.ascii_letters + ".,;" n_letters = len(all_letters) print("Number of letters: ", n_letters)
To encode names first, we will get all the ASCII characters into a list. Now we have a list of all possible characters that can appear in the names of a person. We iterate through each character present in the name and find the index of that character in our list of ASCII characters. Using that index number we will create a one-hot vector for that character and repeat this process all the characters to get final encoding.
def name_rep(name): rep = torch.zeros(len(name), 1, n_letters) for index, letter in enumerate(name): pos = all_letters.find(letter) rep[index][pos] = 1 return rep #sample encoding name_rep("Kumar")
The above function
name_rep create an one-hot encoding for the names. First, we declare a tensor of zeroes with an input size equal to the length of the name and outsize equal to the total number of characters in our list. After that, we iterate through each character to find the index of a letter and set that index position value equal to 1, leaving the remaining values to be equal to 0.
Sample encoding would look like this.
The logic for encoding nationalities is much simpler than encoding names. For encoding nationality, we just find the index of the occurrence of that particular nationality in our list of nationalities. Then assign that index as an encoding.
#function to create lang representation def nat_rep(lang): return torch.tensor([languages.index(lang)], dtype = torch.long)
Recurrent Neural Network Model
In section, we will discuss how to build an RNN model using Pytorch
nn.Module. We will write a class
RNN_net for our model which will subclass
#define a basic rnn network class RNN_net(nn.Module): def __init__(self, input_size, hidden_size, output_size): super(RNN_net, self).__init__() #declare the hidden size for the network self.hidden_size = hidden_size self.i2h = nn.Linear(input_size + hidden_size, hidden_size) #input to hidden layer self.i2o = nn.Linear(input_size + hidden_size, output_size) #input to output layer self.softmax = nn.LogSoftmax(dim = 1) #softmax for classification def forward(self, input_, hidden): combined = torch.cat((input_, hidden), 1) #concatenate tensors on column wise hidden = self.i2h(combined) #generate hidden representation output = self.i2o(combined) #generate output representation output = self.softmax(output) #get the softmax label return output, hidden def init_hidden(self): return torch.zeros(1, self.hidden_size)
torch.nn.Linear(in_features, out_featuers) takes two mandatory parameters.
- in_features — The size of each input sample
- out_features — The size of each output sample
__init__ function (constructor function) helps us to initialize the parameters of the network like weights and biases associated with the hidden layers. The
__init__ function takes input size (size of the representation of one character), hidden layer size, and output size (which is equal to the number of languages we have). The
nn.Linear() function automatically defines weights and biases for each hidden layer instead of manually defining them.
Let’s see what’s going on inside
i2h layer computes the hidden representation at the current time by taking the combination of current time step input and hidden representation of the previous layer. The
i2o layer computes the output at the current time step by taking the combination of current time step input and hidden representation of the previous layer.
forward function takes the encoded representation of a character and it’s hidden representation as the input. The forward function first concatenates the input and hidden representation of a character and uses that as an input to compute the output label using
Inference on Recurrent Neural Network Model
Before we start training our first, we will use the model to make inferences on the data. So that we can be sure that our network architecture is working as we expected.
#function to make inference def infer(net, name): net.eval() name_ohe = name_rep(name) hidden = net.init_hidden() for i in range(name_ohe.size()): output, hidden = net(name_ohe[i], hidden) return output #declare the size of the hidden layer representation n_hidden = 128 #create a object of the class net = RNN_net(n_letters, n_hidden, n_languages) #before training the network, make a inference to test the network output = infer(net, "Adam") index = torch.argmax(output) print(output, index)
infer function takes the network instance and person name as the input parameters. In this function:
– We are setting the network to evaluation mode.
– Computing the One-Hot representation of the input person name.
– Creating the hidden representation based on the hidden size.
– Iterate through all the characters and feeds the computed hidden representation back to the network.
– Finally computes the output nationality for that person name.
Training Recurrent Neural Network
In this section, we will create a generic training setup that can be used for other networks like LSTM, GRU. To train our network, we need to define the loss function and optimization algorithm. In this case, we will use
NLLLoss to calculate the loss of the network and make use of the
SGD optimizer to find the global minima.
Before we start training our network, let’s define a custom function to calculate the accuracy of our network.
#create a function to evaluate model def eval(net, n_points, k, X_, y_): data_ = dataloader(n_points, X_, y_) correct = 0 for name, language, name_ohe, lang_rep in data_: output = infer(net, name) #prediction val, indices = output.topk(k) #get the top k predictions if lang_rep in indices: correct += 1 accuracy = correct/n_points return accuracy
evaluation function takes network instance, the number of data points, k, test x, and test y as the input parameters. In this function,
- We load the data using the data loader.
- Iterating through all person names present in the data loader.
- Invoking our model on the inputs and getting the outputs.
- Computing the predicted class.
- Calculating the total number of correctly predicted classes and returning the final percentage.
We will write a simple
train_setup function to train our network.
def train_setup(net, lr = 0.01, n_batches = 100, batch_size = 10, momentum = 0.9, display_freq = 5): criterion = nn.NLLLoss() #define a loss function opt = optim.SGD(net.parameters(), lr = lr, momentum = momentum) #define a optimizer loss_arr = np.zeros(n_batches + 1) #iterate through all the batches for i in range(n_batches): loss_arr[i + 1] = (loss_arr[i]*i + train(net, opt, criterion, batch_size))/(i + 1) if i%display_freq == display_freq - 1: clear_output(wait = True) print("Iteration number ", i + 1, "Top - 1 Accuracy:", round(eval(net, len(X_test), 1, X_test, y_test),4), Top-2 Accuracy:', round(eval(net, len(X_test), 2, X_test, y_test),4), 'Loss:', round(loss_arr[i]),4) plt.figure() plt.plot(loss_arr[1:i], "-*") plt.xlabel("Iteration") plt.ylabel("Loss") plt.show() print("\n\n") #declare all the parameters n_hidden = 128 net = RNN_net(n_letters, n_hidden, n_languages) train_setup(net, lr = 0.0005, n_batches = 100, batch_size = 256)
In our training loop,
- For each epoch, we iterate through the data loader.
- Get the input data and labels.
- Reset any previous gradient present in the optimizer, before computing the gradient for the next batch.
- Execute the forward pass and get the output.
- Compute the loss based on the predicted output and actual output.
- Backpropagate the gradients.
- At the end of each epoch, we are printing the progress messages.
Hyperparameters used in the training process are as follows:
- Learning rate: 0.0005
- Loss function: Negative Log-Likelihood Loss
- Optimizer: Stochastic Gradient Descent with Momentum
- Number of batches = 100
- Batch size = 256
Visualization of Loss Plot
We can plot the loss of the network against each iteration to check the model performance.
After training the model for 100 batches, we are able to achieve a top-1 accuracy of 68% and a top-2 accuracy of 79% with the RNN Model.
Long Short Term Memory – LSTM Model
In this section, we will discuss how to implement the LSTM Model for classifying the name nationality of a person’s name. We will make use of Pytorch
nn.LSTM subclass to create a custom called
#LSTM class class LSTM_net(nn.Module): def __init__(self, input_size, hidden_size, output_size): super(LSTM_net, self).__init__() self.hidden_size = hidden_size self.lstm_cell = nn.LSTM(input_size, hidden_size) #LSTM cell self.h2o = nn.Linear(hidden_size, output_size) self.softmax = nn.LogSoftmax(dim = 2) def forward(self, input_, hidden): out, hidden = self.lstm_cell(input_.view(1, 1, -1), hidden) output = self.h2o(hidden) output = self.softmax(output) return output.view(1, -1), hidden def init_hidden(self): return (torch.zeros(1, 1, self.hidden_size), torch.zeros(1, 1, self.hidden_size))
From the implementation standpoint, the only change in the
__init__ function is that we are using the
nn.LSTM function. The
nn.LSTM function will handle all the necessary computations including the computation of the hidden state itself.
init_hidden initializes two tensors of zero values. One tensor represents the hidden state and another tensor represents the hidden cell state. The
forward function takes an encoded character and it’s hidden representation as the parameters to the function similar to RNN. Pytorch LSTM takes expects all of its inputs to be 3D tensors that’s why we are reshaping the input using view function.
To train the LSTM network, we will our training setup function.
#create hyperparameters n_hidden = 128 net = LSTM_net(n_letters, n_hidden, n_languages) train_setup(net, lr = 0.0005, n_batches = 100, batch_size = 256)
The loss plot for the LSTM network would look like this,
There you have it, we have successfully built our nationality classification model using Pytorch. The entire code discussed in the article is present in this GitHub repository. Feel free to fork it or download it.
Where to go from here?
In this article, we have discussed the RNN Model and LSTM Model but if you want to improve the performance of the network you can try out:
- Implementing Gated Recurrent Unit Model (Bonus: I have already implemented GRU in my Git repo).
- Play with hyper-parameters of LSTM and GRU Model
- Increasing the performance by moving the training to GPU.
If you are a beginner in using Pytorch framework, these are the best resources for you Pytorch
- Getting Started With Pytorch In Google Collab With Free GPU
- Building a Feedforward Neural Network using Pytorch NN Module
In this post, we discussed the need to classify the nationality of a person based on the name. Then we have seen how to load our custom dataset in the format of training our model. After that, we have discussed how to encode the names and nationalities before training the model. Finally, we have seen the implementations of the RNN and LSTM Model used for training the data. If you any issues or doubts while implementing the above code, feel free to ask them in the comment section below or send me a message on LinkedIn citing this article.
Connect with Me
- LinkedIn – https://www.linkedin.com/in/niranjankumar-c/
- GitHub – https://github.com/Niranjankumar-c
- Twitter – https://twitter.com/Nkumar_n
- Medium – https://medium.com/@niranjankumarc
Note: This is a guest post, and the opinion in this article is of the guest writer. If you have any issues with any of the articles posted at www.marktechpost.com please contact at firstname.lastname@example.org