In this blog post, we will discuss how to build a Convolution Neural Network that can classify Fashion MNIST data using Pytorch on Google Colaboratory (Free GPU). The way we do that is, first we will download the data using Pytorch
DataLoader class and then we will use LeNet-5 architecture to build our model. Finally, we will train our model on GPU and evaluate it on the test data.
This tutorial assumes that the reader has the basic knowledge of convolution neural networks and know the basics of Pytorch tensor operations with CUDA support.
Image Classification is a task of assigning a class label to the input image from a list of given class labels. Here the idea is that you are given an image and there could be several classes that the image belong to. The task in Image Classification is to predict a single class label for the given image.
Fashion MNIST Dataset
Fashion-MNIST is a dataset of Zalando‘s article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28×28 grayscale image, associated with a label from 10 classes.
Fashion-MNIST intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits. According to the creators of
Fashion-MNIST, here are some good reasons to replace MNIST dataset:
- MNIST is too easy. Convolutional nets can achieve 99.7% on MNIST.
- MNIST is overused. In this April 2017 Twitter thread, Google Brain research scientist and deep learning expert Ian Goodfellow calls for people to move away from MNIST.
- MNIST cannot represent modern CV tasks, as noted in this April 2017 Twitter thread, deep learning expert/Keras author François Chollet.
Run this notebook in Colab
All the code discussed in the article is present on my GitHub. You can open the code notebook with any setup by directly opening my Jupyter Notebook on Github with Colab which runs on Google’s Virtual Machine. Click here, if you just want to quickly open the notebook and follow along with this tutorial. To learn more about how to execute Pytorch tensors in Colab read my blog post.
Rest of the article is structured as follows
- Download dataset (Using
- Visualize dataset
- Setting the GPU device
- Train and Evaluate LeNet
- Visualization Loss Plot
- Where to go from here?
Before we start building our network, first we need to import the required libraries. We are importing the
numpy because we need to convert tensor image to numpy format so that we can use
matplotlib to visualize the images. Importing
torch for all things related to Pytorch and
torchvision to download the Fashion MNIST dataset.
import torchvision import torchvision.transforms as transforms import torch import matplotlib.pyplot as plt import numpy as np plt.figure(figsize = (3,3)) #define the image size
torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision. All the images present in the FashionMNIST dataset are stored in
PIL format. So we are using
transform function to transform the input images to Pytorch tensors.
#transforming the PIL Image to tensors trainset = torchvision.datasets.FashionMNIST(root = "./data", train = True, download = True, transform = transforms.ToTensor()) testset = torchvision.datasets.FashionMNIST(root = "./data", train = False, download = True, transform = transforms.ToTensor())
Once we download the training data, we will use
torch.utils.data.DataLoader to load the dataset.
DataLoader also gives us the ability to iterate over the dataset.
#loading the training data from trainset trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle = True) #loading the test data from testset testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False)
To visualize the dataset, we will implement a custom function
imshow. In the FashionMNIST dataset, there are 10 classes, these will be represented as indices starting from 0 to 9.
classes = ('T-Shirt','Trouser','Pullover','Dress','Coat','Sandal','Shirt','Sneaker','Bag','Ankle Boot') def imshow(img): npimg = img.numpy() #convert the tensor to numpy for displaying the image #for displaying the image, shape of the image should be height * width * channels plt.imshow(np.transpose(npimg, (1, 2, 0))) plt.show()
In this function first, we will convert the Pytorch tensor image to numpy image and then we transpose the image such that image size should be height, width, and channels. After that, we will use
matplotlib to display the image.
LeNet – 5
In this tutorial, we will use LeNet-5 ( 7 layers Convolutional network) Architecture to classify Fashion Images. Yann LeCun, Leon Bottou, Yosuha Bengio, and Patrick Haffner proposed a Convolution Network called LeNet-5 to automatically classify hand-written digits on bank cheques in the United States. LeNet-5 architecture is fairly simple. It is a 7 layered network architecture excluding the inputs consists of two alternate convolution and pooling layers followed by three fully connected layers at the end. LeNet-5 uses average pooling for downsampling of features. Tanh and Sigmoid activations are used in this network.
To create LeNet-5 architecture in Pytorch, we will use
nn.Module class and
nn.Sequential API to create a custom class called
class LeNet(nn.Module): def init(self): super(LeNet, self).init() self.cnn_model = nn.Sequential( nn.Conv2d(3, 6, kernel_size = 5), (N, 1, 28, 28) -> (N, 6, 24, 24) nn.Tanh(), nn.AvgPool2d(2, stride = 2), #(N, 6, 24, 24) -> (N, 6, 12, 12) nn.Conv2d(6, 16, kernel_size = 5), #(N, 6, 12, 12) -> (N, 6, 8, 8) nn.Tanh(), nn.AvgPool2d(2, stride = 2)) #(N, 6, 8, 8) -> (N, 16, 4, 4) self.fc_model = nn.Sequential( nn.Linear(256, 120), # (N, 256) -> (N, 120) nn.Tanh(), nn.Linear(120, 84), # (N, 120) -> (N, 84) nn.Tanh(), nn.Linear(84, 10)) # (N, 84) -> (N, 10)) #10 classes
def forward(self, x):
x = self.cnn_model(x)
x = x.view(x.size(0), -1)
x = self.fc_model(x)
nn.Sequential object executes the series of transformations contained within it, in a sequential manner. In our
LeNet class, we will implement two functions
__init__ function (constructor function) and
forward function. In
_init_ function, we define convolution and downsampling operations inside
self.cnn_model and the fully connected network is defined inside
To define a convolution layer we are using
nn.Conv2d() inside our custom class.
nn.Conv2d() applies a 2D convolution over an input signal composed of several input planes. It takes a few input parameters such as:
- in_channels (int) – Number of channels in the input image
- out_channels (int) – Number of channels produced by the convolution
- kernel_size (int or tuple) – Size of the convolving kernel
forward function, we will pass the inputs through the convolution block and its output it is flattened or reshaped using
view() to match the input dimensions required for the fully connected block of neural network.
Setting the GPU device
To check how many CUDA supported GPU’s are connected to the machine, you can use below code snippet. If you are executing the code in Colab you will get 1, that means that the Colab virtual machine is connected to one GPU.
torch.cuda is used to set up and run CUDA operations. It keeps track of the currently selected GPU.
>> print(torch.cuda.device_count()) 1
The important thing to note is that we can reference this CUDA supported GPU card to a variable and use this variable for any Pytorch Operations. All CUDA tensors you allocate will be created on that device.
>> device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") >> print(device) cuda: 0
Train and Evaluate LeNet – 5
To train our convolution neural network, we need to define the loss function and optimization algorithm i.e… a variant of gradient descent algorithm. In this case, we will use
CrossEntropyLoss to calculate the loss of the network and make use of the
Adam optimizer to find the global minima.
#create the model object and move it to GPU net = LeNet().to(device) loss_fn = nn.CrossEntropyLoss() opt = optim.Adam(net.parameters())
Before we start training our network, let’s define a custom function to calculate the accuracy of our network.
# function to do evaluation (calculate the accuracy) in gpu def evaluation(dataloader): total, correct = 0, 0
#keeping the network in evaluation mode
net.eval()for data in dataloader: inputs, labels = data #moving the inputs and labels to gpu inputs, labels = inputs.to(device), labels.to(device) outputs = net(inputs) _, pred = torch.max(outputs.data, 1) total += labels.size(0) correct += (pred == labels).sum().item() return 100 * correct / total
evaluation function takes a data loader as an input parameter. In this function,
- We are setting the network to evaluation mode.
- Iterating through all the batches present in the data loader.
- Invoking our model on the inputs and getting the outputs.
- Computing the predicted class.
- Calculating the total number of correctly predicted classes and returning the final percentage.
We will write a simple for loop, to train the network
%%time loss_arr =  loss_epoch_arr =  max_epochs = 10 for epoch in range(max_epochs): #iterate through all the batches in each epoch for i, data in enumerate(trainloader, 0):
#keeping the network in training mode
inputs, labels = data
#moving the input and labels to gpu
inputs, labels = inputs.to(device), labels.to(device)
#clear the gradients
outputs = net(inputs)
loss = loss_fn(outputs, labels)
In our training loop,
- For each epoch, we iterate through the data loader.
- Get the input data and labels, move them to GPU (if available).
- Reset any previous gradient present in the optimizer, before computing the gradient for the next batch.
- Execute the forward pass and get the output.
- Compute the loss based on the predicted output and actual output.
- Backpropagate the gradients.
- At the end of each epoch, we are bookkeeping the loss values for plotting and printing the progress messages.
Hyperparameters used in the network are as follows:
- Learning rate: 0.001
- Loss function: CrossEntropyLoss
- Optimizer: Adaptive moment estimation (Adam)
- Epochs = 10
Once you start the training of our model, you should see the progress messages:
Epoch: 0/10, Test acc: 79.09, Train acc: 80.22 Epoch: 1/10, Test acc: 82.65, Train acc: 83.46 Epoch: 2/10, Test acc: 84.83, Train acc: 85.87 Epoch: 3/10, Test acc: 85.75, Train acc: 87.00 Epoch: 4/10, Test acc: 85.93, Train acc: 87.48 Epoch: 5/10, Test acc: 86.57, Train acc: 88.28 Epoch: 6/10, Test acc: 87.10, Train acc: 88.87 Epoch: 7/10, Test acc: 87.35, Train acc: 89.31 Epoch: 8/10, Test acc: 87.75, Train acc: 89.61 Epoch: 9/10, Test acc: 88.05, Train acc: 90.23
To evaluate the model on the test dataset, just call our
evaluation function and pass the test data loader.
#test on testing data >> print('Test acc: %0.2f, Train acc: %0.2f' % (evaluation(testloader), evaluation(trainloader))) Test acc: 88.96, Train acc: 92.15
Just by running the network for 10 epochs, I am able to achieve 88.96% on the test data.
Visualization Loss Plot
We can plot the loss of the network against each epoch to check the model performance.
#plotting the loss chart plt.plot(loss_epoch_arr) plt.xlabel("Epoch") plt.ylabel("Loss") plt.show()
There you have it, we have successfully built our first image classification model for multi-class classification using Pytorch. The entire code discussed in the article is present in this GitHub repository. Feel free to fork it or download it.
Where to go from here?
For the things we have to learn before we can do them, we learn by doingthem.Aristotle, The Nicomachean Ethics
In this article, we have discussed the basics of image classification using Pytorch. If you want to improve the performance of the network you can try out:
- Modify LeNet to work with ReLU instead of Tanh: Compare the training time and final loss of network.
- Use L2 regularisation: In order to avoid overfitting, you can use weight_decay in torch.optim to add L2 regularisation.
- Different optimizer: Instead of using Adam Optimizer, you can use SGD with/without momentum.
Using this framework you can build a classifier for different popular datasets such as CIFAR10 or MNIST, the important point to keep note is that CIFAR10 images have 3 channels (RGB image) instead of 1 in the case of MNIST and FashionMNIST.
- Getting Started With Pytorch In Google Collab With Free GPU
- Building a Feedforward Neural Network using Pytorch NN Module
In this post, we discussed the FashionMNIST dataset and the need to replace MNIST dataset. Then we have seen how to download and visualize the FashionMNIST dataset. After that, we have discussed the architecture of LeNet-5 and trained the LeNet-5 on GPU using Pytorch
nn.Module. If you any issues or doubts while implementing the above code, feel free to ask them in the comment section below or send me a message in LinkedIn citing this article.
Note: This is a guest post, and opinion in this article is of the guest writer. If you have any issues with any of the articles posted at www.marktechpost.com please contact at [email protected]m