Introduction to Image Classification using Pytorch to Classify FashionMNIST Dataset

In this blog post, we will discuss how to build a Convolution Neural Network that can classify Fashion MNIST data using Pytorch on Google Colaboratory (Free GPU). The way we do that is, first we will download the data using Pytorch DataLoader class and then we will use LeNet-5 architecture to build our model. Finally, we will train our model on GPU and evaluate it on the test data.


This tutorial assumes that the reader has the basic knowledge of convolution neural networks and know the basics of Pytorch tensor operations with CUDA support.

Image Classification

Image Classification is a task of assigning a class label to the input image from a list of given class labels. Here the idea is that you are given an image and there could be several classes that the image belong to. The task in Image Classification is to predict a single class label for the given image.

Fashion MNIST Dataset

Fashion-MNIST is a dataset of Zalando‘s article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28×28 grayscale image, associated with a label from 10 classes.Fashion-MNIST intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits. According to the creators of Fashion-MNIST, here are some good reasons to replace MNIST dataset:

  • MNIST is too easy. Convolutional nets can achieve 99.7% on MNIST.
  • MNIST is overused. In this April 2017 Twitter thread, Google Brain research scientist and deep learning expert Ian Goodfellow calls for people to move away from MNIST.
  • MNIST cannot represent modern CV tasks, as noted in this April 2017 Twitter thread, deep learning expert/Keras author François Chollet.

Run this notebook in Colab

All the code discussed in the article is present on my GitHub. You can open the code notebook with any setup by directly opening my Jupyter Notebook on Github with Colab which runs on Google’s Virtual Machine. Click here, if you just want to quickly open the notebook and follow along with this tutorial. To learn more about how to execute Pytorch tensors in Colab read my blog post.

Rest of the article is structured as follows

  • Download dataset (Using Dataloader)
  • Visualize dataset
  • LeNet
  • Setting the GPU device
  • Train and Evaluate LeNet
  • Visualization Loss Plot
  • Where to go from here?
  • Conclusion

Import Libraries

Before we start building our network, first we need to import the required libraries. We are importing the numpy because we need to convert tensor image to numpy format so that we can use matplotlib to visualize the images. Importing torch for all things related to Pytorch and torchvision to download the Fashion MNIST dataset.

import torchvision
import torchvision.transforms as transforms
import torch
import matplotlib.pyplot as plt
import numpy as np
plt.figure(figsize = (3,3)) #define the image size

Download Dataset

The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision. All the images present in the FashionMNIST dataset are stored in PIL format. So we are using transform function to transform the input images to Pytorch tensors.

#transforming the PIL Image to tensors
trainset = torchvision.datasets.FashionMNIST(root = "./data", train = True, download = True, transform = transforms.ToTensor())
testset = torchvision.datasets.FashionMNIST(root = "./data", train = False, download = True, transform = transforms.ToTensor())

Once we download the training data, we will use to load the dataset. DataLoader also gives us the ability to iterate over the dataset.

#loading the training data from trainset
trainloader =, batch_size=4, shuffle = True)
#loading the test data from testset
testloader =, batch_size=4, shuffle=False)

Visualize dataset

To visualize the dataset, we will implement a custom function imshow. In the FashionMNIST dataset, there are 10 classes, these will be represented as indices starting from 0 to 9.

classes = ('T-Shirt','Trouser','Pullover','Dress','Coat','Sandal','Shirt','Sneaker','Bag','Ankle Boot')
 def imshow(img):
     npimg = img.numpy() #convert the tensor to numpy for displaying the image
     #for displaying the image, shape of the image should be height * width * channels 
     plt.imshow(np.transpose(npimg, (1, 2, 0)))

In this function first, we will convert the Pytorch tensor image to numpy image and then we transpose the image such that image size should be height, width, and channels. After that, we will use matplotlib to display the image.

Images from the training dataset

LeNet – 5

LeNet-5 Original Image from Paper
LeNet-5 architecture as published in the original paper.

In this tutorial, we will use LeNet-5 ( 7 layers Convolutional network) Architecture to classify Fashion Images. Yann LeCun, Leon Bottou, Yosuha Bengio, and Patrick Haffner proposed a Convolution Network called LeNet-5 to automatically classify hand-written digits on bank cheques in the United States. LeNet-5 architecture is fairly simple. It is a 7 layered network architecture excluding the inputs consists of two alternate convolution and pooling layers followed by three fully connected layers at the end. LeNet-5 uses average pooling for downsampling of features. Tanh and Sigmoid activations are used in this network.

To create LeNet-5 architecture in Pytorch, we will use nn.Module class and nn.Sequential API to create a custom class called LeNet.

class LeNet(nn.Module):
     def init(self):
         super(LeNet, self).init()
         self.cnn_model = nn.Sequential(
             nn.Conv2d(3, 6, kernel_size = 5), (N, 1, 28, 28) -> (N, 6, 24, 24)
             nn.AvgPool2d(2, stride = 2), #(N, 6, 24, 24) -> (N, 6, 12, 12)
             nn.Conv2d(6, 16, kernel_size = 5), #(N, 6, 12, 12) -> (N, 6, 8, 8)
             nn.AvgPool2d(2, stride = 2)) #(N, 6, 8, 8) -> (N, 16, 4, 4)
          self.fc_model = nn.Sequential(
             nn.Linear(256, 120), # (N, 256) -> (N, 120)
             nn.Linear(120, 84), # (N, 120) -> (N, 84)
             nn.Linear(84, 10))  # (N, 84)  -> (N, 10)) #10 classes
      def forward(self, x):     
           x = self.cnn_model(x)     
           x = x.view(x.size(0), -1)     
           x = self.fc_model(x)     
           return x

nn.Sequential object executes the series of transformations contained within it, in a sequential manner. In our LeNet class, we will implement two functions __init__ function (constructor function) and forward function. In _init_ function, we define convolution and downsampling operations inside self.cnn_model and the fully connected network is defined inside self.fc_model.

To define a convolution layer we are using nn.Conv2d() inside our custom class. nn.Conv2d() applies a 2D convolution over an input signal composed of several input planes. It takes a few input parameters such as:

  • in_channels (int) – Number of channels in the input image
  • out_channels (int) – Number of channels produced by the convolution
  • kernel_size (int or tuple) – Size of the convolving kernel

In the forward function, we will pass the inputs through the convolution block and its output it is flattened or reshaped using view() to match the input dimensions required for the fully connected block of neural network.

Setting the GPU device

To check how many CUDA supported GPU’s are connected to the machine, you can use below code snippet. If you are executing the code in Colab you will get 1, that means that the Colab virtual machine is connected to one GPU. torch.cuda is used to set up and run CUDA operations. It keeps track of the currently selected GPU.

>> print(torch.cuda.device_count())

The important thing to note is that we can reference this CUDA supported GPU card to a variable and use this variable for any Pytorch Operations. All CUDA tensors you allocate will be created on that device.

>> device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
>> print(device)
cuda: 0

Train and Evaluate LeNet – 5

To train our convolution neural network, we need to define the loss function and optimization algorithm i.e… a variant of gradient descent algorithm. In this case, we will use CrossEntropyLoss to calculate the loss of the network and make use of the Adam optimizer to find the global minima.

#create the model object and move it to GPU
net = LeNet().to(device)
loss_fn = nn.CrossEntropyLoss()
opt = optim.Adam(net.parameters())

Before we start training our network, let’s define a custom function to calculate the accuracy of our network.

# function to do evaluation (calculate the accuracy) in gpu
def evaluation(dataloader):
     total, correct = 0, 0
     #keeping the network in evaluation mode 
     for data in dataloader:
         inputs, labels = data
         #moving the inputs and labels to gpu
         inputs, labels =,
         outputs = net(inputs)
         _, pred = torch.max(, 1)
         total += labels.size(0)
         correct += (pred == labels).sum().item()
     return 100 * correct / total

The evaluation function takes a data loader as an input parameter. In this function,

  • We are setting the network to evaluation mode.
  • Iterating through all the batches present in the data loader.
  • Invoking our model on the inputs and getting the outputs.
  • Computing the predicted class.
  • Calculating the total number of correctly predicted classes and returning the final percentage.

We will write a simple for loop, to train the network

 loss_arr = []
 loss_epoch_arr = []
 max_epochs = 10
 for epoch in range(max_epochs):
     #iterate through all the batches in each epoch
     for i, data in enumerate(trainloader, 0):
     #keeping the network in training mode     
     inputs, labels = data     
     #moving the input and labels to gpu     
     inputs, labels =,     
     #clear the gradients     
     #forward pass     
     outputs = net(inputs)      
     loss = loss_fn(outputs, labels)     
     #backward pass     

In our training loop,

  • For each epoch, we iterate through the data loader.
  • Get the input data and labels, move them to GPU (if available).
  • Reset any previous gradient present in the optimizer, before computing the gradient for the next batch.
  • Execute the forward pass and get the output.
  • Compute the loss based on the predicted output and actual output.
  • Backpropagate the gradients.
  • At the end of each epoch, we are bookkeeping the loss values for plotting and printing the progress messages.

Hyperparameters used in the network are as follows:

  • Learning rate: 0.001
  • Loss function: CrossEntropyLoss
  • Optimizer: Adaptive moment estimation (Adam)
  • Epochs = 10

Once you start the training of our model, you should see the progress messages:

Epoch: 0/10, Test acc: 79.09, Train acc: 80.22
Epoch: 1/10, Test acc: 82.65, Train acc: 83.46
Epoch: 2/10, Test acc: 84.83, Train acc: 85.87
Epoch: 3/10, Test acc: 85.75, Train acc: 87.00
Epoch: 4/10, Test acc: 85.93, Train acc: 87.48
Epoch: 5/10, Test acc: 86.57, Train acc: 88.28
Epoch: 6/10, Test acc: 87.10, Train acc: 88.87
Epoch: 7/10, Test acc: 87.35, Train acc: 89.31
Epoch: 8/10, Test acc: 87.75, Train acc: 89.61
Epoch: 9/10, Test acc: 88.05, Train acc: 90.23

To evaluate the model on the test dataset, just call our evaluation function and pass the test data loader.

#test on testing data
>> print('Test acc: %0.2f, Train acc: %0.2f' % (evaluation(testloader), evaluation(trainloader)))
Test acc: 88.96, Train acc: 92.15 

Just by running the network for 10 epochs, I am able to achieve 88.96% on the test data.

Visualization Loss Plot

We can plot the loss of the network against each epoch to check the model performance.

#plotting the loss chart 
Loss Plot

There you have it, we have successfully built our first image classification model for multi-class classification using Pytorch. The entire code discussed in the article is present in this GitHub repository. Feel free to fork it or download it.

Where to go from here?

For the things we have to learn before we can do them, we learn by doingthem.

Aristotle, The Nicomachean Ethics

In this article, we have discussed the basics of image classification using Pytorch. If you want to improve the performance of the network you can try out:

  • Modify LeNet to work with ReLU instead of Tanh: Compare the training time and final loss of network.
  • Use L2 regularisation: In order to avoid overfitting, you can use weight_decay in torch.optim to add L2 regularisation.
  • Different optimizer: Instead of using Adam Optimizer, you can use SGD with/without momentum.

Using this framework you can build a classifier for different popular datasets such as CIFAR10 or MNIST, the important point to keep note is that CIFAR10 images have 3 channels (RGB image) instead of 1 in the case of MNIST and FashionMNIST.

Recommended Reading


In this post, we discussed the FashionMNIST dataset and the need to replace MNIST dataset. Then we have seen how to download and visualize the FashionMNIST dataset. After that, we have discussed the architecture of LeNet-5 and trained the LeNet-5 on GPU using Pytorch nn.Module. If you any issues or doubts while implementing the above code, feel free to ask them in the comment section below or send me a message in LinkedIn citing this article.

Note: This is a guest post, and opinion in this article is of the guest writer. If you have any issues with any of the articles posted at please contact at

Niranjan Kumar is working as a Senior Consultant Data Science at Allstate India. He is passionate about Deep Learning and Artificial Intelligence. He writes about the latest tools and technologies in the field of Deep Learning. He is one of the top writers in Artificial Intelligence at Medium. A Graduate of Praxis Business School, Niranjan Kumar holds a degree in Data Science. Feel free to contact him via LinkedIn for collaboration on projects

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...