COCO – A Definitive Dataset For Deep Learning On Images


In Deep Learning, the biggest challenge is to collect data. We need a massive dataset to train our model. Like in the classification problem, we need many images of the object and their respective labels. Manually collecting the images and labeling them is a labor-intensive task, which worsens when dealing with image segmentation or object detection like problems. There is well-defined dataset COCO which researchers and practitioners generally reference for such purpose.

What is COCO?

COCO stands for the common object in context, and it means that images in the dataset are objects from everyday scenes.

It is a large-scale image dataset with annotations for object detection, image segmentation, image labeling, and keypoints(for image positioning). The human force prepares these annotations of all the images. The COCO team by hands prepare all these segments, label, keypoints and many more. That is why COCO is reliable to use and enables us to create robust models.

To download this dataset, go through the link below


What are annotations?

Annotation folder consists of. JSON files, each with a specific task. Each file consists of various sections like info, license images, annotations. Below is the smaller version of the JSON file.

The annotation section in the above snapshot contains a bunch of sub-segments, with each containing

  • image id: id of the respective image,
  • category id: category id of the object,
  • Iscrowd: 1 (whether there are a bunch of instances in the image) else 0,
  • Segmentation:  consist of consecutive x and y coordinate of the picture to be highlighted,
  • bbox: a rectangle coordinates that are to be made around the object.

Above discussed annotation file was specified with the object segmentation task. Other files have their specified job.

Now, if we apply this annotation to their respective images, we will have the desired dataset. But the question arises of how to use these annotations on the images?

For achieving this, COCO provides us two API……



To work with these API’s we have to install a package named PYCOCOTOOLS.

Let’s begin with using one of the API to create a person binary masks:

Import the required packages.

from pycocotools.coco import COCO #from pycocotools importing the COCO class
import numpy as np
from shutil import copy  #copying a file from a source path to destination path
from tqdm import tqdm    #for keeping the track of the processing
import os                #for reading and writing the images from/to a particular path

import matplotlib.pyplot as plt

Create a root folder that consists of three sub-folders, as shown.

mrcnn_data is the primary dataset having two subfolders, ‘input’ and ‘output.’ Load the folders’ path to read images and annotations and write the mask images.

dataset_dir = os.path.join(os.getcwd(), 'minor_dataset')   #getting the path of the root folder
train_dir   = os.path.join(dataset_dir,    'train2017')    #joing the path of root folder with the downloaded image folder
annot_dir   = os.path.join(dataset_dir,  'annotations')    #joining the path of the root folder with the downloaded annotations folder
input_save_dir  = os.path.join(dataset_dir, 'mrcnn_data', 'input')  #joining the root folder with the mrcnn_data containing sub folder input for input images
output_save_dir = os.path.join(dataset_dir, 'mrcnn_data', 'output') #joining the root folder with the mrcnn_data containing sub folder output for output images

The Below code used the in-build functions of coco API to load the images using the category names and ids.

# function to copy file from src path to the destination path
def copy_file(src, dst):
    src = os.path.join(train_dir, src)
    ext = src.split('.')[-1]

    dst = '{}.{}'.format(dst, ext)
    dst = os.path.join(input_save_dir, dst) 
    copy(src, dst)

#creating COCO object with required annoted .JSON file
coco = COCO(os.path.join(annot_dir, 'instances_train2017.json'))

filter_classes = ['person'] #classes name whose respective binary mas we need
category_ids = coco.getCatIds(catNms = filter_classes) #this functions returns the category ids of the repective class
image_ids = coco.getImgIds(catIds = category_ids)     #this function returns the image ids with given category id

images = coco.loadImgs(image_ids)   #this function loads the images of the given image ids

We need to load the annotations of the respective image with the given category for further processing.

#processing the task of finding the binary mask of persons
for idx, image in tqdm(enumerate(images)):
    _id = image['id']  
    ann_ids = coco.getAnnIds(imgIds = [_id], catIds = category_ids, iscrowd = None) #retrieving the annotations ids with the given image and category id
    anns = coco.loadAnns(ann_ids)   #loading the annotations of the given ids
    if not anns:

    masks = list(map(coco.annToMask, anns))  #maping the annToMask functions to the annotations to recrive all the masks of a respeective images
    final_mask = np.zeros_like(masks[0])  #creating matrix
    for mask in masks:
        final_mask = np.bitwise_or(final_mask, mask) #applying the mask to the final mask to and oring it bitwise to achieve a binary mask
    # Save image
    copy_file(image['file_name'], idx) #user define function shown above to copy file form path to path

    # Save binary mask
    file_name = '{}.png'.format(idx)
    plt.imsave(os.path.join(output_save_dir, file_name), final_mask) #saving respective person masks of the images

Input images :

Respective output masks :


Using the COCO powerful API’s we can make out the custom desired dataset we need. There are many more functions in the COCO API, which can make some genuine datasets for segmentation, detection, and many more.

To learn more about COCO API’s   go through the below links:

Link (COCO API) :

Link (MASK API):



Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.