Introduction to Naive Bayes Classifiers

Naive Bayes is a term that is used for classification algorithms that are based on Bayes Theorem. It is a simple yet effective and commonly-used machine learning classifier that makes classifications using the Maximum A Posteriori rule in a Bayesian setting. The algorithm is Naive since it works on the assumption that any two features in a class are independent or unrelated to the presence of each other in the same class. Naive Bayes classifiers have been highly popular for text classification and work better than some very complicated algorithms, also being the traditional solution for problems such as spam detection.

What Is Conditional Probability?

Conditional probability is calculated for two or more events, and it is the probability of an outcome occurring, given that another event has already occurred. Taking two events, M and N, the conditional probability of event N is defined as the probability that event N will occur given the knowledge that event M has already happened. It is represented as P(N|M) and mathematically expressed by the formula:

P(N|M) = P(M and N)/P(M)

Bayes’ theorem is dependent on conditional probability and describes the likelihood of an event based on prior knowledge of conditions related to the event.


P(A|B) is the probability of hypothesis A given the data B called the posterior probability. P(B|A) is the likelihood of data B given that hypothesis A was true. Whereas P(A) is the probability of hypothesis A being true, and P(B) is the probability of hypothesis B being true.

Working of Naive Bayes Classifiers

The input features in our training set are known as evidence, and their respective labels are known as outcomes. Using conditional probability, we calculate the likelihood of the evidence given the outcomes, denoted as P(Evidence|Outcome). Our goal is to determine the likelihood of an outcome concerning the evidence, denoted as P(Outcome|Evidence). Considering X to denote Evidence and Y to denote Outcome:-

P(Evidence|Outcome) is therefore P(X|Y), and is expressed as:

P(X|Y) = (P(Y|X) * P(X)) / P(Y) ( estimated from training data.)

P(Outcome|Evidence) is thus P(Y|X), and is expressed as:

P(Y|X) = (P(X|Y) * P(Y)) / P(X) ( predicted from the test data.)

Implementing Naive Bayes Classifiers

We will use the naive Bayes classifier to work on the MNIST dataset and build a confusion matrix to determine the model’s performance.

Write the following code to implement the model:

#importing important libraries
from sklearn.datasets import load_digits
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import cross_val_score

digits = load_digits()
X =
Y =


from sklearn.naive_bayes import MultinomialNB
mnb = MultinomialNB()

import itertools
def plot_confusion_matrix(cm, classes,
                          title='Confusion matrix',
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
        print('Confusion matrix, without normalization')


    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt),
                 color="white" if cm[i, j] > thresh else "black")

    plt.ylabel('True label')
    plt.xlabel('Predicted label')

#printing confusion matrix
from sklearn.metrics import confusion_matrix
ypred = mnb.predict(X)
cnf_matrix = confusion_matrix(Y,ypred)
plot_confusion_matrix(cnf_matrix,classes=np.arange(10),normalize=False,title="Confusion Matrix for MNIST",
from sklearn.metrics import classification_report

Visit the sckit-learn official documentation of naive bayes classifiers to know more.

Happy Learning!!

Nitish is a computer science undergraduate with keen interest in the field of deep learning. He has done various projects related to deep learning and closely follows the new advancements taking place in the field.

πŸš€ LLMWare Launches SLIMs: Small Specialized Function-Calling Models for Multi-Step Automation [Check out all the models]