In this article, we will learn how to use Deep Learning models for NLP. Since the model takes numerical vectors as input, we need to convert text to numbers. Let us go through the process of converting text into numbers.
Dataset used :
I used the Fake News dataset from Kaggle Datasets. This data set contains two CSV files, fake.csv and true.csv, which contain Fake and True news. Our aim is to train a model which detects fake news.
Preprocessing Text :
Our input to the model is text related to the news, and the target is a label (0 or 1). Now we aim to convert text to numbers. There are some predefined methods for this. Let’s go through them.
- First, we clean the text. Cleaning the text includes converting the entire text into small letters, removing punctuation marks, and many more techniques.
- Tokenize the text and converting the text to a sequence. Tokenizing means we convert each word into a number whose value is the word’s index value in the dictionary of words.
- In this step, we pad the sequence: We pad the sequence because the input dimension should not vary.
- In this pass the sequence through a Word Embedding.
Word Embedding returns a two-dimensional vector that contains a sequence of vectors where each vector numerically represents the meaning of the word. There are some pre-trained Word Embeddings that were trained on very large data sets that are available. One of them is GloVe, Word2Vec, etc.
Fake News Detection with Artificial Neural Network :
Now let us train an ANN model which detects Fake News using TensorFlow2.0. Steps involved in this are
- Preprocessing the Text
- Developing the Model
- Training the Model
Preprocessing the Text:
Python implementation to this is as follows.
import numpy as np
import tensorflow as tf
import pandas as pd
from sklearn.model_selection import train_test_split
from tensorflow.keras import models,layers
df1 = pd.read_csv('Fake.csv')
df2 = pd.read_csv('True.csv')
df2['target'] = 1
df1['target'] = 0
frames = [df1,df2]
df = pd.concat(frames)
df['news'] = df['title']+df['text']
df.drop(labels=['title','text'],axis=1,inplace=True)
df.drop(labels=['subject','date'],axis=1,inplace=True)
df = df.sample(frac = 1)
X_train, X_test, y_train, y_test = train_test_split(df.news, df.target, test_size=0.1, random_state=37)
tk = Tokenizer(num_words=1000,
filters='!"#$%&()*+,-./:;<=>?@[\]^_`{"}~\t\n',lower=True, split=" ")
tk.fit_on_texts(X_train)
X_train_seq = tk.texts_to_sequences(X_train)
X_test_seq = tk.texts_to_sequences(X_test)
X_train_seq_trunc = pad_sequences(X_train_seq, maxlen=100)
X_test_seq_trunc = pad_sequences(X_test_seq, maxlen=100)
Developing the Model:
We use a simple Deep Learning model i.e Logistic Regression. Logistic Regression works as shown below. It takes a vector as input and outputs a value between 0 and 1.
Model Architecture is as follows.
- Input Layer
- Embedding Layer
- Flatten Layer
- Dense Layer one unit ( Final Output )

Python implementation to this is as follows.
emb_model = models.Sequential()
emb_model.add(layers.Embedding(len(tokenizer.index_word), 8, input_length=100))
emb_model.add(layers.Flatten())
emb_model.add(layers.Dense(1, activation='sigmoid'))

Training the Model:
Different hyper-parameters used while compiling and fitting the model are
- Optimizer – Adam
- Loss – BinaryCrossentropy
- Metrics – Accuracy
- Batch size – 256
- Epochs – 10
Python implementation is as follows:
emb_model.compile(optimizer='adam',loss=tf.keras.losses.BinaryCrossentropy(),metrics=[tf.keras.metrics.BinaryAccuracy()])
emb_model.fit(x=X_train_seq_trunc,y=y_train,batch_size=256,epochs=3,validation_data=(X_test_seq_trunc,y_test))
Results:

Fake News Detection with Convolutional Neural Network :
Now let us train a CNN model which detects Fake News using TensorFlow2.0. Steps involved in this are
- Preprocessing the Text
- Developing the Model
- Training the Model
We use the same preprocessed Text. Hence the 1st step is the same in both cases.
Developing the Model :
Here we use a CNN( Convolutional Neural Network ) model. CNN works as shown below but the image shows a 2D convolution, 2D Max Pooling we perform 1D convolution, 1D Max Pooling.
Model Architecture is as follows.
- Input Layer
- Embedding Layer
- Convolution 1D with 16 filters
- Average pooling 1D
- Convolution 2D with 32 filters
- Average pooling 1D
- Flatten Layer
- Dense Layer 1 unit ( Final Output )

python implementation to CNN is as follows.
emb_model = models.Sequential()
emb_model.add(layers.Embedding(len(tk.index_word), 8, input_length=100))
emb_model.add(layers.Convolution1D(16,4,activation='relu'))
emb_model.add(layers.AveragePooling1D())
emb_model.add(layers.Convolution1D(32,4,activation='relu'))
emb_model.add(layers.AveragePooling1D())
emb_model.add(layers.Flatten())
emb_model.add(layers.Dense(1, activation='sigmoid'))

Training the model:
python implementation is as follows.
emb_model.compile(optimizer='adam',loss=tf.keras.losses.BinaryCrossentropy() ,metrics=[tf.keras.metrics.BinaryAccuracy()])
emb_model.fit(x=X_train_seq_trunc,y=y_train,batch_size=128,epochs=10,validation_data=(X_test_seq_trunc,y_test))
Results:

Conclusion:
After looking into the results, we can conclude that the results were quite good, but there is a problem with this. The model does not consider the sequential relationship between the words. There are many more advanced models such as RNN, GRU, LSTM that consider the sequential relationship between the words which can be used for complex uses such as Machine translation, Question & Answering, ChatBot etc.
Shivesh Kodali is a content writing consultant at MarktechPost. He is currently pursuing his B.Tech in Electronics and Communication Engineering from Indian Institute of Technology(IIT), Kharagpur. He is a Deep Learning fanatic who loves understanding and implementing its complex algorithms.