Regression with Keras (Deep Learning with Keras – Part 3)

Image by Денис Марчук from Pixabay


After two introductory tutorials, its time to build our first neural network! The network we are building solves a simple regression problem. Regression is a process where a model learns to predict a continuous value output for a given input data, e.g. predict price, length, width, etc.

Problem Definition

Our objective is to build prediction model that predicts housing prices from a set of house features. We will use the Boston Housing dataset, which is collected by the U.S Census Service concerning housing in the area of Boston Mass. It was obtained from the StatLib archive, and has been used extensively throughout the literature to benchmark algorithms.

The dataset is small in size with only 506 cases. It contains 14 features described as follows:

  • CRIM: per capita crime rate by town
  • ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
  • INDUS: proportion of non-retail business acres per town.
  • CHAS: Charles River dummy variable (1 if tract bounds river; 0 otherwise)
  • NOX: nitric oxides concentration (parts per 10 million)
  • RM: average number of rooms per dwelling
  • AGE: proportion of owner-occupied units built prior to 1940
  • DIS: weighted distances to five Boston employment centres
  • RAD: index of accessibility to radial highways
  • TAX: full-value property-tax rate per $10,000
  • PTRATIO: pupil-teacher ratio by town
  • B: 1000(Bk — 0.63)² where Bk is the proportion of blacks by town
  • LSTAT: % lower status of the population
  • MEDV: Median value of owner-occupied homes in $1000’s

The goal behind our regression problem is to use the 13 features to predict the value of MEDV (which represents the housing price).

Loading the Data

Fortunately, Keras has a set of datasets already available. You can access them from keras.dataset.

from keras.datasets import boston_housing
(X_train, y_train), (X_test, y_test) = boston_housing.load_data()

# let us view on sample from the features
print(X_train[0], y_train[0])
# output
# (array([  1.23247,   0.     ,   8.14   ,   0.     ,   0.538  ,   6.142  ,
#         91.7    ,   3.9769 ,   4.     , 307.     ,  21.     , 396.9    ,
#         18.72   ]), 15.2)

The data is returned as two tuples representing the training and testing splits. The X_train and X_test contain the feature columns, while the y_train and y_test contain the label/output column.

Now for the next step…


As discussed in the previous article, we need to preprocess our data before feeding it to the network. Obviously, our data needs to be rescaled. Time for our buddy (StandarScaler) from the scikit-learn package.

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

# first we fit the scaler on the training dataset

# then we call the transform method to scale both the training and testing data
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

# a sample output

# array([-0.27224633, -0.48361547, -0.43576161, -0.25683275, -0.1652266 ,
#      -0.1764426 ,  0.81306188,  0.1166983 , -0.62624905, -0.59517003,
#       1.14850044,  0.44807713,  0.8252202 ])

Much better! Note that we only rescale the features and not the label column. This dataset is simple and no further preprocessing is needed. Time for the most exciting part…

Building the Model

We will build the model layer by layer in a sequential manner. To do so we have to import 1) the model class 2) and the layer class.

from keras import models, layers

Then, we create the model:

model = models.Sequential()

And we start adding the layers:

model.add(layers.Dense(8, activation='relu', input_shape=[X_train.shape[1]]))
model.add(layers.Dense(16, activation='relu'))

# output layer

Notice that we only specify the input shape for the first layer, all layers later will know automatically their input shape from the previous one.

The activation parameter here specifies the function we want to perform on top of the layer to calculate the output = activation(X * W + bias). Relu is a activation function that is used to break the linearity of the model. There are many other activation functions but Relu is one of the most popular in this kind of networks.

The output layer is simply a layer with one neuron and linear activation function since we are predicting only one continuous value.

Compiling the Model

After building the network we need to specify two important things: 1) the optimizer and 2) the loss function. The optimizer is responsible for navigating the space to choose the best model parameters, while the loss function is used by the optimizer to know how to move in the search space.

model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])

Keras supports other optimizers than RMSprop, and you are supposed to do a trial and error process to choose the best one for your problem. But normally RMSprop works fine with its default parameters.

The loss function used is the Mean Squared Error which is the average squared error a point is from the mean value. Keras supports other loss functions as well that are chosen based on the problem type.

The metrics shown here has nothing to do with the model training. It is just a user friendly value that is easier to evaluate than the main loss value. Example: an absolute value loss is easier for us to evaluate and make sense of than the squared error.

Model Training

Let the show begin… All is set, we just have to call the fit method to start training…

history =, y_train, validation_split=0.2, epochs=100)

The fit method takes both the features and the labels, the validation split indicates that the model has to keep 20% of the data as a validation set. The epochs indicate the number of iterations on the data.

When we run the above code we will get an output like the following:

Train on 323 samples, validate on 81 samples
Epoch 1/100
323/323 [==============================] - 0s 516us/step - loss: 581.6925 - mean_absolute_error: 22.2193 - val_loss: 648.7472 - val_mean_absolute_error: 23.6661
Epoch 2/100
323/323 [==============================] - 0s 48us/step - loss: 570.4857 - mean_absolute_error: 21.9364 - val_loss: 639.6261 - val_mean_absolute_error: 23.4443
Epoch 100/100 323/323 [==============================] - 0s 42us/step - loss: 19.0644 - mean_absolute_error: 3.0359 - val_loss: 20.4928 - val_mean_absolute_error: 3.4002 

Look how the validation loss decreased from 648 to 20. Impressive!

Let us plot the training and validation error convergence according to the epoch number:

We started with an error of 20K per prediction, and went down to around 3K. This is a very acceptable error value for a housing price.

Evaluation on Test Data

Model evaluation is super easy in Keras. Check the following:

model.evaluate(X_test_scaled, y_test)
# output
# [26.68399990306181, 3.7581424339144838]

The output values represent the loss (Mean Squarred Error) and the metrics (Mean Absolute Error).

Model Prediction

Using the model for prediction is simpler than you expect. Have a look:

# we get a sample data (the first 2 inputs from the training data)
to_predict = X_train_scaled[:2]
# we call the predict method
predictions = model.predict(to_predict)
# print the predictions
# output
# array([[13.272537], [39.808475]], dtype=float32)
# print the real values
# array([15.2, 42.3])

As shown in the graph, we are doing very good with only around 3K errors per house.

Final Thoughts

I hope you have enjoyed this article. The main objective was to show you how to build a regression model, evaluate it, and use it to predict new data values. In the next tutorial we will talk about classification, so stay tuned…

Note: This is a guest post, and opinion in this article is of the guest writer. If you have any issues with any of the articles posted at please contact at [email protected]