# Regression using Tensorflow and multiple distinctive attributes

As we did in the previous tutorial will use Gradient descent optimization algorithm. Additionally, we will divide our data set into three slices, Training, Testing, and validation. In our example, we have data in CSV format with columns “height weight age projects salary”. In this tutorial, we will use multiple features to train our model. You download data using this link: https://drive.google.com/file/d/1Gx0riTlJHt9o_VyokrKNbj384AhwXpAW/view?usp=sharing

## Importing essential libraries

``````from __future__ import print_function

import math ##For basic mathematical operations

from IPython import display ## Plot setup for Ipython
from matplotlib import cm ##  Colormap reference
from matplotlib import gridspec ##plot setups
from matplotlib import pyplot as plt ##plot setups
import numpy as np
import pandas as pd
from sklearn import metrics
import tensorflow as tf
from tensorflow.python.data import Dataset

drive.mount('/content/gdrive') ## Mounting drive

tf.logging.set_verbosity(tf.logging.ERROR)
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format``````

``````dataframe = pd.read_csv("/content/gdrive/My Drive/Colab Notebooks/TENSOR_FLOW/train_dataset.csv", sep=",")
#dataframe["height"] = dataframe["height"]*-1
dataframe = dataframe.reindex(
np.random.permutation(dataframe.index))
`height    weight  age projects    salary 1623    117.2   33.1    7   2015    279600 12851    121.8   37.4    37  1569    286200 10236    119.9   38.9    24  235 136800 2783    117.7   34.1    29  1216    134300 16170    122.5   37.8    40  1675    330000`

## Step2: Preprocess features

This step is optional and can be used to create synthetic features we will cover this in upcoming posts.

``````def preprocess_features(dataframe):

selected_features = dataframe[
["height",
"weight",
"age",
"projects"]]

processed_features = selected_features.copy()
return processed_features

def preprocess_targets(dataframe):

output_targets = pd.DataFrame()
# Scale the target to be in units of thousands of dollars.
output_targets["salary"] = (
dataframe["salary"] / 1000.0)
return output_targets``````

## Step3: Split data

``````training_examples = preprocess_features(dataframe.head(12000))

validation_examples = preprocess_features(dataframe.tail(5000))
validation_targets = preprocess_targets(dataframe.tail(5000))

print("Training examples summary:")
display.display(training_examples.describe())
print("Validation examples summary:")
display.display(validation_examples.describe())

print("Training targets summary:")
display.display(training_targets.describe())
print("Validation targets summary:")
display.display(validation_targets.describe())``````
`Training examples summary: height    weight  age projects count    12000.0 12000.0 12000.0 12000.0 mean    119.6   35.6    28.6    1425.3 std    2.0 2.1 12.6    1112.5 min    114.5   32.5    1.0 3.0 25%    118.0   33.9    18.0    793.0 50%    118.5   34.2    29.0    1170.0 75%    121.8   37.7    37.0    1726.0 max    124.3   42.0    52.0    35682.0 Validation examples summary: height    weight  age projects count    5000.0  5000.0  5000.0  5000.0 mean    119.5   35.6    28.5    1439.9 std    2.0 2.1 12.5    1228.5 min    114.3   32.6    1.0 11.0 25%    118.0   33.9    18.0    779.0 50%    118.5   34.2    29.0    1159.0 75%    121.8   37.7    37.0    1713.0 max    124.3   42.0    52.0    28566.0 Training targets summary: salary count    12000.0 mean    207.2 std    115.7 min    15.0 25%    119.7 50%    180.4 75%    264.6 max    500.0 Validation targets summary: salary count    5000.0 mean    207.5 std    116.6 min    15.0 25%    119.0 50%    179.9 75%    265.7 max    500.0`

## Step4: Feature construction

``````def construct_feature_columns(input_features):
return set([tf.feature_column.numeric_column(my_feature)
for my_feature in input_features])

def my_input_fn(features, targets, batch_size=1, shuffle=True, num_epochs=None):

# Convert pandas data into a dict of np arrays.
features = {key:np.array(value) for key,value in dict(features).items()}

# Construct a dataset, and configure batching/repeating.
ds = Dataset.from_tensor_slices((features,targets)) # warning: 2GB limit
ds = ds.batch(batch_size).repeat(num_epochs)

# Shuffle the data, if specified.
if shuffle:
ds = ds.shuffle(10000)

# Return the next batch of data.
features, labels = ds.make_one_shot_iterator().get_next()
return features, labels``````

## Step5: Training the model

``````def train_model(
learning_rate,
steps,
batch_size,
training_examples,
training_targets,
validation_examples,
validation_targets):

periods = 10
steps_per_period = steps / periods

# Create a linear regressor object.
linear_regressor = tf.estimator.LinearRegressor(
feature_columns=construct_feature_columns(training_examples),
optimizer=my_optimizer
)

# Create input functions.
training_input_fn = lambda: my_input_fn(training_examples,
training_targets["salary"],
batch_size=batch_size)
predict_training_input_fn = lambda: my_input_fn(training_examples,
training_targets["salary"],
num_epochs=1,
shuffle=False)
predict_validation_input_fn = lambda: my_input_fn(validation_examples,
validation_targets["salary"],
num_epochs=1,
shuffle=False)

# Train the model
print("Training model...")
print("RMSE (on training data):")
training_rmse = []
validation_rmse = []
for period in range (0, periods):
# Train the model
linear_regressor.train(
input_fn=training_input_fn,
steps=steps_per_period,
)

training_predictions = linear_regressor.predict(input_fn=predict_training_input_fn)
training_predictions = np.array([item['predictions'] for item in training_predictions])

validation_predictions = linear_regressor.predict(input_fn=predict_validation_input_fn)
validation_predictions = np.array([item['predictions'] for item in validation_predictions])

training_root_mean_squared_error = math.sqrt(
metrics.mean_squared_error(training_predictions, training_targets))
validation_root_mean_squared_error = math.sqrt(
metrics.mean_squared_error(validation_predictions, validation_targets))

print("  period %02d : %0.2f" % (period, training_root_mean_squared_error))

training_rmse.append(training_root_mean_squared_error)
validation_rmse.append(validation_root_mean_squared_error)
print("Model training finished.")

# Output a graph of loss metrics over periods.
plt.ylabel("RMSE")
plt.xlabel("Periods")
plt.title("Root Mean Squared Error vs. Periods")
plt.tight_layout()
plt.plot(training_rmse, label="training")
plt.plot(validation_rmse, label="validation")
plt.legend()

return linear_regressor``````

## Supply features and train model

This is the step where we will supply multiple features that is “height”, “weight”, “age”, “projects” all at once.

``````minimal_features = ["height","weight","age","projects"]

assert minimal_features, "You must select at least one feature!"

minimal_training_examples = training_examples[minimal_features]
minimal_validation_examples = validation_examples[minimal_features]

train_model(
learning_rate=0.001,
steps=500,
batch_size=5,
training_examples=minimal_training_examples,
training_targets=training_targets,
validation_examples=minimal_validation_examples,
validation_targets=validation_targets)``````
`WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md https://github.com/tensorflow/addons If you depend on functionality not listed there, please file an issue.  Training model… RMSE (on training data):   period 00 : 173.38   period 01 : 173.80   period 02 : 187.86   period 03 : 167.92   period 04 : 186.66   period 05 : 165.39   period 06 : 165.32   period 07 : 159.16   period 08 : 166.23   period 09 : 157.86 Model training finished. h`
1. 2. Hi Nilesh, Nice work with the computer generated Bible. I had a similar thought with another piece of art. Wanted…

3. 4. 5. ##### Nilesh Kumar

I am Nilesh Kumar, a graduate student at the Department of Biology, UAB under the mentorship of Dr. Shahid Mukhtar. I joined UAB in Spring 2018 and working on Network Biology. My research interests are Network modeling, Mathematical modeling, Game theory, Artificial Intelligence and their application in Systems Biology.

I graduated with master’s degree “Master of Technology, Information Technology (Specialization in Bioinformatics)” in 2015 from Indian Institute of Information Technology Allahabad, India with GATE scholarship. My Master’s thesis was entitled “Mirtron Prediction through machine learning approach”. I worked as a research fellow at The International Centre for Genetic Engineering and Biotechnology, New Delhi for two years.