As we did in the previous tutorial will use Gradient descent optimization algorithm. Additionally, we will divide our data set into three slices, Training, Testing, and validation. In our example, we have data in CSV format with columns “height weight age projects salary”. In this tutorial, we will use multiple features to train our model. You download data using this link: https://drive.google.com/file/d/1Gx0riTlJHt9o_VyokrKNbj384AhwXpAW/view?usp=sharing
Importing essential libraries
from __future__ import print_function
import math ##For basic mathematical operations
from IPython import display ## Plot setup for Ipython
from matplotlib import cm ## Colormap reference
from matplotlib import gridspec ##plot setups
from matplotlib import pyplot as plt ##plot setups
import numpy as np
import pandas as pd
from sklearn import metrics
import tensorflow as tf
from tensorflow.python.data import Dataset
from google.colab import drive ## Loading data directly from Google Drive
drive.mount('/content/gdrive') ## Mounting drive
tf.logging.set_verbosity(tf.logging.ERROR)
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format
Step1: loading dataset and data randomization
dataframe = pd.read_csv("/content/gdrive/My Drive/Colab Notebooks/TENSOR_FLOW/train_dataset.csv", sep=",")
#dataframe["height"] = dataframe["height"]*-1
dataframe = dataframe.reindex(
np.random.permutation(dataframe.index))
dataframe.head()
height weight age projects salary
1623 117.2 33.1 7 2015 279600
12851 121.8 37.4 37 1569 286200
10236 119.9 38.9 24 235 136800
2783 117.7 34.1 29 1216 134300
16170 122.5 37.8 40 1675 330000
Step2: Preprocess features
This step is optional and can be used to create synthetic features we will cover this in upcoming posts.
def preprocess_features(dataframe):
selected_features = dataframe[
["height",
"weight",
"age",
"projects"]]
processed_features = selected_features.copy()
return processed_features
def preprocess_targets(dataframe):
output_targets = pd.DataFrame()
# Scale the target to be in units of thousands of dollars.
output_targets["salary"] = (
dataframe["salary"] / 1000.0)
return output_targets
Step3: Split data
training_examples = preprocess_features(dataframe.head(12000))
training_targets = preprocess_targets(dataframe.head(12000))
validation_examples = preprocess_features(dataframe.tail(5000))
validation_targets = preprocess_targets(dataframe.tail(5000))
print("Training examples summary:")
display.display(training_examples.describe())
print("Validation examples summary:")
display.display(validation_examples.describe())
print("Training targets summary:")
display.display(training_targets.describe())
print("Validation targets summary:")
display.display(validation_targets.describe())
Training examples summary:
height weight age projects
count 12000.0 12000.0 12000.0 12000.0
mean 119.6 35.6 28.6 1425.3
std 2.0 2.1 12.6 1112.5
min 114.5 32.5 1.0 3.0
25% 118.0 33.9 18.0 793.0
50% 118.5 34.2 29.0 1170.0
75% 121.8 37.7 37.0 1726.0
max 124.3 42.0 52.0 35682.0
Validation examples summary:
height weight age projects
count 5000.0 5000.0 5000.0 5000.0
mean 119.5 35.6 28.5 1439.9
std 2.0 2.1 12.5 1228.5
min 114.3 32.6 1.0 11.0
25% 118.0 33.9 18.0 779.0
50% 118.5 34.2 29.0 1159.0
75% 121.8 37.7 37.0 1713.0
max 124.3 42.0 52.0 28566.0
Training targets summary:
salary
count 12000.0
mean 207.2
std 115.7
min 15.0
25% 119.7
50% 180.4
75% 264.6
max 500.0
Validation targets summary:
salary
count 5000.0
mean 207.5
std 116.6
min 15.0
25% 119.0
50% 179.9
75% 265.7
max 500.0
Step4: Feature construction
def construct_feature_columns(input_features):
return set([tf.feature_column.numeric_column(my_feature)
for my_feature in input_features])
def my_input_fn(features, targets, batch_size=1, shuffle=True, num_epochs=None):
# Convert pandas data into a dict of np arrays.
features = {key:np.array(value) for key,value in dict(features).items()}
# Construct a dataset, and configure batching/repeating.
ds = Dataset.from_tensor_slices((features,targets)) # warning: 2GB limit
ds = ds.batch(batch_size).repeat(num_epochs)
# Shuffle the data, if specified.
if shuffle:
ds = ds.shuffle(10000)
# Return the next batch of data.
features, labels = ds.make_one_shot_iterator().get_next()
return features, labels
Step5: Training the model
def train_model(
learning_rate,
steps,
batch_size,
training_examples,
training_targets,
validation_examples,
validation_targets):
periods = 10
steps_per_period = steps / periods
# Create a linear regressor object.
my_optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0)
linear_regressor = tf.estimator.LinearRegressor(
feature_columns=construct_feature_columns(training_examples),
optimizer=my_optimizer
)
# Create input functions.
training_input_fn = lambda: my_input_fn(training_examples,
training_targets["salary"],
batch_size=batch_size)
predict_training_input_fn = lambda: my_input_fn(training_examples,
training_targets["salary"],
num_epochs=1,
shuffle=False)
predict_validation_input_fn = lambda: my_input_fn(validation_examples,
validation_targets["salary"],
num_epochs=1,
shuffle=False)
# Train the model
print("Training model...")
print("RMSE (on training data):")
training_rmse = []
validation_rmse = []
for period in range (0, periods):
# Train the model
linear_regressor.train(
input_fn=training_input_fn,
steps=steps_per_period,
)
training_predictions = linear_regressor.predict(input_fn=predict_training_input_fn)
training_predictions = np.array([item['predictions'][0] for item in training_predictions])
validation_predictions = linear_regressor.predict(input_fn=predict_validation_input_fn)
validation_predictions = np.array([item['predictions'][0] for item in validation_predictions])
training_root_mean_squared_error = math.sqrt(
metrics.mean_squared_error(training_predictions, training_targets))
validation_root_mean_squared_error = math.sqrt(
metrics.mean_squared_error(validation_predictions, validation_targets))
print(" period %02d : %0.2f" % (period, training_root_mean_squared_error))
training_rmse.append(training_root_mean_squared_error)
validation_rmse.append(validation_root_mean_squared_error)
print("Model training finished.")
# Output a graph of loss metrics over periods.
plt.ylabel("RMSE")
plt.xlabel("Periods")
plt.title("Root Mean Squared Error vs. Periods")
plt.tight_layout()
plt.plot(training_rmse, label="training")
plt.plot(validation_rmse, label="validation")
plt.legend()
return linear_regressor
Supply features and train model
This is the step where we will supply multiple features that is “height”, “weight”, “age”, “projects” all at once.
minimal_features = ["height","weight","age","projects"]
assert minimal_features, "You must select at least one feature!"
minimal_training_examples = training_examples[minimal_features]
minimal_validation_examples = validation_examples[minimal_features]
train_model(
learning_rate=0.001,
steps=500,
batch_size=5,
training_examples=minimal_training_examples,
training_targets=training_targets,
validation_examples=minimal_validation_examples,
validation_targets=validation_targets)
WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.
Training model…
RMSE (on training data):
period 00 : 173.38
period 01 : 173.80
period 02 : 187.86
period 03 : 167.92
period 04 : 186.66
period 05 : 165.39
period 06 : 165.32
period 07 : 159.16
period 08 : 166.23
period 09 : 157.86
Model training finished.
h

- Researchers Use Unsupervised Machine Learning To Understand And Visualize The Evolution In Classical Music
- A Nepalese Machine Learning (ML) Researcher Introduces ‘Papers-With-Video’ Browser Extension Which Allows Users To Access Videos Related To Research Papers On ArXiv
- Facebook AI In Collaboration With NYU Introduce New Machine Learning (ML) Models To Predict COVID Patient’s Health Condition
- Google AI Introduces ToTTo: A Controlled Table-to-Text Generation Dataset Using Novel Annotation Process
- Model Proposed By Columbia University Can Learn Predictability From Unlabelled Video
action complete
action started
action created
Very nice thanks for post
Hi Nilesh, Nice work with the computer generated Bible. I had a similar thought with another piece of art. Wanted…