Introduction to Boosting Machine Learning Algorithm: AdaBoost

0
459
Photo Credit: Pixabay

AdaBoost algorithm

Boosting is a supervised machine learning algorithm for primarily handling data which have outlier and variance. Recently, boosting algorithms gained enormous popularity in data science. Boosting algorithms combine multiple low accuracy models to create a high accuracy model. AdaBoost is example of Boosting algorithm. The important advantages of AdaBoost Low generalization error, easy to implement, works with a wide range of classifiers, no parameters to adjust. Especial attention is needed to data as this algorithm is sensitive to outliers.

Install Sklearn

# For linux os
$ sudo pip install sklearn

Building Model in Python

Let’s first install the required Sklearn libraries in Python using pip.


from sklearn.ensemble import AdaBoostClassifier
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import metrics

Loading iris dataset

There are 4 features (sepal length, sepal width, petal length, petal width) and a target four types of flower: Setosa, Versicolour, and Virginica.


iris = datasets.load_iris()
X = iris.data
y = iris.target
print X.view
<built-in method view of numpy.ndarray object at 0x7f9b3e0d7df0>
print X
[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
...
[6.3 2.5 5.  1.9]
 [6.5 3.  5.2 2. ]
 [6.2 3.4 5.4 2.3]
 [5.9 3.  5.1 1.8]]

Split the data set

For better model training we would need Tesing and trainig sclices of the data.


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

print "X_train:",len(X_train),"; X_test:",len(X_test),"; y_train:",len(y_train),"; y_test:",len(y_test)

X_train: 105 ; X_test: 45 ; y_train: 105 ; y_test: 4
70% training and 30% test

Building the Model, AdaBoost

Let’s build the AdaBoost Model using Scikit-learn using Decision Tree Classifier the default Classifier.


# Create adaboost object
Adbc = AdaBoostClassifier(n_estimators=50,
                         learning_rate=1.5)
# Train Adaboost 
model = Adbc.fit(X_train, y_train)

#Predict the response for test dataset
y_pred = model.predict(X_test)

Evaluation of the model

print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
#('Accuracy:', 0.8888888888888888)

Done!

For more tutorial and details about Adaboost please follow official Sklearn Adaboost web page.

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html#sklearn.ensemble.AdaBoostClassifier

For latest Data science and Machine learning follow these links.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.