Chapter 9. Classification Methods 5

Ensemble Methods

Table of Contents

Introducing ensemble methods: the bagging algorithm
The influence of the number of base classifiers to the performance of bagging
The influence of the number of base classifiers to the performance of the AdaBoost method
The influence of the number of base classifiers to the performance of the random forest

Introducing ensemble methods: the bagging algorithm

Description

The experiment introduces the use of ensemble methods, featuring the Bagging operator. The average classification error rate from 10-fold cross-validation on the Heart Disease data set is compared for a single decision stump and an ensemble of 10 decision stumps trained by bagging. The impurity measure used for the decision stumps is the gain ratio.

Input

Heart Disease [UCI MLR]

Note

The data set was donated to the UCI Machine Learning Repository by R. Detrano [Detrano et al.].

Output

Figure 9.1. The average classification error rate of a single decision stump obtained from 10-fold cross-validation.

The average classification error rate of a single decision stump obtained from 10-fold cross-validation.

Figure 9.2. The average classification error rate of the bagging algorithm obtained from 10-fold cross-validation, where 10 decision stumps were used as base classifiers.

The average classification error rate of the bagging algorithm obtained from 10-fold cross-validation, where 10 decision stumps were used as base classifiers.

Interpretation of the results

An ensemble of 10 decision stumps trained by bagging gives an average classification error rate that is about 7% better that those of a single decision stump.

Video

Workflow

ensemble_exp1.rmp

Keywords

bagging
ensemble methods
supervised learning
error rate
cross-validation
classification

Operators

Apply Model
Bagging
Decision Stump
Map
Multiply
Performance (Classification)
Read CSV
X-Validation