The influence of the number of base classifiers to the performance of bagging

Description

This process demonstrates the influence of the number of base classifiers on the classification error rate of bagging in the case of the Heart Disease data set. The base classifiers are decision stumps and the impurity measure used is the gain ratio. The number of base classifiers are increased from 1 to 20 in the experiment, and the average classification error rate of bagging from 10-fold cross-validation is determined in each step.

Input

Heart Disease [UCI MLR]

Note

The data set was donated to the UCI Machine Learning Repository by R. Detrano [Detrano et al.].

Output

Figure 9.3. The average classification error rate obtained from 10-fold cross-validation against the number of base classifiers.

The average classification error rate obtained from 10-fold cross-validation against the number of base classifiers.

Interpretation of the results

The figure shows that the best average classification error rate (21.4%) is achieved when the number of base classifiers is 14.

Video

Workflow

ensemble_exp2.rmp

Keywords

bagging
ensemble methods
supervised learning
error rate
cross-validation
classification

Operators

Apply Model
Bagging
Decision Stump
Log
Loop Parameters
Map
Performance (Classification)
Read CSV
X-Validation