The influence of the number of base classifiers to the performance of the AdaBoost method

Description

The process demonstrates the influence of the number of base classifiers on the classification error rate of the AdaBoost method in the case of the Heart Disease data set. The base classifiers are decision stumps and the impurity measure used is the gain ratio. The number of base classifiers are increased from 1 to 20 in the experiment, and the average classification error rate of the AdaBoost method from 10-fold cross-validation is determined in each step.

Note

The experiment is the same as the previous one, the only difference is that the AdaBoost operator is used instead of the Bagging operator.

Input

Heart Disease [UCI MLR]

Note

The data set was donated to the UCI Machine Learning Repository by R. Detrano [Detrano et al.].

Output

Figure 9.4. The average classification error rate obtained from 10-fold cross-validation against the number of base classifiers.

The average classification error rate obtained from 10-fold cross-validation against the number of base classifiers.

Interpretation of the results

The figure shows that the best average classification error rate (22.7%) is achieved when the number of base classifiers is 3. It is also apparent that increasing the number of base classifiers does not result in the degradation of the performance, that remains constant instead. Thus, model overfitting surprisingly does not occur.

Note that the best performance obtained is almost identical to those of bagging, but requires lesser base classifiers. Moreover, performance behaves more predictable than in the case of bagging.

Video

Workflow

ensemble_exp3.rmp

Keywords

AdaBoost
ensemble methods
supervised learning
error rate
cross-validation
classification

Operators

AdaBoost
Apply Model
Decision Stump
Log
Loop Parameters
Map
Performance (Classification)
Read CSV
X-Validation