The process demonstrates the influence of the number of base classifiers on the classification error rate of the AdaBoost method in the case of the Heart Disease data set. The base classifiers are decision stumps and the impurity measure used is the gain ratio. The number of base classifiers are increased from 1 to 20 in the experiment, and the average classification error rate of the AdaBoost method from 10-fold cross-validation is determined in each step.
The experiment is the same as the previous one, the only difference
is that the
AdaBoost operator is used instead of
Heart Disease [UCI MLR]
The data set was donated to the UCI Machine Learning Repository by R. Detrano [Detrano et al.].
Figure 9.4. The average classification error rate obtained from 10-fold cross-validation against the number of base classifiers.
The figure shows that the best average classification error rate (22.7%) is achieved when the number of base classifiers is 3. It is also apparent that increasing the number of base classifiers does not result in the degradation of the performance, that remains constant instead. Thus, model overfitting surprisingly does not occur.
Note that the best performance obtained is almost identical to those of bagging, but requires lesser base classifiers. Moreover, performance behaves more predictable than in the case of bagging.