This process demonstrates the influence of the number of base classifiers on the classification error rate of bagging in the case of the Heart Disease data set. The base classifiers are decision stumps and the impurity measure used is the gain ratio. The number of base classifiers are increased from 1 to 20 in the experiment, and the average classification error rate of bagging from 10-fold cross-validation is determined in each step.
Heart Disease [UCI MLR]
The data set was donated to the UCI Machine Learning Repository by R. Detrano [Detrano et al.].
Figure 9.3. The average classification error rate obtained from 10-fold cross-validation against the number of base classifiers.
The figure shows that the best average classification error rate (21.4%) is achieved when the number of base classifiers is 14.