The process demonstrates the influence of the number of base classifiers on the classification error rate of the random forest in the case of the Heart Disease data set. The number of base classifiers (i.e., decision trees) are increased from 1 to 20 in the experiment, and the average classification error rate of the random forest from 10-fold cross-validation is determined in each step. The impurity measure used for the decision trees is the gain ratio.
The experiment is the same as the previous two, the only difference
is that the
Random Forest operator is used here instead
Bagging and the
Heart Disease [UCI MLR]
The data set was donated to the UCI Machine Learning Repository by R. Detrano [Detrano et al.].
Figure 9.5. The average error rate of the random forest obtained from 10-fold cross-validation against the number of base classifiers.
The figure shows that the best average classification error rate (19.1%) is achieved when the number of base classifiers is 10.
Note that the best performance obtained is slightly better than those of AdaBoost (22.7%), but requires more base classifiers. Moreover, the performance of AdaBoost behaves more predictable than those of the random forest.