This experiment shows the combined method of bagging. In this method, a better fitting model can be built
from supervised data mining models using bootstrap aggregation. Bagging is sampling the original training
dataset obtaining several subsamples by the bootstrap method. On these subsamples supervised models
(decision treee in this experiment) are fitted, respectively, and a new model is obtained by aggregation
of the models that are obtained. In the experiment, the bagging cycle has set
10 pieces of decision trees are fitted on
10 different subsamples.
The results are compared with a simple decision tree, which is fitted to the entire training dataset.
In the bagging method the basic classifier is determined by the
which is straddled between the
Start Groups and
End Groups operators.
The size of the bagging cycle is set in the
Start Groups operator.
Spambase [UCI MLR]
In the preprocessing step the dataset is partitionated by the
Data Partition operator
according to the rates 60/20/20 for training, validatation and test dataset.
There are similar tools to evaluate bagging classifiers which are available for other supervised data mining models:
statistics (number of incorrectly classified cases, misclassification rate) and graphs (response and lift curves).
The only additional graph can be seen on the second figure below, where the errors of the
classifiers obtained in the consecutive bagging cycle are plotted.
The obtained bagging classifier is compared with a reference decision tree that we fit on the whole training dataset. The statistical and graphical results obtained are shown below.
Figure 20.14. Response curves of the bagging classifier and the decision tree comparing the baseline and the optimal classifiers
The experiment shows that a better working model can be obtained by taking a bagging classifier than a simple decision tree if the models are compared on the first deciles. This is clear considering the classification matrix, the response and the ROC curve.