Table of Contents
This experiment presents the ensemble classification method by using the
Ensemble operator. By the help of this operator a better model can be built
from separate supervised data mining models. In the experiment, an ensemble classifier is
constructed from a decision tree, a logistic regression and a neural network classifer using
the average voting method. The resulted ensemble model is compared with a polynomial kernel
SVM derived by the
SVM operator. For the evaluation the misclassification
rate is applied on the Spambase dataset.
Spambase [UCI MLR]
Before fitting the models, in the preprocessing step the dataset is partitionated by the
Data Partition operator according to the rates 60/20/20 for training, validatation
and test dataset.
The ensemble classifiers can be evaluated by similar tools as other supervised data mining models: statistics like number of the incorrectly classified cases and the misclassification rate and graphs like like lift and response curves.
The resulted ensemble classifier is compared with a baseline polynomial kernel SVM. The statistics and graphs of this comparison are summarized below.
Figure 20.7. Cumulative lift curves of the ensemble classifier, the SVM and the best theoretical model
The experiment shows that by combining simple classifiers we can obtain a competitive model against such supervised model as the polynomial kernel SVM. The classification matrix clearly shows that the ensemble classification model is better than the SVM, especially at the false positive cases. The cumulative lift curves slightly favor the combined model, and the ROC curve of the combined model passes over the SVM's a little.