Chapter 20. Classification Methods 5

Ensemble methods

Table of Contents

Ensemble methods: Combination of classifiers
Ensemble methods: bagging
Ensemble methods: boosting

Ensemble methods: Combination of classifiers

Description

This experiment presents the ensemble classification method by using the Ensemble operator. By the help of this operator a better model can be built from separate supervised data mining models. In the experiment, an ensemble classifier is constructed from a decision tree, a logistic regression and a neural network classifer using the average voting method. The resulted ensemble model is compared with a polynomial kernel SVM derived by the SVM operator. For the evaluation the misclassification rate is applied on the Spambase dataset.

Input

Spambase [UCI MLR]

Before fitting the models, in the preprocessing step the dataset is partitionated by the Data Partition operator according to the rates 60/20/20 for training, validatation and test dataset.

Output

The ensemble classifiers can be evaluated by similar tools as other supervised data mining models: statistics like number of the incorrectly classified cases and the misclassification rate and graphs like like lift and response curves.

Figure 20.1. Fitting statistics of the ensemble classifier

Fitting statistics of the ensemble classifier

Figure 20.2. The classification matrix of the ensemble classifier

The classification matrix of the ensemble classifier

Figure 20.3. The cumulative lift curve of the ensemble classifier

The cumulative lift curve of the ensemble classifier

The resulted ensemble classifier is compared with a baseline polynomial kernel SVM. The statistics and graphs of this comparison are summarized below.

Figure 20.4. Misclassification rates of the ensemble classifier and the SVM

Misclassification rates of the ensemble classifier and the SVM

Figure 20.5. Classification matrices of the ensemble classifier and the SVM

Classification matrices of the ensemble classifier and the SVM

Figure 20.6. Cumulative lift curves of the ensemble classifier and the SVM

Cumulative lift curves of the ensemble classifier and the SVM

Figure 20.7. Cumulative lift curves of the ensemble classifier, the SVM and the best theoretical model

Cumulative lift curves of the ensemble classifier, the SVM and the best theoretical model

Figure 20.8. ROC curves of the ensemble classifier and the SVM

ROC curves of the ensemble classifier and the SVM

Interpretation of the results

The experiment shows that by combining simple classifiers we can obtain a competitive model against such supervised model as the polynomial kernel SVM. The classification matrix clearly shows that the ensemble classification model is better than the SVM, especially at the false positive cases. The cumulative lift curves slightly favor the combined model, and the ROC curve of the combined model passes over the SVM's a little.

Video

Workflow

sas_ensemble_exp1.xml

Keywords

ensemble method
supervised learning
SVM
misclassification rate
ROC curve
classification

Operators

Data Source
Decision Tree
Ensemble
Model Comparison
Neural Network
Data Partition
Regression
Support Vector Machine