Chapter 17. Classification Methods 2

Rule induction for rare events

Table of Contents

Rule induction to the classification of rare events

Rule induction to the classification of rare events

Description

In this experiment, using the Spambase dataset, we show how can a baseline classifier be improved for a binary classification task with rare events by Rule Induction operator.

Input

Spambase [UCI MLR]

The corresponding input is prepared using the Sample operator. The number of records are deposited on the top of the dataset is choosen such that the proportion of the cases to be 5 percent. Then, we partition the dataset in the usual way.

Output

Two rule induction models are fitted to the dataset. The former is based on decision tree model, the latter is based on logistic regression model. The fitted models are compared to a baseline decision tree classifier. The figures below show the goodness of fit.

Figure 17.1. The misclassification rate of rule induction

The misclassification rate of rule induction

Figure 17.2. The classification matrix of rule induction

The classification matrix of rule induction

Figure 17.3. The classification chart of rule induction

The classification chart of rule induction

On the left side of the model comparison figure a perfect ROC curve can be seen. This curve clearly shows that the fitting is perfect on the training dataset in case of the second rule induction model.

Figure 17.4. The ROC curves of rule inductions and decision tree

The ROC curves of rule inductions and decision tree

On the next output window, the number of wrongly classified cases can also be seen as the output of the Rule Induction operator, besides the usual information. This is a very important information in this case.

Figure 17.5. The output of the rule induction operator

The output of the rule induction operator

Interpretation of the results

The experiment shows that, when the class is very uneven, i.e. one class frequency is very low, compared to the traditional classification models, significant improvement can be achieved by the rule induction method.

Video

Workflow

sas_rules_exp1.xml

Keywords

rule induction
supervised learning
classification

Operators

Data Source
Decision Tree
Model Comparison
Data Partition
Rule Induction
Sample