Table of Contents
In this experiment, using the Spambase dataset, we show how can a baseline classifier
be improved for a binary classification task with rare events by
Rule Induction operator.
Spambase [UCI MLR]
The corresponding input is prepared using the
Sample operator. The number of records are
deposited on the top of the dataset is choosen such that the proportion of the cases to be
Then, we partition the dataset in the usual way.
Two rule induction models are fitted to the dataset. The former is based on decision tree model, the latter is based on logistic regression model. The fitted models are compared to a baseline decision tree classifier. The figures below show the goodness of fit.
On the left side of the model comparison figure a perfect ROC curve can be seen. This curve clearly shows that the fitting is perfect on the training dataset in case of the second rule induction model.
On the next output window, the number of wrongly classified cases can also be seen as the output of
Rule Induction operator, besides the usual information. This is a very important
information in this case.
The experiment shows that, when the class is very uneven, i.e. one class frequency is very low, compared to the traditional classification models, significant improvement can be achieved by the rule induction method.