## Chapter 17. Classification Methods 2

*Rule induction for rare events*

## Rule induction to the classification of rare events

In this experiment, using the *Spambase* dataset, we show how can a baseline classifier
be improved for a binary classification task with rare events by `Rule Induction`

operator.

*Spambase* [UCI MLR]

The corresponding input is prepared using the `Sample`

operator. The number of records are
deposited on the top of the dataset is choosen such that the proportion of the cases to be `5`

percent.
Then, we partition the dataset in the usual way.

Two rule induction models are fitted to the dataset. The former is based on decision tree model, the
latter is based on logistic regression model. The fitted models are compared to a baseline decision
tree classifier. The figures below show the goodness of fit.

On the left side of the model comparison figure a perfect ROC curve can be seen. This curve
clearly shows that the fitting is perfect on the training dataset in case of the second rule induction model.

On the next output window, the number of wrongly classified cases can also be seen as the output of
the `Rule Induction`

operator, besides the usual information. This is a very important
information in this case.

### Interpretation of the results

The experiment shows that, when the class is very uneven, i.e. one class frequency is very low, compared
to the traditional classification models, significant improvement can be achieved by the rule induction method.

rule induction |

supervised learning |

classification |

Data Source |

Decision Tree |

Model Comparison |

Data Partition |

Rule Induction |

Sample |