Table of Contents
The process shows, using the Spambase dataset, how can a regression
model be fitted to a dataset which has binary target. The conventional linear regression are
not suitable for this task even though the
Regression operator offers this
option. Instead, we must use the logistic regression method which is the default option of this operator.
We can choose between the following link functions: logit, which takes the name of the procedure, probit
and complementary logit. There is no significant difference among these link functions.
The Enterprise Miner™ gives an other operator for fitting
regression. By the
Dmine Rgeression operator forward stepwise regression can
be fitted. In each step, an input variable is selected that contributes most significantly to the
variability of the target.
Spambase [UCI MLR]
After fitting the logistic regression, standard statistics and graphs are obtained similarly to the binary classification tasks. Here, only the confusion matrix is shown, the rest of comparison tools is left at the and of this experiment.
In addition to the usual tools, the regression operators, using the effect plot, also show the importance of the input variables in the regression model which were built during the process.
In addition to the traditional regression analysis Enterprise Miner ™
yields another operator to fit forward stepwise regression. This is the
Dmine Rgeression operator.
The results can be seen in the figures below.
The two regressions can be compared by the usual way with the
Model Comparison operator.
The results of this comparison are presented in the following figures.
The fit statistics and ROC curves clearly show on the test set that the logistic regression model is better than the stepwise logistic regression model.