Prediction of discrete target by regression models

Description

The process shows, using the Wine dataset, that how can we fit a regression model to a dataset containing discrete but non-binary target variable. Moreover, how can we performe a classification task using the parameter estimates obtained from the model. The type of the fitted regression model depends on the measurement scale of the discrete target variable. If the target is nominal then the Regression operator fits binary logistic models separately, so that a selected event of the target is compared to the class of the other values of the discrete target variable. On the other hand, if the target is ordinal then a common logistic regression model are fitted, wherein only the constant parameters differ, but the parameters of the input variables are shared. (As opposed to the nominal case where these coefficients are different.)

Input

Wine [UCI MLR]

Output

A number of models can be choosen to fit a regression model, e.g., linear or logistic regression. Among them, the logistic regression used in the process. It is not needed to set in, the software recognizes the right type of the regression using the metadata on the target. Of course, it is possible to override this option and to enforce linear regression, but this does not make sense because in this case the Model Comparison operator can not be used to compare this model to other discrete supervised models. The fit of the model can be tested by the well-known statistics and charts.

Figure 18.9. Classification matrix of the logistic regression

Classification matrix of the logistic regression

The classification chart shows that the fitted model is perfect on the training dataset and has small error on the validation dataset.

Figure 18.10. The classification chart of the logistic regression

The classification chart of the logistic regression

Besides the standard goodness-of-fit tests the Regression operator presents a bar graph showing the importance of the input variables in the regression models. The higher the coefficient of an input variable, the more is its explanation power with respect to the target variable. In case of ordinal target, there is only one, while, in case of the nominal target, the number of different values ​​of the target variable minus 1 bar graphs are created. Since the Class variable has three values there are two such bar graphs below.

Figure 18.11. The effects plot of the logistic regression

The effects plot of the logistic regression

Interpretation of the results

Using the regression model created on the training set for the validation dataset the above results show that a model can be built with relatively high accuracy for multiple discrete target variable value by the Regression operator. It is noted that for the problem discussed here the Dmine Regression operator can not be applied.

Video

Workflow

sas_regr_exp2.xml

Keywords

classification
nominal and ordinal target
logistic regression

Operators

Data Source
Data Partition
Regression