Evaluation of performance for classification by regression model 2

Description

The process shows, using the Wine dataset, how the quality, the precision of a given classification that is created based on a regression model fitted to a given data set can be evaluated. After the regression model has been built based on the training set, and the test set has been classified using it, the quality of the classification executed can be examined. In some cases, more advanced levels of validation may be necessary; in these cases, e.g. random subsampling, cross-validation, or a special case of the latter, the leave-one-out method can be used. Using the evaluation received this way, t can be decided whether the resulting classification is appropriate for the goals of the process, the existing model should be improved further, or the existing model is of such poor quality that using a completely new model is necessary.

Input

Wine [UCI MLR]

Output

Evaluation can be done by using a complex validation operator as well instead of separate operators. In this case, as the regression model has to be placed into an operator that implements regression-based classification, and this operator has to be placed into the operator of complex evaluation, the result is a process that contains embedded operators on multiple levels:

Figure 7.10. The subprocess of the cross-validation by regression operator

The subprocess of the cross-validation by regression operator

Figure 7.11. The subprocess of the classification by regression operator

The subprocess of the classification by regression operator

Similarly to when using the operator individually, it can be defined for example which method should be used for attribute selection, or what the level of minimal tolerance should be. The thus created linear regression model can be applied to the test set. The following regression model is created based on the data of the training set:

Figure 7.12. The linear regression model yielded as a result

The linear regression model yielded as a result

Interpretation of the results

If a deeper examination of the given classifier is necessary, subprocesses identical to the ones above can be defined in the operator responsible for cross-validation as well. The operator can be tuned using the following preferences:

Figure 7.13. The customizable properties of the cross-validation operator

The customizable properties of the cross-validation operator

Here, it can be defined how many cross-validation iterations should be executed. The dataset is split into a many subsets of equal size as the number of iterations. Then, each of these splits is selected to be the test set of an iteration, and the union of all other subsets will serve as the training set of the given iteration. A special case of this is the leave-one-out method, which can be used by ticking the appropriate checkbox (leave-one-out). When using this, an iteration is run for each record, in which the given record serves as the test set, and the training set consists of all other records. As can be seen on the figure, the following average performance values are yielded by cross-validation with 10 iterations:

Figure 7.14. The overall performance vector of the classifications done using the regression model defined in the cross-validation operator

The overall performance vector of the classifications done using the regression model defined in the cross-validation operator

The following average performance values are yielded by the leave-one-out method:

Figure 7.15. The overall performance vector of the classifications done using the regression model defined in the cross-validation operator for the case of using the leave-one-out method

The overall performance vector of the classifications done using the regression model defined in the cross-validation operator for the case of using the leave-one-out method

Note that in this case, the standard deviation of the precision values of the leave-one-out method are remarkably higher than those of standard cross-validation. This might indicate that such irregular records are present the classification os which is not necessarily accurate, even after learning on all other records.

Video

Workflow

regr_exp4.rmp

Keywords

classification
regression
performance
cross-validation

Operators

Apply Model
Classification by Regression
Linear Regression
Performance (Classification)
Read AML
X-Validation