Evaluation of performance for classification by decision tree 2

Description

The process shows, using the Congressional Voting Records dataset, how the quality of a given classification can be evaluated. After the decision tree has been built based on the training set, and the test set has been classified using it, the quality of the classification executed can be examined. In some cases, more advanced levels of validation may be necessary; in these cases, e.g. random subsampling, cross-validation, or a special case of the latter, the leave-one-out method can be used. Using the evaluation received this way, it can be decided whether the resulting classification is appropriate for the goals of the process, the existing model should be improved further, or due to the poor performance of the existing model, it should be replaced with a completely new model.

Input

Congressional Voting Records [UCI MLR]

Output

Evaluation can be done by using a complex validation operator as well instead of separate operators. In this case, the split ratio of the dataset, and the form of sampling can be specified:

Figure 5.20. Settings for the sampling done in the validation operator

Settings for the sampling done in the validation operator

This is a complex operator that consists of two subprocesses, which can be defined as follows:

Figure 5.21. Subprocesses of the validation operator

Subprocesses of the validation operator

Interpretation of the results

This case is completely identical to the process in the previous example, the split into training and test sets is done, the decision tree built using the training set is applied to the test set, and then its efficiency is evaluated. Here, the following decision tree emerges, which classifies the records of the test set with the following results:

Figure 5.22. Graphic representation of the decision tree created

Graphic representation of the decision tree created

Figure 5.23. Performance vector of the classification based on the decision tree

Performance vector of the classification based on the decision tree

If a deeper examination of the given classifier is necessary, subprocesses identical to the ones above can be defined in the operator responsible for cross-validation as well. The operator can be tuned using the following preferences:

Figure 5.24. Settings of the cross-validation operator

Settings of the cross-validation operator

Here, it can be defined how many cross-validation iterations should be executed. The dataset is split into a many subsets of equal size as the number of iterations. Then, each of these splits is selected to be the test set of an iteration, and the union of all other subsets will serve as the training set of the given iteration. A special case of this is the leave-one-out method, which can be used by ticking the appropriate checkbox (leave-one-out). When using this, an iteration is run for each record, in which the given record serves as the test set, and the training set consists of all other records. As can be seen on the figure, the following average performance values are yielded by cross-validation with 10 iterations:

Figure 5.25. Overall performance vector of the classifications done in the cross-validation operator

Overall performance vector of the classifications done in the cross-validation operator

The following average performance values are yielded by the leave-one-out method:

Figure 5.26. Overall performance vector of the classifications done in the cross-validation operator in the leave-one-out case

Overall performance vector of the classifications done in the cross-validation operator in the leave-one-out case

Note that in this case, the standard deviation of the precision values of the leave-one-out method are remarkably higher than those of standard cross-validation. This might indicate that such irregular records are present the classification os which is not necessarily accurate, even after learning on all other records.

Video

Workflow

dtree_exp4.rmp

Keywords

classification
decision tree
performance
random subsampling
cross-validation

Operators

Apply Model
Decision Tree
Multiply
Performance (Classification)
Read AML
Split Validation
X-Validation