Evaluation of performance for classification by decision tree

Description

The process shows, using the Congressional Voting Records dataset, how the quality of a given classification can be evaluated. After the decision tree has been built based on the training set, and the test set has been classified using it, the quality of the classification executed can be examined. Using the evaluation received this way, it can be decided whether the resulting classification is appropriate for the goals of the process, the existing model should be improved further, or the existing model is of such poor quality that using a completely new model is necessary.

Input

Congressional Voting Records [UCI MLR]

Output

The decision tree is built based on the data set using the following settings in the process:

Figure 5.13. Preferences for the building of the decision tree

Preferences for the building of the decision tree

In this case, the following decision tree emerges:

Figure 5.14. Graphic representation of the decision tree created

Graphic representation of the decision tree created

Interpretation of the results

Using the decision tree created, the records of the test set can be classified, and after the classification of the records, the original class labels can be compared to those assigned based on the decision tree, e.g. using the following figure:

Figure 5.15. Graphic representation of he classification of the records based on the decision tree

Graphic representation of he classification of the records based on the decision tree

Examining the performance of the classifier, the number of records classified appropriately and inappropriately can be obtained, and the precision of the classification done by the model can be seen as well, displayed in percentages for the individual classes, and overally:

Figure 5.16. Performance vector of the classification based on the decision tree

Performance vector of the classification based on the decision tree

The question can also be raised in this case whether the performance of the model can be increased further. For example, the minimal required confidence for splits can be raised as follows:

Figure 5.17. The modification of preferences for the building of the decision tree.

The modification of preferences for the building of the decision tree.

In this case, as a result of the raised value of the required confidence, the structure of the decision tree will be completely different from that of the original one, and this leads to a change in the numbers and distribution of the records classified appropriately and inappropriately as well. This model yields a better performance than the original one, which can also be seen in the figure:

Figure 5.18. Graphic representation of the decision tree created with the modified preferences

Graphic representation of the decision tree created with the modified preferences

Figure 5.19. Performance vector of the classification based on the decision tree created with the modified preferences

Performance vector of the classification based on the decision tree created with the modified preferences

Video

Workflow

dtree_exp3.rmp

Keywords

classification
decision tree
performance
evaluation

Operators

Apply Model
Decision Tree
Performance (Classification)
Read AML
Split Data