Chapter 16. Classification Methods 1

Decision trees

Table of Contents

Classification by decision tree
Comparison and evaluation of decision tree classifiers

Classification by decision tree


The process demonstrates how to classify by the Decision Tree operator in the case when the target is a nominal attribute. In this case the Wine dataset is used and the target variable has three values. In order to build a decision tree classifier it is worth to divide the dataset to training and validation datasets. Then the current best splitting rule is found by the algorithm on the training set, but the growth of the tree is stopped by using the validation dataset when the algorithm does not find a significant split. In partitioning step, a test dataset can be separated too in order to measure the generalization ability of the resulting tree, but now this is not recommended due to the limited size of the data set. The decision tree as the result of the process can be displayed where we can see the decisions at the splittings of the model. Using the principle of majority voting the algorithm decides which class label should assign to each leaf (terminal nodes).


Wine [UCI MLR]


In case of nominal target variable, we can decide about the execution of each split on the basis of various impurity measures such as the chi-square, the Gini index, or the entropy. For these, and for the reliability of splitting, depending on the choosen measure, a parameter value can be specified. In addition, the stopping condition of splitting can be determined by the way that we give the minimum size set of records can be divided even further, or the maximal depth of the tree. Also we may set the maximum number of branches of a tree. The default is 2, that is, the algorithm builds a binary tree. It is also possible to decide if we wish to use the missing values ​​in splitting used as possible value. We can also decide that the input attributes are used only once or several times when the decision tree is produced.

Figure 16.1. The settings of dataset partitioning

The settings of dataset partitioning

In the partition of dataset different sampling methods can be choosen and the proportion can be determined among the training, validation, and test dataset. This partitioning can be carried out simply by considering the order of records, randomly, or stratifying with respect to the target variable. The stratified sampling ensures the same proportion of each class in the training, validation, and test set.

Figure 16.2. The decision tree

The decision tree

The results of the classification can be seen in the decision tree for the training and validation dataset as well, including the number of records in each vertex of the tree according to each class, respectively. On the edges between vertices the variables that define the splittings and their splitting values are presented. The thickness of the lines is proportional to the number of concerned records.

Interpretation of the results

The evaluation of the resulting decision tree is supported by numerous statistical indicators and graphical tools. Among them, the most important ones are displayed by multiple windows at a time where we can do comparisons. These windows can also be opened one by one by the view menu. By the help of this tools wrong decisions can be filtered out and the modeling process can be tuned by using further background information or domain knowledge. In this process, an interactive tree building process also helps.

Figure 16.3. The response curve of the decision tree

The response curve of the decision tree

On the response curve above it can be seen, for the training and validation dataset, based on the ranking of records according to their goodness how many percent of the records are classified correctly. The curve is generally monotonically decreasing.

Figure 16.4. Fitting statistics of the decision tree

Fitting statistics of the decision tree

In the Fit Statistics table different indicators can be seen on the fiiting of the decision tree classifier produced by the algorithm. The simplest and most important one among them is the misclassification rate in the red circle, which shows the proportion of the wrong classification.

Figure 16.5. The classification chart of the decision tree

The classification chart of the decision tree

On the classification bar chart we can look at the details which classes work well or poorly the model.

Figure 16.6. The cumulative lift curve of the decision tree

The cumulative lift curve of the decision tree

On the figure, it can be concluded how relates the resulting decision tree to the best possible model based on the cumulative lift value.

Figure 16.7. The importance of attributes

The importance of attributes

The variable importance measure table shows which variables and with what importance involved in the decision of the decision tree. This is a useful tool for the users who possess some speciality knowledge.





decision tree
response curve
misclassification rate


Data Source
Decision Tree
Data Partition