Chapter 5. Classification Methods 1

Decision Trees

Table of Contents

Classification using a decision tree
Under- and overfitting of a classification with a decision tree
Evaluation of performance for classification by decision tree
Evaluation of performance for classification by decision tree 2
Comparison of decision tree classifiers

Classification using a decision tree

Description

The process shows, using the Wine dataset, how classification can be executed by building a decision tree. To build the decision model, first the dataset has to be split into training and testing sets. After this, the splitting rules are ordered into a decision tree based on the training set, and afterwards, the model created in such a way will be used on the test set. Later on, it can be checked what decision conditions the model consists of, based on the training set, and to which class the records of the test set have been assigned based on these decisions.

Input

Wine [UCI MLR]

Output

The decisions about the individual splits can be done based on measures such as the Gini-index or information gain. For these, and for the confidence level of splits, different parameter values can be defined when creating the decision tree model. Furthermore, the stop conditions of splitting can also be defined either by specifying the minimal size of record sets that can be split further, or by specifying the maximal depth of the tree.

Figure 5.1. Preferences for the building of the decision tree

Preferences for the building of the decision tree

When splitting the dataset, various sampling methods can be defined, and the ratio in which it should be split into training and test sets can also be specified. Splitting can be done simply based on the order of the records, randomly, or attending to it that records belonging to each class occur in the same ratio as in the original dataset in the training and test sets.

Figure 5.2. Preferences for splitting the dataset into training and test sets

Preferences for splitting the dataset into training and test sets

Figure 5.3. Setting the relative sizes of the data partitions

Setting the relative sizes of the data partitions

Interpretation of the results

After it has been built, the model itself can also be directed to the output, thus it can be checked what decision tree has been built based on the data of the training set. Based on this, incidental erroneous decisions can be filtered out using background information or domain knowledge, if any of these is available. If such decisions are found, the process of building the model can be tuned further. Besides this, by applying the model to the test set, it can also be seen which classes the records of the test set have been assigned to based on the model trained using the training set.

Figure 5.4. Graphic representation of the decision tree created

Graphic representation of the decision tree created

Figure 5.5. The classification of the records based on the decision tree

The classification of the records based on the decision tree

Video

Workflow

dtree_exp1.rmp

Keywords

classification
decision tree
splitting

Operators

Apply Model
Decision Tree
Multiply
Read AML
Split Data