Table of Contents
The process shows, using the Wine dataset, how classification can be executed by building a decision tree. To build the decision model, first the dataset has to be split into training and testing sets. After this, the splitting rules are ordered into a decision tree based on the training set, and afterwards, the model created in such a way will be used on the test set. Later on, it can be checked what decision conditions the model consists of, based on the training set, and to which class the records of the test set have been assigned based on these decisions.
Wine [UCI MLR]
The decisions about the individual splits can be done based on measures such as the Gini-index or information gain. For these, and for the confidence level of splits, different parameter values can be defined when creating the decision tree model. Furthermore, the stop conditions of splitting can also be defined either by specifying the minimal size of record sets that can be split further, or by specifying the maximal depth of the tree.
When splitting the dataset, various sampling methods can be defined, and the ratio in which it should be split into training and test sets can also be specified. Splitting can be done simply based on the order of the records, randomly, or attending to it that records belonging to each class occur in the same ratio as in the original dataset in the training and test sets.
After it has been built, the model itself can also be directed to the output, thus it can be checked what decision tree has been built based on the data of the training set. Based on this, incidental erroneous decisions can be filtered out using background information or domain knowledge, if any of these is available. If such decisions are found, the process of building the model can be tuned further. Besides this, by applying the model to the test set, it can also be seen which classes the records of the test set have been assigned to based on the model trained using the training set.