The process shows, using the Titanic dataset, how the usability and efficiency of association rules can be checked if association rules are being extracted for a given dataset. After extracting the association rules, their support can be evaluated, and similarly to classification tasks, it can be checked to what extent the original values of the dataset can be predicted based on the rules created. Based on these types of evaluation, conclusions can be drawn based on which it can be decided whether the resulting association rules are appropriate for the goals of the process, the existing rules should be improved further, or the existing rules have revealed such poor connections that using a completely new approach is necessary.
Using this dataset, it can be examined whether the age, sex, and class of the passengers of the Titanic had any influence on their survival chances. After the appropriate conversion of the variables, the dataset can be split into a training set and a test set, and then, by applying the association rules deduced based on the training set to the test set, it can be defined to what extent the rules are usable. In order to render the attributes created during the conversion referable, the following parameter has to be used:
After this, in order to evaluate the efficiency of applying the rules using the general performance evaluation operator, the original and predicted values of the attribute of interest (in this case, the variable Surived_1, which indicates that the given passenger has survived the shipwreck) have to be converted to nominal types, and also, it also has to be ensured that their values are coded using the same values:
After setting the appropriate roles, the performance measurement operator automatically performs the comparisons, and based on these, it evaluates the efficiency of the application of the rules. Running the process yields the following rules regarding the survival of the passengers as a result:
It can be seen here that although many conclusions have been drawn regarding the survival of the passengers, the support of the rules is rather low. This leads to the conclusion that the rules can be applied in relatively special cases, and not generally, thus in some cases, no decision will be possible based on them. This can be illustrated by the low value appearing in the evaluation of performance as well:
One of the reasons for this could be that during the extraction of the association rules, some other factor, that might affect the connections disclosed by the association rules, was not taken into consideration. After the discovery of these, a better result might be obtainable in some cases.