Evaluation of performance for association rules

Description

The process shows, using the Titanic dataset, how the usability and efficiency of association rules can be checked if association rules are being extracted for a given dataset. After extracting the association rules, their support can be evaluated, and similarly to classification tasks, it can be checked to what extent the original values of the dataset can be predicted based on the rules created. Based on these types of evaluation, conclusions can be drawn based on which it can be decided whether the resulting association rules are appropriate for the goals of the process, the existing rules should be improved further, or the existing rules have revealed such poor connections that using a completely new approach is necessary.

Input

Titanic [Titanic]

Output

Using this dataset, it can be examined whether the age, sex, and class of the passengers of the Titanic had any influence on their survival chances. After the appropriate conversion of the variables, the dataset can be split into a training set and a test set, and then, by applying the association rules deduced based on the training set to the test set, it can be defined to what extent the rules are usable. In order to render the attributes created during the conversion referable, the following parameter has to be used:

Figure 10.13. Operator preferences for the necessary data conversion

Operator preferences for the necessary data conversion

After this, in order to evaluate the efficiency of applying the rules using the general performance evaluation operator, the original and predicted values of the attribute of interest (in this case, the variable Surived_1, which indicates that the given passenger has survived the shipwreck) have to be converted to nominal types, and also, it also has to be ensured that their values are coded using the same values:

Figure 10.14. Label role assignment for performance evaluation

Label role assignment for performance evaluation

Figure 10.15. Prediction role assignment for performance evaluation

Prediction role assignment for performance evaluation

Figure 10.16. Operator preferences for the data conversion necessary for evaluation

Operator preferences for the data conversion necessary for evaluation

Interpretation of the results

After setting the appropriate roles, the performance measurement operator automatically performs the comparisons, and based on these, it evaluates the efficiency of the application of the rules. Running the process yields the following rules regarding the survival of the passengers as a result:

Figure 10.17. Graphic representation of the association rules generated regarding survival

Graphic representation of the association rules generated regarding survival

Figure 10.18. List of the association rules generated regarding survival

List of the association rules generated regarding survival

It can be seen here that although many conclusions have been drawn regarding the survival of the passengers, the support of the rules is rather low. This leads to the conclusion that the rules can be applied in relatively special cases, and not generally, thus in some cases, no decision will be possible based on them. This can be illustrated by the low value appearing in the evaluation of performance as well:

Figure 10.19. Performance vector for the application of association rules generated

Performance vector for the application of association rules generated

One of the reasons for this could be that during the extraction of the association rules, some other factor, that might affect the connections disclosed by the association rules, was not taken into consideration. After the discovery of these, a better result might be obtainable in some cases.

Video

Workflow

assoc_exp3.rmp

Keywords

frequent item sets
association rules
performance
support

Operators

Apply Association Rules
Create Association Rules
Discretize by User Specification
FP-Growth
Multiply
Nominal to Binominal
Performance
Read AML
Set Role
Split Data