Asszociációs szabályok kinyerése nem tranzakciós adathalmazból

Description

The process shows, using the Titanic dataset, how association rules can be extracted from a non-transactional dataset. In order to obtain association rules from such a dataset, it first has to be transformed into a transactional dataset. In these cases, it depends on the structure of the original database whether the emphasis is only on the items that are present in the transactional dataset from the possible items, or the 0 values of the variables also have to be interpreted. These datasets have to be transformed into an uncompressed sparse matrix representation, in which all records contain a binomial value for each of the possible items. After this, the extraction of association rules can be executed without any complex transformation. The frequent item sets occurring in the dataset can be extracted, and based on these, the association rules valid for the dataset can be extracted as well.

Input

Titanic [Titanic]

Output

Using this dataset, it can be examined whether the age, sex, and class of the passengers of the Titanic had any influence on their survival chances. As the Class variable is not of a binomial type, this has to be converted into binomial form first, before the frequent item sets could be extracted:

Figure 10.4. Operator preferences for the necessary data conversion

Operator preferences for the necessary data conversion

Figure 10.5. Converted version of the dataset

Converted version of the dataset

Based on these, the frequent item sets can now be acquired, from which the association rules valid for the dataset can be generated:

Figure 10.6. List of the frequent item sets generated

List of the frequent item sets generated

Figure 10.7. List of the association rules generated

List of the association rules generated

Interpretation of the results

Looking at the frequent item sets and association rules created, it is obvious that the handling of the dataset is inappropriate. In can be seen in the documentation of the dataset that for each variable, including the binomial variables, 0 values have a separate meaning (e.g. this represents children at the age variable, or belonging to the crew at the class variable). In accordance with this, to acquire the appropriate transactional records, these variables will also have to be split into two separate variables that represent the presence or absence of the two possible values. In this case, the following dataset is yielded as a result:

Figure 10.8. Operator preferences for the appropriate data conversion

Operator preferences for the appropriate data conversion

Figure 10.9. The appropriate converted version of the dataset

The appropriate converted version of the dataset

Based on these, the frequent item sets, and using those, the appropriate association rules can be extracted. Using these emerging rules, deeper conclusions can be drawn regarding the connections between the data, and the factors influencing the survival chances of the passengers can be filtered out. Among other things, the table representation of the rules can aid this, as in this representation, different kinds of filters can be utilized to filter out the rules considered interesting, for example by outcome or by confidence level:

Figure 10.10. Enhanced list of the frequent item sets generated

Enhanced list of the frequent item sets generated

Figure 10.11. List of the association rules generated

List of the association rules generated

Besides the table representation, a graphic representation can also be used, with available filtering conditions that are similar to those of the former:

Figure 10.12. Graphic representation of the association rules generated

Graphic representation of the association rules generated

Video

Workflow

assoc_exp2.rmp

Keywords

frequent item sets
association rules
non-transactional data
binomial attributes
data transformation

Operators

Create Association Rules
FP-Growth
Multiply
Nominal to Binominal
Read AML