The process shows, using the Titanic dataset, how association rules can be extracted from a non-transactional dataset. In order to obtain association rules from such a dataset, it first has to be transformed into a transactional dataset. In these cases, it depends on the structure of the original database whether the emphasis is only on the items that are present in the transactional dataset from the possible items, or the 0 values of the variables also have to be interpreted. These datasets have to be transformed into an uncompressed sparse matrix representation, in which all records contain a binomial value for each of the possible items. After this, the extraction of association rules can be executed without any complex transformation. The frequent item sets occurring in the dataset can be extracted, and based on these, the association rules valid for the dataset can be extracted as well.
Using this dataset, it can be examined whether the age, sex, and class of the passengers of the Titanic had any influence on their survival chances. As the Class variable is not of a binomial type, this has to be converted into binomial form first, before the frequent item sets could be extracted:
Based on these, the frequent item sets can now be acquired, from which the association rules valid for the dataset can be generated:
Looking at the frequent item sets and association rules created, it is obvious that the handling of the dataset is inappropriate. In can be seen in the documentation of the dataset that for each variable, including the binomial variables, 0 values have a separate meaning (e.g. this represents children at the age variable, or belonging to the crew at the class variable). In accordance with this, to acquire the appropriate transactional records, these variables will also have to be split into two separate variables that represent the presence or absence of the two possible values. In this case, the following dataset is yielded as a result:
Based on these, the frequent item sets, and using those, the appropriate association rules can be extracted. Using these emerging rules, deeper conclusions can be drawn regarding the connections between the data, and the factors influencing the survival chances of the passengers can be filtered out. Among other things, the table representation of the rules can aid this, as in this representation, different kinds of filters can be utilized to filter out the rules considered interesting, for example by outcome or by confidence level:
Besides the table representation, a graphic representation can also be used, with available filtering conditions that are similar to those of the former:
|frequent item sets|