Chapter 23. Clustering 2

Advanced methods

Table of Contents

Clustering attributes before fitting SVM
Self-organizing maps (SOM) and vector quantization (VQ)

Clustering attributes before fitting SVM


The process demonstrates how to cluster the attributes by using the Variable Clustering operator when there are a number of attributes in the dataset. The process uses the Spambase dataset. After clustering the attributes, further supervised data mining methods can be applied, e.g., we may classify the e-mails into spam and non-spam classes.


Spambase [UCI MLR]

The dataset contains 4601 records and 58 attributes. The records are classified into 2 groups by the Class variable, which identifies the spam e-mails, i.e., its value equals to 1 if the record is spam and 0 otherwise. The challenge in the dataset is that there are relatively large number of attributes which slow down the training process. The experiment points out that a competitive model can be obtained after a suitable clustering of the attributes to such models which are fitted on the whole dataset.


During the attribute clustering the columns of the dataset are clustered by a hierarchical method to reduce the dimension of the dataset. The most important parameter of the Variable Clustering operator is the Maximum Cluster, which can be used to adjust the maximal number of clusters. Similar parameters are the maximal number of eigenvalues ​​and the explained variance. You can also choose between the correlation and the covariance matrix in the ​​analysis. One of the most important results is the dendrogram which visualizes the process of the hierarchical clustering.

Figure 23.1. The dendrogram of attribute clustering

The dendrogram of attribute clustering

The relationship between the original attributes and the obtained clusters is depicted on the following graph.

Figure 23.2. The graph of clusters and attributes

The graph of clusters and attributes

The list of cluster membership, i.e., the set of attributes belonging to clusters, respectively, can be seen on the following figure.

Figure 23.3. The cluster membership

The cluster membership

To create clusters the correlation (covariance) between the original attributes plays the most important role. Those attributes will be in one cluster which have high correlation with each other. This displays in the following figure.

Figure 23.4. The correlation plot of the attributes

The correlation plot of the attributes

It can also be investigated how high is the correlation between each variable and the new cluster variables obtained. The following figure shows the correlation bar chart of the variable representing the special character dollar.

Figure 23.5. The correlation between clusters and an attribute

The correlation between clusters and an attribute

After the attribute clustering, SVM model was fitted to the Class binary variable by using the obtained 19 new cluster attributes. Then, the results obtained in this way were compared with a similar model fitted to the original 58 attributes directly. The results below show that the models obtained have similar performance. The classification bar charts show a similar classification matrix.

Figure 23.6. Classification charts of SVM models

Classification charts of SVM models

The response curve behaves better in some places on the clustered attributes than on the original ones.

Figure 23.7. The response curve of SVM models

The response curve of SVM models

If the cumulative lift functions are compared considering the baseline and the best lift functions, similar behavior can be seen.

Figure 23.8. Az SVM modellek kumulatív lift függvényei

Az SVM modellek kumulatív lift függvényei

Finally, the ROC curves are very similar to each other.

Figure 23.9. The ROC curves of SVM models

The ROC curves of SVM models

Interpretation of the results

If there are very much input attributes in teaching a supervised data mining model, which makes teaching so very slow, then it worth reducing the dimension by clustering the input attributes. The explanatory power of the resulting model is usually not much worse than the one fitted to the original attributes.





attibute clustering
hierarchical methods
ROC curve


Data Source
Model Comparison
Support Vector Machine