Comparison of clustering methods


The experiment presents the difference between the automatic clustering and the clustering where the number of clusters is specified by the user on the Maximum Variance (D31) dataset. In the experiment the Cluster operator is used.


Maximum Variance (D31)

The dataset consists of 3100 two-dimensional vectors, which are grouped around 31 clusters.

Figure 22.16. The Maximum Variance (D31) dataset

The Maximum Variance (D31) dataset


Firstly, an automatic clustering is performed where the Class attribute is ignored. The algorithm finds 31 clusters which aggres to the original number of clusters. The resulting clusters are shown in the following figure.

Figure 22.17. The result of automatic clustering

The result of automatic clustering

The correctness of the resulted cluster number is clearly shown by the CCC plot.

Figure 22.18. The CCC plot of automatic clustering

The CCC plot of automatic clustering

The scematic arrangement of the clusters is shown by the proximity graph below.

Figure 22.19. Az automatikus klaszterezés proximitási ábrája

Az automatikus klaszterezés proximitási ábrája

You can try a cluster model based on the CCC chart, which has 9 clusters. This can be done by the Ward's version of the K-means algorithm. As a result, the scatterplot and the proximity graph of the clusters are shown in the following two figures.

Figure 22.20. The result of K-means clustering

The result of K-means clustering

Figure 22.21. The proximity graph of K-means clustering

The proximity graph of K-means clustering

Then, by the so-called segment profiling, the resulted clusters can be investigated from the point of view that how the input variables determine the clusters.

Figure 22.22. The profile of the segments (clusters)

The profile of the segments (clusters)

Interpretation of the results

The experiment shows that the automatic clustering is able to find the correct number of clusters in case of a relatively large number of closely spaced, but spherical groups. If this we put down it too big so this number can be reduced to a reasonable size by analyzing of the CCC graph and searching a suitable breakpoint.





automatic clustering
cluster profiling
CCC graphs


Data Source
Graph Explore
Segment Profile