The process shows, using the Maximum Variance (R15) dataset, how to define the appropriate number of clusters, and the agglomerative hierarchical clustering method.
The dataset contains 600 two-dimensional vectors, which form 15 separate groups. The task is to define the cardinality of the groups, and to discover them.
The result of aggregation clustering is a so-called dendrogram, which is such a tree structure the leaves of which are the points themselves, and the intermediate nodes (the clusters) result from agglomerating two points or subtrees (clusters). The method always contracts the two points (or clusters) closest to each other, thus building up the tree, which will contain all the points by the end of the process. The length of the edges in the finished dendrogram is proportional to the distance between the clusters, thus the number of edges on the appropriate level defines the ideal number of clusters. So, at the beginning of the process, each point forms a cluster on its own, while by the end of the process, all points get to be put into one single cluster.
By using the Flatten clustering operator, the dendrogram can also be used for clustering, manually stating the number of clusters as a single parameter. The figure shows the result of this cluster analysis.
It could be seen that based on the dendrogram created, the ideal number of clusters can be defined, and then, based on this, the cluster analysis can be performed as well.