Agglomerative methods

Description

The process shows, using the Maximum Variance (R15) dataset, how to define the appropriate number of clusters, and the agglomerative hierarchical clustering method.

Input

Maximum Variance (R15) [SIPU Datasets] [Maximum Variance]

The dataset contains 600 two-dimensional vectors, which form 15 separate groups. The task is to define the cardinality of the groups, and to discover them.

Figure 11.10. The 15 group

The 15 group

Output

Figure 11.11. The resulting dendrogram

The resulting dendrogram


The result of aggregation clustering is a so-called dendrogram, which is such a tree structure the leaves of which are the points themselves, and the intermediate nodes (the clusters) result from agglomerating two points or subtrees (clusters). The method always contracts the two points (or clusters) closest to each other, thus building up the tree, which will contain all the points by the end of the process. The length of the edges in the finished dendrogram is proportional to the distance between the clusters, thus the number of edges on the appropriate level defines the ideal number of clusters. So, at the beginning of the process, each point forms a cluster on its own, while by the end of the process, all points get to be put into one single cluster.

Figure 11.12. The clustering generated from dendrogram

The clustering generated from dendrogram


By using the Flatten clustering operator, the dendrogram can also be used for clustering, manually stating the number of clusters as a single parameter. The figure shows the result of this cluster analysis.

Interpretation of the results

It could be seen that based on the dendrogram created, the ideal number of clusters can be defined, and then, based on this, the cluster analysis can be performed as well.

Video

Workflow

clust_exp4.rmp

Keywords

Agglomerative method
agglomerative hierarchical clustering
cluster analysis

Operators

Agglomerative Clustering
Flatten Clustering
Multiply
Read CSV