The process shows, using the
Maximum Variance (R15)
dataset, how agglomerative hierarchical clustering algorithms work. These clustering algorithms
can be run by the
The dataset contains
600 two-dimensional vectors, which are concentrated into
15 clusters. The points are aligned around a center with the coordinates (10,10),
in increasing distances from each other as they get further from the center. This is the difficulty
of the task, as the clusters near the center are close to blending into each other.
Firstly, the average linkage hierarchical method is applied. In this case, the distance between the clusters is calculated as the average of the pairwise distance of cluster elements by the algorithm. The results are shown in the following figure.
The goodness of clustering can be measured so that the original grouping
Segment attribute which contains the cluster membership obtained after clustering
are plotted by a spatial bar chart. It can be seen that, apart from a permutation, there is a one-to-one
correspondance between the lines and the columns except two records.
An other hierarchical clustering method is the Ward method. Using this, we obtain the following results.
The process demonstrated that if the number of possible clusters is relatively large then it is worth choosing one of the automatic clustering procedures. In the SAS® Enterprise Miner™, the hierarchical clustering is available for this purpose in several different ways. The experiment also shows that the choice of the agglomerative method does not always affect the resulting clusters. The SAS proposes the cluster number by investigating the CCC graph, see figure below.
In addition, a schematic display on the location of the clusters, the so-called proximity diagram is also obtained which is clearly similar to the previously obtained scatterplot on the clusters.