## Agglomerative hierarchical methods

The process shows, using the
*Maximum Variance (R15)*
dataset, how agglomerative hierarchical clustering algorithms work. These clustering algorithms
can be run by the `Cluster`

operator.

*Maximum Variance (R15)* [SIPU Datasets] [Maximum Variance]

The dataset contains `600`

two-dimensional vectors, which are concentrated into
`15`

clusters. The points are aligned around a center with the coordinates (10,10),
in increasing distances from each other as they get further from the center. This is the difficulty
of the task, as the clusters near the center are close to blending into each other.

Firstly, the average linkage hierarchical method is applied. In this case, the distance between the clusters is
calculated as the average of the pairwise distance of cluster elements by the algorithm. The results are shown
in the following figure.

The goodness of clustering can be measured so that the original grouping `Class`

attribute
and the `Segment`

attribute which contains the cluster membership obtained after clustering
are plotted by a spatial bar chart. It can be seen that, apart from a permutation, there is a one-to-one
correspondance between the lines and the columns except two records.

An other hierarchical clustering method is the Ward method. Using this, we obtain the following results.

### Interpretation of the results

The process demonstrated that if the number of possible clusters is relatively large then it is worth
choosing one of the automatic clustering procedures. In the
SAS® Enterprise Miner™, the hierarchical
clustering is available for this purpose in several different ways. The experiment also shows that
the choice of the agglomerative method does not always affect the resulting clusters. The SAS proposes
the cluster number by investigating the CCC graph, see figure below.

In addition, a schematic display on the location of the clusters, the so-called proximity diagram is
also obtained which is clearly similar to the previously obtained scatterplot on the clusters.

hierarchical methods |

average linkage |

Ward method |

CCC graph |

clustering |

Cluster |

Data Source |

Graph Explore |