The process shows, using the
*Aggregation*
dataset, how to gather and display cluster metrics.

*Aggregation* [SIPU Datasets] [Aggregation]

The dataset contains 788 two-dimensional vectors, which form 7 separate groups. In the present case, the aim is to evaluate the clusters created.

After reading the data, an agglomerative clustering is run with different parameters, and then using this, clusters can be created. A similarity function is created to measure cluster density, and then, the results of the measurements are saved for each parameter setting.

60 different settings are tested, the number of clusters ranging from 2 to 20, and all three of the agglomeration strategies of the agglomerative clustering are tried out.

The cluster sizes, the cluster densities, the distribution of the points, and the agglomeration strategy are saved for each setting.

The final result can be acquired by reading the log.

The final result shows that the increase in the number of clusters leads to the increase of cluster densities, and the decrease of point distribution in different paces for the tree different strategies. However, the single link strategy falls a bit behind compared to the complete link and average link methods.

cluster evaluation |

agglomerative clustering |

single link |

complete link |

average link |

point density |

point distribution |