The process shows, using the Aggregation dataset, how to gather and display cluster metrics.
The dataset contains 788 two-dimensional vectors, which form 7 separate groups. In the present case, the aim is to evaluate the clusters created.
After reading the data, an agglomerative clustering is run with different parameters, and then using this, clusters can be created. A similarity function is created to measure cluster density, and then, the results of the measurements are saved for each parameter setting.
60 different settings are tested, the number of clusters ranging from 2 to 20, and all three of the agglomeration strategies of the agglomerative clustering are tried out.
The cluster sizes, the cluster densities, the distribution of the points, and the agglomeration strategy are saved for each setting.
The final result can be acquired by reading the log.
The final result shows that the increase in the number of clusters leads to the increase of cluster densities, and the decrease of point distribution in different paces for the tree different strategies. However, the single link strategy falls a bit behind compared to the complete link and average link methods.