Cluster evaluation


The process shows, using the Aggregation dataset, how to gather and display cluster metrics.


Aggregation [SIPU Datasets] [Aggregation]

The dataset contains 788 two-dimensional vectors, which form 7 separate groups. In the present case, the aim is to evaluate the clusters created.

Figure 12.11. The 788 vectors

The 788 vectors


Figure 12.12. The evaluating subprocess

The evaluating subprocess

After reading the data, an agglomerative clustering is run with different parameters, and then using this, clusters can be created. A similarity function is created to measure cluster density, and then, the results of the measurements are saved for each parameter setting.

Figure 12.13. Setting up the parameters

Setting up the parameters

60 different settings are tested, the number of clusters ranging from 2 to 20, and all three of the agglomeration strategies of the agglomerative clustering are tried out.

Figure 12.14. Parameters to log

Parameters to log

The cluster sizes, the cluster densities, the distribution of the points, and the agglomeration strategy are saved for each setting.

Figure 12.15. Cluster density against k number of clusters

Cluster density against k number of clusters

Figure 12.16. Item distribution against k number of clusters

Item distribution against k number of clusters

The final result can be acquired by reading the log.

Interpretation of the results

The final result shows that the increase in the number of clusters leads to the increase of cluster densities, and the decrease of point distribution in different paces for the tree different strategies. However, the single link strategy falls a bit behind compared to the complete link and average link methods.





cluster evaluation
agglomerative clustering
single link
complete link
average link
point density
point distribution


Agglomerative Clustering
Cluster Density Performance
Data to Similarity
Flatten Clustering
Item Distribution Performance
Log to Data
Loop Parameters
Read CSV