Table of Contents
The process demonstrates, using the Aggregation dataset, how the K-means clustering algorithm works. Also, it also shows the importance of choosing the distance function.
The dataset consists of 788 two-dimensional vectors, which form 7 separate groups. The task is to discover these groups - clusters. The difficulty of the task is in the alignment of the points, as smaller and larger clouds of points are present with different distances in space between them.
After reading the data, the node of the K-means method is connected, and the algorithm is set to search for 7 clusters, then the process is initiated. The result is that the discovery of the upper and right side point clouds is successful, however, the algorithms performed poorly on the lower point cloud.
Let us try out another distance function, the Mahalanobis distance.
It can be seen that by minor sacrifices, but the result has become more precise; the clustering of the lower point cloud is now nearing a perfect solution.
It can be seen that even the simplest clustering algorithms can discover basic connections, and if the distance function is chosen correctly, the results can even be made more precise.