Chapter 11. Clustering 1

Standard methods

Table of Contents

K-means method
K-medoids method
The DBSCAN method
Agglomerative methods
Divisive methods

K-means method


The process demonstrates, using the Aggregation dataset, how the K-means clustering algorithm works. Also, it also shows the importance of choosing the distance function.


Aggregation [SIPU Datasets] [Aggregation]

The dataset consists of 788 two-dimensional vectors, which form 7 separate groups. The task is to discover these groups - clusters. The difficulty of the task is in the alignment of the points, as smaller and larger clouds of points are present with different distances in space between them.

Figure 11.1. The 7 separate groups

The 7 separate groups


Figure 11.2. Clustering with default values

Clustering with default values

After reading the data, the node of the K-means method is connected, and the algorithm is set to search for 7 clusters, then the process is initiated. The result is that the discovery of the upper and right side point clouds is successful, however, the algorithms performed poorly on the lower point cloud.

Figure 11.3. Set the distance function.

Set the distance function.

Let us try out another distance function, the Mahalanobis distance.

Figure 11.4. Clustering with Mahalanobis distance function

Clustering with Mahalanobis distance function

It can be seen that by minor sacrifices, but the result has become more precise; the clustering of the lower point cloud is now nearing a perfect solution.

Interpretation of the results

It can be seen that even the simplest clustering algorithms can discover basic connections, and if the distance function is chosen correctly, the results can even be made more precise.





K-means method
distance functions
cluster analysis


Read CSV