Chapter 11. Clustering 1

Standard methods

Table of Contents

K-means method
K-medoids method
The DBSCAN method
Agglomerative methods
Divisive methods

K-means method

Description

The process demonstrates, using the Aggregation dataset, how the K-means clustering algorithm works. Also, it also shows the importance of choosing the distance function.

Input

Aggregation [SIPU Datasets] [Aggregation]

The dataset consists of 788 two-dimensional vectors, which form 7 separate groups. The task is to discover these groups - clusters. The difficulty of the task is in the alignment of the points, as smaller and larger clouds of points are present with different distances in space between them.

Figure 11.1. The 7 separate groups

The 7 separate groups

Output

Figure 11.2. Clustering with default values

Clustering with default values


After reading the data, the node of the K-means method is connected, and the algorithm is set to search for 7 clusters, then the process is initiated. The result is that the discovery of the upper and right side point clouds is successful, however, the algorithms performed poorly on the lower point cloud.

Figure 11.3. Set the distance function.

Set the distance function.


Let us try out another distance function, the Mahalanobis distance.

Figure 11.4. Clustering with Mahalanobis distance function

Clustering with Mahalanobis distance function


It can be seen that by minor sacrifices, but the result has become more precise; the clustering of the lower point cloud is now nearing a perfect solution.

Interpretation of the results

It can be seen that even the simplest clustering algorithms can discover basic connections, and if the distance function is chosen correctly, the results can even be made more precise.

Video

Workflow

clust_exp1.rmp

Keywords

K-means method
distance functions
cluster analysis

Operators

k-Means
Read CSV