Centroid method

Description

The process shows, using the Maximum Variance (D31) dataset, that cluster centers are suitable for representing even the whole of their clusters.

Input

Maximum Variance (D31) [SIPU Datasets] [Maximum Variance]

The dataset contains 3100 two-dimensional vectors, which are concentrated into 31 clusters. Using this dataset, it is to be illustrated the generalization power centroids possess.

Figure 12.17. The vectors forming 31 clusters

The vectors forming 31 clusters

Output

Figure 12.18. The extracted centroids

The extracted centroids


Centroids are obtained after the cluster analysis of the data, and then, to illustrate their representative power, they are utilized as training data for a k-NN classifier.

Figure 12.19. The output of the k nearest neighbour method, using the centroids as prototypes

The output of the k nearest neighbour method, using the centroids as prototypes


The efficiency of the k-NN classification method primarily depends on the prototypes selected. Based on the result, it can be seen that the well-chosen points have aided the classification.

Interpretation of the results

It can be seen that clustering can be a good starting point for the extraction of the prototypes of a dataset, which can make the cut-back of the training dataset possible.

Video

Workflow

clust2_exp4.rmp

Keywords

centroids
X-means method
k-NN

Operators

Apply Model
Extract Cluster Prototypes
k-NN
Multiply
Read CSV
Set Role
X-Means