The process shows, using the Maximum Variance (D31) dataset, that cluster centers are suitable for representing even the whole of their clusters.
The dataset contains 3100 two-dimensional vectors, which are concentrated into 31 clusters. Using this dataset, it is to be illustrated the generalization power centroids possess.
Centroids are obtained after the cluster analysis of the data, and then, to illustrate their representative power, they are utilized as training data for a k-NN classifier.
The efficiency of the k-NN classification method primarily depends on the prototypes selected. Based on the result, it can be seen that the well-chosen points have aided the classification.
It can be seen that clustering can be a good starting point for the extraction of the prototypes of a dataset, which can make the cut-back of the training dataset possible.