The experiment presents the difference between the automatic clustering and the clustering where the
number of clusters is specified by the user on the Maximum Variance (D31)
dataset. In the experiment the
Cluster operator is used.
Maximum Variance (D31)
The dataset consists of
3100 two-dimensional vectors, which are grouped around
Firstly, an automatic clustering is performed where the
Class attribute is ignored. The
31 clusters which aggres to the original number of clusters.
The resulting clusters are shown in the following figure.
The correctness of the resulted cluster number is clearly shown by the CCC plot.
The scematic arrangement of the clusters is shown by the proximity graph below.
You can try a cluster model based on the CCC chart, which has
9 clusters. This can be done
by the Ward's version of the K-means algorithm. As a result, the scatterplot and the proximity graph of the
clusters are shown in the following two figures.
Then, by the so-called segment profiling, the resulted clusters can be investigated from the point of view that how the input variables determine the clusters.
The experiment shows that the automatic clustering is able to find the correct number of clusters in case of a relatively large number of closely spaced, but spherical groups. If this we put down it too big so this number can be reduced to a reasonable size by analyzing of the CCC graph and searching a suitable breakpoint.