Chapter 12. Clustering 2

Advanced methods

Table of Contents

Support vector clustering
Choosing parameters in clustering
Cluster evaluation
Centroid method
Text clustering

Support vector clustering


The process shows, using the Jain dataset, how support vector clustering can be used, and what the effects of its parameters are.


Jain [SIPU Datasets] [Jain]

The dataset contains 373 two-dimensional vectors, which are organized into 2 groups. The challenge posed by the point set is that clouds of points are aligned closely to each other, and they have non-spherical shapes.

Figure 12.1. The two groups

The two groups


During support vector clustering, data are transformed using kernel functions, and then, a circle is enlarged until a state in which all points are located within the circle. Finally, the thus created boundary curve is transformed back into real space along with the data, and thus the clusters are created. The kernel functions are identical to the functions described at the support vector machines, their parameters are the same as well. Support vector clustering has a unique parameter r, with which the radius of the circle in the transformed space can be defined.

Figure 12.2. Support vector clustering with polynomial kernel and p=0.21 setup

Support vector clustering with polynomial kernel and p=0.21 setup

Firstly, let us test the polynomial kernel, letting the points reach over the boundary curve.

Figure 12.3. Unsuccessful clustering

Unsuccessful clustering

It can be seen that the result is rather disappointing, the resulting cluster are extending into each other, and the second cluster is considered to be noise by the method.

Figure 12.4. Clustering with RBF kernel

Clustering with RBF kernel

Switching to the RBF kernel, and not allowing the points to reach over the boundary curve, the result is much more promising. However, the upper cluster is split into multiple clusters, but the lower one remains in one piece, and is separated from the other clusters.

Figure 12.5. More promising results

More promising results

Interpretation of the results

Just like when using support vector machines, when using SVC, the factors that influence the efficiency of the method the most are choosing the appropriate kernel function, and finding the ideal value of the ability to generalize.





Support vector clustering
cluster analysis
kernel functions


Read CSV
Support Vector Clustering