Table of Contents
The process shows, using the Jain dataset, how support vector clustering can be used, and what the effects of its parameters are.
The dataset contains 373 two-dimensional vectors, which are organized into 2 groups. The challenge posed by the point set is that clouds of points are aligned closely to each other, and they have non-spherical shapes.
During support vector clustering, data are transformed using kernel functions, and then, a circle is enlarged until a state in which all points are located within the circle. Finally, the thus created boundary curve is transformed back into real space along with the data, and thus the clusters are created.
The kernel functions are identical to the functions described at the support vector machines, their parameters are the same as well. Support vector clustering has a unique parameter
r, with which the radius of the circle in the transformed space can be defined.
Firstly, let us test the polynomial kernel, letting the points reach over the boundary curve.
It can be seen that the result is rather disappointing, the resulting cluster are extending into each other, and the second cluster is considered to be noise by the method.
Switching to the RBF kernel, and not allowing the points to reach over the boundary curve, the result is much more promising. However, the upper cluster is split into multiple clusters, but the lower one remains in one piece, and is separated from the other clusters.
Just like when using support vector machines, when using SVC, the factors that influence the efficiency of the method the most are choosing the appropriate kernel function, and finding the ideal value of the ability to generalize.