Chapter 12. Clustering 2

Advanced methods

Table of Contents

Support vector clustering
Choosing parameters in clustering
Cluster evaluation
Centroid method
Text clustering

Support vector clustering

Description

The process shows, using the Jain dataset, how support vector clustering can be used, and what the effects of its parameters are.

Input

Jain [SIPU Datasets] [Jain]

The dataset contains 373 two-dimensional vectors, which are organized into 2 groups. The challenge posed by the point set is that clouds of points are aligned closely to each other, and they have non-spherical shapes.

Figure 12.1. The two groups

The two groups

Output

During support vector clustering, data are transformed using kernel functions, and then, a circle is enlarged until a state in which all points are located within the circle. Finally, the thus created boundary curve is transformed back into real space along with the data, and thus the clusters are created. The kernel functions are identical to the functions described at the support vector machines, their parameters are the same as well. Support vector clustering has a unique parameter r, with which the radius of the circle in the transformed space can be defined.

Figure 12.2. Support vector clustering with polynomial kernel and p=0.21 setup

Support vector clustering with polynomial kernel and p=0.21 setup


Firstly, let us test the polynomial kernel, letting the points reach over the boundary curve.

Figure 12.3. Unsuccessful clustering

Unsuccessful clustering


It can be seen that the result is rather disappointing, the resulting cluster are extending into each other, and the second cluster is considered to be noise by the method.

Figure 12.4. Clustering with RBF kernel

Clustering with RBF kernel


Switching to the RBF kernel, and not allowing the points to reach over the boundary curve, the result is much more promising. However, the upper cluster is split into multiple clusters, but the lower one remains in one piece, and is separated from the other clusters.

Figure 12.5. More promising results

More promising results


Interpretation of the results

Just like when using support vector machines, when using SVC, the factors that influence the efficiency of the method the most are choosing the appropriate kernel function, and finding the ideal value of the ability to generalize.

Video

Workflow

clust2_exp1.rmp

Keywords

Support vector clustering
SVC
cluster analysis
kernel functions

Operators

Read CSV
Support Vector Clustering