Choosing parameters in clustering

Description

The process shows, using the Flame dataset, how the ideal parameters can be found automatically.

Input

Flame [SIPU Datasets] [Flame]

The dataset consists of 240 two-dimensional vectors, that belong to two clusters. The clusters are aligned close to each other, and one of the clusters has a non-spherical shape.

Figure 12.6. The two groups containing 240 vectors

The two groups containing 240 vectors

Output

Figure 12.7. The subprocess of the optimalization node

The subprocess of the optimalization node


To perform parameter optimization, a performance operator is required, which, in this case, will be the node measuring cluster distance.

Figure 12.8. The parameters of the optimalization

The parameters of the optimalization


The parameters to be optimized, and their possible values are chosen in the parameter optimization operator, and then, it is confided to the system to choose the ideal values.

Figure 12.9. The report generated by the process

The report generated by the process


In the present case, the best result was yielded by partitioning the task into 10 clusters, and defining the distance between them with the Euclidean distance.

Figure 12.10. Clustering generated with the most optimal parameters

Clustering generated with the most optimal parameters


Interpretation of the results

For many parameterized clustering methods, it can be ideal to confide the determination of the appropriate number of clusters to a performance measurement operator, and then run the clustering with the obtained values.

Video

Workflow

clust2_exp2.rmp

Keywords

Support vector clustering
SVC
cluster analysis
kernel functions

Operators

Cluster Distance Performance
k-Means
Optimize Parameters (Grid)
Read CSV