Self-organizing maps (SOM) and vector quantization (VQ)

Description

The process presents the Kohonen's vector quantization (VQ) and the self-organizing map (SOM) algorithms using the Maximum Variance (R15) dataset. These algorithms can be fitted by the SOM/Kohonen operator.

Input

Maximum Variance (R15) [SIPU Datasets] [Maximum Variance]

The dataset consists of 600 two-dimensional records, which are grouped into 15 groups. The points are located around the point with coordinates (10, 10) and they are farther from each other as they are far from the center. The difficulty of the task is that the groups which are around the center almost fuse. In the figure below these points are depicted by coloring the different groups.

Figure 23.10. The scatterplot of the Maximum Variance (R15) dataset

The scatterplot of the Maximum Variance (R15) dataset

Output

First, the method of Kohonen's vector quantization is used. By this method we got 10 clusters. The results can be seen on the figure below.

Figure 23.11. The result of Kohonen's vector quantization

The result of Kohonen's vector quantization

The size of clusters can be depicted by a simple pie chart.

Figure 23.12. The pie chart of cluster size

The pie chart of cluster size

A table displays all the statistics which characterize the clusters, among others the frequency of clusters, the standard deviation of clusters, the maximum distance from the center of clusters, and the number of the adjacent cluster with the distance between them.

Figure 23.13. Statistics of clusters

Statistics of clusters

Then, the method of batch SOM algorithm is applied for the same dataset. In this case, the numbers of row and column segments should be defined where 6 was chosen. The results are shown in the following two figures. The first one is the schematic graph of the SOM/Kohonen operator on the resulting net where the coloring shows the frequency of each cell.

Figure 23.14. Graphical representation of the SOM

Graphical representation of the SOM

The second figure is a scatterplot which displays the resulting clusters in the coordinate system of original input attributes.

Figure 23.15. Scatterplot of the result of SOM

Scatterplot of the result of SOM

Interpretation of the results

The experiment shows how to use two unsupervised data mining techniques such as vector quantization and self-organizing maps. The two methods are particularly effective for examining 2-dimensional data. However, being important prototype methods, they can greatly simplify the further analysis in higher dimension too.

Video

Workflow

sas_clust2_exp2.xml

Keywords

vector quantization (VQ)
self-organizing map (SOM)
clustering

Operators

Data Source
Graph Explore
Self-organizing Map