Vizualizing multidimensional data and dimension reduction by PCA

Description

The experiment presents vizualization and dimension reduction methods by the help of the Fisher-Anderson Iris dataset. Multidimensional datasets can be vizualized by the Graph Explore operator. Dimension reduction can be performed by the Principal Components operator. After the dimension reduction, it becomes much easier to display multi-dimensional datasets in the space of principal components.

Input

Fisher-Anderson Iris

Output

The Graph Explore operator provides several graphical tools for displaying multi-dimensional datasets, which plays a key role in the preprocessing step of data mining. Some of these are extensions of well-known tools such as two- and three-dimensional scatterplots and bar charts supplemented by a number of options such as the use of colors and symbols. Other techniques such as parallel axis or the radar plot, however, are only characteristics of data mining software tools.

Figure 15.6. Displaying the dataset by parallel axis

Displaying the dataset by parallel axis

The Pricipal Components Analysis (PCA) can be performed by the Principal Components operator. In the operator the following settings can be defined: the dependency structure as covariance or correlation, the cut-off condition as the number of eigenvalues ​​or the cumulative eigenvalue ratio.

Figure 15.7. Explained cumulated explained variance plot of the PCA

Explained cumulated explained variance plot of the PCA

The main result of principal component analysis is the principal component coordinates of individual records, which can be used in the further data analysis and visualization.

Figure 15.8. Scatterplit of the Iris dataset using the first two principal components

Scatterplit of the Iris dataset using the first two principal components

Interpretation of the results

The experiment shows that how we can display high-dimensional data sets and perform dimension reduction. In our experiment, the original 4-dimensional data set that can not be displayed using the standard scatterplot, is managed to reduce to 2 dimensions such that the 95 percent of the information contained in the data is preserved.

Video

Workflow

sas_preproc_exp2.xml

Keywords

principal components analysis (PCA)
parallel axis

Operators

Data Source
Graph Explore
Principal Components