Table of Contents
The process illustrates, by using the
dataset, how to generate the metadata of a dataset by the
then how automatic variable selection can be obtained by the
operator. The Spambase dataset contains
58 attributes, one
of which is the binary target. In order to visualize a dataset, it may be necessary to determine
the most important input attributes which can be used in the graphical representation.
Spambase [UCI MLR]
DMDB operator produces such metadata (descriptive statistics) as mean, variance,
minimum, maximum, skewness, and kurtosis. In case of discrete attributes these are complemented by the mode.
The default settings of the
Variable Selection operator are applied except that the
minimum R-square is increased in order to filter the unnecessary attributes.
The result on the one hand will be a list which contains the decision about the variables, i.e., whether it remains or not in the data mining process, on the other hand, a few graphs of the importance of the variables.
In view of the important variables a number of graphical tools of the Enterprise Miner ™ can be used to display the records.
Figure 15.5. The binary target variables in a function of the two most important input attributes after the variable selection
The experiment shows how metadata can be extracted from SAS datasets which we can then transmit to other operators. Moreover, we demonstrated how can variable selection be performed in case of large number of attributes and how can we be working with the important attributes.