Chapter 14. Data Sources

Table of Contents

Reading SAS dataset
Importing data from a CSV file
Importing data from a Excel file

Reading SAS dataset

Description

The experiment illustrates how existing SAS data sets can be made available to the SAS® Enterprise Miner™ by the Input Data operator. In the experiment, an earlier prepared SAS dataset will be read. A SAS dataset can be created by using the SAS® System or the SAS® Enterprise Guide™. In order to load a SAS file that we would like to use we need to know the path to the file. The file may be on the local machine, but also can be on a remote SAS server. The SAS file can be read by using a wizard that guides you through the entire process. Then, the original dataset is sampled by the Sample operator where a part of the relatively large data file is selected.

Figure 14.1. The metadata of the dataset

The metadata of the dataset

Input

Individual household electric power consumption [UCI MLR]

Output

A dataset which contains the 10 percent of the original dataset. At the sampling, absolute and relative sample size can be chosen as well. It is also possible to set the Random Seed parameter which controls the cycle of the pseudo-random number generator. If the same value is set to on different machines we get the same random sample. We also set the method of sampling, e.g. simple random, clustered or stratified.

Figure 14.2. Setting the Sample operator

Setting the Sample operator

Figure 14.3. The metadata of the resulting dataset and a part of the dataset

The metadata of the resulting dataset and a part of the dataset

Interpretation of the results

Whenever we rerun the process, the current state of the data set will be imported to the system, so the Input Data operator can be used to retrieve data files and to rerun the data mining process based on them, which are updated constantly by other SAS based systems.

Video

Workflow

sas_import_exp1.xml

Keywords

reading SAS dataset
sampling

Operators

Data Source
Sample