Replacement and imputation

Description

In this experiment, we demonstrate by the help of the Congressional Voting Records dataset how to modify the values of attributes by the Replacement operator and then how to impute the missing value by the Impute operator. The replacement of missing values ​​for each variable can be carried out independently of the others and to interact with the target variable by fitting a decision tree.

Input

Congressional Voting Records [UCI MLR]

Output

By the Replacement operator we can set the substitution of discrete and continuous variables separately.

Figure 15.9. The replacement wizard

The replacement wizard

A number of imputation methods can be choosen in the Impute operator. We may fill in the missing values ​​by a constant value, but also can use distribution-based value, where a random value is generated by the system, or decision tree based method.

Figure 15.10. The output of imputation

The output of imputation

The results of the imputation correlated by the target variable are shown in the following two bar charts.

Figure 15.11. The relationship of an input and the target variable before imputation

The relationship of an input and the target variable before imputation

Figure 15.12. The relationship of an input and the target variable after imputation

The relationship of an input and the target variable after imputation

Interpretation of the results

The experiment shows that if the method of imputation is chosen in appropriate way the values obtained in place of the missing data values ​​is not very distorted and thus, on a larger dataset, we can perform a more reliable fitting of the model.

Video

Workflow

sas_preproc_exp3.xml

Keywords

replacement
imputation

Operators

Data Source
Graph Explore
Impute
Replacement