Chapter 25. Anomaly detection

Table of Contents

Detecting outliers

Detecting outliers

Description

The process presents, using the Concrete Compressive Strength dataset, that how can we filter outliers considering various criteria by the Filter operator. The applied criteria involve the mean square deviation from the mean or the mean absolute deviation. Other possibility is the modal center. In the experiment, the records which differ from the mean by twice standard deviation are filetered.

Input

Concrete Compressive Strength [UCI MLR] [Concrete]

Output

As shown, the above setting can filter out a significant number of outliers.

Figure 25.1. Statistics before and after filtering outliers

Statistics before and after filtering outliers

Figure 25.2. The predicted mean based on the two decision trees

The predicted mean based on the two decision trees

Figure 25.3. The tree map of the best model

The tree map of the best model

Interpretation of the results

The following comparison shows that after the filtering the error of the fitted decision tree is significantly smaller than the error of the decision tree fitted on the full dataset. Thus, in suitable cases, the removal of outliers is able to improve the efficiency of supervised models.

Figure 25.4. Comparison of the two fitted decision trees

Comparison of the two fitted decision trees

Video

Workflow

sas_anomaly_exp1.xml

Keywords

outliers
preprocessing
data cleaning

Operators

Data Source
Decision Tree
Filter
Graph Explore
Model Comparison