Table of Contents
The process presents, using the
Concrete Compressive Strength
dataset, that how can we filter outliers considering various criteria by the
Filter operator. The applied criteria involve the mean square
deviation from the mean or the mean absolute deviation. Other possibility is the
modal center. In the experiment, the records which differ from the mean by twice
standard deviation are filetered.
As shown, the above setting can filter out a significant number of outliers.
The following comparison shows that after the filtering the error of the fitted decision tree is significantly smaller than the error of the decision tree fitted on the full dataset. Thus, in suitable cases, the removal of outliers is able to improve the efficiency of supervised models.