The influence of the parameter C to the performance of the linear SVM (3)

Description

In this experiment linear SVMs are trained on the Spambase data set while the value of the parameter C is varied. We will use integer powers of 2 as value of the parameter: C = 2^n, where -8 <= n <= 5. The data set is split into a training and a test set, 60% of the examples are used to form a training set, and the rest are for testing. The classification error rates on both the training and the test sets and also the number of support vectors are determined for each SVM.

Input

Spambase [UCI MLR]

Output

Figure 8.13. The classification error rate of the linear SVM on the training and the test sets against the value of the parameter C.

The classification error rate of the linear SVM on the training and the test sets against the value of the parameter C.

Figure 8.14. The number of support vectors against the value of the parameter C.

The number of support vectors against the value of the parameter C.

Interpretation of the results

The first figure shows that the classification error rate on the training set is decreases with the increase of value of the parameter C. As the value of the parameter C is increased, the error rate on the test set also decreases, until C reaches 2. However, further increase of the value of the parameter causes a slight increase in the test error.

The second figure shows that the number of support vectors falls by about 50% while the value of the parameter C is increased from 2^-8 to 8. Further increase of the value of the parameter causes a slight increase in the number of support vectors.

Video

Workflow

svm_exp4.rmp

Keywords

SVM
supervised learning
error rate
classification

Operators

Apply Model
Log
Log to Data
Loop Parameters
Normalize
Performance (Classification)
Performance (Support Vector Count)
Read CSV
Split Data
Support Vector Machine (LibSVM)