The influence of the number of training examples to the performance of the linear SVM

Description

The process demonstrates the influence of the number of training examples on the performance of the linear SVM in the case of the Adult (LIBSVM) data set. The number of training examples is increased in the experiment, and an SVM is trained in each step. The following performance characteristics are determined for each of the SVMs:

  • the classification error rate on the training set,

  • the classification error rate on the corresponding test set,

  • the number of support vectors,

  • the CPU execution time needed to train the linear SVM.

Input

A discretized and binarized version of the Adult data set [UCI MLR] available at the LIBSVM website [LIBSVM].

Output

Figure 8.15. The classification error rate of the linear SVM on the training and the test sets against the number of training examples.

The classification error rate of the linear SVM on the training and the test sets against the number of training examples.

Figure 8.16. The number of support vectors against the number of training examples.

The number of support vectors against the number of training examples.

Figure 8.17. CPU execution time needed to train the SVM against the number of training examples.

CPU execution time needed to train the SVM against the number of training examples.

Interpretation of the results

The first figure shows that the classification error on the training and test sets are roughly the same, independently of the number of training examples.

The second and the third figures show that both the number of support vectors and the CPU execution time increase linearly with the number of training examples.

Video

Workflow

svm_exp5.rmp

Keywords

SVM
supervised learning
error rate
classification
cross-validation

Operators

Apply Model
Extract Macro
Generate Attributes
Log
Log to Data
Loop Files
Normalize
Parse Numbers
Performance (Classification)
Performance (Support Vector Count)
Provide Macro as Log Value
Read Sparse
Remove Duplicates
Sort
Support Vector Machine (LibSVM)