svm-light multiclass micro-tutorial

SVMLight multiclass mini-tutorial

The following is a brief tutorial on how to use SVMLight with DP2.

Download DP2 SVMLight binaries for your platform.
These can be found here: http://sicp.csail.mit.edu/6.034/spring06/projects/learn/code/SVM/.
Generate train and test input files for SVMLight
You can easily write code that generates appropriate training set and test set files for SVM Light by using the write-svm-multiclass-file function in learn-utils.scm.

An example of what such code might look like (for 2 classes) is as follows:

(define (load-learn-and-SVMlite data-directory category1 category2)
(fluid-let  ((*required-words* 50) ; min required words for a doc
               (*data-directory* data-directory)  ; directory from which to load data
               (*documents-considered* 500) ; how many documents to load
               (*dictionary-size* 100)) ; default dictionary size
    (begin
      (load (string-append *data-directory* "allfiles.scm")) ;; defines countfiles
      (let*
          ;; load up the raw datasets from file
  ((category1-data (load-dataset category1 countfiles))
   (category2-data (load-dataset category2 countfiles))
           ;; split up our documents into a training set and testing set
   (c1-train (sublist category1-data 0 (/ (length category1-data) 2)))
   (c1-test (sublist category1-data (/ (length category1-data) 2) (length category1-data)))
   (c2-train (sublist category2-data 0 (/ (length category2-data) 2)))
   (c2-test (sublist category2-data (/ (length category2-data) 2) (length category2-data)))
           (train-docs (append c1-train c2-train))
           (test-docs (append c1-test c2-test))
           ;; turn documents into feature vectors 
           (dictionary (make-dictionary train-docs))
           (train-set (data-set-from-docs train-docs))
           (test-set (data-set-from-docs test-docs))
           (train-filename (string-append "train.svm-mc.w" (number->string *dictionary-size*) ".in." category1 "." category2))
           (test-filename (string-append "test.svm-mc.w" (number->string *dictionary-size*) ".in." category1 "." category2)))
        (display* "writing : " train-filename)
        (write-svm-multiclass-file train-filename train-set)
        (display* "writing : " test-filename)
        (write-svm-multiclass-file test-filename test-set)))))

Call your code with your dataset.
For example:
```
 (load-learn-and-SVMlite"../Data/" "basketball" "dance")
```
You will have to go manually delete previous output files before invoking the code again, because it refuses to automatically clobber files.

Train an SVM.

Now run svm_multiclass_learn with appropriate parameters and the training file as its argument. What parameters you ask? Consult http://svmlight.joachims.org/ for a complete list, although these are most likely the ones you will want to start with:

Learning options:
         -c float    - C: trade-off between training error
                       and margin (default [avg. x*x]^-1)  
Kernel options:
         -t int      - type of kernel function:
                        0: linear (default)
                        1: polynomial (s a*b+c)^d
                        2: radial basis function exp(-gamma ||a-b||^2)
         -d int      - parameter d in polynomial kernel
         -g float    - parameter gamma in rbf kernel
         -s float    - parameter s in sigmoid/poly kernel
         -r float    - parameter c in sigmoid/poly kernel

for example:

../../Svm/Bin/svm_multiclass_learn -c .01 -t 1 -d 3 train.svm-mc.w100.in.basketball.dance

trained a multiclass SVM with a cubic kernel, with C = 0.01.

Run classify on your test set

The training procedure will output a file named svm_struct_model in your current directory. Now all you have to do is feed this with your test set into svm_multiclass_classify:

  ../../Svm/Bin/svm_multiclass_classify test.svm-mc.w100.in.basketball.dance svm_struct_model

and voila! you will see the performance of your SVM. For example:

Reading model... (380 support vectors read) done.
Reading test examples..Scanning examples...done
Reading examples into memory...100..200..300..400..500..OK. (500 examples read)
 (500 examples) done.
Classifying test examples..99..199..299..399..499..done
Runtime (without IO) in cpu-seconds: 0.40
Average loss on test set: 0.0620
Zero/one-error on test set: 6.20% (469 correct, 31 incorrect, 500 total)

SVMLight multiclass mini-tutorial

Download DP2 SVMLight binaries for your platform.

Generate train and test input files for SVMLight

Call your code with your dataset.

Train an SVM.

Run classify on your test set