SVMLight multiclass mini-tutorial

The following is a brief tutorial on how to use SVMLight with DP2.
  1. Download DP2 SVMLight binaries for your platform.

    These can be found here: http://sicp.csail.mit.edu/6.034/spring06/projects/learn/code/SVM/.
  2. Generate train and test input files for SVMLight

    You can easily write code that generates appropriate training set and test set files for SVM Light by using the write-svm-multiclass-file function in learn-utils.scm.
  3. An example of what such code might look like (for 2 classes) is as follows:

    (define (load-learn-and-SVMlite data-directory category1 category2)
    (fluid-let  ((*required-words* 50) ; min required words for a doc
                   (*data-directory* data-directory)  ; directory from which to load data
                   (*documents-considered* 500) ; how many documents to load
                   (*dictionary-size* 100)) ; default dictionary size
        (begin
          (load (string-append *data-directory* "allfiles.scm")) ;; defines countfiles
          (let*
              ;; load up the raw datasets from file
      ((category1-data (load-dataset category1 countfiles))
       (category2-data (load-dataset category2 countfiles))
               ;; split up our documents into a training set and testing set
       (c1-train (sublist category1-data 0 (/ (length category1-data) 2)))
       (c1-test (sublist category1-data (/ (length category1-data) 2) (length category1-data)))
       (c2-train (sublist category2-data 0 (/ (length category2-data) 2)))
       (c2-test (sublist category2-data (/ (length category2-data) 2) (length category2-data)))
               (train-docs (append c1-train c2-train))
               (test-docs (append c1-test c2-test))
               ;; turn documents into feature vectors 
               (dictionary (make-dictionary train-docs))
               (train-set (data-set-from-docs train-docs))
               (test-set (data-set-from-docs test-docs))
               (train-filename (string-append "train.svm-mc.w" (number->string *dictionary-size*) ".in." category1 "." category2))
               (test-filename (string-append "test.svm-mc.w" (number->string *dictionary-size*) ".in." category1 "." category2)))
            (display* "writing : " train-filename)
            (write-svm-multiclass-file train-filename train-set)
            (display* "writing : " test-filename)
            (write-svm-multiclass-file test-filename test-set)))))
    
  4. Call your code with your dataset.

    For example:
     (load-learn-and-SVMlite"../Data/" "basketball" "dance")
    You will have to go manually delete previous output files before invoking the code again, because it refuses to automatically clobber files.
  5. Train an SVM.

    Now run svm_multiclass_learn with appropriate parameters and the training file as its argument. What parameters you ask? Consult http://svmlight.joachims.org/ for a complete list, although these are most likely the ones you will want to start with:
    Learning options:
             -c float    - C: trade-off between training error
                           and margin (default [avg. x*x]^-1)  
    Kernel options:
             -t int      - type of kernel function:
                            0: linear (default)
                            1: polynomial (s a*b+c)^d
                            2: radial basis function exp(-gamma ||a-b||^2)
             -d int      - parameter d in polynomial kernel
             -g float    - parameter gamma in rbf kernel
             -s float    - parameter s in sigmoid/poly kernel
             -r float    - parameter c in sigmoid/poly kernel
        

    for example:

    ../../Svm/Bin/svm_multiclass_learn -c .01 -t 1 -d 3 train.svm-mc.w100.in.basketball.dance
    
    trained a multiclass SVM with a cubic kernel, with C = 0.01.
  6. Run classify on your test set

    The training procedure will output a file named svm_struct_model in your current directory. Now all you have to do is feed this with your test set into svm_multiclass_classify:
      ../../Svm/Bin/svm_multiclass_classify test.svm-mc.w100.in.basketball.dance svm_struct_model
    

    and voila! you will see the performance of your SVM. For example:

    Reading model... (380 support vectors read) done.
    Reading test examples..Scanning examples...done
    Reading examples into memory...100..200..300..400..500..OK. (500 examples read)
     (500 examples) done.
    Classifying test examples..99..199..299..399..499..done
    Runtime (without IO) in cpu-seconds: 0.40
    Average loss on test set: 0.0620
    Zero/one-error on test set: 6.20% (469 correct, 31 incorrect, 500 total)