>Achilles

3. Running nFOIL on the mutagenesis data set

To illustrate the use of the C implementation of nFOIL, we will reconsdider Srinivasan et al.'s mutagenesis data set. After installing nFOIL, your will find two data sets muta.pl and muta_ruf_all.plin the nfoil/datain directory. They are two versions of the regression unfriendly case. Check the previous Sections and the ".nFOIL language" section to understand the content.

The C implementation of nFOIL is a command-line tool. You call it as

        nfoil KnowledgeBaseFile [options]

The results of each run of nFOIL are written to the standard output. The options are

   -h : help
   -b$width, -b $width : beam size; default : 1 (greedy search)
   -P : use post pruning
   -t$value, -t $value : convergence threshold; default : .001
   -H$number, -H $number : maximum number of clauses to be learned; default : 25
   -C$number, -C $number : maximum number of literals in a clause. default : 10
   -c$number, -c $number :  perform $number-fold cross-validation
        If $number = 0, perform leave-one-out.
        If $number = 1, do not perform cross-validation
      default : 1
   -k$value, -k $value :  set random seed If $value = 0, seed is initialized by timer.
      default : 0

Now it is time, to run a quick test:

     nfoil/bin/nfoil muta_ruf_all.pl -P -c3

The output looks as follows. First, an overview of the parameters used and the input data file, the knoweldge base, is given:

-------------------------------------------------------------------------------
Parameters :
beam size                                                         : 1
maximum number of clauses to be learned                           : 25
maximum number of literals per clause                             : 10
convergence threshold                                             : 0.001
Number of folds in cross-validation                               : 3
random seed                                                       : 1156422997
Post-pruning?                                                     : yes
input data : muta_ruf_all.pl

Reading input file.

-------------------------------------------------------------------------------
Number of classes: 2
-------------------------------------------------------------------------------
There are 5131 ground facts in the background theory and 42 classified examples
-------------------------------------------------------------------------------
...

Then, the folds for the cross-validation (if used) are presented. Afterwards, the actual learning phase start. There are two loops. In the outer loop, nFOILs learns one clause at a time and thereby refines the current best model (Refinement iteration) . In the inner loop, a single clause is refined (). For both types of iterations, the old and new scores (conditional log likelihood) are shown and the current model respectively clause. As you can specify to use a beam search to find the best next clause/attribute, actually the beam is shown.

Of course, the output of this quick run will not be that interesting. Try the more interesting runs

nfoil/bin/nfoil muta_ruf_all.pl -P -c42 -b10 -t 0.01

You may also have a look at the other data set muta.pl.