Things to remember:


Labeling: 

After doing labeling, make sure all substrokes are created.
If they weren't use ContextUtils.addMissingSubStrokes to create them.
Note that as of 5/6/2006 this method left small gaps because there is
(of course) a gap between points and no point was included in 2
substrokes...  There are arguments going both way for this one.

Methods for picking wires out of the data files (without wires
labeled) are in BuildNoiseModel

Data Prep:

Size normalization: 
Individual shapes should be size normalized to 50.  This is done in 
ContextUtils.sizeNormalizeTeplates() or
ShapeContextCalculator.getSizeNormalized(size)
It works by finding the major axis of the arbitrarly oriented bounding
box and scaling the image so that axis has the given size in the
scaled version.

SRL (serialized) files:
Create them from MultiModalActionHistory XML files.  Lots of options
in Trainer

Choose prototypes and do the initial selection on them using, for
example every 15th point.

All of that is done with:
bigjava edu.mit.sketch.messy.Trainer -size 50 -srl -arff \
     -select 15 \
     -codebook templates/new/*labeled.xml \
     -train circuits/noisy/{adfa,bank,bean,bowwow}*.xml \
     -test circuits/noisy/{Chin,elsie}*.xml 
     -o weka-data/50-noisy
                                                                                          

bigjava edu.mit.sketch.messy.Trainer  -codebook weka-data/50-noisy-codebook.srl -train weka-data/50-noisy-train.srl -o isolated-classifier-data/50-noisy -smo 500 .007    

Output the WEKA data

Run the SMO or LibSVM in weka to train up a model


======================================================================
How to search for shapes:

Temporal + spatial
 * start with a stroke
 * add strokes that are spatially nearby

How to train the classifier:
 * Tuned to whole shapes
 * Detect an x in a scene and then apply the whole shape identifier in
   that region until you find it
 

Regression based: e.g. full shapes are 1, parts of shapes of
decreasing amounts are or decreasing value until 0 for different
classes.


======================================================================
Journal:

2006-05-16

1-vs-all classifiers:  Don't seem to work very well.  Probably too
many different things to try to generalize across.

class-vs-noise classifier?  Advantage is that it can be biased towards
false positives.  This seems to be resaonble.  One big question is to
see if the performance is a result of the "noise" being shifted BBoxes
which often have nothing in them so it is really just selecting big
things over small ones.

Can we build a continuous classifier with the shifted versions of
things.  E.g. bbox shifted to the side is less of a resister whereas a
capacitor is a non-resistor


LibSVM Noise+wire vs all:
LibSVM -S 0 -K 2 -D 3.0 -G 0.0 -R 0.0 -N 0.5 -M 40.0 
TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
  0.99      0.067      0.988     0.99      0.989      0.962    neg_3-last
  0.933     0.01       0.943     0.933     0.938      0.962    pos_3-last

=== Confusion Matrix ===
    a    b   <-- classified as
 1056   11 |    a = neg_3-last
   13  182 |    b = pos_3-last

Weighting the second class more:
LibSVM -S 0 -K 2 -D 3.0 -G 0.0 -R 0.0 -N 0.5 -M 40.0 -C 50.0 -E 0.0010 -P 0.1 -Z -W "1.0 10.0"

TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
  0.977     0.046      0.991     0.977     0.984      0.965    neg_3-last
  0.954     0.023      0.882     0.954     0.916      0.965    pos_3-last

=== Confusion Matrix ===

    a    b   <-- classified as
 1042   25 |    a = neg_3-last
    9  186 |    b = pos_3-last

======================================================================
2006-05-22:

SMO with the logistic seems to work well too.

Had some trouble with the order of the class indexes, maybe they should be
stored with the codebook somehow.


======================================================================
2006-05-25

Scanning a fixed sized box around every n-th point, scalling the
enclosed region to the cannonical size and classifying it seems
promising.  Plotting each point and coloring by the classification of
that region gives reasoable results.  Scale is an issue since
different objects are at different scales in the same image.

It may be possible to look at the classification of each point at each
scale and do some form of clustering/voting to get shapes.  Graphical model?

What scales:
The codebook and training data is:
* hand segmented and labeled
* noise model consists of shifted bboxes of shapes; should be
consistent with search method (fixed sized box at every n'th point)
* scaled so longest axis is n pixels (50-75)
* resampled by adding points between ink points so that max distance
between points is: 0.5 pixel (I hope we are doing double arithmatic...)
* calculated with fixed sized context features: radius = 25 (that
seems small...

======================================================================
2006-06-02

Things are not working well for the full image scann I changed the
noise model to sample every n'th point instead of the slightly shifted
bbox used previously.  I may have been testing on the training data
before which would explain some of the good performance.

check the training
cross validate with libsvm
bias noise model away from saying yes to noise
use all of the sampled "hits" as positive training examples
