
Last updated $Id: README,v 1.5 2002/02/18 23:25:01 bilmes Exp $

This README file provides a simple demonstration on using GMTK, the
graphical models toolkit.  Working through these instructions should
give you an idea of what it is like to build a simple graphical models
trainer and decoder. We will work through this in a few simple steps.

1) First, you will run a Viterbi decoder using some already-existing
model parameters.

2) Then, you will run a script to train a set of model parameters, but 
in order to get a fast result, the training set will be tiny, so the results
will be lousy.

3) Then, you will modify the decoder so that it is guaranteed to decode
at least one non-silence word, and no more than 7.

Here are the steps.

1) ------------------------------------------------


Type "vitcommand" and watch the decoded words roll by. Take a look at
the contents of the script vitcommand to see how the program was
called.

If you want to see the relationship these words have with "reality"
(i.e., the reference transcript), take a look at DATA/AllCleanTr.mlf,
which has the correct transcriptions of the utterances for this
data. We are working only with the first 100 of them. You also should
take a look at the script 'vitcommand' to start getting used to the
command line parameters of the program. Can you get an idea about what
each one is for?

Do not change any parameters in the script (or at least not just yet),
or your computer will explode.

2) ------------------------------------------------

Now, we will do a miniature training run. 

First, it is necessary to build the decision trees that control the
deterministic relationships in the network -- specifically, decision
trees are used to deterministically map from a collection of parent
random variable values to the current value of a child random
variable.  The script "generate" generates all required decision
trees. It is a shell script that calls several other perl scripts,
each one of which writes a specialized decision tree file. These
decision tree files are put in the PARAMS subdirectory. First, a bit
more background information.

In this tutorial, the PARAMS directory contains all the parameters
used by the programs. These parameters come in two varieties: those
that are hand-specified and those that are created by a script.  The
hand-created ones are the graphical structure files
"aurora_training.str" and "aurora_decode.str" and the master-file with
pointers to all the other files, named "masterFile.params".  The
script-generated files include aurora.InitialGMParams, which has the
initial Gaussian mixture parameters, and the various ".dts" files.

If you peruse the various decision-tree files, you will notice that some
have one decision tree for each word in the vocabulary 
(like wordWordPos2WholeWordState.dts), others have one decision tree per
training utterance, and others have still different numbers of decision
trees. Depending on what you are trying to do, you will need to choose
the appropriate number. Keep in mind that it is quite easy to read in 
decision trees on an utterance-by-utterance basis, and that this is 
necessary for many operations. 

Actually, the decision trees already exist, and one of the files (having
to do with the vocabulary) was used in the previous decoding experiment.
But we should generate them again.
Type in "generate" and you will get a bunch of ".dts"
Verify that they were just created by typing "ls -l".

Now you can do training. 

Run the script 'traincommand' and the graphical model will train on a
small number of utterances starting from scratch.  This means that
that training will start with some initial parameters. The starting
parameters are in the file PARAMS/aurora.InitialGMParams, and they are
loaded into the program via the file PARAMS/masterFile.master. Notice
that all the means and variances in this file are identical. Why does
it work to start training with all means and variances identical?

Let the training script finish (it might take anything from from 5-10
minutes, depending on what machine you are on). Once it has finished,
you can see how these parameters perform in a Viterbi decoding.  To do
this, modify the vitcommand script to comment out the first command
and uncomment the second command in that file (also see the comments
therein). After you've made the modifications, re-run vitcommand.

Question: Why are the likelihoods different (i.e., better) than they
were the first time you ran vitcommand above?

3) -----------------------------------------------------------------------

In this step, you will modify the structure file so that the decoding
behaves in a different way. To begin, create a new subdirectory and
copy the following files to your new directory:

         PARAMS/aurora_decode.str
         PARAMS/nonTrainable.master
         PARAMS/commonParams
         PARAMS/learned_parms.gmp

Normally, the decoding structure will allow any number of words to be
decoded. What we would like to do is add a constraint to the network
so that no more than 7 words (counting silence) will be
decoded. Anything more than that will not be accepted. Your goal is to
modify PARAMS/aurora_decode.str and PARAMS/learned_parms.gmp so that
this is the case.

To do this step, modify aurora_decode.str by adding a
"numDecodedWords" variable to each frame.  Add a "countTheWords"
decision tree to the nonTrainable.master file.  Look carefully at the
file format, as you will need to modify it in two places, once for the
decision tree and once for the deterministic CPT.

If you absolutely have to, the solution is given in the FOR_CHEATERS
directory, but we encourage you to find a solution first without
looking here.

Once you have made the modifications, modify the vitcommand as
appropriate and see if your solution works. The third command in the
'vitcommand' script runs the solution given in FOR_CHEATERS.

After you have finished this step, you might also try further the
following structural modifications:

a) Insist on at least 1 non-silence word.  
b) Allow up to 7 decoded words, *excluding* silences.

--------------- END OF FILE -----------------------------
