phState
/ |
/ | ...
V V
dg1 pl1
| |
| |
V V
VE_dg1 VE_pl1
where VE_<F> is the virtual evidence given by MLP activations
for feature <F>.
Svitchboard, monophone, hybrid
This system uses the 8 ANNs to provide virtual evidence about the 8 features. The 8 feature hidden RVs each depend on the phone state using a
DenseCPT. If this has only one non-zero entry per row, is is deterministic.
| Models trained using original word alignments |
| Vocab size | Task | Word error rate (%) | VE scale factors | language model | Divide | Det. | Notes |
| | | Validation | Test | dg1 | pl1 | scale | penalty | by prior? | CPTs? | |
| 10 | 1 | 26.0 | | 1.5 | 1.5 | 25 | -2 | no | no | (A) |
| 32.0 | | 1.5 | 1.5 | 27 | -4 | yes | (B) |
| 32.9 | | 0.5 | 0.5 | 28 | -5 | no | yes | (A) |
| | | | | | | yes | |
(A) weight search over all combinations of 0.5/1.0/1.5 for dg1 and pl1
(B) No weight search (yet)
The table below is WRONG
| Results for ANN outputs NOT divided by the prior, and without using word alignments |
| Vocab size | Task | Word error rate (%) | VE scale factors | language model | Notes |
| | | Validation | Test | dg1 | pl1 | scale | penalty | |
| 10 | 1 | 33.7 | | 1.0 | 1.0 | 20 | -3 | full D set |
| 32.5 | | 0.5 | 1.0 | 20 | -2 |
| 29.0 | 35.1 | 0.5 | 1.5 | 22 | -2 | searching over 0.1,0.5,1,1.5,2,4,8,16 for each of dg1 and pl1 scale factors |
| 500 | 84.6?? | | 0.5 | 1.5 | 20 | -1 | ckbeam 10000, NOT TUNED recipe 1 |
| Results for ANN outputs divided by the prior, using word alignments |
| Vocab size | Task | Word error rate (%) | VE scale factors | language model | Notes |
| | 1 | Validation | Test | dg1 | pl1 | scale | penalty | |
| 10 | 24.6 | 29.2 | 1.5 | 1.5 | 24 | -4 | Searched wide range of dg1,pl1 weights |
| 500 | 74.7 (1) | * | 1.5 | 1.5 | 22/24 | -2 | No weight search, recipe 2, trained to 0.5 tolerance, decode ckbeam 25000 |
| 74.7 (1) | 78.0 | 1.5 | 1.5 | 22/24 | -2 | Weight search (0.5/1.0/1.5 for dg1 and pl1), recipe 2, 0.2 tol, decode ckbeam 25000 |
| 500 | (1) | | 1.5 | 1.5 | ?? | ?? | No weight search, recipe 3, trained to 0.5 tolerance, decode ckbeam 25000 |
Validation means the D_short set, unless noted.
(1) Validation on only the first 100 utterances of D_short
--
SimonKing - 25 Jul 2006
Recipes for the 500 word task
Very slow to train starting with uniform DCPTs (unless I can find a better triangulation), so:
Recipe 1
Train on 1000 utterances for 2 iterations
Take the DCPTs and make them more sparse by zeroing all entries less than 0.1
Using these parameters, run the genetic triangulation script to find a fast triangulation, given this particular sparsity of the DCPTs.
Starting from these parameters, train to 0.5% tolerance (takes 8 its) on full training set
Find a decoding graph triangulation using the final trained parameters.
--
SimonKing - 01 Aug 2006
Recipe 2
Found a better triangulation using the genetic algorithm. Then manually re-retriangulated the epilogue and prologue (becasue they were "completed") using heurstic "S".
This model is easily trainable with fully dense CPTs.
However, decoding takes serious memory (ckbeam of 25000 because anything smaller lead to different decodings on one test sentence), although is fast enough (~20 secs per utt).
To make this decode in reasonable amounts of memory, all state_to_FEAT DCPTs were made sparser by zeroing all entries smaller than 0.1
--
SimonKing - 04 Aug 2006
Recipe 3
As recipe two, but zeroing entries smaller than 0.? (TO DO)
--
SimonKing - 08 Aug 2006