Results
EM has been fixed to do EM instead of MM. The new results are posted here
[gif, ps].
These results are worse than the results for MM.
This might be because of how we're cutting off now (if the probability of
irrelevance is greater than the probability of relevance, don't show result).
Tests without cutting off the results show that with no iterations, using
the posterior probability of ranking is about as good as tf.idf. Each
subsequent iteration, however, makes the results slightly worse. See
these results [gif, ps].
We hypothesised that this was due to topic drift, and that by setting the query
weight to be higher, we could control this. However, it seems that while
a higher query weight may help with the pre-iteration posterior probability
of relevance, it makes iterations worse [gif].
How odd.
Log Frequency Counts: Log frequency graph
[gif], and log frequency graph with prior and
query weight set high [gif].