Results

EM has been fixed to do EM instead of MM. The new results are posted here [gif, ps]. These results are worse than the results for MM. This might be because of how we're cutting off now (if the probability of irrelevance is greater than the probability of relevance, don't show result).

Tests without cutting off the results show that with no iterations, using the posterior probability of ranking is about as good as tf.idf. Each subsequent iteration, however, makes the results slightly worse. See these results [gif, ps].

We hypothesised that this was due to topic drift, and that by setting the query weight to be higher, we could control this. However, it seems that while a higher query weight may help with the pre-iteration posterior probability of relevance, it makes iterations worse [gif]. How odd.

Log Frequency Counts: Log frequency graph [gif], and log frequency graph with prior and query weight set high [gif].