The Effect of Length on log(1+x)

Let's look at the sum of the log(1+x) for each term in documents of two different lengths [gif]. Since there is a lot of noise, let's bin the terms [gif]. A sort-of linear relationship.

NEW: Instead of two different document lengths, two different sum( log(1+x) )s [gif].

Now let's look at how document length affects the average term's log(1+x) in documents of that length [by source, all].

We also have these same graphs looking at the effect of the sum of log(1+x)'s of a document has on the average term's log(1+x) in documents with that sum [by source, all]. There's a weird hump, so I looked at the bin size, so we whether there were enough documents in those high-end bins to warrant drawing conclusions from [gif]. It drops into the low hundreds by the time we're looking at documents with the sum of log(1+x)'s above 500, and remains that way until the end. But still seems to be a reasonable number of documents. Anyway, I also made some graphs focusing on the pre-hump portion of the graph [log(1+x), bin size].