Linear Time Clustering
Postulate good, slow clustering (!)
Random sampling:
- choose a small sample
- cluster in linear time
- move all documents to nearest cluster
Divide and conquer
- divide documents into small groups
- cluster each group in constant time
- recurse, treating each group as an individual
O(kn) time to put n documents in k groups