Tentative idea: precluster
In advance, partition corpus to 1000 groups
treat each group as one “metadocument”
- term vector: documents’ centroid
- title: centroid document’s
Scatter/Gather metadocuments
Expand to documents when few
few (meta)documents clustered at one time
extends interactive time to 1M documents