How does it Work?
Scatter/Gather clusters similar documents
Clustering posits similarity measure
- we used vector space model
- relevance/similarity via dot product
- works surprisingly well
Scatter/Gather has 2 phases
- preprocessing: 1Gb/day (can be incremental)
- interaction: 5 seconds, regardless of corpus size