Challenges
Prove the SVD works
- model topics---e.g., distribution on terms
- prove SVD finds them
- is there a different query-document inner product that works better?
Implement efficiently:
- fast SVD
- fast “nearest neighbor” computation
- fast more important than right...
- fast incremental update for new documents