Rationalization
Suppose corpus contains k “topics” (k << t)
Each topic is a combination of terms
- basis vector in term space
Each document is a combination of topics
- linear combination of topic vectors
- corpus is in a k-dimensional subspace
But user word choice adds noise
SVD gives best k-dimensional projection
- remove noise, keep meaning