Vector space model
Each document is a vector
Each term is a coordinate
- 0-1 for presence/absence of term (quorum)
- real valued to represent frequency of term
- similarity via dot-product
- often normalize documents to unit-norm
Smoother than Boolean search
Real values let us account for
- term frequency in document
- term importance in corpus