Vector Space Model
Corpus forms a term-document matrix A
- rows are terms
- columns are documents
Modern “large” corpus:
- d = 107 documents (10Gb)
- t = 105 terms
Query also a term vector q
Vector of query-document similarities: qTA
Document-document similarities: ATA
Improves significantly on Boolean