Information Retrieval: Interactive-Time Manipulation of Large Text Collections David Karger

Information Retrieval: Interactive-Time Manipulation of Large Text Collections David Karger

4/3/97


Click here to start


Table of Contents

Information Retrieval: Interactive-Time Manipulation of Large Text Collections David Karger

Information Retrieval

The Classic IR Model

General Problems

Boolean Keyword Search

Implementing Boolean Search

Problems with Model

Semantics vs. Syntax

Fixing Problems

Vector space model

Vector Space Model

Implementing Vector Space

Limits of Vector Space

Topics

Latent Semantic Indexing

Singular Value Decomposition

Truncated SVD

Truncated SVD

Rationalization

Challenges

Keyword Search has Limits

Scatter/Gather [Cutting, Karger, Pedersen, Tukey, Xerox PARC]

A Scatter/Gather Session NY Times, August 1990

Actual Output

Why Scatter/Gather?

How does it Work?

Implementation Requirements

Clustering

Describing Clusters

Clustering Algorithms

Linear Time Clustering

Implementation Results

Tentative idea: precluster

Generalize: Hierarchical clustering

Modified Scatter/Gather

Implementation Details

Summary

Challenges

Conclusion

Author: David Karger

Email: karger@mit.edu

Home Page: http://theory.lcs.mit.edu/~karger