(0) Cover
(1) Today
(2) Corpora
(3) Word Counts
(4) Most Common Words
(5) Most Common Words (Cont.)
(6) How Many Words Are There?
(7) Frequencies of Frequencies
(8) Zipf's Law in Tom Sawyer
(9) Zipf's Law
(10) Zipf's Law
(11) Mandelbrot's refinement
(12) Zipf's Law and Principle of Least Effort
(13) Other laws
(14) Examples of collections approximately obeying Zipf's law
(15) Is Zipf's Law unique to human language?
(16) Sparsity
(17) Sparsity
(18) Very Very Large Data
(19) The Brown Corpus
(20) Recent Corpora
(21) Corpus Content
(22) Example of Annotations: POS Tagging
(23) Issues in Annotations
(24) Tokenization
(25) What's a word?
(26) Word Segmentation
(27) Motivation for Statistical Segmentation
(28) Word Segmentation
(29) Algorithm for Word Segmentation
(30) Algorithm for Word Segmentation (Cont.)
(31) Experimental Framework
(32) Evaluation Measures
(33) Evaluation Measures (Cont)
(34) Conclusions
Postscript