Jaime Teevan
jaime@teevan.org
13109 NE 38th Place
Bellevue, WA 98005
(425) 556-9753
Home * Work * Personal
Research * Publications * Classes * Curriculum Vitae

Hyper-Learning

Many models of text documents impose a particular term occurrence distribution on each term. The term occurrence distribution approximates how likely the term is to occur in any given document. For example, a term may have a certain probability of occurring once in the document, and a different probability of occurring twice. Instead of imposing a particular family of distributions on the term occurrence distribution, we propose that the family should be determined experimentally, directly from the data. We call this process of examining the entire corpus to set the "hyper-parameters" hyper-learning.

To faciliate the use of hyper-learned models by others, here we provide MATLAB code that, given a set of training document sets, will hyper-learn the best family of distributions.

Download hyper-learning code (hyper-learning.zip, 26k).

Learn More About Hyper-Learning
  1. Jaime Teevan and David R. Karger. Empirical Development of an Exponential Probabilistic Model for Text Retrieval: Using Textual Analysis to Build a Better Model. In Proceedings of the 26th Annual ACM Conference on Research and Development in Information Retrieval (SIGIR '03), Toronto, Canada, 2003 (Presenter). [ps, pdf, slides: ppt]

  2. Jaime Teevan and David R. Karger Empirical Development of an Exponential Probabilistic Model for Text Retrieval. MIT Laboratory for Computer Science Abstract, 2003. [pdf]

  3. Jason D. M. Rennie, Lawrence Shih, Jaime Teevan and David R. Karger. Tackling the Poor Assumptions of Naive Bayes Text Classifiers. In Proceedings of the Twentieth International Conference on Machine Learning (ICML '03), 2003. [ps, pdf]

  4. Jaime Teevan and David R. Karger. Finding an Expontential Model for Text Retrieval through Textual Analysis. MIT Artificial Intelligence Laboratory Abstract, 2002. [pdf]

  5. Jaime Teevan. Improving Information Retrieval with Textual Analysis: Bayesian Models and Beyond. Master's thesis, Massachusetts Institute of Technology, 2001. [ps, pdf]

  6. Jaime Teevan. Bayesian Model for Information Retrieval. MIT Artificial Intelligence Laboratory Abstract, 2000. [pdf, ps]


Email Jaime Teevan if you have code or papers that are relevant to hyper-learning and would be appropriately linked to from this site.