ps.gz (130 pages, 1.2M) or
pdf (0.9M)
The purpose of this thesis is to propose and to study a class of
models that admits tractable inference and learning algorithms yet is
rich enough for practical applications. This class is the class of
mixtures of trees models. Mixtures of trees inherit the excellent
computational properties of tree distributions (themselves a subset
of graphical models) but combine several trees in order to augment
their modeling power, thereby going beyond the standard graphical
model framework.
The thesis demonstrates the performance of the mixture of trees in
density estimation and classification tasks. In the same time it
deepens the understanding of the properties of the tree distribution
as a multivariate density model. Among others, it shows that the tree
classifier implements an implicit variable selection mechanism.
Learning mixtures of trees from data is a central subject of this thesis. The learning algorithm that is introduced here is based on the the EM and the Minimum
Weight Spanning Tree algorithms and is quadratic in the dimension of
the domain.
This algorithm can serve as a tool for discovering hidden variables in
a special but important class of models where, conditioned on the
hidden variable, the dependencies between the observed variables
become sparse.
Finally, it is shown that in the case of sparse discrete data, the
original learning algorithm can be transformed in an algorithm that
is jointly subquadratic and that in simulations achieves speedups
factors of up to a thousand.
Marina Meila-Predoviciu
Contents
(see also the Table of Contents)
Abstract
One of the challenges of density estimation as it is used in machine
learning is that usually the data are multivariate and often the
dimensionality is large. Operating with joint distributions over
multidimensional domains raises specific problems that are not
encountered in the univariate case. Graphical models are
representations of joint densities that are specifically tailored to
address these problems. They take advantage of the (conditional)
independencies between subsets of variables in the domain which they
represent by means of a graph. When the graph is sparse, graphical
models provide an excellent support for human intuition and allow for
efficient inference algorithms. However, learning the underlying
dependence graph from data is generally NP-hard.
Learning with Mixtures of Trees
Marina Meila
back to top of page