Papers

<head>
  <title>Papers and talk slides</title>
</head>

<!------------------------------------------------------------>
<h1>Papers</h1>
<ul>

<li>
An End-to-End Discriminative Approach to Machine Translation.
Percy Liang, Alexandre Bouchard-C&ocirc;t&eacute;, Dan Klein, and Ben Taskar.
ACL 2006.<p>

<li>
Alignment by Agreement.
Percy Liang, Ben Taskar, and Dan Klein.
HLT-NAACL 2006.<p>

<li>
Learning Non-Generative Grammatical Models for Document Analysis.
Michael Shilman, Percy Liang, and Paul Viola.
ICCV 2005.<p>

<li>
<a href="geometric-parsing-icdar2005.pdf">
Efficient Geometric Algorithms for Parsing in Two Dimensions</a>.
Percy Liang, Mukund Narasimhan, Michael Shilman, and Paul Viola.
ICDAR 2005 (draft).<p>

Grammars are a powerful technique for modeling and extracting the structure of
documents. One large challenge, however, is computational complexity. The
computational cost of grammatical parsing is related to both the complexity of
the input and the ambiguity of the grammar. For programming languages, where
the terminals appear in a linear sequence and the grammar is unambiguous,
parsing is O(N). For natural languages, which are linear yet have an ambiguous
grammar, parsing is O(N^3). For documents, where the terminals are arranged in
two dimensions and the grammar is ambiguous, parsing time can be exponential in
the number of terminals. In this paper we introduce (and unify) several types
of geometrical data structures which can be used to significantly accelerate
parsing time. Each data structure embodies a different geometrical constraint
on the set of possible valid parses. These data structures are very general, in
that they can be used by any type of grammatical model, and a wide variety of
document understanding tasks, to limit the set of hypotheses examined and
tested. Assuming a clean design for the parsing software, the same parsing
framework can be tested with various geometric constraints to determine the
most effective combination.

<li>
<a href="meng-thesis.pdf">
Semi-Supervised Learning for Natural Language</a>.
MIT Masters of Engineering (MEng) Thesis, 2005.<p>

Statistical supervised learning techniques have been successful for many
natural language processing tasks, but they require labeled datasets, which can
be expensive to obtain.  On the other hand, unlabeled data (raw text) is often
available ``for free'' in large quantities.  Unlabeled data has shown promise
in improving the performance of a number of tasks, e.g. word sense
disambiguation, information extraction, and natural language parsing.<p>

In this thesis, we focus on two segmentation tasks, named-entity recognition
and Chinese word segmentation.  The goal of named-entity recognition is to
detect and classify names of people, organizations, and locations in a
sentence.  The goal of Chinese word segmentation is to find the word boundaries
in a sentence that has been written as a string of characters without spaces.<p>

Our approach is as follows: In a preprocessing step, we use raw text to cluster
words and calculate mutual information statistics.  The output of this step is
then used as features in a supervised model, specifically a global linear model
trained using the Perceptron algorithm.  We also compare Markov and semi-Markov
models on the two segmentation tasks.  Our results show that features derived
from unlabeled data substantially improves performance, both in terms of
reducing the amount of labeled data needed to achieve a certain performance
level and in terms of reducing the error using a fixed amount of labeled data.
We find that sometimes semi-Markov models can also improve performance over
Markov models.

<li>
<a href="hypercycle-2005.pdf">
A Data Structure for Maintaing Acyclicity in Hypergraphs</a>.
Percy Liang and Nathan Srebro.
2005.
<p>

Acyclicity is an important property of hypergraphs which has applications in
many areas such as graphical models and relational databases.  Our
contributions in this paper are two-fold: First, we present two new
characterizations of a hyperforest (equivalently, acyclic hypergraph or
triangulated graph) through a hierarchical decomposition of the hyperforest and
through the lack of hypercycles, a concept defined in this paper.
Second, we present the first efficient dynamic data structure for maintaining
acyclicity in a hypergraph.  The data structure uses as a building block
Tarjan's Union-Find data structure (which can be used to maintain acyclicity in
graphs) to achieve an amortized expected query time that has an inverse
Ackermann dependence on the number of vertices.  To demonstrate the
practicality of this data structure, we conduct experiments using our data
structure to construct high-weight hyperforests.<p>

<li>
<a href="experiments.pdf">
Methods and Experiments With Bounded Tree-width Markov Networks</a>.
Percy Liang and Nathan Srebro.
MIT Technical Report, 2004.<p>

Markov trees generalize naturally to bounded tree-width Markov networks, on
which exact computations can still be done efficiently.  However, learning the
maximum likelihood Markov network with tree-width greater than 1 is NP-hard, so
we discuss a few algorithms for approximating the optimal Markov network.  We
present a set of methods for training a density estimator.  Each method is
specified by three arguments: tree-width, model scoring metric (maximum
likelihood or minimum description length), and model representation (using one
joint distribution or several class-conditional distributions).  On these
methods, we give empirical results on density estimation and classification
tasks and explore the implications of these arguments.<p>

<li>
<a href="maxwmfarm.pdf">
How Much Of A Hypertree Can Be Captured By Windmills?</a>
Percy Liang and Nathan Srebro.
MIT Technical Report, 2003.<p>

Current approximation algorithms for maximum weight hypertrees
find heavy windmill farms, and are based on the fact that a
constant ratio (for constant width k) of the weight of a
k-hypertree can be captured by a k-windmill farm.  However, the
exact worst case ratio is not known and is only bounded to be between
1/(k+1)! and 1/(k+1).  We investigate this worst case ratio by
searching for weighted hypertrees that minimize the ratio of their
weight that can be captured with a windmill farm.  To do so, we
use a novel approach in which a linear program is used to find ``bad''
inputs to a dynamic program.<p>

</ul>

<!------------------------------------------------------------>
<h1>Talk slides</h1>
<ul>

<li>
<a href="mcmaster2005-slides.pdf">
Linear Programming in Bounded Tree-width Markov Networks</a>.
2005 Mathematical Programing for
Data Mining and Machine Learning Workshop at McMaster University.<p>

<li>
<a href="lids2003-slides.pdf">
Understanding Approximations of Maximum Likelihood Bounded Tree-width Markov Networks</a>.
LIDS Student Conference, 2003.<p>

</ul>