This file describes the table used to identify head-words in the papers

Three Generative, Lexicalised Models for Statistical Parsing  (ACL/EACL97)
A New Statistical Parser Based on Bigram Lexical Dependencies (ACL96)

There are two parts to this file:

[1] an email from David Magerman describing the head-table used in
    D. Magerman. 1995. Statistical Decision-Tree Models for Parsing.
    {\it Proceedings of the 33rd Annual Meeting of
    the Association for Computational Linguistics}, pages 276-283.

[2] A modified version of David's head-table which I used in my experiments.

Many thanks to David Magerman for allowing me to distribute his table.


[1]

From magerman@bbn.com Thu May 25 13:48 EDT 1995
Posted-Date: Thu, 25 May 1995 13:48:07 -0400
Received-Date: Thu, 25 May 1995 13:48:43 +0500
Message-Id: <199505251748.NAA02892@thane.bbn.com>
To: mcollins@gradient.cis.upenn.edu, robertm@unagi.cis.upenn.edu,
        mitch@linc.cis.upenn.edu
Cc: magerman@bbn.com
Subject: Re: Head words table 
In-Reply-To: Your message of "Thu, 25 May 1995 13:17:14 EDT."
             <9505251717.AA17874@gradient.cis.upenn.edu> 
Date: Thu, 25 May 1995 13:48:07 -0400
From: David Magerman <magerman@bbn.com>
Content-Type: text
Content-Length: 2972


Hi all.  Mike and Robert asked me for the Tree Head Table, so I
thought I'd pass it along to everyone in one shot.  Feel free to
distribute it to whomever at Penn wants it.

Note that it's not complete, and that I've invented a tag (% for the
symbol %) and a label (NP$ for NP's that end in POS).  I also have
some optional mapping mechanisms that: (a) convert to_TO -> to_IN when
in a prepositional phrase and (b) translate (PRT x_RP) -> (ADVP x_RB),
thus mapping away the distinction between particles and adverbs.  I
currently use transformation (b) in my parser, but don't use (a).
These facts may or may not be relevant, depending on how you want to
use this table.

Cheers,
-- David

Tree Head Table
---------------

Instructions:

1. The first column is the non-terminal.  The second column indicates
where you start when you are looking for a head (left is for
head-initial categories, right is for head-final categories).  The
rest of the line is a list of non-terminal and pre-terminal categories
which represent the head rule.

2. ** is a wildcard value.  Any non-terminal with ** in its rule means
that anything can be its head.  So, for a head-initial category, **
means the first word is always the head, and for a head-final
category, ** means the last word is always the head.  In most cases,
** means I didn't investigate good head rules for that category, so it
might be worthwhile to do so yourself.

3. The Tree Head Table is used as follows:

	a. Use tree head rule based on NT category of constituent
	b. For each category X in tree head rule, scan the children of
           the constituent for the first (or last, for head-final)
           occurrence of category X.  If
           X occurs, that child is the head.
        c. If no child matches any category in the list, use the first
           (or last, for head-final) child as the head.

4. I treat the NP category as a special case.  Before consulting the
head rule for NP, I look for the rightmost child with a label
beginning with the letter N.  If one exists, I use that child as the
head.  If no child's tag begins with N, I use the tree head rule.

ADJP	right	% QP JJ VBN VBG ADJP $ JJR JJS DT FW **** RBR RBS RB
ADVP	left	RBR RB RBS FW ADVP CD **** JJR JJS JJ
CONJP	left	CC RB IN
FRAG	left	**
INTJ	right	**
LST	left	LS :
NAC	right	NN NNS NNP NNPS NP NAC EX $ CD QP PRP VBG JJ JJS JJR ADJP FW
NP	right	EX $ CD QP PRP VBG JJ JJS JJR ADJP DT FW RB SYM PRP$
NP$	right	NN NNS NNP NNPS NP NAC EX $ CD QP PRP VBG JJ JJS JJR ADJP FW SYM
PNP	right	**
PP	left	IN TO FW
PRN	left	**
PRT	left	RP
QP	right	CD NCD % QP JJ JJR JJS DT
RRC	left	VP NP ADVP ADJP PP
S	right	VP SBAR ADJP UCP NP
SBAR	right	S SQ SINV SBAR FRAG X
SBARQ	right	SQ S SINV SBARQ FRAG X
SINV	right	S VP VBZ VBD VBP VB SINV ADJP NP
SQ	right	VP VBZ VBD VBP VB MD SQ
UCP	left	**
VP	left	VBD VBN MD VBZ TO VB VP VBG VBP ADJP NP
WHADJP	right	JJ ADJP
WHADVP	left	WRB
WHNP	right	WDT WP WP$ WHADJP WHPP WHNP
WHPP	left	IN TO FW
X	left	**


[2]

Here's the head table which I used in my experiments below. The first column
is just the number of fields on that line. Otherwise, the format is the same
as David's.

Ignore the row for NPs -- I use a special set of rules for this. For these
I initially remove ADJPs, QPs, and also NPs which dominate a possesive 
(tagged POS, e.g.  (NP (NP the man 's) telescope ) becomes 
(NP the man 's telescope)). These are recovered as a post-processing stage 
after parsing. The following rules are then used to recover the NP head:

If the last word is tagged POS, return (last-word);

Else search from right to left for the first child which is an NN, NNP, NNPS, NNS, NX, POS, or JJR
                       
Else search from left to right for first child which is an NP

Else search from right to left for the first child which is a $, ADJP or PRN

Else search from right to left for the first child which is a CD

Else search from right to left for the first child which is a JJ, JJS, RB or QP

Else return the last word


20 ADJP	0	NNS QP NN $ ADVP JJ VBN VBG ADJP JJR NP JJS DT FW RBR RBS SBAR RB
15 ADVP	1	RB RBR RBS FW ADVP TO CD JJR JJ IN NP JJS NN
5 CONJP	1	CC RB IN
2 FRAG	1	
2 INTJ	0	
4 LST	1	LS :
19 NAC	0	NN NNS NNP NNPS NP NAC EX $ CD QP PRP VBG JJ JJS JJR ADJP FW
8 PP	1	IN TO VBG VBN RP FW
2 PRN	0	
3 PRT	1	RP
14 QP	0	$ IN NNS NN JJ RB DT CD NCD QP JJR JJS
7 RRC	1	VP NP ADVP ADJP PP
10 S	0	TO IN VP S SBAR ADJP UCP NP
13 SBAR	0	WHNP WHPP WHADVP WHADJP IN DT S SQ SINV SBAR FRAG
7 SBARQ	0	SQ S SINV SBARQ FRAG
12 SINV	0	VBZ VBD VBP VB MD VP S SINV ADJP NP
9 SQ	0	VBZ VBD VBP VB MD VP SQ
2 UCP	1	
15 VP	0	TO VBD VBN MD VBZ VB VBG VBP VP ADJP NN NNS NP
6 WHADJP	0	CC WRB JJ ADJP
4 WHADVP	1	CC WRB
8 WHNP	0	WDT WP WP$ WHADJP WHPP WHNP
5 WHPP	1	IN TO FW