The *parsed.gz files contain N-best parse trees, one tree per line.
Preterminals and terminals are included as a word and POS tag joined
by a "/" character, e.g. "dog/NN".  Nesting relations are indicated by
parentheses, with nonterminal labels attached to the open-paren that
begins each constituent.  Each label is a 4-tuple delimited by "~"
characters: the first element is the actual label of the constituent,
the second element is the headword of the constituent, the third
element is the number of children in the constituent, and the fourth
element is the (1-origin) index of the head child among the children.
For instance, "(NP~dog~2~2" indicates an NP headed by "dog" where the
head child is the second of two children.  

NB: The child counts and head child indices do not include
punctuation, where punctuation is defined as any element whose POS tag
begins with "PUNC" (e.g., ",/PUNC,").  When you are trying to select
the head child, you must skip past any PUNC nodes in the sequence of
children.


The *scored.gz files contain entries that are line-by-line matched
with the contents of the *parsed.gz files.  Each line contains several
numerical values.  The first column is the example index, the second
column is the candidate index, the third column gives an objective
score of the parse, and the fifth column is the log-probability of
the parse.  The values of the other columns are unrelated to reranking
and do not appear in all of the *scored.gz files.  In particular, the
sec00scored.gz and sec2224.scored.gz files have their third and fourth
columns zeroed, as they are used as test data.  

NB: The *parsed.gz files contain an "ID" entry that is temptingly
similar to an example index.  However, the ID periodically cycles,
probably due to batching of the parses.  Use the example and candidate
indices given in the *scored.gz files.
