Nuggeteer: Nugget-Based IR Evaluation
Nuggeteer is a tool for evaluating TREC definition and relationship
questions, the AQUAINT opinion questions, and complex interactive
question answering (ciQA), all of which can be described as
(BibTex) Gregory Marton. Nuggeteer: Automatic
Nugget-Based Evaluation using Descriptions and Judgements. MIT
CSAIL Work Product 1721.1/30604. January, 2006.
(BibTex) Gregory Marton, Alexey Radul. Nuggeteer: Automatic Nugget-Based Evaluation using Descriptions and Judgements. In Proceedings of HLT-NAACL. July, 2006 forthcoming
To download Nuggeteer, please fill out this form:
|0.8.1||Jan 31, 2008||
- New settings included for O2007 and ciqa
- Added per-judgement documentation in the output
- Work on the (unsupported) training system:
- supporting Condor (ruby Condor library not included),
- recognizing RUN-\d+-\d+,
- added automatic plotting,
- misc bugfixes
|0.8||Feb 6, 2007||
- Tested working with Debian GNU/Linux, Mac OS X, and Win/Cygwin (not Vista)
- User interface - data files now encapsulated in settings/*.pl files
so you no longer have to specify them on the command line
- Added art/arj methods in src/bayes-threshold/ to fix overtraining problem
presented in ACL 2006 paper. NOTE: this is not used by default,
because it is a ruby package. If you have ruby, please try it.
It will be distribution-tested in an upcoming release.
- Full results of testing now in settings/*.taus
- No longer distributing .eval files because interpretation was unclear.
- Added datasets: O2006 and ciqa2006 -- de-anonymized with data from NIST
- Retrained datasets:
O2005 and R2005 -- due to bugfixes and de-anonymized as above
O2004 and D2003 -- due to bugfixes in nuggeteer
- Jdbs now encapsulate all they need, so no longer prone to errors due to
different settings used when running than when a jdb was built.
- Using Data::Dumper instead of recalculating each time, when BerkeleyDB
is not available.
- Allows users to specify a --runtag
- Bugs fixed:
- we used to assume that if a response set contained a nugget,
then all responses not marked with that nugget did not
contain that nugget, but this is false: they are ambiguous,
because assessors only mark the "best" response.
This leads to much larger jdb files.
- we misinterpreted lines in ill-formed input files without warning.
- we did not warn about inconsistencies in existing judgements.
- we used to use the Kendall's tau gamma variant non-standardly,
whereas Kendall's tau beta makes more sense for this task.
Thanks to Jimmy Lin and Dina Demner-Fushman for help on this.
- Better support for training: calculates multiple decision methods at once
Allows alphanumeric, rather than just numeric, question ids.
|0.6||Feb 21, 2005||
- Tested working out of the box on Debian GNU/Linux, Mac OS X, and Windows with Cygwin.
- BerkeleyDB and MLDBM are now optional.
- Command line options set best performing values for each data set.
- Added results and settings for AQUAINT Opinion pilot 2005 data
- Added support for Lin and Demner-Fushman's nugget pyramids. (autodetected)
- Enclosed cross-validation results for every combination of settings in doc/.
- Many interface fixes. Thanks to Christine Moran and Becky Passonneau
for their very helpful feedback and beta testing!
- Some substantive bugs fixed in how duplicate n-grams are treated, and
silly failures in stemming fixed. Thanks to Alexey Radul for
noticing score > 1! Impact on published values was minimal.
- Added an interface for per-nugget and per-question thresholds from an
external source. Thanks, in part, to Alexey Radul.
- Improved the automatic tests.
- Updated doc/nuggeteer.pdf upon learning of MITRE's Qaviar system.
- fixed cdft compile error on Windows w/cygwin -- thanks to Yuan Shen
|0.5||Jan 10, 2005|
- Christine Moran and Becky Passonneau, patient and enthusiastic users and testers
- Jimmy Lin and Dina Demner-Fushman for Pourpre
- Alexey Radul's work is supported under a National Science Foundation Graduate Research Fellowship. Any opinions, findings, conclusions, or recommendations expressed here are the authors' and do not necessarily reflect the views of the National Science Foundation.
- Gregory Marton's work is supported under a Defense Advanced Research Projects Agency grant to Dr. Boris Katz, the Infolab.