Nuggeteer: Nugget-Based IR Evaluation

Nuggeteer is a tool for evaluating TREC definition and relationship questions, the AQUAINT opinion questions, and complex interactive question answering (ciQA), all of which can be described as nugget-based tasks.


Submit your TREC formatted answer file for automated evaluation. Results will be emailed back to you.

Evaluation Year:  
Email:  
File:  
Thanks to Yuan K. Shen for the web+email interface



Read more:
(BibTex) Gregory Marton. Nuggeteer: Automatic Nugget-Based Evaluation using Descriptions and Judgements. MIT CSAIL Work Product 1721.1/30604. January, 2006.
(BibTex) Gregory Marton, Alexey Radul. Nuggeteer: Automatic Nugget-Based Evaluation using Descriptions and Judgements. In Proceedings of HLT-NAACL. July, 2006 forthcoming

To download Nuggeteer, please fill out this form:

Name: (optional)
Organization: (optional)
Email: (optional)
will only be used for announcements
like major bugfixes and new releases.
Developer? you will be added to the developer
mailing list, and get subversion access.

Changes

0.8.1Jan 31, 2008
  • New settings included for O2007 and ciqa
  • Added per-judgement documentation in the output
  • Work on the (unsupported) training system:
    • supporting Condor (ruby Condor library not included),
    • recognizing RUN-\d+-\d+,
    • added automatic plotting,
    • misc bugfixes
0.8Feb 6, 2007
  • Tested working with Debian GNU/Linux, Mac OS X, and Win/Cygwin (not Vista)
  • User interface - data files now encapsulated in settings/*.pl files so you no longer have to specify them on the command line
  • Added art/arj methods in src/bayes-threshold/ to fix overtraining problem presented in ACL 2006 paper. NOTE: this is not used by default, because it is a ruby package. If you have ruby, please try it. It will be distribution-tested in an upcoming release.
  • Full results of testing now in settings/*.taus
  • No longer distributing .eval files because interpretation was unclear.
  • Added datasets: O2006 and ciqa2006 -- de-anonymized with data from NIST
  • Retrained datasets: O2005 and R2005 -- due to bugfixes and de-anonymized as above O2004 and D2003 -- due to bugfixes in nuggeteer
  • Jdbs now encapsulate all they need, so no longer prone to errors due to different settings used when running than when a jdb was built.
  • Using Data::Dumper instead of recalculating each time, when BerkeleyDB is not available.
  • Allows users to specify a --runtag
  • Bugs fixed:
    • we used to assume that if a response set contained a nugget, then all responses not marked with that nugget did not contain that nugget, but this is false: they are ambiguous, because assessors only mark the "best" response. This leads to much larger jdb files.
    • we misinterpreted lines in ill-formed input files without warning.
    • we did not warn about inconsistencies in existing judgements.
    • we used to use the Kendall's tau gamma variant non-standardly, whereas Kendall's tau beta makes more sense for this task. Thanks to Jimmy Lin and Dina Demner-Fushman for help on this.
  • Better support for training: calculates multiple decision methods at once Allows alphanumeric, rather than just numeric, question ids.
0.6Feb 21, 2005
  • Tested working out of the box on Debian GNU/Linux, Mac OS X, and Windows with Cygwin.
  • BerkeleyDB and MLDBM are now optional.
  • Command line options set best performing values for each data set.
  • Added results and settings for AQUAINT Opinion pilot 2005 data
  • Added support for Lin and Demner-Fushman's nugget pyramids. (autodetected)
  • Enclosed cross-validation results for every combination of settings in doc/.
  • Many interface fixes. Thanks to Christine Moran and Becky Passonneau for their very helpful feedback and beta testing!
  • Some substantive bugs fixed in how duplicate n-grams are treated, and silly failures in stemming fixed. Thanks to Alexey Radul for noticing score > 1! Impact on published values was minimal.
  • Added an interface for per-nugget and per-question thresholds from an external source. Thanks, in part, to Alexey Radul.
  • Improved the automatic tests.
  • Updated doc/nuggeteer.pdf upon learning of MITRE's Qaviar system.
  • fixed cdft compile error on Windows w/cygwin -- thanks to Yuan Shen
0.5Jan 10, 2005
    initial release

Acknowledgements