I am a post-doc researcher at MIT CSAIL, working with Michael Collins.
My research is in machine learning and natural language processing.
I am interested in learning computational models
for syntactic-semantic analysis of natural languages.
I have worked in syntactic parsing, semantic role labeling, named entity extraction,
among other tasks.
egstra:
Exponentiated-Gradient algorithms for STRuctured prediction. This C++
code implements EG algorithms to train max-margin and log-linear models
for structured prediction tasks. The package includes other
popular learning algorithms as well, namely Perceptron and SGD.
The current package includes algorithms for
first-order dependency parsing. Much more will come soon.
boostree :
A C++ implementation of Schapire and Singer's AdaBoost.MH, with
decision-tree weak learning. The algorithm is suited for multi-label
classification problems, and sparse example representations based
on binary-valued features (typical for NLP problems). This code was
used to develop many CoNLL shared task systems which
obtained state-of-the-art-results.
Check the documentation of the package.
Publications
PhD Thesis
Learning and Inference in Phrase Recognition: A Filtering-Ranking Architecture using Perceptron,
Xavier Carreras,
Ph.D Dissertation, Technical University of Catalonia, October 2005.
[pdf][ps.gz][slides pdf][bib][code][README file of the code]
Journal Articles
Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks,
Michael Collins, Amir Globerson, Terry Koo, Xavier Carreras and Peter Bartlett,
JMLR, Volume 9, pages 1775-1822, 2008.
[article]
Semantic Role Labeling: An Introduction to the Special Issue,
Lluís Màrquez, Xavier Carreras, Ken Litkowski and Suzanne Stevenson,
Computational Linguistics 34(2), pages 145-159, 2008.
[article]
Combination Strategies for Semantic Role Labeling,
Mihai Surdeanu, Lluís Màrquez, Xavier Carreras and Pere Comas,
JAIR, Volume 29, pages 105-151, 2007.
[article][bib]
Filtering-Ranking Perceptron Learning for Partial Parsing,
Xavier Carreras, Lluís Màrquez and Jorge Castro,
Machine Learning Journal, Special Issue on Learning in Speech and Language Technologies, Volume 60, Issue 1-3, pages 41-71, Sept. 2005.
[ps.gz][pdf][bib][code][README file of the code]
Margin maximization with feed-forward neural networks: a comparative study with SVM and AdaBoost,
Enrique Romero, Lluís Màrquez and Xavier Carreras,
Neurocomputing, 57(C):313-344, 2004.
[pdf][ps][bib]
A Proposal for Wide-Coverage Spanish Named Entity Recognition,
Montse Arévalo, Xavier Carreras, Lluís Màrquez, Toni Martí, Lluís Padró and Maria José Simon,
SEPLN Journal "Procesamiento del Lenguaje Natural", Vol. 28. May 2002.
WordMed and Scriptum: Development of Terminological Resources for the Medical Practitioner,
Victoria Arranz, Jordi Turmo, Xavier Carreras and Montserrat Arévalo,
Terminology 7(1), John Benjamins Publishing Co., Amsterdam. 2001.
Papers in Conferences
2008
TAG, Dynamic Programming, and the Perceptron for Efficient, Feature-rich Parsing,
Xavier Carreras, Michael Collins and Terry Koo,
In Proceedings of CoNLL-2008, best paper award.
[pdf][slides]
Simple Semi-supervised Dependency Parsing,
Terry Koo, Xavier Carreras and Michael Collins,
In Proceedings of ACL-2008.
[pdf]
2007
Exponentiated Gradient Algorithms for Log-Linear Structured Prediction,
Amir Globerson, Terry Koo, Xavier Carreras and Michael Collins,
In Proceedings of ICML 2007.
[pdf][bib]
Structured Prediction Models via the Matrix-Tree Theorem,
Terry Koo, Amir Globerson, Xavier Carreras and Michael Collins,
In Proceedings of EMNLP-CoNLL 2007.
[pdf][bib]
Experiments with a Higher-Order Projective Dependency Parser,
Xavier Carreras,
In Proceedings of the EMNLP-CoNLL 2007 Shared Task.
[pdf][ps][bib]
2006
Projective Dependency Parsing with Perceptron,
Xavier Carreras, Mihai Surdeanu and Lluís Màrquez,
Proceedings of the CoNLL-X Shared Task, New York 2006.
[pdf][ps][slides in pdf]
Exploiting Diversity of Margin-based Classifiers,
Enrique Romero, Xavier Carreras and Lluís Màrquez,
In Proceedings of the 2004 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, July 2004.
[pdf]
Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling,
Xavier Carreras and Lluís Màrquez,
In Proceedings of the CoNLL-2004 Shared Task, Boston, MA USA. May 2004.
[ps][ps.gz][pdf]
Hierarchical Recognition of Propositional Arguments with Perceptrons,
Xavier Carreras, Lluís Màrquez and Grzegorz Chrupała,
In Proceedings of the CoNLL-2004 Shared Task, Boston, MA USA. May 2004.
[pdf]
FreeLing: An Open-Source Suite of Language Analyzers,
Xavier Carreras, Isaac Chao, Lluís Padró and Muntsa Padró,
In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC'04). Lisbon, Portugal. 2004.
[pdf][FreeLing homepage]
2003
Online Learning via Global Feedback for Phrase Recognition,
Xavier Carreras and Lluís Màrquez,
NIPS-2003, Vancouver, Canada. December 2003.
[ps]
Phrase Recognition by Filtering and Ranking with Perceptrons,
Xavier Carreras and Lluís Màrquez,
RANLP-2003, Borovets, Bulgaria. September 2003.
[ps.gz][pdf][slides pdf]
A Simple Named Entity Extractor Using AdaBoost,
Xavier Carreras, Lluís Màrquez and Lluís Padró,
Proceedings of the CoNLL-2003 Shared Task. Edmonton, Canada. June 2003.
[ps]
Learning a Perceptron-Based Named Entity Chunker via Online Recognition Feedback,
Xavier Carreras, Lluís Màrquez and Lluís Padró,
Proceedings of the CoNLL-2003 Shared Task. Edmonton, Canada. June 2003.
[ps]
Low-cost Named Entity Classification for Catalan: Exploiting Multilingual Resources and Unlabeled Data,
Lluís Màrquez, Adrià de Gispert, Xavier Carreras and Lluís Padró,
1st ACL Workshop on Multilingual and Mixed-language Named Entity Recognition:
Combining Statistical and Symbolic Models. Sapporo, Japan. July 2003.
[ps][pdf]
Named Entity Recognition For Catalan Using Spanish Resources,
Xavier Carreras, Lluís Màrquez and Lluís Padró,
10th Conference of the European Chapter of the Association for Computational
Linguistics (EACL'03). Budapest, Hungary. April 2003.
[ps][pdf]
2002
Named Entity Extraction Using AdaBoost,
Xavier Carreras, Lluís Màrquez and Lluís Padró,
In Proceedings of the CoNLL-2002 Shared Task, Taipei, Taiwan, September 2002.
[ps.gz][pdf][slides ps]
Learning and Inference for Clause Identification,
Xavier Carreras, Lluís Màrquez, Vasin Punyakanok and Dan Roth,
13th European Conference on Machine Learning (ECML'02). Helsinki, Finland. August 2002.
[ps.gz][pdf][pdf slides]
Wide-Coverage Spanish Named Entity Extraction,
Xavier Carreras, Lluís Màrquez and Lluís Padró,
VIII Conferencia Iberoamericana de Inteligencia Artificial, IBERAMIA'02. Sevilla, Spain. November 2002.
[ps.gz]
A Flexible Distributed Architecture for Natural Language Analyzers,
Xavier Carreras and Lluís Padró,
Conference on Language Resources and Evaluation (LREC'02). Las Palmas de Gran Canaria, Spain. 2002.
[ps.gz]
2001
Boosting Trees for Clause Splitting,
Xavier Carreras and Lluís Màrquez,
In Proceedings of the CoNLL-2001 Shared Task. Toulouse, France. 2001.
[ps][bib][talk slides]
Boosting Trees for Anti-Spam Email Filtering,
Xavier Carreras and Lluís Màrquez,
Conference on Recent Advances in NLP (RANLP'01). Tzigov Chark, Bulgaria. 2001.
[ps][bib][ps slides]