Statistical Machine Translation Reading Group
Organizer: Philipp Koehn
Statistical Machine Translation
The dream of translation of documents from foreign languages into English (or between any two languages) by computer is one of the oldest persuits of artificial intelligence research. Now, armed with vast amounts of digitally available translations and powerful computers, we can witness significant progress toward achieving that dream. Statistical methods allow the analysis of these so-called parallel text corpora and the automatic construction of machine translation systems. Already, for some language pairs such as Chinese-English or Arabic-English, the best automatic translation systems available today are such statistical systems.

Reading Group
Since there is increasing interest in statistical machine translation (SMT) around the world and at MIT, our reading group meets to review research and discuss new ideas.
If you have any questions, please send me email.
Sign up on the mailing list!

Please volunteer!
It would be great, if you could volunteer to present a paper or even your own research.

Schedule
Thursday, November 18, 2pm, Room 32-G451 Presenter: David Kauchak
Papers: "Robust Sub-Sentential Alignment of Phrase-Structure Trees", Declan Groves, Mary Hearne and Andy Way (Dublin City University)
pdf
Thursday, November 4, 2pm, Room 32-G451 Presenter: Philipp Koehn
Papers: "Example-based Machine Translation Based on Syntactic Transfer with Statistical Models", Kenji Imamura, Hideo Okuma, Taro Watanabe, and Eiichiro Sumita (ATR);
"Hierarchical Phrase Alignment Harmonized with Parsing", Kenji Imamura (ATR)
pdf
pdf
Thursday, October 28, 2pm, Room 32-G451 Presenter: Luke Zettlemoyer
Paper: "Syntax-Based Alignment: Supervised or Unsupervised", Hao Zhang and Daniel Gildea (Univ. Rochester)
pdf
Thursday, October 21, 2pm, Room 32-G451 Presenter: Philipp Koehn
Paper: "A Path-Based Transfer Model for Machine Translation", Dekang Lin (Univ. Alberta)
pdf
Thursday, July 15, 2pm, Room 32-261 Presenter: Philipp Koehn
Paper: "Improving a Statistical MT System with Automatically Learned Rewrite Patterns", Fei Xia and Michael McCord (IBM)
pdf
Thursday, July 8, 2pm, Room 32-261 Presenter: Brooke Cowan
Paper: "Greedy Decoding for Statistical Machine Translation in Almost Linear Time", Ulrich Germann (ISI)
pdf
Thursday, July 1, 2pm, Room 32-261 Presenter: Philipp Koehn
I will share some impressions from the DARPA MT Eval Workshop, which took place last week in Washington, DC.
-
Thursday, June 10, 2pm, Room 32-G451 Presenter: Philipp Koehn
Paper: Discriminative Reranking for Statistical Machine Translation Shen, Sarkar, and Och
-
Thursday, May 14, 2pm, Room 32-G451 Presenter: Philipp Koehn
Let's meet tomorrow and chat about the new papers in statistical MT presented at HLT-NAACL. I will also share some lessons from this year's DARPA MT Eval, which is going on right now.
-
Thursday, April 29, 2pm, Room 32-261 Presenter: Philipp Koehn
Paper: "Minimum Error Rate Training in Statistical Machine Translation", Franz Och (ISI)
pdf
Thursday, April 22, 2pm, Room 32-261 Presenter: Michael Collins
Paper: "Head Automata and Bilingual Tiling: Translation with Minimal Representations", Hiyan Alshawi (AT&T)
pdf
Thursday, April 15, 2pm, Room 32-261 Presenter: Philipp Koehn
Tutorial: "My statistical machine translation system: A look under the hood"
Everbody should have a rough understanding of the phrase translation model that I discussed in the tutorial. Check my NAACL paper for some background. Tomorrow, I will open the hood and offer a look into the inner workings, the data structure files, etc. At the end of the day, you will be able to train and run your own statistical machine translation system.
paper
handout
manual
Thursday, April 8, 3pm, Room 32-261 Presenter: Luke Zettlemoyer
Paper: "What's in a translation rule?", Michael Galley (Columbia), Mark Hopkins (UCLA), Kevin Knight and Daniel Marcu (USC)
Abstract: We propose a theory that gives formal semantics to word-level alignments defined over parallel corpora. We use our theory to introduce a linear algorithm that can be used to derive from word-aligned, parallel corpora the minimal set of syntactically motivated transformation rules that explain human translation data.
pdf
Thursday, April 1, 3pm, Room 32-261 Presenter: David D. Palmer, Virage Advanced Technology Group
Talk: "Statistical Machine Translation in Real-time Multilingual Video Processing"
Recent advances in statistical machine translation approaches have significantly improved the speed of MT systems and the readability of their output. These advances have enabled the integration of Statistical MT into large-scale language processing environments. I will discuss and demonstrate the use of MT in a fully-automated real-time broadcast news video and audio processing system. The system combines speech recognition, statistical machine translation, and cross-lingual information retrieval components to enable real-time search and alerting from live English, Arabic, and Mandarin news sources.
-
Thursday, March 18, 3pm, 8th floor playroom Presenter: Philipp Koehn
Tutorial: "Introduction to Statistical Machine Translation", part 3
This Thursday we will finish up the tutorial, which will be about the latest developments and ideas in statistical machine translation: the currently best-performing method called phrase-based MT, efforts to make use of syntax and discriminative training.
-
Wednesday, March 10, 4pm, 8th floor playroom Presenter: Philipp Koehn
Tutorial: "Introduction to Statistical Machine Translation", part 2
I will cover in more detail the EM algorithm, generative models such as IBM Model 4.
ps
Thursday, February 26, 4pm, 8th floor playroom Presenter: Philipp Koehn
Tutorial: "Introduction to Statistical Machine Translation", part 1
As an introduction Philipp Koehn will be going over a tutorial on SMT that he presented last year together with Kevin Knight at HLT/NAACL and the MT SUMMIT conferences. This will provide a gentle introduction to the state of the art.
ps