Recent Publications

Google Scholar's list

M. Korpusik and J. Glass, "Deep Learning for Database Mapping and Asking Clarification Questions in Dialogue Systems,'' Trans. ASLP, 27(8), 2019.

P. Atanasova, P. Nakov, L. Marquez, A. Barron-Cedeno, G. Karadzhov, T. Mihaylova, M. Mohtarami, and J. Glass, "Automatic Fact-Checking Using Context and Discourse Information,'' ACM J. Data and Info. Quality, 11(3), 2019.

M. Nadeem, W. Fang, B. Xu, M. Mohtarami and J. Glass, "FAKTA: An Automatic End-to-End Fact Checking System,'' Proc. NAACL-HLT, Minneapolis, 2019.

R. Baly, G. Karadzhov, A. Saleh, J. Glass, and P. Nakov, "Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media,'' Proc. NAACL-HLT, Minneapolis, 2019.

J. Drexler and J. Glass, "Subword Regularization and Beam Search Decoding for End-to-End Automatic Speech Recognition,'' Proc. ICASSP, Brighton, 2019.

F. Grondin and J. Glass, "SVD-PHAT: A Fast Sound Source Localization Method,'' Proc. ICASSP, Brighton 2019.

D. Harwath and J. Glass, "Towards Visually Grounded Sub-Word Speech Unit Discovery,'' Proc. ICASSP, Brighton 2019.

W. Hsu, Y. Zhang, R. J. Weiss, Y. Chung, Y. Wang, Y. Wu, and J. Glass, "Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization,'' Proc. ICASSP, Brighton, 2019.

S. Khurana, S. R. Joty, A. Ali, and J. Glass, "A Factorial Deep Markov Model for Unsupervised Disentangled Representation Learning from Speech,'' Proc. ICASSP, Brighton, 2019.

M. Korpusik and J. Glass, "Dialogue State Tracking with Convolutional Semantic Taggers,'' Proc. ICASSP, Brighton, 2019.

S. Shon, A. Ali, and J. Glass, "Domain Attentive Fusion for End-to-End Dialect Identification with Unknown Target Domain,'' Proc. ICASSP, Brighton, 2019.

S. Shon, T. Oh, and J. Glass, "Noise-Tolerant Audio-Visual Online Person Verification Using an Attention-based Neural Network Fusion,'' Proc. ICASSP, Brighton, 2019.

Y. Chung, W. Weng, S. Tong, and J. Glass, "Towards Unsupervised Speech-to-Text Translation,'' Proc. ICASSP, 7170-7174, Brighton, 2019.

A. Bau, Y. Belinkov, H. Sajjad, N. Durrani, F. Dalvi, and J. Glass, "Identifying and Controlling Important Neurons in Neural Machine Translation,'' Proc. ICLR, New Orleans, 2019.

T. He and J. Glass, "Detecting Egregious Responses in Neural Sequence-to-Sequence Models,'' Proc. ICLR, New Orleans, 2019.

S. Romeo, G. Da San Martino, Y. Belinkov, A. Barron-Cedeno, M. Eldesouki, K. Darwish, H. Mubarak, J. Glass, and A. Moschitti, "Language Processing and Learning models for Community Question Answering in Arabic,'' Info. Proc. and Management, 56, 2019. 2019.

F. Dalvi, N. Durrani, H. Sajjad, Y. Belinkov, A. Bau, and J. Glass, "What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models,'' Proc. AAAI, Honolulu, 2019.

F. Dalvi, A. Nortonsmith, A. Bau, Y. Belinkov, H. Sajjad, N. Durrani, and J. Glass, "NeuroX: A Toolkit for Analyzing Individual Neurons in Neural Networks,'' Proc. AAAI, Honolulu, 2019.

M. Korpusik and J. Glass, "Convolutional Neural Encoder for the 7th Dialogue System Technology Challenge,'' Proc. AAAI Dialog System Technology Challenges Workshop, Honolulu, 2019.

Y. Belinkov and J. Glass, ``Analysis Methods in Neural Language Processing: A Survey,'' Trans. ACL, 7, 2019.


J. Drexler and J. Glass, "Combining End-to-End and Adversarial Training for Low-Resource Speech Recognition,'' Proc. SLT, Athens, 2018.

M. Korpusik and J. Glass, "Convolutational Neural Networks for Dialogue State Tracking without Pre-Trained Word Vectors or Semantic Dictionaries,'' Proc. SLT, Athens, 2018.

S. Shon, W. Hsu, and J. Glass, "Unsupervised Representation Learning of Speech for Dialect Identification,'' Proc. SLT, Athens, 2018.

H. Tang and J. Glass, "On Training Recurrent Neural Networks with Truncated Backpropagation Through Time in Speech Recognition,'' Proc. SLT, 48-55, Athens, 2018.

B. Xu, M. Mohtarami, and J. Glass, "Adversarial Domain Adaptation for Stance Detection,'' Proc. NeurIPS Workshop, Montreal, 2018.

Y. Chung, W. Weng, S. Tong and J. Glass, "Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces,'' Proc. NeurIPS, Montreal, 2018.

W. Hsu, Y. Zhang, R. Weiss, Y. Chung, Y. Wang, Y. Wu and J. Glass, "Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization,'' Proc. NeurIPS Workshop, Montreal, 2018.

H. Luo and J. Glass, "Learning Word Representations with Cross-Sentence Dependency for End-to-End Co-reference Resolution,'' Proc. EMNLP, 4829-4833, Brussels, 2018.

R. Baly, G. Karadzhov, D. Alexandrov, J. Glass, and P. Nakov, "Predicting Factuality of Reporting and Bias of News Media Sources,'' Proc. EMNLP, Brussels, 2018.

D. Harwath, G. Chuang, A. Torralba, and J. Glass, "Matchmap Networks: Discovering Words and Objects from Speech and Images,'' Proc. ECCV, Munich, 2018.

T. Alhanai, M. Ghassemi and J. Glass, "Detecting Depression with Audio/Text Sequence Modeling of Interviews,'' Proc. Interspeech, Hyderabad, 2018.

Y. Chung, and J. Glass, "Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech,'' Proc. Interspeech, Hyderabad, 2018.

H. Tang, W. Hsu, F. Grondin, and J. Glass, "A Study of Enhancement, Augmentation, and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition,'' Proc. Interspeech, Hyderabad, 2018.

W. Hsu, H. Tang and J. Glass, "Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition,'' Proc. Interspeech, Hyderabad, 2018.

W. Hsu and J. Glass, "Scalable Factorized hierarchical Variational Autoencoder Training,'' Proc. Interspeech, Hyderabad, 2018.

S. Shon, A. Ali, and J. Glass, "Convolutional Neural Networks and Language Embeddings for End-to-End Dialect Recognition, Proc. Odyssey, Les Sables D'Olonne, 2018.

M. Mohtarami, R. Baly, J. Glass, P. Nakov, L. Marquez, and A. Moschitti, "Automatic Stance Detection Using End-to-End Memory Networks,'' Proc. NACCL, New Orleans, 2018.

R. Baly, M. Mohtarami, J. Glass, L. Marquez, A. Moschitti, and P. Nakov, "Integrating Stance Detection and Fact Checking in a Unified Corpus,'' Proc. NACCL, New Orleans, 2018.

T. Alhanai, R. Au, and J. Glass, "Role-specific Language Models for Processing Neuropsychological Exams,'' Proc. NAACL, 746-752, New Orleans, 2018.

Y. Chung, H. Li, and J. Glass, "Supervised and Unsupervised Transfer Learning for Question Answering,'' Proc. NACCL, New Orleans, 2018.

D. Harwath, G. Chuang, and J. Glass, "Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech,'' Proc. ICASSP, Calgary, 2018.

M. Korpusik and J. Glass, "Convolutional Neural Networks and Multitask Strategies for Semantic Mapping of Natural Language Input to a Structured Database,'' Proc. ICASSP, Calgary, 2018.

W. Hsu and J. Glass, "Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition,'' Proc. ICASSP, Calgary, 2018.

M. Najafian, S. Khurana, S. Shon, A. Ali, and J. Glass, "Exploiting Convolutional Neural Networks for Phonotactic-based Dialect Identification,'' Proc. ICASSP, Calgary, 2018.

T. Mihaylova, P. Nakov, L. Marquez, A. Barron-Cedeno, M. Mohtarami, G. Karadzhov, and J. Glass, "Fact Checking in Community Forums,'' Proc. AAAI, New Orleans, 2018.

M. Price, J. Glass, and A. Chandrakasan, "A Low-power Speech Recognizer and Voice Acivity Detector using Deep Neural Networks,'' IEEE J. Solid State Circuits, 53(1), 2018. (PDF)


K. Leidal, D. Harwath, and J. Glass, "Learning Modality-Invariant Representations for Speech and Images,'' Proc. ASRU, Okinawa, 2017. (PDF)

T. Alhanai, R. Au, and J. Glass, "Spoken Language Biomarkers for Detecting Cognitive Impairment,'' Proc. ASRU, Okinawa, 2017. (PDF)

S. Shon, A. Ali, and J. Glass, "MIT-QCRI Arabic Dialect Identification System for the 2017 Multi-Genre Broadcast Challenge,'' Proc. ASRU, Okinawa, 2017. (PDF)

M. Najafian, W. Hsu, A. Ali, and J. Glass, "Automatic Speech Recognition of Arabic Multi-Genre Broadcast Media,'' Proc. ASRU, Okinawa, 2017. (PDF)

W. Hsu, Y. Zhang, and J. Glass, "Unsupervised Domain Adaptation for Robust Speech Recognition Via Variational Autoencoder-based Data Augmentation,'' Proc. ASRU, Okinawa, 2017. (PDF)

Y. Belinkov, L. Marquez, H. Sajjad, N. Durrani, F. Dalvi, and J. Glass, "Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks,'' Proc. IJCNLP, Taipei, 2017. (PDF)

Y. Chung, and J. Glass, "Learning Word Embeddings from Speech,'' NIPS Workshop on Machine Learning for Audio Signal Proc., Long Beach, 2017. (PDF)

W. Hsu, Y. Zhang, and J. Glass, "Unsupervsed Learning of Disentangled Latent Representations from Sequential Data,'' Proc. NIPS, Long Beach, 2017. (PDF)

Y. Belinkov, and J. Glass, "Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems,'' Proc. NIPS, Long Beach, 2017. (PDF)

S. Khurana, M. Najafian, A. Ali, T. Alhanai, Y. Belinkov, and J. Glass, "QMDIS: QCRI-MIT Advanced Dialect Identification System,'' Proc. Interspeech, Stockholm, 2017. (PDF)

M. Korpusik, Z. Collins, and J. Glass, "Character-based Embedding Models and Reranking Strategies for Understanding Natural Language Meal Descriptions,'' Proc. Interspeech, Stockholm, 2017. (PDF)

W. Hsu, Y. Zhang, and J. Glass, "Learning Latent Representations for Speech Generation and Transformation,'' Proc. Interspeech, Stockholm, 2017. (PDF)

X. Feng, B. Richardson, S. Ahman, and J. Glass, "An Environmental Feature Representation for Robust Speech Recognition and for Environmental Identification,'' Proc. Interspeech, Stockholm, 2017. (PDF)

J. Drexler, and J. Glass, "Analysis of Audio-Visual Features for Unsupervised Speech Recognition,'' Proc. Grounded Language Understanding Workshop, Stockholm, 2017. (PDF)

Y. Belinkov, N. Durrani, F. Dalvi, H. Sajjad, and J. Glass, "What do Neural Machine Translation Models Learn about Morphology?," Proc. ACL, Vancouver, 2017. (PDF)

D. Harwath, and J. Glass, "Learning Word-like Units from Joint Audio-Video Analysis," Proc. ACL, Vancouver, 2017. (PDF)

M. Korpusik and J. Glass, "Spoken Language Understanding for a Nutrition Dialogue System," Trans. ASLP, 25(7), 2017. (PDF)

M. Korpusik, Z. Collins, and J. Glass, "Semantic Mapping of Natural Language Input to Database Entries via Convolutional Neural Networks," Proc. ICASSP, New Orleans, 2017. (PDF)

M. Price, J. Glass, and A. Chandrakasen, "A Scalable Speech Recognizer with Deep-Neural-Network Acoustic Models and Voice-Activated Power Gating," Proc. ISSCC, San Francisco, 2017. (PDF)


T. AlHanai, W. Hsu, and J. Glass, "Development of the MIT ASR System for the 2016 Arabic Multi-Genre Broadcast Challenge," Proc. SLT, San Diego, 2016. (PDF)

F. Sun, D. Harwath, and J. Glass, "Look, Listen, and Decode: Multimodal Speech Recognition with Images," Proc. SLT, San Diego, 2016. (PDF)

W. Hsu, Y. Zhang, and J. Glass, "A Prioritized Grid Long Short-Term Memory RNN for Speech Recognition," Proc. SLT, San Diego, 2016. (PDF)

A. Ali, P. Bell, J. Glass, Y. Messaoui, H. Mubarak, S. Renals, and Y. Zhang, "The MGB-2 Challenge: Arabic Multi-Dialect Broadcast Media Recognition," Proc. SLT, San Diego, 2016. (PDF)

Y. Belinkov and J. Glass, "A Character-level Convolutional Neural Network for Distinguishing Similar Languages and Dialects," Proc. Coling VarDial Workshop, Osaka, 2016. (PDF)

S. Romeo, G. Da San Martino, A. Barron-Cedeno, A. Moschitti, Y. Belinkov, W. Hsu, Y. Zhang, M. Mohtarami, and J. Glass, "Neural Attention for Learning to Rank Questions in Community Question Answering," Proc. COLING, Osaka, 2016. (PDF)

D. Harwath, A. Torralba, and J. Glass, "Unsupervised Learning of Spoken Language with Visual Context," Proc. NIPS, Barcelona, 2016. (PDF)

Y. Belinkov and J. Glass, "Large-Scale Machine Translation between Arabic and Hebrew: Available Corpora and Initial Results," Proc. Workshop on Semitic MT, Austin, 2016. (PDF)

W. Hsu, Y. Zhang, A. Lee, and J. Glass, "Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition," Proc. Interspeech, San Francisco, 2016. (PDF)

M. Price, A. Chandrakasan, J. Glass, "Memory-efficient Modeling and Search Techniques for Hardware ASR Decoders," Proc. Interspeech, San Francisco, 2016. (PDF)

A. Ali, N. Dehak, P. Cardinal, S. Khuruna, S. Yella, J. Glass, P. Bell, and S. Renals, "Automatic Dialect Detection in Arabic Broadcast Speech," Proc. Interspeech, San Francisco, 2016. (PDF)

S. Shum, D. Harwath, N. Dehak, and J. Glass, "On the Use of Acoustic Unit Discovery for Language Recognition," Trans. ASLP, 24(9), 2016. (PDF)

H. Nassif, M. Mohtarami, J. Glass, "Learning Semantic Relatedness in Community Question Answering Using Neural Models," Proc. of ACL Workshop on Representation Learning for NLP, Berlin, 2016. (PDF)

M. Mohtarami, Y. Belinkov, W. Hsu, Y. Zhang, T. Lei, K. Bar, S. Cyphers, and J. Glass, "Neural-based Approaches for Ranking in Community Question Answering," Proc. SemEval, 2016. (PDF)

P. Nakov, L. Marquez, A. Moschitti, W. Magdy, H. Mubarak, A. Freihat, J. Glass and B. Randeree, "SemEval-2016 Task 3: Community Question Answering," Proc. SemEval, 2016. (PDF)

M. Korpusik, C. Huang, M. Price, and J. Glass, "Distributional Semantics for Understanding Spoken Meal Descriptions," Proc. ICASSP, Shanghai, 2016. (PDF)

E. Chuangsuwanich, Y. Zhang, and J. Glass, "Multilingual Data Selection for Training Stacked Bottleneck Features," Proc. ICASSP, Shanghai, 2016. (PDF)

A. Lee, N. Chen, and J. Glass, "Personalized Mispronunciation Detection and Diagnosis Based on Unsupervised Error Pattern Discovery," Proc. ICASSP, Shanghai, 2016. (PDF)

Y. Zhang, E. Chuangsuwanich, J. Glass, and D. Yu, "Prediction-Adaption-Correction Recurrent Neural Networks for Low-Resource Language Speech Recognition," Proc. ICASSP, Shanghai, 2016. (PDF)

Y. Zhang, G. Chen, D. Yu, K. Yao, S. Khudanpur, and J. Glass, "Highway Long Short-Term Memory RNNs for Distant Speech Recognition," Proc. ICASSP, Shanghai, 2016. (PDF)


D. Harwath and J. Glass, "Deep Multimodal Semantic Embeddings for Speech and Images," Proc. ASRU, Scottsdale, 2015. (PDF)

Y. Belinkov and J. Glass, "Arabic Diacritization with Recurrent Neural Networks," Proc. EMNLP, Lisbon, 2015. (PDF)

P. Cardinal, N. Dehak, Y. Zhang, and J. Glass, "Speaker Adaptation Using the I-Vector Technique for Bottleneck Features," Proc. Interspeech, Dresden, 2015. (PDF)

A. Lee and J. Glass, "Mispronunciation Detection without Nonnative Training Data," Proc. Interspeech, Dresden, 2015. (PDF)

C. Lee, T. O'Donnell, and J. Glass, "Unsupervised Lexicon Discovery from Acoustic Input," Trans. ACL, 3, 2015. (PDF)

L. Lee, J. Glass, H. Lee, and C. Chan, "Spoken Content ascading Speech Recognition with Text Retrieval," Trans. ASLP, 23(9), 2015. (PDF)

M. Walter, M. Antone, E. Chuangsuwanich, A. Correa, R. Davis, L. Fletcher, E. Frazzoli, Y. Friedman, J. Glass, J. How, J. Jeon, S. Karaman, B. Luders, N. Roy, S. Tellex, and S. Teller, "A Situationally Aware Voice-commandable Robotic Forklift Working Alongside People in Unstructured Outdoor Environments," J. Field Robotics, 32(4), 2015. (PDF)

Y. Belinkov, M. Mohtarami, S. Cyphers, and J. Glass, "VectorSLU: A Continuous Word Vector Approach to Answer Selection in Community Question Answering Systems," Proc. Int. Workshop on Semantic Evaluation, Denver, 2015. (PDF)

P. Nakov, L. Marquez, W. Magdy, A. Moschitti, J. Glass, and B. Randeree, "SemEval-2015 Task 3: Answer Selection in Community Question Answering," Proc. Int. Workshop on Semantic Evaluation, Denver, 2015. (PDF)

A. Alhunaim, M. Mohtarami, and J. Glass, "A Vector Space Approach for Aspect Based Sentiment Analysis," Proc. NAACL-HLT, Denver, 2015. (PDF)

C. Cai, P. Guo, J. Glass, and R. Miller, "Wait-Learning: Leveraging Wait Time for Second Language Education," Proc. CHI, Seoul, 2015. (PDF)

X. Feng, B. Richardson, S. Amman, and J. Glass, "On Using Heterogeneous Data for Vehicle-Based Speech Recognition: A DNN-Based Approach," Proc. ICASSP, Brisbane, 2015. (PDF)

M. Price, J. Glass, and A. Chandrakasen, "A 6mW, 5000-word Real-Time Speech Recognizer using WFST Models," IEEE J. Solid State Circuits, 50(1), 2015. (PDF)


M. Korpusik, N. Schmidt, J. Drexler, S. Cyphers, and J. Glass, "Data Collection and Language Understanding of Food Descriptions," IEEE SLT Workshop, South Lake Tahoe, 2014. (PDF)

A. Ali, Y. Zhang, P. Cardinal, N. Dahak, S. Vogel, and J. Glass, "A Complete Kaldi Recipe For Building Arabic Speech Recognition Systems," IEEE SLT Workshop, South Lake Tahoe, 2014. (PDF)

S. Shum, N. Dehak, and J. Glass, "Limited Labels for Unlimited Data: Active Learning for Speaker Recognition," Proc. Interspeech, Singapore, 2014. (PDF)

H. Lee, Y. Zhang, E. Chuangsuwanich, and J. Glass, "Graph-based Re-ranking using Acoustic Feature Similarity between Search Results for Spoken Term Detection on Low-resource Languages," Proc. Interspeech, Singapore, 2014. (PDF)

Y. Zhang, E. Chuangsuwanich, and J. Glass, "Language ID-based training of multilingual stacked bottleneck features," Proc. Interspeech, Singapore, 2014. (PDF)

T. Al Hanai and J. Glass, "Lexical Modeling for Arabic ASR: A Systematic Approach," Proc. Interspeech, Singapore, 2014. (PDF)

A. Lee and J. Glass, "Context-dependent Pronunciation Error Pattern Discovery with Limited Annotations," Proc. Interspeech, Singapore, 2014. (PDF)

D. Harwath and J. Glass, "Speech Recognition without a Lexicon - Bridging the Gap between Graphemic and Phonetic Systems," Proc. Interspeech, Singapore, 2014. (PDF)

P. Cardinal, A. Ali, N. Dehak, Y. Zhang, T. Al Hanai, Y. Zhang, J. Glass, and S. Vogel, "Recent Advances in ASR Applied to an Arabic Transcription System for Al-Jazeera," Proc. Interspeech, Singapore, 2014. (PDF)

I. Saleh, S. Joty, L. Marquez, S. Cyphers, J. Glass, A. Moschitti, and P. Nakov, "A Study of using Syntactic and Semantic Structures for Concept Segmentation and Labeling," Proc. Coling, 193-202, Dublin, 2014. (PDF)

B. Lake, C. Lee, J. Glass, and J. Tenenbaum, "One-Shot Learning of Generative Speech Concepts," Proc. CogSci, Quebec City, 2014. (PDF)

H. Bahari, N. Dehak, H. Van hamme, L. Burget, A. Ali, and J. Glass, "Non-Negative Factor Analysis for Gaussian Mixture Model Weight Adaptation for Language and Dialect Recognition," Trans. ASLP, 22(7), 1117-1129, 2014. (PDF)

Y. Zhang, E. Chuangsuwanich, and J. Glass, "Extracting Deep Neural Network Bottleneck Features using Low-Rank Matrix Factorization," Proc. ICASSP, Florence, 2014. (PDF)

X. Feng, Y. Zhang, and J. Glass, "Speech Feature Denoising and Dereverberation via Deep Autoencoders for Noisy Reverberant Speech Recognition," Proc. ICASSP, Florence, 2014. (PDF)

C. Cai, P. Guo, J. Glass, and R. Miller, "Wait Learning: Leveraging Conversational Dead Time for Second Language Education," Proc. CHI, Toronto, 2014. (PDF)

M. Price, J. Glass, and A. Chandrakasen, "A 6mW, 5000-word Real-Time Speech Recognizer using WFST Models," Proc. ISSCC, San Francisco, 2014. (PDF)


J. Liu, P. Pasupat, Y. Wang, S. Cyphers, and J. Glass, "Query Understanding Enhanced by Hierarchical Parsing Structures," Proc. ASRU, Olomouc, 2013. (PDF)

C. Lee, Y. Zhang, and J. Glass, "Joint Learning of Phonetic Units and Word Pronunciations for ASR," Proc. EMNLP, Seattle, 2013. (PDF)

W. Li, J. Glass, N. Roy, and S. Teller, "Probabilistic Dialogue Modeling for Speech-Enabled Assistive Technology," Proc. Speech and Language processing for Assistive Technologies Workshop, Grenoble, 2013. (PDF)

X. Fang, N. Dehak, and J. Glass, "Bayesian Distance Metric Learning on i-vector for Speaker Verification," Proc. Interspeech, Lyon, 2013. (PDF)

A. Lee and J. Glass, "Pronunciation Assessment via a Comparison-based System," Proc. Slate, Grenoble, 2013. (PDF)

S. Shum, N. Dehak, and J. Glass, "Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach," Trans. ASLP, 21(10), 2013. (PDF)

A. Lee, Y. Zhang, and J. Glass, "Mispronunciation Detection via Dynamic Time Warping on Deep Belief Network-Based Posteriorgrams," Proc. ICASSP, Vancouver, 2013. (student paper award) (PDF)

J. Liu, P. Pasupat, S. Cyphers, and J. Glass, "Asgard: A Portable Architecture for Multilingual Dialogue Systems," Proc. ICASSP, Vancouver, 2013. (PDF)

D. Harwath, T. Hazen, and J. Glass, "Zero Resource Spoken Audio Corpus Analysis," Proc. ICASSP, Vancouver, 2013. (PDF)

I. McGraw, I. Badr, and J. Glass, "Learning Lexicons from Speech using a Pronunciation Mixture Model," Trans. ASLP, 21(2), 357-366 , 2013. (PDF)


A. Lee and J. Glass, "A Comparison-Based Approach to Mispronunciation Detection," Proc. IEEE SLT Workshop, Miami, 2012. (PDF)

I. McGraw, S. Cyphers, P. Pasupat, J. Liu, and J. Glass, "Automating Crowd-supervised Learning for Spoken Language Systems," Proc. Interspeech, Portland, 2012. (PDF)

A. Lee and J. Glass, "Sentence Detection Using Multiple Annotations," Proc. Interspeech, Portland, 2012. (PDF)

J. Liu, S. Cyphers, P. Pasupat, I. McGraw, and J. Glass, "A Conversational Movie Search System Based on Conditional Random Fields," Proc. Interspeech, Portland, 2012. (PDF)

S. Shum, N. Dehak, and J. Glass, "On the Use of Spectral and Iterative Methods for Speaker Diarization," Proc. Interspeech, Portland, 2012. (PDF)

C. Lee and J. Glass, "A Nonparametric Bayesian Approach to Acoustic Model Discovery," Proc. ACL, Jeju, 2012. (PDF)

J. Glass, "Towards Unsupervised Speech Processing," Keynote, Proc. ISSPA, Montreal, 2012. (PDF)

H. Chang and J. Glass, "Evaluation of Multi-level Context-Dependent Acoustic Model for Large Vocabulary Speaker Adaptation Tasks," Proc. ICASSP, Kyoto, 2012. (PDF)

E. Chuangsuwanich, S. Watanabe, T. Hori, T. Iwata, and J. Glass, "Handling Uncertain Observations in Unsupervised Topic-Mixture Language Modeling Adaptation," Proc. ICASSP, Kyoto, 2012. (PDF)

Y. Zhang, K. Adl, and J. Glass, "Fast Spoken Query Detection Using Lower-Bound Dynamic Time Warping on Graphical Processing Units," Proc. ICASSP, Kyoto, 2012. (PDF)

Y. Zhang, R. Salakhutdinov, H. Chang, and J. Glass, "Resource Configurable Spoken Query Detection Using Deep Boltzmann Machines," Proc. ICASSP, Kyoto, 2012. (PDF)


H. Chang and J. Glass, "Multi-level Context-Dependent Acoustic Modeling for Automatic Speech Recognition," Proc. ASRU, Waikolao, 2011. (PDF)

I. Badr, I. McGraw, and J. Glass, "Pronunciation Learning from Continuous Speech," Proc. Interspeech, Florence, 2011. (PDF)

E. Chuangsuwanich and J. Glass, "Robust Voice Activity Detector for Real World Applications Using Harmonicity and Modulation Frequency," Proc. Interspeech, Florence, 2011. (PDF)

C. Lee and J. Glass, "A Transcription Task for Crowdsourcing with Automatic Quality Control," Proc. Interspeech, Florence, 2011. (PDF)

C. Lee, J. Glass, and O. Ghitza, "An Efferent-Inspired Auditory Model Front-End for Speech Recognition," Proc. Interspeech, Florence, 2011. (PDF)

I. McGraw, J. Glass, and S. Seneff, "Growing a Spoken Language Interface on Amazon Mechanical Turk," Proc. Interspeech, Florence, 2011. (PDF)

S. Shum, N. Dehak, E. Chuangsuwanich, D. Reynolds, and J. Glass, "Exploiting Intra-Conversation Variability for Speaker Diarization," Proc. Interspeech, Florence 2011. (PDF)

Y. Zhang and J. Glass, "A Piecewise Aggregate Approximation Lower-Bound Estimate for Posteriorgram-based Dynamic Time Warping," Proc. Interspeech, Florence, 2011. (PDF)

S. Roberts, B. Mehler, J. Orszulak, B. Reimer, J. Glass, and J. Coughlin, "An Evaluation of Age, Gender, and Technology Experience in User Performance and Impressions of a Multimodal Human-Machine Interface," Ind. Eng. Research Conf., Reno, 2011.

N. Dehak, Z. Karam, D. Reynolds, R. Dehak, W. Campbell, and J. Glass, "A Channel-Blind System for Speaker Verification," Proc. ICASSP, Prague, 2011. (PDF)

Y. Zhang and J. Glass, "An Inner-Product Lower-Bound Estimate for Dynamic Time Warping," Proc. ICASSP, Prague, 2011. (PDF)


S. Liu, S. Seneff, and J. Glass, "A Collective Data Generation Method for Speech Language Models," Proc. IEEE Workshop on Spoken Language Technology, 211-216, Berkeley, 2010. (PDF)

E. Chuangsuwanich, S. Cyphers, J. Glass, and S. Teller, "Spoken Command of Large Mobile Robots in Outdoor Environments," Proc. IEEE Workshop on Spoken Language Technology, Berkeley, 2010. (PDF)

I. Badr, I. McGraw, and J. Glass, "Learning New Word Pronunciations from Spoken Examples," Proc. Interspeech, 2294-2297, Makuhari, 2010. (PDF)

J. Ming, T. Hazen, and J. Glass, "Combining Missing-Feature Theory, Speech Enhancement, and Speaker-Dependent-Independent Modeling for Speech Separation," Computer, Speech, and Language, 24(1), 67-76, 2010. (PDF)

N. Dehak, R. Dehak, J. Glass, D. Reynolds, and P. Kenny, "Cosine Similarity Scoring without Score Normalization Techniques," Proc. Odyssey Speaker and Language Recognition Workshop, Brno, 2010. (PDF)

S. Shum, N. Dehak, R. Dehak, and J. Glass, "Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verfication," Proc. Odyssey Speaker and Language Recognition Workshop, Brno, 2010. (PDF)

I. McGraw, C.Y. Lee, L. Hetherington, S. Seneff and J. Glass, "Collecting Voices from the Cloud," Proc. International Conference on Language Resources and Evaluation, 1576-1583, Malta, 2010. (PDF)

S. Teller, M. Walter, M. Antone, A. Correa, R. Davis, L. Fletcher, E. Frazzoli, J. Glass, J. How, A. Huang, J. Jeon, S. Karaman, B. Luders, N. Roy, and T. Sainath, "A Voice-Commandable Robotic Forklift Working Alongside Humans in Minimally-Prepared Outdoor Environments," Proc. ICRA, Anchorage, 2010. (PDF)

Y. Zhang and J. Glass, "Towards Speaker-Independent Unsupervised Speech Pattern Discovery," Proc. ICASSP, 4366-4369, Dallas, 2010. (PDF)

A. Correa, M. Walter, L. Fletcher, J. Glass, S. Teller, and R. Davis, "Multimodal Interaction with an Autonomous Forklift," Proc. HRI, 243-250, Osaka, 2010. (PDF)


Y. Zhang and J. Glass, "Unsupervised Spoken Keyword Spotting via Segmental DTW on Gaussian Posteriorgrams," Proc. ASRU, 398-403, Merano, Dec. 2009. (PDF)

K. Saenko, K. Livescu, J. Glass, and T. Darrell, "Multistream Articulatory Feature-Based Models for Visual Speech Recognition," IEEE Trans. Pattern Anal. and Machine Int., 31(9), 1700-1707, 2009. (PDF)

H. Chang and J. Glass, "A Back-off Discriminative Acoustic Model for Automatic Speech Recognition," Proc. Interspeech, 232-235, Brighton, Sept. 2009. (PDF)

J. Baker, L. Deng, S. Khudanpur, C. Lee, J. Glass, N. Morgan, and D. O'Shaughnessy, "Updated MINDS Report on Speech Recognition and Understanding, Part 2," IEEE Signal Processing Magazine, 78-85, July 2009. (PDF)

J. Baker, L. Deng, J. Glass, S. Khudanpur, C. Lee, N. Morgan, and D. O'Shaughnessy, "Research Developments and Directions in Speech Recognition and Understanding, Part 1," IEEE Signal Processing Magazine, 75-80, May 2009. (PDF)

H. Chang and J. Glass, "Discriminative Training of Hierarchical Acoustic Models for Large Vocabulary Continuous Speech Recognition," Proc. ICASSP, 4481-4484, Taipei, April 2009. (PDF)

B. Hsu and J. Glass, "Language Model Parameter Estimation Using User Transcriptions," Proc. ICASSP, 4805-4808, Taipei, April 2009. (PDF)

K. Livescu, B. Zhu, and J. Glass, "On the Phonetic Information in Ultrasonic Microphone Signals," Proc. ICASSP, 4621-4624, Taipei, April 2009. (PDF)

Y. Zhang and J. Glass, "Speech Rhythm Guided Syllable Nuclei Detection," Proc. ICASSP, 3797-3800, Taipei, April 2009. (PDF)

A. Gruenstein, J. Orszulak, S. Liu, S. Roberts, J. Zabel, B. Reimer, B. Mehler, S. Seneff, J. Glass, J. Coughlin, "City Browser: Developing a Conversational Automotive HMI," Proc. CHI, 4291-4296, Boston, April 2009. (PDF)

I. Badr, R. Zbib, and J. Glass, "Syntactic Phrase Reordering for English-to-Arabic Statistical Machine Translation," Proc. EACL, 86-93, Athens, April 2009. (PDF)


B. Hsu and J. Glass, "N-gram Weighting: Reducing Training Data Mismatch in Cross-Domain Language Model Estimation," Proc. EMNLP, 828-837, Honolulu, 2008. (PDF)

B. Hsu and J. Glass, "Iterative Language Model Estimation: Efficient Data Structure & Algorithms," Proc. Interspeech, 841-844, Brisbane, 2008. (PDF)

I. Badr, R. Zbib, and J. Glass, "Segmentation for English-to-Arabic Statistical Machine Translation," Proc. ACL, 153-156, Columbus, 2008. (PDF)

A. Gruenstein, B. Hsu, J. Glass, S. Seneff, I. Hetherington, S. Cyphers, I. Badr, C. Wang, and S. Liu, "A Multimodal Home Entertainment Interface via a Mobile Device," Proc. ACL Workshop on Mobile Language Processing, 1-9, Columbus, 2008. (PDF)

G. Choueiter, M. Ohannessian, S. Seneff, and J. Glass, "A Turbo-Style Algorithm For Lexical Baseforms Estimation," Proc. ICASSP, 4313-4316, Las Vegas, April 2008. (PDF)

A. Park and J. Glass, "Unsupervised Pattern Discovery in Speech," Trans. ASLP, 16(1), 186-197, 2008. (PDF)


J. Ming, T. Hazen, and J. Glass, "Combining Missing-Feature Theory, Speech Enhancement, and Speaker-Dependent/-Independent Modeling for Speech Separation," Computer, Speech, and Language, (DOI), 2007. (PDF)

H. Chang and J. Glass, "Hierarchical Large-Margin Gaussian Mixture Models For Phonetic Classification," Proc. ASRU, 272-275, Kyoto, December 2007. (PDF)

G. Choueiter, S. Seneff, and J. Glass, "Automatic Lexical Pronunciations Generation and Update," Proc. ASRU, 225-228, Kyoto, December 2007. (PDF)

K. Schutte and J. Glass, "Speech Recognition with Localized Time-Frequency Pattern Detectors," Proc. ASRU, 341-344, Kyoto, December 2007. (PDF)

G. Choueiter, S. Seneff, and J. Glass, "New Word Acquisition Using SubWord Modeling,", Proc. Interspeech, 1765-1768, Antwerp, August 2007. (PDF)

J. Glass, T. Hazen, S. Cyphers, I. Malioutov, D. Huynh, and R. Barzilay, "Recent Progress in the MIT Spoken Lecture Processing Project," Proc. Interspeech, 2553-2556, Antwerp, August 2007. (PDF)

B. Zhu, T. Hazen, and J. Glass, "Multimodal Speech Recognition with Ultrasonic Sensors," Proc. Interspeech, 662-665, Antwerp, August 2007. (PDF)

J. Ming, T. Hazen, J. Glass, and D. Reynolds, "Robust Speaker Recognition in Unknown Noisy Conditions," Trans. ASLP, 15(5), 1711-1723, 2007. (PDF)

E. Weinstein, K. Steele, A. Agarwal and J. Glass, "Loud: A 1020-Node Microphone Array and Acoustic Beamformer," Proc. ICSV, 571-578, Cairns, July 2007. (PDF)

I. Malioutov, A. Park, R. Barzilay, and J. Glass, "Making Sense of Sound: Unsupervised Topic Segmentation Over Acoustic Input," Proc. ACL, 504-511, Prague, June 2007. (PDF)

S. Seneff, M. Adler, J. Glass, B. Sherry, T. Hazen, C. Wang, and T. Wu, "Exploiting Context Information in Spoken Dialogue Interaction with Mobile Devices," Proc. IMUX, Toronto, May 2007. (PDF)

T. Hori, L. Hetherington, T. Hazen, and J. Glass, "Open-Vocabulary Spoken Utterance Retrieval Using Confusion Networks," Proc. ICASSP, 73-76, Honolulu, April 2007. (PDF)

R. Rifkin, K. Schutte, M. Saad, J. Bouvrie, and J. Glass, "Noise Robust Phonetic Classification with Linear Regularized Least Squares and Second-Order Features," Proc. ICASSP, 881-884, Honolulu, April 2007. (PDF)

G. Choueiter and J. Glass, "An Implementation of Rational Wavelets and Filter Design for Phonetic Classification," Trans. ASLP, 15(3), 939-948, 2007. (PDF)


A. Park and J. Glass, "A Novel DTW-Based Distance Measure for Speaker Segmentation," Proc. SLT, 22-25, Aruba, December 2006. (PDF)

P. Hsu and J. Glass, "Spoken Correction for Chinese Text Entry," Proc. ISCSLP, 648-659, Singapore, December 2006. (PDF)

B. J. Hsu, and J. Glass, "Style and Topic Language Model Adaptation Using HMM-LDA," Proc. EMNLP, Sydney, July 2006. (PDF)

I. L. Hetherington, H. Shu, and J. Glass, "Flexible Multi-Stream Framework for Speech Recognition Using Multi-Tape Finite-State Transducers," Proc. ICASSP, Toulouse, May 2006. (PDF)

J. Ming, T. J. Hazen, and J. Glass, "Speaker Verification Over Handheld Devices with Realistic Noisy Speech Data," Proc. ICASSP, Toulouse, May 2006. (PDF)

A. Park and J. Glass, "Unsupervised Word Acquisition from Speech Using Pattern Discovery," Proc. ICASSP, Toulouse, May 2006. (PDF)


A. Park and J. Glass, "Towards Unsupervised Pattern Discovery in Speech," Proc. ASRU, 53-58, San Juan, December 2005. (PDF)

K. Schutte and J. Glass, "Robust Detection of Sonorant Landmarks," Proc. Interspeech, 1005-1008, Lisbon, September 2005. (PDF)

A. Park, T. Hazen, and J. Glass, "Automatic processing of audio lectures for information retrieval: Vocabulary selection and language modeling," Proc. ICASSP, Philadelphia, March 2005. (PDF)

G. Choueiter and J. Glass, "A wavelet and filter bank framework for phonetic classification," Proc. ICASSP, Philadelphia, PA, March 2005. (PDF)

K. Saenko, K. Livescu, J. Glass, and T. Darrell, "Production domain modeling of pronunciation for visual speech recognition," Proc. ICASSP, Philadelphia, March 2005. (PDF)

K. Saenko, T. Darrell, and J. Glass, "Articulatory features for robust visual speech recognition," Proc. ICMI, State College, October 2004. (PDF)

T. Hazen, K. Saenko, C. La and J. Glass, "A segment-based audio-visual speech recognizer: Data collection, development and initial experiments," Proc. ICMI, State College, October 2004. (PDF)

K. Livescu and J. Glass, "Feature-based pronunciation modeling with trainable asynchrony probabilities." Proc. ICSLP, Jeju, October 2004. (PDF)

K. Livescu and J. Glass, "Feature-based pronunciation modeling for speech recognition." Proc. HLT/NAACL, Boston, May 2004. (PDF)

J. Glass, T. Hazen, L. Hetherington and C. Wang, "Analysis and processing of lecture audio data: Preliminary investigations", In Proceedings of the HLT-NAACL 2004 Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval, 9-12, Boston, May 2004. (PDF)

J. Glass, E. Weinstein, S. Cyphers, J. Polifroni, G. Chung, and M. Nakano, "A Framework for Developing Conversational User Interfaces," Proc. CADUI, Madeira, 354-365, Portugal, January 2004. (PDF)


J. Glass, "A Probabilistic Framework for Segment-Based Speech Recognition," Computer Speech and Language, 17, 137-152, 2003. (PDF)

S. Sakai and J. Glass, "Fundamental frequency modeling for corpus-based speech synthesis based on a statistical learning technique," Proc. ASRU, St. Thomas, 712-717, December 2003. (PDF)

H. Shu, I. Lee Hetherington, and J. Glass, "Baum-Welch training for segment-based speech recognition," Proc. ASRU, St. Thomas, December 2003, 43-48. (PDF)

K. Livescu, J. Glass, and J. Bilmes, "Hidden feature models for speech recognition using dynamic Bayesian networks," Proc. Eurospeech Geneva, 2529-2532, September 2003. (PDF)

J. Glass and S. Seneff, "Flexible and personalizable mixed-initiative dialogue systems," Proc. HLT-NAACL Workshop on Research Directions in Dialogue Processing, Edmonton, May 2003.

I. Bazzi and J. Glass, "A multi-class approach for modelling out-of-vocabulary words," Proc. ICSLP, Denver, 1613-1616, September 2002.

J. Yi and J. Glass, "Information-theoretic criteria for unit selection synthesis," Proc. ICSLP, Denver, 2617-2620, September 2002. (PDF)

I. Bazzi and J. Glass, "Learning units for domain-independent out-of-vocabulary word modelling," Proc. Eurospeech, Aalborg, September 2001. (PDF)

J. Glass and E. Weinstein, "Speechbuilder: Facilitating spoken dialogue systems development," Proc. Eurospeech, Aalborg, September 2001. (PDF)

M. Nakano, T. Minami, S. Seneff, T. J. Hazen, D. Scott Cyphers, J. Glass, J. Polifroni, V. Zue, "Mokusei: A telephone-based Japanese conversational system in the weather domain," Proc. Eurospeech, Aalborg, September 2001. (PDF)

K. Livescu and J. Glass, "Segment-based recognition on the Phonebook task: Initial results and observations on duration modeling," Proc. Eurospeech, Aalborg, September 2001. (PDF)

I. Bazzi and J. Glass, "Modeling out-of-vocabulary words for robust speech recognition" Proc. ICSLP, Beijing, October 2000. (PDF)

J. Glass, J. Polifroni, S. Seneff and V. Zue, "Data collection and performance evaluation of spoken dialogue systems: The MIT experience," Proc. ICSLP, Beijing, October 2000. (PDF)

J. Yi, J. Glass and L. Hetherington, "A flexible, scalable finite-state transducer architecture for corpus-based concatenative speech synthesis," Proc. ICSLP, Beijing, October 2000. (PDF)

V. Zue and J. Glass, "Conversational Interfaces: Advances and Challenges" Proc. IEEE, Special Issue on Spoken Language Processing, 88, August 2000. (PDF)

I. Bazzi and J. Glass, "Heterogeneous lexical units for automatic speech recognition: Preliminary investigations" Proc. ICASSP, Istanbul, June 2000.

K. Livescu and J. Glass, "Lexical modeling of non-native speech for automatic speech recognition," Proc. ICASSP, Istanbul, June 2000.

S. Seneff, J. Glass, T.J. Hazen, Y. Minami, J. Polifroni, and V. Zue, "Mokusei: A Japanese spoken dialogue system in the weather domain," NTT R&D Vol. 49, No. 7, 2000.

V. Zue, S. Seneff, J. Glass, J. Polifroni, C. Pao, T. Hazen, and L. Hetherington, "Jupiter: A Telephone-Based Conversational Interface for Weather Information," Trans. SAP, 8(1), 85-96, 2000.

J. Glass, "Challenges for spoken dialogue systems," Proc. ASRU, Keystone, December 1999. (PDF)

N. Ström, L. Hetherington, T.J. Hazen, E. Sandness and J. Glass, "Acoustic modeling improvements in a segment-based speech recognizer," Proc. ASRU, Keystone, December 1999. (PDF)

J. Glass, T.J. Hazen and L. Hetherington, "Real-time telephone-based speech recognition in the Jupiter domain," Proc. ICASSP, Phoenix, March 1999. (PDF)

J. Glass and T.J. Hazen, "Telephone-based conversational speech recognition in the Jupiter domain, " Proc. ICSLP, Sydney, November 1998. (PDF)

A. Halberstadt and J. Glass, "Heterogeneous measurements and multiple classifiers for speech recognition," Proc. ICSLP, Sydney, November 1998. (PDF)

S. Lee, and J. Glass, "Real-time probabilistic segmentation for segment-based speech recogntion," Proc. ICSLP, Sydney, November 1998. (PDF)

C. Pao, P. Schmid, and J. Glass, "Confidence scoring for speech understanding systems," Proc. ICSLP, Sydney, November 1998. (PDF)

J. Yi and J. Glass, "Natural-sounding speech synthesis using variable-length units," Proc. ICSLP, Sydney, November 1998. (PDF)

J. Polifroni, S. Seneff, J. Glass, and T.J. Hazen, "Evaluation methodology for a telephone-based conversational system," Proc. LREC, 42-50, Granada, May 1998.

J. Chang and J. Glass, "Segmentation and Modeling in Segment-Based Recognition," Proc. Eurospeech, 1199-1202, Rhodes, Sept. 1997. (PDF)

A. Halberstadt and J. Glass, "Heterogeneous Acoustic Measurements for Phonetic Classification," Proc. Eurospeech, 401-404, Rhodes, Sept. 1997. (PDF)

T. Hazen and J. Glass, "A Comparison of Novel Techniques for Instantaneous Speaker Adaptation," Proc. Eurospeech, 2047-2050, Rhodes, Sept. 1997. (PDF)

Some Older Publications

J. Glass, G. Flammia, D. Goodine, M. Phillips, J. Polifroni, S. Sakai, S. Seneff, and V. Zue, "Multilingual Spoken-language Understanding in the MIT Voyager System," Speech Communication, 17(1-2), 1-18, 1995.

V. Zue, S. Seneff, J. Polifroni, M. Phillips, C. Pao, D. Goodine, D. Goddeau, and J. Glass, "Pegasus: A Spoken Dialogue Interface for Online Travel Planning," Speech Communication, 15, 331-340, 1994.

V. Zue, S. Seneff, and J. Glass, "Speech Database Development at MIT: TIMIT and Beyond," Speech Communication, 9, 351-356, 1990.