Cheng-I Jeff Lai

clai24 at mit dot edu

Hello! I am a 4th year Ph.D. student in computer science at MIT Computer Science and Artificial Intelligence Laboratory, advised by James Glass in the Spoken Language Systems Group. My long-term professional goal is to democratize advanced speech technologies to under-explored domains, languages, and users. My current research interest is on self-supervised learning, audio-visual learning, and their applications in speech. Specifically, I think a lot about:

  • Grounded Language Acquisition
  • Discovering grammar, words, phones from speech with distant supervision.
  • Speech Generation via Self-Supervision
  • Leveraging self-supervised representations for speech translation or modeling.

I worked on the following topics in the past:
  • Sparse Speech Processing
  • Reduced architectural complexity via pruning for large speech models.
  • Self-Supervised Learning in Speech
  • Designed self-supervised representations for different speech/audio tasks.
  • Speaker-Adaptive Speech Synthesis
  • Improved speaker similarity in zero-shot/few-shot speech synthesis.

Previously, I completed my B.S. in electrical engineering from Johns Hopkins University advised by Najim Dehak and Jesús Villalba. I also have long-term collaborations with Yang Zhang and David Cox at MIT-IBM Watson AI Lab on low-resource language learning, and with Erica Cooper and Junichi Yamagishi at National Institute of Inforamtics on speech synthesis. Outside of school, I have spent several summers interning at research labs in academia and indsutry: University of Edinburgh, National Institute of Inforamtics, Amazon AWS, MIT-IBM Watson AI Lab, and Meta Fundamental AI Research (FAIR).

If you have questions or are interested in my work, please reach me at my email (clai24 at mit dot edu). I am always open to collaborations!

2018 Summer 2015–2018 2019 Summer 2020 Summer 2021 Summer 2022 Summer 2023 Summer 2019–Now

[ Google Scholar  |  CV  |  GitHub  |  Medium Blog  |  Videos  |  LinkedIn  |  Twitter ]

Recent News

  • (Summer 2022) I spent a summer Meta AI (FAIR accel), working on multi-modal word discovery for textless direct speech-to-speech translation (poster).
  • (Summer 2022) ContentVec (speaker disentanglement of Hubert representation) is accepted at ICML 2022, and S3-Router (improved version of PARP) is accepted at NeurIPS 2022.
  • (May 2022) A MIT News article describing our recent work on cross-modal discrete representation learning.
  • (April 2022) I gave a guest lecture for MIT's speech processing class (6.345) on the SUPERB benchmark and sparsity in speech.
  • (March 2022) Our recent work SSAST was presented at AAAI 2022, TTS-Pruning was accepted at ICASSP 2022, and Cross-Modal VQ and SUPERB-SG was accepted at ACL 2022.
  • (Fall 2021) PARP will appear at NeurIPS as Spotlight presentation! Code and pretrained models coming soon. A short presentation is available here, and a short article in MIT News is available here. Give it a try with our colab demo.
  • (November 2020) Motivated by Nelson Liu's blog post, I also put my PhD Statement of Purpose online for those interested!

Featured Publications * indicates equal contribution
More Publications * indicates equal contribution
Instruction-Following Speech Recognition
Cheng-I Jeff Lai, Zhiyun Lu, Liangliang Cao, Ruoming Pang
preprint, 2023
arxiv
Cascading and Direct Approaches to Unsupervised Constituency Parsing on Spoken Sentences
Yuan Tseng, Cheng-I Jeff Lai, Hung-Yi Lee
ICASSP, 2023
arxiv
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Yonggan Fu, Yang Zhang, Kaizhi Qian, Zhifan Ye, Zhongzhi Yu, Cheng-I Jeff Lai, Yingyan Lin
NeurIPS, 2022
arxiv
Simple and Effective Unsupervised Speech Synthesis
Alexander H. Liu*, Cheng-I Jeff Lai*, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James Glass
Interspeech, 2022
NAACL Student Research Workshop, 2022
arxiv / listening samples / poster
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers
Kaizhi Qian*, Yang Zhang*, Heting Gao, Junrui Ni, Cheng-I Jeff Lai, David Cox, Mark Hasegawa-Johnson, Shiyu Chang
ICML, 2022
arxiv / code / 5 min presentation
Cross-Modal Discrete Representation Learning
Alexander H. Liu, SouYoung Jin, Cheng-I Jeff Lai, Andrew Rouditchenko, Aude Oliva, James Glass
ACL, 2022 (Oral)
arxiv / MIT News
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Jeff Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-Yi Lee
ACL, 2022
arxiv / code
On the Interplay between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis
Cheng-I Jeff Lai, Erica Cooper*, Yang Zhang*, Shiyu Chang, Kaizhi Qian, Yi-Lun Liao, Yung-Sung Chuang, Alexander H. Liu, Junichi Yamagishi, David Cox, James Glass
ICASSP, 2022
arxiv / project page / listening samples / 15 min presentation
SSAST: Self-Supervised Audio Spectrogram Transformer
Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, James Glass
AAAI, 2022
arxiv / code
SUPERB: Speech processing Universal PERformance Benchmark
Shu-wen Yang, Po-Han Chi*, Yung-Sung Chuang*, Cheng-I Jeff Lai*, Kushal Lakhotia*, Yist Y. Lin*, Andy T. Liu*, Jiatong Shi*, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-Yi Lee
Interspeech, 2021
arxiv / code / website
Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining
Cheng-I Lai, Yung-Sung Chuang, Hung-Yi Lee, Shang-Wen Li, James Glass
ICASSP, 2021
arxiv / code / 15 min presentation / 1 hr presentation
Towards Semi-Supervised Semantics Understanding from Speech
Cheng-I Lai, Jin Cao, Sravan Bodapati, Shang-Wen Li
NeurIPS workshop on Self-Supervised Learning for Speech and Audio Processing, 2020
arxiv / media
Conditioned Natural Language Generation using only Unconditioned Language Model: An Exploration
Fan-Keng Sun*, Cheng-I Lai*
Technical Report, 2020
arxiv
Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?
Erica Cooper*, Cheng-I Lai*, Yusuke Yasuda, Junichi Yamagishi
Interspeech, 2020
arxiv / 15 min presentation / listening samples
Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction
Yi Zhao, Haoyu Li, Cheng-I Lai, Jennifer Williams, Erica Cooper, Junichi Yamagishi
Interspeech, 2020
arxiv / code / 15 min presentation / listening samples
Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings
Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Fuming Fang, Xin Wang, Nanxin Chen, Junichi Yamagishi
ICASSP, 2020 (Oral)
arxiv / code / 15 min presentation / slides / listening samples / dataset
ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks
Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak
Interspeech, 2019
arxiv / code / ASVspoof 2019 website / dataset
Controlling the Reading Level of MachineTranslation Output
Kelly Marchisio, Jialiang Guo, Cheng-I Lai, Philipp Koehn
MT Summit, 2019
arxiv / slides / dataset
Attentive Filtering Networks for Audio Replay Attack Detection
Cheng-I Lai, Alberto Abad, Korin Richmond, Junichi Yamagishi, Najim Dehak, Simon King
ICASSP, 2019
arxiv / code / ASVspoof 2017 website / dataset
Investigation on Bandwidth Extension for Speaker Recognition
Phani Nidadavolu, Cheng-I Lai, Jesús Villalba, Najim Dehak
Interspeech, 2018
arxiv / dataset
Thesis
Finding Sparse Subnetworks in Self-Supervised Speech Recognition and Speech Synthesis
Cheng-I Jeff Lai
SM Thesis, 2022
dspace
Contrastive Predictive Coding Based Feature for Automatic Speaker Verification
Cheng-I Lai
Bachelor Thesis, 2018
arxiv / code / dataset
Talks

  • The SUPERB benchmark [Slides]
  • MIT 6.345 guest lecture (April 2022).
  • Making Machines Understand Uncommon Spoken Languages [Event Page, Slides, Recordings]
  • MIT ROCSA 5x5 talk (April 2022).
    MIT Horizon (January 2022).
  • Finding Sparse Subnetworks for Self-Supervised Speech Recognition and Speech Synthesis [Slides]
  • MIT 6.345 guest lecture (April 2022).
    Georgia Institute of Technology (December 2021).
    A*STAR, Singapore (November 2021).
    ASAPP, New York (November 2021).
    National Institute of Informatics, Japan (November 2021).
    MIT Embodied Intelligence student seminar (October 2021).
    MIT-IBM 5k language learning seminar (September 2021).
  • Semi-Supervised Trainings for Semantics Understanding from Speech [Slides, Recordings]
  • National Institute of Informatics, Japan (February 2021).
    Biometrics Research Laboratory, NEC Corporation, Japan (January 2021).
    JHU speech reading group (November 2020).
    MIT Spoken Language Systems group (October 2020).
    Amazon Web Services, Lex (July & August 2020).
  • Deep Learning Frameworks for Spoofing Detection and Speaker Representation [Slides]
  • Biometrics Research Laboratory, NEC Corporation, Japan (July 2019).
    National Institute of Informatics, Japan (July 2019).
  • Deep Learning in Artificial Intelligence
  • College of Science and Technology, Nanhua University, Taiwan (June 2019).
    College of Chinese Medicine, China Medical University, Taiwan (May 2019).
  • Attentive Filtering Network for Audio Replay Attacks Detection [Slides]
  • Gulf Coast Undergraduate Research Symposium, Rice University (October 2018).
    Center for Language and Speech Processing, Johns Hopkins University (October 2018).
    Centre for Speech Technology Research, University of Edinburgh (August 2018).

Services

Selected Awards

  • Merrill Lynch Fellowship, Department of EECS, MIT (2019-2020)
  • Departmental and General Honors, JHU (2019)
  • Vredenburg Scholarship, JHU (2018)


website template