Cheng-I Lai

Cheng-I Jeff Lai

clai24 at mit dot edu

Hello! I am a 4th year Ph.D. student in computer science at MIT Computer Science and Artificial Intelligence Laboratory, advised by James Glass in the Spoken Language Systems Group. My long-term professional goal is to democratize advanced speech technologies to under-explored domains, languages, and users. My current research interest is on self-supervised learning, audio-visual learning, and their applications in speech. Specifically, I think a lot about:

Grounded Language Acquisition
Speech Generation via Self-Supervision

I worked on the following topics in the past:

Sparse Speech Processing
Self-Supervised Learning in Speech
Speaker-Adaptive Speech Synthesis

Previously, I completed my B.S. in electrical engineering from Johns Hopkins University advised by Najim Dehak and Jesús Villalba. I also have long-term collaborations with Yang Zhang and David Cox at MIT-IBM Watson AI Lab on low-resource language learning, and with Erica Cooper and Junichi Yamagishi at National Institute of Inforamtics on speech synthesis. Outside of school, I have spent several summers interning at research labs in academia and indsutry: University of Edinburgh, National Institute of Inforamtics, Amazon AWS, MIT-IBM Watson AI Lab, and Meta Fundamental AI Research (FAIR).

If you have questions or are interested in my work, please reach me at my email (clai24 at mit dot edu). I am always open to collaborations!


2018 Summer	2015–2018	2019 Summer	2020 Summer	2021 Summer	2022 Summer	2023 Summer	2019–Now

[ Google Scholar | CV | GitHub | Medium Blog | Videos | LinkedIn | Twitter ]

Recent News

(Summer 2022) I spent a summer Meta AI (FAIR accel), working on multi-modal word discovery for textless direct speech-to-speech translation (poster).
(Summer 2022) ContentVec (speaker disentanglement of Hubert representation) is accepted at ICML 2022, and S3-Router (improved version of PARP) is accepted at NeurIPS 2022.
(May 2022) A MIT News article describing our recent work on cross-modal discrete representation learning.
(April 2022) I gave a guest lecture for MIT's speech processing class (6.345) on the SUPERB benchmark and sparsity in speech.
(March 2022) Our recent work SSAST was presented at AAAI 2022, TTS-Pruning was accepted at ICASSP 2022, and Cross-Modal VQ and SUPERB-SG was accepted at ACL 2022.
(Fall 2021) PARP will appear at NeurIPS as Spotlight presentation! Code and pretrained models coming soon. A short presentation is available here, and a short article in MIT News is available here. Give it a try with our colab demo.
(November 2020) Motivated by Nelson Liu's blog post, I also put my PhD Statement of Purpose online for those interested!

Featured Publications * indicates equal contribution

Audio-Visual Neural Syntax Acquisition
Cheng-I Jeff Lai*, Freda Shi*, Puyuan Peng*, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David Cox, David Harwath, Yang Zhang, Karen Livescu, James Glass
ASRU, 2023

PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition
Cheng-I Jeff Lai, Yang Zhang*, Alexander H. Liu*, Shiyu Chang*, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David Cox, James Glass
NeurIPS, 2021 (Spotlight 3%)
arxiv / project page / colab demo / OpenReview / 4 min presentation / 15 min presentation / poster / MIT News / IEEE Spectrum / ACM Tech News / 知乎

More Publications * indicates equal contribution

Instruction-Following Speech Recognition
Cheng-I Jeff Lai, Zhiyun Lu, Liangliang Cao, Ruoming Pang
preprint, 2023
arxiv

Cascading and Direct Approaches to Unsupervised Constituency Parsing on Spoken Sentences
Yuan Tseng, Cheng-I Jeff Lai, Hung-Yi Lee
ICASSP, 2023
arxiv

Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Yonggan Fu, Yang Zhang, Kaizhi Qian, Zhifan Ye, Zhongzhi Yu, Cheng-I Jeff Lai, Yingyan Lin
NeurIPS, 2022
arxiv

Simple and Effective Unsupervised Speech Synthesis
Alexander H. Liu*, Cheng-I Jeff Lai*, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James Glass
Interspeech, 2022
NAACL Student Research Workshop, 2022
arxiv / listening samples / poster

ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers
Kaizhi Qian*, Yang Zhang*, Heting Gao, Junrui Ni, Cheng-I Jeff Lai, David Cox, Mark Hasegawa-Johnson, Shiyu Chang
ICML, 2022
arxiv / code / 5 min presentation

Cross-Modal Discrete Representation Learning
Alexander H. Liu, SouYoung Jin, Cheng-I Jeff Lai, Andrew Rouditchenko, Aude Oliva, James Glass
ACL, 2022 (Oral)
arxiv / MIT News

SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Jeff Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-Yi Lee
ACL, 2022
arxiv / code

On the Interplay between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis
Cheng-I Jeff Lai, Erica Cooper*, Yang Zhang*, Shiyu Chang, Kaizhi Qian, Yi-Lun Liao, Yung-Sung Chuang, Alexander H. Liu, Junichi Yamagishi, David Cox, James Glass
ICASSP, 2022
arxiv / project page / listening samples / 15 min presentation

SSAST: Self-Supervised Audio Spectrogram Transformer
Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, James Glass
AAAI, 2022
arxiv / code

SUPERB: Speech processing Universal PERformance Benchmark
Shu-wen Yang, Po-Han Chi*, Yung-Sung Chuang*, Cheng-I Jeff Lai*, Kushal Lakhotia*, Yist Y. Lin*, Andy T. Liu*, Jiatong Shi*, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-Yi Lee
Interspeech, 2021
arxiv / code / website

Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining
Cheng-I Lai, Yung-Sung Chuang, Hung-Yi Lee, Shang-Wen Li, James Glass
ICASSP, 2021
arxiv / code / 15 min presentation / 1 hr presentation

Towards Semi-Supervised Semantics Understanding from Speech
Cheng-I Lai, Jin Cao, Sravan Bodapati, Shang-Wen Li
NeurIPS workshop on Self-Supervised Learning for Speech and Audio Processing, 2020
arxiv / media

Conditioned Natural Language Generation using only Unconditioned Language Model: An Exploration
Fan-Keng Sun*, Cheng-I Lai*
Technical Report, 2020
arxiv

Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?
Erica Cooper*, Cheng-I Lai*, Yusuke Yasuda, Junichi Yamagishi
Interspeech, 2020
arxiv / 15 min presentation / listening samples

Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction
Yi Zhao, Haoyu Li, Cheng-I Lai, Jennifer Williams, Erica Cooper, Junichi Yamagishi
Interspeech, 2020
arxiv / code / 15 min presentation / listening samples

Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings
Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Fuming Fang, Xin Wang, Nanxin Chen, Junichi Yamagishi
ICASSP, 2020 (Oral)
arxiv / code / 15 min presentation / slides / listening samples / dataset

ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks
Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak
Interspeech, 2019
arxiv / code / ASVspoof 2019 website / dataset

Controlling the Reading Level of MachineTranslation Output
Kelly Marchisio, Jialiang Guo, Cheng-I Lai, Philipp Koehn
MT Summit, 2019
arxiv / slides / dataset

Attentive Filtering Networks for Audio Replay Attack Detection
Cheng-I Lai, Alberto Abad, Korin Richmond, Junichi Yamagishi, Najim Dehak, Simon King
ICASSP, 2019
arxiv / code / ASVspoof 2017 website / dataset

Investigation on Bandwidth Extension for Speaker Recognition
Phani Nidadavolu, Cheng-I Lai, Jesús Villalba, Najim Dehak
Interspeech, 2018
arxiv / dataset

Thesis

Finding Sparse Subnetworks in Self-Supervised Speech Recognition and Speech Synthesis
Cheng-I Jeff Lai
SM Thesis, 2022
dspace

Contrastive Predictive Coding Based Feature for Automatic Speaker Verification
Cheng-I Lai
Bachelor Thesis, 2018
arxiv / code / dataset

Talks

The SUPERB benchmark [Slides]

Making Machines Understand Uncommon Spoken Languages [Event Page, Slides, Recordings]

Finding Sparse Subnetworks for Self-Supervised Speech Recognition and Speech Synthesis [Slides]

Semi-Supervised Trainings for Semantics Understanding from Speech [Slides, Recordings]

Deep Learning Frameworks for Spoofing Detection and Speaker Representation [Slides]

Deep Learning in Artificial Intelligence

Attentive Filtering Network for Audio Replay Attacks Detection [Slides]

Services

Program Committee: AAAI 2022 SAS, ACL 2021 MetaNLP, NeurIPS 2020 SAS
Reviewer: NAACL, ACL, EMNLP, EACL, ICML, NeurIPS, SLT, IEEE/ACM TASLP, Computer Speech & Language

Selected Awards

Merrill Lynch Fellowship, Department of EECS, MIT (2019-2020)
Departmental and General Honors, JHU (2019)
Vredenburg Scholarship, JHU (2018)

website template