Heng-Jui Chang

Heng-Jui Chang (pronunciation)

I'm a Ph.D. candidate at MIT in the Spoken Language Systems Group, advised by Dr. James Glass. I received a Master's in Computer Science from MIT and a Bachelor's in Electrical Engineering from National Taiwan University, where I worked with Prof. Lin-shan Lee and Prof. Hung-yi Lee. My research focuses on audio representation learning, multimodal LLMs, and model efficiency.
Email: hengjui [at] mit.edu


2022–Present	Summers 2023–2025	2017–2021

News

(Feb 2026) My paper (PE-AV) done during an internship at Meta was accepted to CVPR 2026.
(Aug 2025) My paper (USAD) was accepted to ASRU 2025.
(May 2025) My paper (DC-Spin) done during an internship at Meta was accepted to Interspeech 2025.
(Apr 2024) I received the IEEE Ganesh N. Ramaswamy Memorial Student Grant for my ICASSP paper.
(Mar 2024) My paper (R-Spin) was accepted to NAACL 2024.
(Feb 2024) I have obtained my Master of Science degree in EECS from MIT.
(Dec 2023) My paper (CoLLD) done during an internship at Meta was accepted to ICASSP 2024.

Selected Publications (full list)

Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning

Apoorv Vyas, Heng-Jui Chang, Cheng-Fu Yang, Po-Yao Huang, Luya Gao, Julius Richter, Sanyuan Chen, Matt Le, Piotr Dollár, Christoph Feichtenhofer, Ann Lee, Wei-Ning Hsu

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

arxiv / huggingface / github / blog

USAD: Universal Speech and Audio Representation via Distillation

Heng-Jui Chang, Saurabhchand Bhati, James Glass, Alexander H. Liu

IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2025

arxiv / huggingface

DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models

Heng-Jui Chang, Hongyu Gong, Changhan Wang, James Glass, Yu-An Chung

Interspeech 2025

arxiv / isca

R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces

Heng-Jui Chang, James Glass

Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) 2024

arxiv / acl anthology / code

CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders

Heng-Jui Chang, Ning Dong, Ruslan Mavlyutov, Sravya Popuri, Yu-An Chung

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024

IEEE Ganesh N. Ramaswamy Memorial Student Grant

arxiv / ieee xplore / blog

A Large-Scale Evaluation of Speech Foundation Models

Shu-wen Yang, Heng-Jui Chang*, Zili Huang*, Andy T. Liu*, Cheng-I Lai*, Haibin Wu*, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Abdelrahman Mohamed, Shang-Wen Li, Shinji Watanabe, Hung-yi Lee

IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP) 2024

arxiv / ieee xplore

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

Alexander H. Liu, Heng-Jui Chang, Michael Auli, Wei-Ning Hsu, James Glass

Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS) 2023

arxiv / openreview / neurips proceedings / code

Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering

Heng-Jui Chang, Alexander H. Liu, James Glass

Interspeech 2023

arxiv / isca / code

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model

Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Layne Berry, Hung-yi Lee, David Harwath

IEEE Spoken Language Technology Workshop (SLT) 2022

arxiv / ieee xplore / code

SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities

Hsiang-Sheng Tsai*, Heng-Jui Chang*, Wen-Chin Huang*, Zili Huang*, Kushal Lakhotia*, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee

Annual Meeting of the Association for Computational Linguistics (ACL) 2022

arxiv / acl anthology / code / website

DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT

Heng-Jui Chang, Shu-wen Yang, Hung-yi Lee

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022

arxiv / ieee xplore / code / poster / huggingface / video