Andrew Rouditchenko

email: roudi AT mit.edu

Google Scholar | CV available upon request

I am a PhD student at MIT CSAIL in the Spoken Language Systems Group, advised by Dr. Jim Glass. I graduated from MIT in 2021 with my M.Eng. in EECS, advised by Dr. Jim Glass and Professor David Harwath. I graduated from MIT in 2019 with my S.B. in EECS, and I worked with Professor Antonio Torralba and Professor Josh McDermott.


PREPRINTS

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogerio Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James Glass
Paper
Contrastive Audio-Visual Masked Autoencoder
Yuan Gong, Andrew Rouditchenko, Alexander Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James Glass
Paper

CONFERENCE PUBLICATIONS

Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Hilde Kuehne
Conference on Computer Vision and Pattern Recognition (CVPR) 2022
Paper
Cross-Modal Discrete Representation Learning
Alexander H. Liu, SouYoung Jin, Cheng-I Jeff Lai, Andrew Rouditchenko, Aude Oliva, James Glass
Annual Meeting of the Association for Computational Linguistics (ACL) 2022
Paper
Multimodal Clustering Networks for Self-Supervised Learning from Unlabeled Videos
Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie Boggust, Rameswar Panda, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Michael Picheny, Shih-Fu Chang
International Conference on Computer Vision (ICCV) 2021
Paper
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko*, Angie Boggust*, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James Glass
Interspeech 2021
Project website with code, data, models, and demo
Paper
Video Presentation (YouTube)
Cascaded Multilingual Audio-Visual Learning from Videos
Andrew Rouditchenko, Angie Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, James Glass
Interspeech 2021
Project website with code, data, and models
Paper
Video Presentation (YouTube)
Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Ian Palmer, Andrew Rouditchenko, Andrei Barbu, Boris Katz, James Glass
Interspeech 2021
Project website with code and data
Paper (the ArXiv version contains additional experiments on the test set)
Video Presentation (YouTube)
Self-Supervised Audio-Visual Co-Segmentation
Andrew Rouditchenko*, Hang Zhao*, Chuang Gan, Josh McDermott, Antonio Torralba
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
ICASSP Paper
CVPR Multi-Modal Learning from Videos Workshop Paper
The Sound of Pixels
Hang Zhao, Chuang Gan, Andrew Rouditchenko, Carl Vondrick, Josh McDermott, Antonio Torralba
European Conference on Computer Vision (ECCV) 2018
Project website with code and dataset
Paper
MIT News article

CONFERENCE WORKSHOP PAPERS

Label-Efficient Audio Classification Through Multitask Learning and Self-Supervision
Tyler Lee, Ting Gong, Suchismita Padhy, Andrew Rouditchenko, Anthony Ndirango
ICLR Learning from Limited Labeled Data Workshop 2019
Paper

Accessibility. Website source code credit: Jon Barron.