Chuang Gan;

Chuang Gan


I am a principal research staff member at MIT-IBM Watson AI Lab. I am also a visiting research scientist at MIT, working closely with Prof. Antonio Torralba and Prof. Josh Tenenbaum. Before that, I completed my PhD with the highest honor at Tsinghua University, where I was supervised by Prof. Andrew Chi-Chih Yao. I primarily focus on video understanding, including representation learning, neural-symbolic visual reasoning, audio-visual scene analysis and skill learning. My research works have been recognized by Microsoft Fellowship, Baidu Fellowship, and media coverage from CNN, BBC, The New York Times, WIRED, Forbes, and MIT Tech Review.


During my PhD, I was fortunate enough to work with:

 

Google Scholar | Contact | News | Publications | Competitions | Software | Talks | Honors | Accessibility

 


Email: ganchuang [at] csail (dot) mit (dot) edu


News

  • I am looking for research interns with strong background on RL, Embodied AI, Robotics, Visual Reasoning or Video Understanding.
  • Code, dataset and evaluation server of Video CLEVRER have been released.

    Publications(by date / by topic)

    2020

    Foley Music: Learning to Generate Music from Videos

    Chuang Gan, Deng Huang, Peihao Chen, Joshua B. Tenenbaum, Antonio Torralba

    ECCV 2020

    Music Gesture for Visual Sound Separation

    Chuang Gan, Deng Huang, Hang Zhao, Joshua B. Tenenbaum, Antonio Torralba

    CVPR 2020

    Dense Regression Network For Video Grounding

    Runhao Zeng, Haoming Xu, Wenbing Huang, Peihao Chen, Mingkui Tan, Chuang Gan

    CVPR 2020

    CLEVRER: CoLlision Events for Video REpresentation and Reasoning

    Kexin Yi*, Chuang Gan*, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum

    ICLR 2020 (Oral Spotlight)

    Deep Audio Priors Emerge From Harmonic Convolutional Networks

    Zhoutong Zhang, Yunyun Wang, Chuang Gan, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman

    ICLR 2020

    Once for All: Train One Network and Specialize it for Efficient Deployment

    Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han

    ICLR 2020

    Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

    Chuang Gan*, Yiwei Zhang*, Jiajun Wu, Boqing Gong, Joshua B. Tenenbaum

    ICRA 2020

    2019

    Self-supervised Moving Vehicle Tracking with Stereo Sound

    Chuang Gan, Hang Zhao, Peihao Chen, David Cox, Antonio Torralba

    ICCV 2019

    The Sound of Motions

    Hang Zhao, Chuang Gan, Wei-Chiu Ma, Antonio Torralba

    ICCV 2019

    TSM: Temporal Shift Module for Efficient Video Understanding

    Ji Lin, Chuang Gan, Song Han

    ICCV 2019

    Graph Convolutional Networks for Temporal Action Localization

    Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, Chuang Gan

    ICCV 2019

    Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement

    Chao Yang, Xiaojian Ma, Wenbing Huang, Fuchun Sun, Huaping Liu, Junzhou Huang, Chuang Gan

    NeurIPS 2019 (Spotlight)

    Visual Concept-Metaconcept Learning

    Chi Han, Jiayuan Mao, Chuang Gan, Josh Tenenbaum, Jiajun Wu

    NeurIPS 2019

    Cross-channel Communication Networks

    Jianwei Yang, Zhile Ren, Chuang Gan, Hongyuan Zhu, Devi Parikh

    NeurIPS 2019

    The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision

    Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B. Tenenbaum, Jiajun Wu

    ICLR 2019 (Oral)

    Defensive quantization: When efficiency meets robustness

    Ji Lin, Chuang Gan, Song Han

    ICLR 2019

    2018

    Weakly Supervised Dense Event Captioning in Videos

    Xuguang Duan, Wenbing Huang, Chuang Gan, Jingdong Wang, Wenwu Zhu, Junzhou Huang

    NeurIPS 2018

    Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

    Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, Joshua B. Tenenbaum

    NeurIPS 2018 (Spotlight)

    The Sound of Pixels

    Hang Zhao, Chuang Gan, Andrew Rouditchenko, Carl Vondrick, Josh McDermott, Antonio Torralba

    ECCV 2018

    Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency

    Xingyi Zhou, Arjun Karpur, Chuang Gan, Linjie Luo, Qixing Huang

    ECCV 2018

    Geometry-Guided CNNs for Self-supervised Video Representation Learning

    Chuang Gan, Boqing Gong, Kun Liu, Hao Su, Leonidas Guibas

    CVPR 2018

    Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification

    Xiang Long, Chuang Gan, Gerard de Melo, Jiajun Wu, Xiao Liu, Shilei Wen

    CVPR 2018

    End-to-End Learning of Motion Representation for Video Understanding

    Lijie Fan, Wenbing Huang, Chuang Gan, Stefano Ermon, Boqing Gong, Junzhou Huang

    CVPR 2018 (Spotlight)

    Sparse, Smart Contours to Represent and Edit Images

    Tali Dekel, Chuang Gan, Dilip Krishnan, Ce Liu, William T. Freeman

    CVPR 2018

    Video Captioning with Multi-Faceted Attention

    Xiang Long, Chuang Gan, Gerard de Melo

    TACL 2018


    2017

    StyleNet: Generating Attractive Visual Captions with Styles

    Chuang Gan, Zhe Gan, Xiaodong He, Jianfeng Gao, Li Deng

    CVPR 2017

    Semantic Compositional Networks for Visual Captioning

    Zhe Gan, Chuang Gan, Xiaodong He, Yunchen Pu, Kenneth Tran, Jianfeng Gao, Lawrence Carin, Li Deng

    CVPR 2017 (Spotlight)

    VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation

    Chuang Gan, Yandong Li, Haoxiang Li, Chen Sun, Boqing Gong

    ICCV 2017

    Recurrent Topic-Transition GAN for Visual Paragraph Generation

    Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing

    ICCV 2017


    2016

    Learning Attributes Equals Multi-Source Domain Generalization

    Chuang Gan, Tianbao Yang, Boqing Gong

    CVPR 2016 (Spotlight)

    You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images

    Chuang Gan, Ting Yao, Kuiyuan Yang, Yi Yang, Tao Mei

    CVPR 2016 (Spotlight)

    Recognizing an Action Using Its Name: A Knowledge-Based Approach

    Chuang Gan, Yi Yang, Linchao Zhu, Deli Zhao, Yueting Zhuang

    IJCV 2016

    Webly-Supervised Video Recognition by Mutually Voting for Relevant Web Images and Web Video Frames

    Chuang Gan, Chen Sun, Lixin Duan, Boqing Gong

    ECCV 2016


    2015

    DevNet: A Deep Event Network for multimedia event detection and evidence recounting

    Chuang Gan, Naiyan Wang, Yi Yang, Dit-Yan Yeung, Alexander G. Hauptmann

    CVPR 2015

    Automatic Concept Discovery from Parallel Text and Visual Corpora

    Chen Sun, Chuang Gan, Ram Nevatia

    ICCV 2015

    Audio-Visual Scene Analysis

    Foley Music: Learning to Generate Music from Videos

    Chuang Gan, Deng Huang, Peihao Chen, Joshua B. Tenenbaum, Antonio Torralba

    ECCV 2020

    Music Gesture for Visual Sound Separation

    Chuang Gan, Deng Huang, Hang Zhao, Joshua B. Tenenbaum, Antonio Torralba

    CVPR 2020

    The Sound of Pixels

    Hang Zhao, Chuang Gan, Andrew Rouditchenko, Carl Vondrick, Josh McDermott, Antonio Torralba

    ECCV 2018

    Self-supervised Moving Vehicle Tracking with Stereo Sound

    Chuang Gan, Hang Zhao, Peihao Chen, David Cox, Antonio Torralba

    ICCV 2019

    The Sound of Motions

    Hang Zhao, Chuang Gan, Wei-Chiu Ma, Antonio Torralba

    ICCV 2019

    Deep Audio Priors Emerge From Harmonic Convolutional Networks

    Zhoutong Zhang, Yunyun Wang, Chuang Gan, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman

    ICLR 2020

    Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

    Chuang Gan*, Yiwei Zhang*, Jiajun Wu, Boqing Gong, Joshua B. Tenenbaum

    ICRA 2020


    Visual Reasoning

    CLEVRER: CoLlision Events for Video REpresentation and Reasoning

    Kexin Yi*, Chuang Gan*, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum

    ICLR 2020 (Oral Spotlight)

    Dense Regression Network For Video Grounding

    Runhao Zeng, Haoming Xu, Wenbing Huang, Peihao Chen, Mingkui Tan, Chuang Gan

    CVPR 2020

    The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision

    Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B. Tenenbaum, Jiajun Wu

    ICLR 2019 (Oral)

    Visual Concept-Metaconcept Learning

    Chi Han, Jiayuan Mao, Chuang Gan, Josh Tenenbaum, Jiajun Wu

    NeurIPS 2019

    Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

    Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, Joshua B. Tenenbaum

    NIPS 2018

    VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation

    Chuang Gan, Yandong Li, Haoxiang Li, Chen Sun, Boqing Gong

    ICCV 2017


    Visual Representations Learning

    Once for All: Train One Network and Specialize it for Efficient Deployment

    Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han

    ICLR 2020

    Cross-channel Communication Networks

    Jianwei Yang, Zhile Ren, Chuang Gan, Hongyuan Zhu, Devi Parikh

    NeurIPS 2019

    TSM: Temporal Shift Module for Efficient Video Understanding

    Ji Lin, Chuang Gan, Song Han

    ICCV 2019

    Graph Convolutional Networks for Temporal Action Localization

    Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, Chuang Gan

    ICCV 2019

    Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification

    Xiang Long, Chuang Gan, Gerard de Melo, Jiajun Wu, Xiao Liu, Shilei Wen

    CVPR 2018

    End-to-End Learning of Motion Representation for Video Understanding

    Lijie Fan, Wenbing Huang, Chuang Gan, Stefano Ermon, Boqing Gong, Junzhou Huang

    CVPR 2018 (Spotlight)

    DevNet: A Deep Event Network for multimedia event detection and evidence recounting

    Chuang Gan, Naiyan Wang, Yi Yang, Dit-Yan Yeung, Alexander G. Hauptmann

    CVPR 2015


    Learning from Unlabeled Videos

    Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement

    Chao Yang, Xiaojian Ma, Wenbing Huang, Fuchun Sun, Huaping Liu, Junzhou Huang, Chuang Gan

    NeurIPS 2019 (Spotlight)

    Geometry-Guided CNNs for Self-supervised Video Representation Learning

    Chuang Gan, Boqing Gong, Kun Liu, Hao Su, Leonidas Guibas

    CVPR 2018

    You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images

    Chuang Gan, Ting Yao, Kuiyuan Yang, Yi Yang, Tao Mei

    CVPR 2016 (Spotlight)

    Recognizing an Action Using Its Name: A Knowledge-Based Approach

    Chuang Gan, Yi Yang, Linchao Zhu, Deli Zhao, Yueting Zhuang

    IJCV 2016

    Webly-Supervised Video Recognition by Mutually Voting for Relevant Web Images and Web Video Frames

    Chuang Gan, Chen Sun, Lixin Duan, Boqing Gong

    ECCV 2016


    Generative Models for Vision and Language

    Weakly Supervised Dense Event Captioning in Videos

    Xuguang Duan, Wenbing Huang, Chuang Gan, Jingdong Wang, Wenwu Zhu, Junzhou Huang

    NeurIPS 2018

    Video Captioning with Multi-Faceted Attention

    Xiang Long, Chuang Gan, Gerard de Melo

    TACL 2018

    StyleNet: Generating Attractive Visual Captions with Styles

    Chuang Gan, Zhe Gan, Xiaodong He, Jianfeng Gao, Li Deng

    CVPR 2017

    Semantic Compositional Networks for Visual Captioning

    Zhe Gan, Chuang Gan, Xiaodong He, Yunchen Pu, Kenneth Tran, Jianfeng Gao, Lawrence Carin, Li Deng

    CVPR 2017 (Spotlight)

    Recurrent Topic-Transition GAN for Visual Paragraph Generation

    Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing

    ICCV 2017

    Automatic Concept Discovery from Parallel Text and Visual Corpora

    Chen Sun, Chuang Gan, Ram Nevatia

    ICCV 2015

    Sparse, Smart Contours to Represent and Edit Images

    Tali Dekel, Chuang Gan, Dilip Krishnan, Ce Liu, William T. Freeman

    CVPR 2018


    Domaim Adaptation

    Learning Attributes Equals Multi-Source Domain Generalization

    Chuang Gan, Tianbao Yang, Boqing Gong

    CVPR 2016 (Spotlight)

    Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency

    Xingyi Zhou, Arjun Karpur,Chuang Gan, Linjie Luo, Qixing Huang

    ECCV 2018


    Competitions

    • Rank 1st in ActivityNet AVA Challenge 2018

    • Rank 1st in ActivityNet Kinetics Challenge 2017

    • Rank 1st in NIST TRECVID MED and MER 2014

    • Rank 2nd in Moments in Time 2018

    • Rank 3rd in Youtube8M Challenge 2017

    • Rank 3rd in ActivityNet classification Challenge 2016


    Data & Software

    NS-VQA. Neural-Symbolic Visual Reasoning.

    WSDEC. Weakly-supervised Dense Event Captioning.

    The Sound of Pixels. Listen to the sound of pixels.

    Smart Contours. Edit images using contours.

    Attention Clusters. Multiple and diverse attention for video classification.

    SCN. Semantic composition network for image and video captioning.

    VQS. Visual question segmentation.

    TVNET. End to end video motion learning.

    Youtube8M. Temporal modeling for video classification.


    Talks

    Video Understanding: From Tags to Language.

    Stanford University, MSR, AI2, NEC, NVDIA, Baidu, MERL, IBM, UCF (2017)


    Honors

    • Outstanding Doctoral Thesis Award at Tsinghua University (2018)

    • Excellent Graduate Student at Tsinghua University (2018)

    • Top Talented Graduate Student at Tsinghua University (2017)

    • Academic Rising Star Finalist at Tsinghua University (2016, 2017)

    • Microsoft Fellowship (2016)

    • Baidu Fellowship (2016)

    • National Scholarship, by Ministry of Education of China (2015)


    conter12