|
|
Contact
Email: ganchuang [at] csail (dot) mit (dot) eduNews
Research Highlight
A Multi-Modal Interactive Physical Simulation Platform for
Computer Vision, Robotics and Cognitive Science
Publications(by date / by topic)
2022
![]() |
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
CVPR 2022 (PDF Comming Soon) |
![]() |
Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction
CVPR 2022 |
![]() |
ICRA 2022 |
![]() |
ICLR 2022 (Oral) |
![]() |
ICLR 2022 |
![]() |
Linking Emergent and Natural Languages via Corpus Transfer
ICLR 2022 (Spotlight) |
![]() |
ComPhy: Compositional Physical Reasoning of Objects and Events from Videos
ICLR 2022 |
![]() |
Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics
ICLR 2022 (Spotlight) |
![]() |
ICLR 2022 |
2021
![]() |
ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation
NeurIPS Dataset 2021 (Oral) |
![]() |
Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
NeurIPS 2021 |
![]() |
PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning
NeurIPS 2021 |
![]() |
When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?
NeurIPS 2021 |
![]() |
STAR: A Benchmark for Situated Reasoning in Real-World Videos
NeurIPS Dataset 2021 |
![]() |
Curious Representation Learning for Embodied Intelligence
ICCV 2021 |
![]() |
OPEn: An Open-ended Physics Environment for Learning Without a Task
IROS 2021 |
![]() |
AGENT: A Benchmark for Core Psychological Reasoning
ICML 2021 |
![]() |
Temporal and Object Quantification Networks
IJCAI 2021 |
![]() |
PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics.
ICLR 2021 (Spotlight) |
![]() |
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
ICLR 2021 |
![]() |
Learning Task Decomposition with Order-Memory Policy Network
ICLR 2021 |
2020
![]() |
Foley Music: Learning to Generate Music from Videos
ECCV 2020 |
![]() |
Music Gesture for Visual Sound Separation
CVPR 2020 |
![]() |
Dense Regression Network For Video Grounding
CVPR 2020 |
![]() |
TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning
NeurIPS 2020 |
![]() |
MCUNet: Tiny Deep Learning on IoT Devices
NeurIPS 2020 (Spotlight) |
![]() |
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
ICLR 2020 (Spotlight) |
![]() |
Deep Audio Priors Emerge From Harmonic Convolutional Networks
ICLR 2020 |
![]() |
Once for All: Train One Network and Specialize it for Efficient Deployment
ICLR 2020 |
![]() |
Look, Listen, and Act: Towards Audio-Visual Embodied Navigation
ICRA 2020 |
2019
![]() |
Self-supervised Moving Vehicle Tracking with Stereo Sound
ICCV 2019 |
![]() |
ICCV 2019 |
![]() |
TSM: Temporal Shift Module for Efficient Video Understanding
ICCV 2019 |
![]() |
Graph Convolutional Networks for Temporal Action Localization
ICCV 2019 |
![]() |
Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement
NeurIPS 2019 (Spotlight) |
![]() |
Visual Concept-Metaconcept Learning
NeurIPS 2019 |
![]() |
Cross-channel Communication Networks
NeurIPS 2019 |
![]() |
ICLR 2019 (Oral) |
![]() |
Defensive quantization: When efficiency meets robustness
ICLR 2019 |
2018
![]() |
Weakly Supervised Dense Event Captioning in Videos
NeurIPS 2018 |
![]() |
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
NeurIPS 2018 (Spotlight) |
![]() |
ECCV 2018 |
![]() |
Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency
ECCV 2018 |
![]() |
Geometry-Guided CNNs for Self-supervised Video Representation Learning
CVPR 2018 |
![]() |
Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification
CVPR 2018 |
![]() |
End-to-End Learning of Motion Representation for Video Understanding
CVPR 2018 (Spotlight) |
![]() |
Sparse, Smart Contours to Represent and Edit Images
CVPR 2018 |
![]() |
Video Captioning with Multi-Faceted Attention
TACL 2018 |
2017
![]() |
StyleNet: Generating Attractive Visual Captions with Styles
CVPR 2017 |
![]() |
Semantic Compositional Networks for Visual Captioning
CVPR 2017 (Spotlight) |
![]() |
ICCV 2017 |
![]() |
Recurrent Topic-Transition GAN for Visual Paragraph Generation
ICCV 2017 |
2016
![]() |
Automatic Concept Discovery from Parallel Text and Visual Corpora
ICCV 2015 |
Embodied Intelligence
![]() |
ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation
NeurIPS Dataset 2021 (Oral) |
![]() |
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
CVPR 2022 (PDF Comming Soon) |
![]() |
ICRA 2022 |
![]() |
ICLR 2022 (Oral) |
![]() |
ICLR 2022 |
![]() |
Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics
ICLR 2022 (Spotlight) |
![]() |
OPEn: An Open-ended Physics Environment for Learning Without a Task
IROS 2021 |
![]() |
Curious Representation Learning for Embodied Intelligence
ICCV 2021 |
![]() |
PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics.
ICLR 2021 (Spotlight) |
![]() |
Learning Task Decomposition with Order-Memory Policy Network.
ICLR 2021 |
![]() |
Look, Listen, and Act: Towards Audio-Visual Embodied Navigation
ICRA 2020 |
![]() |
Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement
NeurIPS 2019 (Spotlight) |
Audio-Visual Scene Analysis
![]() |
Foley Music: Learning to Generate Music from Videos
ECCV 2020 |
![]() |
Music Gesture for Visual Sound Separation
CVPR 2020 |
![]() |
ECCV 2018 |
![]() |
Self-supervised Moving Vehicle Tracking with Stereo Sound
ICCV 2019 |
![]() |
ICCV 2019 |
Visual Commonsense Reasoning
![]() |
Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction
CVPR 2022 |
![]() |
ComPhy: Compositional Physical Reasoning of Objects and Events from Videos
ICLR 2022 |
![]() |
Linking Emergent and Natural Languages via Corpus Transfer
ICLR 2022 (Spotlight) |
![]() |
ICLR 2022 |
![]() |
Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
NeurIPS 2021 |
![]() |
PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning
NeurIPS 2021 |
![]() |
STAR: A Benchmark for Situated Reasoning in Real-World Videos
NeurIPS Dataset 2021 |
![]() |
AGENT: A Benchmark for Core Psychological Reasoning
ICML 2021 |
![]() |
Temporal and Object Quantification Networks
IJCAI 2021 |
![]() |
Grounding Physical Object and Event Concepts Through Dynamic Visual Reasoning.
ICLR 2021 |
![]() |
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
ICLR 2020 (Oral Spotlight) |
![]() |
Dense Regression Network For Video Grounding
CVPR 2020 |
![]() |
ICLR 2019 (Oral) |
![]() |
Visual Concept-Metaconcept Learning
NeurIPS 2019 |
![]() |
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
NIPS 2018 |
![]() |
ICCV 2017 |
Visual Representations Learning
![]() |
When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?
NeurIPS 2021 |
![]() |
TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning
NeurIPS 2020 |
![]() |
MCUNet: Tiny Deep Learning on IoT Devices
NeurIPS 2020 (Spotlight) |
![]() |
Once for All: Train One Network and Specialize it for Efficient Deployment
ICLR 2020 |
![]() |
Cross-channel Communication Networks
NeurIPS 2019 |
![]() |
TSM: Temporal Shift Module for Efficient Video Understanding
ICCV 2019 |
![]() |
Graph Convolutional Networks for Temporal Action Localization
ICCV 2019 |
![]() |
Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification
CVPR 2018 |
![]() |
End-to-End Learning of Motion Representation for Video Understanding
CVPR 2018 (Spotlight) |
![]() |
DevNet: A Deep Event Network for multimedia event detection and evidence recounting
CVPR 2015 |
Learning from Unlabeled Videos
![]() |
Geometry-Guided CNNs for Self-supervised Video Representation Learning
CVPR 2018 |
![]() |
You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images
CVPR 2016 (Spotlight) |
![]() |
Recognizing an Action Using Its Name: A Knowledge-Based Approach
IJCV 2016 |
![]() |
Webly-Supervised Video Recognition by Mutually Voting for Relevant Web Images and Web Video Frames
ECCV 2016 |
Generative Models for Vision and Language
![]() |
Weakly Supervised Dense Event Captioning in Videos
NeurIPS 2018 |
![]() |
Video Captioning with Multi-Faceted Attention
TACL 2018 |
![]() |
StyleNet: Generating Attractive Visual Captions with Styles
CVPR 2017 |
![]() |
Semantic Compositional Networks for Visual Captioning
CVPR 2017 (Spotlight) |
![]() |
Recurrent Topic-Transition GAN for Visual Paragraph Generation
ICCV 2017 |
![]() |
Automatic Concept Discovery from Parallel Text and Visual Corpora
ICCV 2015 |
![]() |
Sparse, Smart Contours to Represent and Edit Images
CVPR 2018 |
Domaim Adaptation
![]() |
Learning Attributes Equals Multi-Source Domain Generalization
CVPR 2016 (Spotlight) |
![]() |
Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency
ECCV 2018 |
Competitions
• Rank 1st in ActivityNet AVA Challenge 2018
• Rank 1st in ActivityNet Kinetics Challenge 2017
• Rank 1st in NIST TRECVID MED and MER 2014
• Rank 2nd in Moments in Time 2018
• Rank 3rd in Youtube8M Challenge 2017
• Rank 3rd in ActivityNet classification Challenge 2016
Data & Software
• NS-VQA. Neural-Symbolic Visual Reasoning.
• WSDEC. Weakly-supervised Dense Event Captioning.
• The Sound of Pixels. Listen to the sound of pixels.
• Smart Contours. Edit images using contours.
• Attention Clusters. Multiple and diverse attention for video classification.
• SCN. Semantic composition network for image and video captioning.
• VQS. Visual question segmentation.
• TVNET. End to end video motion learning.
• Youtube8M. Temporal modeling for video classification.
Honors
• Outstanding Doctoral Thesis Award at Tsinghua University (2018)
• Excellent Graduate Student at Tsinghua University (2018)
• Top Talented Graduate Student at Tsinghua University (2017)
• Academic Rising Star Finalist at Tsinghua University (2016, 2017)
• Microsoft Fellowship (2016)
• Baidu Fellowship (2016)
• National Scholarship, by Ministry of Education of China (2015)