Yale Song

FAIR, Meta AI
<first_name><last_name> AT fb DOT com

Scholar | Github | Curriculum Vitae

Who am I?

I am a research scientist in Facebook Fundamental AI Research (FAIR) at Meta AI. I work in computer vision and machine learning. Most recently, my research is focused on learning from unlabeled and noisy data. I am particularly interested in unsupervised/self-supervised learning from video data, leveraging spatio-temporal and multimodal structures. Before joining Meta, I was a founding member of the Computer Vision Group at Microsoft Research in Redmond, and prior to that I spent 4 years at Yahoo Research in NYC. I obtained Master's and PhD degrees in Computer Science from MIT in 2010 and 2014, respectively, where I was a member of Computer Science and Artificial Intelligence Laboratory (CSAIL). My dissertation investigated learning from structured data and its applications to video understanding. I was lucky to have my committee Randall Davis, Bill Freeman, John Fisher, and Louis-Philippe Morency.

Latest News

Professional Service

    • Organizing Committee: CVPR 2024/2023 (Socials Chair), ICLR 2023/2021 (Paper Award Committee), NeurIPS 2021/2020 (Expo Chair), ICMI 2021 (Sponsorship Chair), ICMI 2019 (Program Chair)
    • Area Chair: NeurIPS 2023/2022/2021/2020, ICLR 2024/2023/2021, ICML 2023, CVPR 2024/2020, ECCV 2022, ICCV 2023/2021, WACV 2023/2020/2018, FG 2018, ICMI 2018/2017/2016, ACL 2019
    • Journal Editor: Transactions on Machine Learning Research
    • Outstanding Area Chair: ICLR 2023
    • Outstanding reviewer: ICML 2021/2020, NeurIPS 2019, CVPR 2017

Publications (see also at Google Scholar, DBLP)

    2023

  1. Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities
    Yale Song, Gene Byrne, Tushar Nagarajan, Huiyu Wang, Miguel Martin, Lorenzo Torresani
    NeurIPS 2023 Datasets & Benchmarks (Spotlight)

  2. EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
    Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Zheng Shou, Rama Chellappa, Pengchuan Zhang
    ICCV 2023 [Preprint]

  3. Egocentric Video Task Translation
    Zihui Xue, Yale Song, Kristen Grauman, Lorenzo Torresani
    CVPR 2023 (Highlight) [Preprint]

  4. Scaling Novel Object Detection with Weakly Supervised Detection Transformers
    Tyler LaBonte, Yale Song, Xin Wang, Vibhav Vineet, Neel Joshi
    WACV 2023 [Preprint][Poster]

  5. 2022

  6. Neural-Sim: Learning to Generate Training Data with NeRF
    Yunhao Ge, Harkirat Behl, Jiashu Xu, Suriya Gunasekar, Neel Joshi, Yale Song, Xin Wang, Laurent Itti, Vibhav Vineet
    ECCV 2022

  7. COMPASS: Contrastive Multimodal Pretraining for Autonomous Systems
    Shuang Ma, Sai Vemprala, Wenshan Wang, Jayesh K. Gupta, Yale Song, Daniel McDuff, Ashish Kapoor
    IROS 2022 [Preprint] [Blog] [Code]

  8. Visual Attention Emerges from Recurrent Sparse Reconstruction
    Baifeng Shi, Yale Song, Neel Joshi, Trevor Darrell, Xin Wang
    ICML 2022 [Preprint] [Slides] [Code]

  9. Robust Contrastive Learning against Noisy Views
    Ching-Yao Chuang, R Devon Hjelm, Xin Wang, Vibhav Vineet, Neel Joshi, Antonio Torralba, Stefanie Jegelka, Yale Song
    CVPR 2022 [Preprint] [Code]

  10. CausalCity: Complex Simulations with Agency for Causal Discovery and Reasoning
    Daniel McDuff, Yale Song, Jiyoung Lee, Vibhav Vineet, Sai Vemprala, Nicholas Alexander Gyde, Hadi Salman, Shuang Ma, Kwanghoon Sohn, Ashish Kapoor
    CLeaR 2022 [Preprint] [Project and code] [Blog]

  11. DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents
    Tsu-Jui Fu, William Yang Wang, Daniel McDuff, Yale Song
    AAAI 2022 [Preprint] [Project and dataset]

  12. Anomaly Detection in Time Series with Robust Variational Quasi-Recurrent Autoencoders
    Tung Kieu, Razvan Cirstea, Yan Zhao, Bin Yang, Chenjuan Guo, Yale Song, Christian Jensen
    ICDE 2022

  13. 2021

  14. Contrastive Learning of Global and Local Video Representations
    Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song
    NeurIPS 2021 [Paper] [Code]

  15. ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning
    Sangho Lee, Jiwan Chung, Youngjae Yu, Gunhee Kim, Thomas Breuel, Gal Chechik, Yale Song
    ICCV 2021 [Preprint] [ACAV100M Dataset] [MSR Blog]

  16. Parameter Efficient Multimodal Transformers for Video Representation Learning
    Sangho Lee, Youngjae Yu, Gunhee Kim, Thomas Breuel, Jan Kautz, Yale Song
    ICLR 2021 [Preprint] [Code] [MSR Blog]

  17. Self-Supervised Learning of Compressed Video Representations
    Youngjae Yu, Sangho Lee, Gunhee Kim, Yale Song
    ICLR 2021 [Preprint]

  18. Active Contrastive Learning of Audio-Visual Video Representations
    Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song
    ICLR 2021 [Preprint] [Code]

  19. 2020

  20. Attention-Based Deep Metric Learning for Near-Duplicate Video Retrieval
    Kuan-Hsun Wang, Chia Chun Cheng, Yi-Ling Chen, Yale Song, Shang-Hong Lai
    ICPR 2020 (Oral)

  21. Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency
    Matt Whitehill, Shuang Ma, Daniel McDuff, Yale Song
    INTERSPEECH 2020 [Preprint]

  22. Phans, Stans and Cishets: Self-Presentation Effects on Content Propagation in Tumblr
    Michael M. Yoder, Qinlan Shen, Alex Coda, Yunseok Jang, Yale Song, Kapil Thadani, Carolyn P. Rose
    WebSci 2020

  23. Image to Video Domain Adaptation Using Web Supervision
    Andrew Kae and Yale Song
    WACV 2020 [Preprint]

  24. 2019

  25. Characterizing Bias in Classifiers using Generative Models
    Daniel McDuff, Shuang Ma, Yale Song, Ashish Kapoor
    NeurIPS 2019 [Preprint]

  26. Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck
    Shuang Ma, Daniel McDuff, Yale Song
    ICCV 2019 [Paper] [Project]

  27. Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
    Yale Song and Mohammad Soleymani
    CVPR 2019 [Paper] [Project] [Code and Dataset]

  28. Neural TTS Stylization with Adversarial and Collaborative Games
    Shuang Ma, Daniel McDuff, Yale Song
    ICLR 2019 [Paper] [Code] [Press]

  29. Visual Question Answering with Spatio-Temporal Reasoning
    Yunseok Jang, Yale Song, Chris Dongjoo Kim, YoungJae Yu, Youngjin Kim, Gunhee Kim
    IJCV 2019 [Paper] [Code and dataset]

  30. 2018

  31. Video Prediction with Appearance and Motion Conditions
    Yunseok Jang, Gunhee Kim, Yale Song
    ICML 2018 [PDF] [Project] [Code]

  32. Image2GIF: Generating Cinemagraphs using Recurrent Deep Q-Networks
    Yipin Zhou, Yale Song, Tamara L. Berg
    WACV 2018 [arxiv] [project]

  33. 2017

  34. Learning from Noisy Labels with Distillation
    Yuncheng Li, Jianchao Yang, Yale Song, Liangliang Cao, Jiebo Luo, Jia Li
    ICCV 2017 [arxiv] [slides] [YFCC100M Entity Dataset]

  35. ElasticPlay: Interactive Video Summarization with Dynamic Time Budget
    Haojian JIn, Yale Song, Koji Yatani
    ACM Multimedia 2017 (Oral) [arxiv] [demo] [video]

  36. TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
    Yunseok Jang, Yale Song, YoungJae Yu, Youngjin Kim, Gunhee Kim
    CVPR 2017 (Spotlight) [arxiv] [Code and dataset]

  37. Improving Pairwise Ranking for Multi-label Image Classification
    Yuncheng Li, Yale Song, Jiebo Luo
    CVPR 2017 [arxiv]

  38. 2016

  39. Real-Time Video Highlights for Yahoo Esports
    Yale Song
    NIPS Workshop, LSCVS 2016, [arxiv]
    In production at Yahoo eSports (Match Highlights)

  40. To Click or Not To Click: Automatic Selection of Beautiful Thumbnails from Videos
    Yale Song, Miriam Redi, Jordi Vallmitjana, Alejandro Jaimes
    CIKM 2016, [arxiv] [Slides] [Code] [Dataset]
    In production at Tumblr and Flickr (Thumbnails from user-generated videos)

  41. TGIF: A New Dataset and Benchmark on Animated GIF Description
    Yuncheng Li, Yale Song, Liangliang Cao, Joel Tetreault, Larry Goldberg, Alejandro Jaimes, Jiebo Luo
    CVPR 2016 (Spotlight), [arxiv] [Dataset] [Project]

  42. Video2GIF: Automatic Generation of Animated GIFs from Video
    Michael Gygli, Yale Song, Liangliang Cao
    CVPR 2016, [arxiv] [Demo] [Code] [Dataset]
    Press coverage: Yahoo, Motherboard, Le Monde (French)

  43. Fast, Cheap, and Good: Why Animated GIFs Engage Us
    Saeideh Bakhshi, David A. Shamma, Lyndon Kennedy, Yale Song, Paloma de Juan, Joseph 'Jofish' Kaye
    CHI 2016, [PDF] [Dataset] [Video]

  44. Balancing Appearance and Context in Sketch Interpretation
    Yale Song, Randall Davis, Kaichen Ma, Dana L. Penney
    IJCAI 2016, [arxiv]

  45. Mouse Activity as an Indicator of Interestingness in Video
    Gloria Zen, Paloma de Juan, Yale Song, Alejandro Jaimes
    ICMR 2016 (Long paper), [PDF] [Dataset]

  46. 2015

  47. TVSum: Summarizing Web Videos using Titles
    Yale Song, Jordi Vallmitjana, Amanda Stent, Alejandro Jaimes
    CVPR 2015, [PDF] [Poster] [TVSum50 Dataset]

  48. Video Co-summarization: Video Summarization by Visual Co-occurrence
    Wen-Sheng Chu, Yale Song, Alejandro Jaimes
    CVPR 2015, [PDF] [Poster] [Project]

  49. Continuous Body and Hand Gesture Recognition for Natural Human-Computer Interaction
    Yale Song, Randall Davis
    IJCAI 2015 Journal Track, [PDF]

  50. Exploiting Sparsity and Co-occurrence Structure for Action Unit Recognition
    Yale Song*, Daniel McDuff*, Deepak Vasisht, Ashish Kapoor (* equal contribution)
    FG 2015, [PDF] [Project] [Code]

  51. 2014

  52. #FluxFlow: Visual Analysis of Anomalous Information Spreading on Social Media
    Jian Zhao, Nan Cao, Zhen Wen, Yale Song, Yu-Ru Lin, Christopher Collins
    IEEE Trans. Visual. Comput. Graphics (VAST 2014), [PDF] [Video]
    Honorable Mention Award (3 out of 146 submissions)

  53. 2013

  54. Action Recognition by Hierarchical Sequence Summarization
    Yale Song, Louis-Philippe Morency, Randall Davis
    CVPR 2013, [PDF] [Code]

  55. One-Class Conditional Random Fields for Sequential Anomaly Detection
    Yale Song, Zhen Wen, Ching-Yung Lin, Randall Davis
    IJCAI 2013, [PDF]

  56. Distribution-Sensitive Learning for Imbalanced Datasets
    Yale Song, Louis-Philippe Morency, Randall Davis
    FG 2013, [PDF]

  57. Learning a Sparse Codebook of Facial and Body Microexpressions for Emotion Recognition
    Yale Song, Louis-Philippe Morency, Randall Davis
    ICMI 2013, [PDF] [Slides]

  58. 2012

  59. Multi-View Latent Variable Discriminative Models for Action Recognition
    Yale Song, Louis-Philippe Morency, Randall Davis
    CVPR 2012, [PDF] [Project] [Code]

  60. Multimodal Human Behavior Analysis: Learning Correlation and Interaction Across Modalities
    Yale Song, Louis-Philippe Morency, Randall Davis
    ICMI 2012, [PDF] [Slides]

  61. Continuous Body and Hand Gesture Recognition for Natural Human-Computer Interaction
    Yale Song, David Demirdjian, Randall Davis
    ACM Trans. Interact. Intell. Syst. 2(1), 2012, [PDF]
    Press coverage: MIT News, Economist, The Verge, CNET, Gizmodo, DailyBRINK

  62. 2011

  63. Tracking Body and Hands For Gesture Recognition: NATOPS Aircraft Handling Signals Database
    Yale Song, David Demirdjian, Randall Davis
    FG 2011, [PDF] [Dataset]

  64. Multi-Signal Gesture Recognition Using Temporal Smoothing Hidden Conditional Random Fields
    Yale Song, David Demirdjian, Randall Davis
    FG 2011, [PDF]

Theses

  1. Structured Video Content Analysis: Learning Spatio-Temporal and Multimodal Structures
    Yale Song
    PhD Thesis, Massachusetts Institute of Technology, 2014 [DSpace@MIT]

  2. Multi-Signal Gesture Recognition using Body and Hand Poses
    Yale Song
    SM Thesis, Massachusetts Institute of Technology, 2010 [DSpace@MIT]

Talks (selected)

  1. Life in Industrial Research Labs
    NeurIPS 2021 Workshop on New in ML, Dec 2021 [Video]

  2. Towards Self-Supervised Holistic Video Representations
    ICCV Tutorial on Holistic Video Understanding, Oct 2021 [Video] [Slides]

  3. An Introduction to Learning from Unlabeled Video
    ICCV Tutorial on Efficient Video Understanding, Oct 2021 [Video] [Slides]

  4. Learning from Unlabeled Video
    UBC Topics in Artificial Intelligence (guest lecture). Apr 2021
  5. SNU Data Science Seminar. Dec 2020 [Video] [Slides]

Interns / Students

Etc.


PageRank