Yale Song

Senior Researcher
Microsoft Research
One Microsoft Way, Redmond, WA 98052
<first_name><last_name> AT microsoft DOT com

Bio | Scholar | Github | Curriculum Vitae

Who am I?

My research is centered around computer vision and machine learning, especially visual perception (classification, detection, segmentation) and visual generation (summarization and synthesis). Most of my recent work is focused on self-supervised learning from unlabeled videos. Besides conducting basic research, I am also interested in making real-world impact with computer vision: Some of my work have been deployed to production at Yahoo, including video thumbnail detection at Flickr and Tumblr, video summary generation at Video Guide, and live stream video highlighting at Yahoo eSports.

I obtained Master's and PhD degrees in Computer Science from Massachusetts Institute of Technology in 2010 and 2014, respectively. I was a member of the Computer Science and Artificial Intelligence Laboratory, and my advisor was Randall Davis. My dissertation investigated learning from structured data and its applications to video understanding. I was lucky to have my committee Randall Davis (chair), Bill Freeman, John Fisher, and Louis-Philippe Morency.

Call for Papers


  • One paper on characterizing bias in visual classifiers accepted at NeurIPS 2019 (Sep 3, 2019)
  • Serving as Area Chair for CVPR 2020 (July 28, 2019)
  • One paper on multimodal representation learning accepted at ICCV 2019 (July 22, 2019)
  • Serving as Area Chair for WACV 2020 (June 24, 2019)
  • Co-organizing a workshop on Comprehensive Video Understanding in the Wild at ICCV 2019 (May 15, 2019)
  • Serving as Area Chair (Computer Vision Track) for ACL 2019 (March 10, 2019)
  • One paper on visual-semantic embedding accepted at CVPR 2019 (Feb 24, 2019)
  • One paper on neural TTS stylization accepted at ICLR 2019 (Dec 20, 2018)
  • Co-organizing a workshop on Learning from Unlabeled Videos at CVPR 2019 (Dec 17, 2018)
  • One paper on video prediction accepted at ICML 2018 (May 11, 2018)
  • Serving as Program Chair for ICMI 2019 in Suzhou, China (April 15, 2018)
  • New chapter at Microsoft AI & Research! (March 19, 2018)
  • One paper on video prediction accepted at WACV 2018 (Jan 20, 2018)
  • Serving as Senior Program Committee for ICMI 2018, for three years in a row! (Jan 16, 2018)
  • Receiving CVPR 2017 Outstandaing Reviewer Award (July 21, 2017)
  • Serving as Area Chair for WACV 2018 (July 21, 2017)
  • One paper on learning from noisy labels accepted at ICCV 2017 (July 16, 2017)
  • One paper on interactive video summarization accepted at ACM Multimedia 2017 (July 2, 2017)
  • NVIDIA AI Podcast on How Yahoo Uses AI to Create Instant eSports Highlight Reels (May 31, 2017)
  • Serving as Senior Program Committee for ICMI 2017 (May 30, 2017)
  • Serving as Area Chair for FG 2018 (May 9, 2017)
  • Serving as Judge for LDV Summit 2017 Entrepreneur Computer VIsion Competition (April 13, 2017)
  • Giving a talk on live video highlighting at NVIDIA GTC 2017 [talk][slides]. (March 17, 2017)
  • Two papers accepted at CVPR 2017 (Feb 27, 2017)
  • Giving a talk on Video Highlight Detection at Yahoo! at FCV 2017 and SNU [slides]. (Feb 2, 2017)

Publications (see also at Google Scholar, DBLP)


  1. Image to Video Domain Adaptation Using Web Supervision
    Andrew Kae and Yale Song
    WACV 2020 [Preprint]

  2. 2019

  3. Characterizing Bias in Classifiers using Generative Models
    Daniel McDuff, Shuang Ma, Yale Song, Ashish Kapoor
    NeurIPS 2019 [Preprint]

  4. Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck
    Shuang Ma, Daniel McDuff, Yale Song
    ICCV 2019 [Paper] [Project]

  5. Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
    Yale Song and Mohammad Soleymani
    CVPR 2019 [Paper] [Project] [Code and Dataset]

  6. Neural TTS Stylization with Adversarial and Collaborative Games
    Shuang Ma, Daniel McDuff, Yale Song
    ICLR 2019 [Paper] [Code] [Press]

  7. Visual Question Answering with Spatio-Temporal Reasoning
    Yunseok Jang, Yale Song, Chris Dongjoo Kim, YoungJae Yu, Youngjin Kim, Gunhee Kim
    IJCV 2019 [Paper] [Code and dataset]

  8. 2018

  9. Video Prediction with Appearance and Motion Conditions
    Yunseok Jang, Gunhee Kim, Yale Song
    ICML 2018 [PDF] [Project] [Code]

  10. Image2GIF: Generating Cinemagraphs using Recurrent Deep Q-Networks
    Yipin Zhou, Yale Song, Tamara L. Berg
    WACV 2018 [arxiv] [project]

  11. 2017

  12. Learning from Noisy Labels with Distillation
    Yuncheng Li, Jianchao Yang, Yale Song, Liangliang Cao, Jiebo Luo, Jia Li
    ICCV 2017 [arxiv] [slides] [YFCC100M Entity Dataset]

  13. ElasticPlay: Interactive Video Summarization with Dynamic Time Budget
    Haojian JIn, Yale Song, Koji Yatani
    ACM Multimedia 2017 (Oral) [arxiv] [demo] [video]

  14. TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
    Yunseok Jang, Yale Song, YoungJae Yu, Youngjin Kim, Gunhee Kim
    CVPR 2017 (Spotlight) [arxiv] [Code and dataset]

  15. Improving Pairwise Ranking for Multi-label Image Classification
    Yuncheng Li, Yale Song, Jiebo Luo
    CVPR 2017 [arxiv]

  16. 2016

  17. Real-Time Video Highlights for Yahoo Esports
    Yale Song
    NIPS Workshop, LSCVS 2016, [arxiv]
    In production at Yahoo eSports (Match Highlights)

  18. To Click or Not To Click: Automatic Selection of Beautiful Thumbnails from Videos
    Yale Song, Miriam Redi, Jordi Vallmitjana, Alejandro Jaimes
    CIKM 2016, [arxiv] [Slides] [Code] [Dataset]
    In production at Tumblr and Flickr (Thumbnails from user-generated videos)

  19. TGIF: A New Dataset and Benchmark on Animated GIF Description
    Yuncheng Li, Yale Song, Liangliang Cao, Joel Tetreault, Larry Goldberg, Alejandro Jaimes, Jiebo Luo
    CVPR 2016 (Spotlight), [arxiv] [Dataset] [Project]

  20. Video2GIF: Automatic Generation of Animated GIFs from Video
    Michael Gygli, Yale Song, Liangliang Cao
    CVPR 2016, [arxiv] [Demo] [Code] [Dataset]
    Press coverage: Yahoo, Motherboard, Le Monde (French)

  21. Fast, Cheap, and Good: Why Animated GIFs Engage Us
    Saeideh Bakhshi, David A. Shamma, Lyndon Kennedy, Yale Song, Paloma de Juan, Joseph 'Jofish' Kaye
    CHI 2016, [PDF] [Dataset] [Video]

  22. Balancing Appearance and Context in Sketch Interpretation
    Yale Song, Randall Davis, Kaichen Ma, Dana L. Penney
    IJCAI 2016, [arxiv]

  23. Mouse Activity as an Indicator of Interestingness in Video
    Gloria Zen, Paloma de Juan, Yale Song, Alejandro Jaimes
    ICMR 2016 (Long paper), [PDF] [Dataset]

  24. 2015

  25. TVSum: Summarizing Web Videos using Titles
    Yale Song, Jordi Vallmitjana, Amanda Stent, Alejandro Jaimes
    CVPR 2015, [PDF] [Poster] [TVSum50 Dataset]

  26. Video Co-summarization: Video Summarization by Visual Co-occurrence
    Wen-Sheng Chu, Yale Song, Alejandro Jaimes
    CVPR 2015, [PDF] [Poster] [Project]

  27. Continuous Body and Hand Gesture Recognition for Natural Human-Computer Interaction
    Yale Song, Randall Davis
    IJCAI 2015 Journal Track, [PDF]

  28. Exploiting Sparsity and Co-occurrence Structure for Action Unit Recognition
    Yale Song*, Daniel McDuff*, Deepak Vasisht, Ashish Kapoor (* equal contribution)
    FG 2015, [PDF] [Project] [Code]

  29. 2014

  30. #FluxFlow: Visual Analysis of Anomalous Information Spreading on Social Media
    Jian Zhao, Nan Cao, Zhen Wen, Yale Song, Yu-Ru Lin, Christopher Collins
    IEEE Trans. Visual. Comput. Graphics (VAST 2014), [PDF] [Video]
    Honorable Mention Award (3 out of 146 submissions)

  31. 2013

  32. Action Recognition by Hierarchical Sequence Summarization
    Yale Song, Louis-Philippe Morency, Randall Davis
    CVPR 2013, [PDF] [Code]

  33. One-Class Conditional Random Fields for Sequential Anomaly Detection
    Yale Song, Zhen Wen, Ching-Yung Lin, Randall Davis
    IJCAI 2013, [PDF]

  34. Distribution-Sensitive Learning for Imbalanced Datasets
    Yale Song, Louis-Philippe Morency, Randall Davis
    FG 2013, [PDF]

  35. Learning a Sparse Codebook of Facial and Body Microexpressions for Emotion Recognition
    Yale Song, Louis-Philippe Morency, Randall Davis
    ICMI 2013, [PDF] [Slides]

  36. 2012

  37. Multi-View Latent Variable Discriminative Models for Action Recognition
    Yale Song, Louis-Philippe Morency, Randall Davis
    CVPR 2012, [PDF] [Project] [Code]

  38. Multimodal Human Behavior Analysis: Learning Correlation and Interaction Across Modalities
    Yale Song, Louis-Philippe Morency, Randall Davis
    ICMI 2012, [PDF] [Slides]

  39. Continuous Body and Hand Gesture Recognition for Natural Human-Computer Interaction
    Yale Song, David Demirdjian, Randall Davis
    ACM Trans. Interact. Intell. Syst. 2(1), 2012, [PDF]
    Press coverage: MIT News, Economist, The Verge, CNET, Gizmodo, DailyBRINK

  40. 2011

  41. Tracking Body and Hands For Gesture Recognition: NATOPS Aircraft Handling Signals Database
    Yale Song, David Demirdjian, Randall Davis
    FG 2011, [PDF] [Dataset]

  42. Multi-Signal Gesture Recognition Using Temporal Smoothing Hidden Conditional Random Fields
    Yale Song, David Demirdjian, Randall Davis
    FG 2011, [PDF]


  1. Structured Video Content Analysis: Learning Spatio-Temporal and Multimodal Structures
    Yale Song
    PhD Thesis, Massachusetts Institute of Technology, 2014 [DSpace@MIT]

  2. Multi-Signal Gesture Recognition using Body and Hand Poses
    Yale Song
    SM Thesis, Massachusetts Institute of Technology, 2010 [DSpace@MIT]

Professional Service

    • Program Chair: ICMI 2019
    • Area Chair / Senior Program Committee: CVPR 2020, ACL 2019, WACV 2018/2020, FG 2018, ICMI 2016/2017/2018
    • Reviewer / Program Committee: CVPR, ECCV, ICCV, NeurIPS, WACV, FG, ICMI
    • Journal Reviewer: TPAMI, TIP, TAFF, TKDE, TiiS, CVIU

Interns / Students