Yale Song

Researcher, Microsoft AI & Research

One Microsoft Way, Redmond, WA 98052.
Scholar | Github | Curriculum Vitae (Last Update: July. 2017)

Who am I?

My research is centered around computer vision and machine learning, especially visual perception (classification, detection, segmentation) and visual content generation (summarization and synthesis). Most of my recent work is focused on self-supervised learning from videos, with a primary focus on learning to predict/synthesize future frames in videos. Besides conducting basic research, I am also interested in making real-world impact with computer vision: Some of my work have been deployed to production at Yahoo, including video thumbnail detection at Flickr and Tumblr, video summary generation at Video Guide, and live stream video highlighting at Yahoo eSports.

I obtained Master's and PhD degrees in Computer Science from Massachusetts Institute of Technology in 2010 and 2014, respectively. I was a member of the Computer Science and Artificial Intelligence Laboratory, and my advisor was Randall Davis. My dissertation investigated learning from structured data and its applications to video understanding. I was lucky to have my committee Randall Davis (chair), Bill Freeman, John Fisher, and Louis-Philippe Morency.


Publications (see also at Google Scholar, DBLP)


  1. Video Prediction with Appearance and Motion Conditions
    Yunseok Jang, Gunhee Kim, Yale Song
    ICML 2018, [PDF], [Project], [Code]

  2. Image2GIF: Generating Cinemagraphs using Recurrent Deep Q-Networks
    Yipin Zhou, Yale Song, Tamara L. Berg
    WACV 2018, [arxiv], [project]

  3. Learning from Noisy Labels with Distillation
    Yuncheng Li, Jianchao Yang, Yale Song, Liangliang Cao, Jiebo Luo, Jia Li
    ICCV 2017, [arxiv], [slides], [YFCC100M Entity Dataset]

  4. ElasticPlay: Interactive Video Summarization with Dynamic Time Budget
    Haojian JIn, Yale Song, Koji Yatani
    ACM Multimedia 2017 (Oral), [arxiv] [demo] [video]

  5. TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
    Yunseok Jang, Yale Song, YoungJae Yu, Youngjin Kim, Gunhee Kim
    CVPR 2017 (Spotlight), [arxiv] [Code and dataset]

  6. Improving Pairwise Ranking for Multi-label Image Classification
    Yuncheng Li, Yale Song, Jiebo Luo
    CVPR 2017, [arxiv]

  7. 2016

  8. Real-Time Video Highlights for Yahoo Esports
    Yale Song
    NIPS Workshop, LSCVS 2016, [arxiv]
    In production at Yahoo eSports (Match Highlights)

  9. To Click or Not To Click: Automatic Selection of Beautiful Thumbnails from Videos
    Yale Song, Miriam Redi, Jordi Vallmitjana, Alejandro Jaimes
    CIKM 2016, [arxiv] [Slides] [Code] [Dataset]
    In production at Tumblr and Flickr (Thumbnails from user-generated videos)

  10. TGIF: A New Dataset and Benchmark on Animated GIF Description
    Yuncheng Li, Yale Song, Liangliang Cao, Joel Tetreault, Larry Goldberg, Alejandro Jaimes, Jiebo Luo
    CVPR 2016 (Spotlight), [arxiv] [Dataset] [Project]

  11. Video2GIF: Automatic Generation of Animated GIFs from Video
    Michael Gygli, Yale Song, Liangliang Cao
    CVPR 2016, [arxiv] [Demo] [Code] [Dataset]
    Press coverage: Yahoo, Motherboard, Le Monde (French)

  12. Fast, Cheap, and Good: Why Animated GIFs Engage Us
    Saeideh Bakhshi, David A. Shamma, Lyndon Kennedy, Yale Song, Paloma de Juan, Joseph 'Jofish' Kaye
    CHI 2016, [PDF] [Dataset] [Video]

  13. Balancing Appearance and Context in Sketch Interpretation
    Yale Song, Randall Davis, Kaichen Ma, Dana L. Penney
    IJCAI 2016, [arxiv]

  14. Mouse Activity as an Indicator of Interestingness in Video
    Gloria Zen, Paloma de Juan, Yale Song, Alejandro Jaimes
    ICMR 2016 (Long paper), [PDF] [Dataset]

  15. 2015

  16. TVSum: Summarizing Web Videos using Titles
    Yale Song, Jordi Vallmitjana, Amanda Stent, Alejandro Jaimes
    CVPR 2015, [PDF] [Poster] [TVSum50 Dataset]

  17. Video Co-summarization: Video Summarization by Visual Co-occurrence
    Wen-Sheng Chu, Yale Song, Alejandro Jaimes
    CVPR 2015, [PDF] [Poster] [Project]

  18. Continuous Body and Hand Gesture Recognition for Natural Human-Computer Interaction
    Yale Song, Randall Davis
    IJCAI 2015 Journal Track, [PDF]

  19. Exploiting Sparsity and Co-occurrence Structure for Action Unit Recognition
    Yale Song*, Daniel McDuff*, Deepak Vasisht, Ashish Kapoor (* equal contribution)
    FG 2015, [PDF] [Project] [Code]

  20. 2014

  21. #FluxFlow: Visual Analysis of Anomalous Information Spreading on Social Media
    Jian Zhao, Nan Cao, Zhen Wen, Yale Song, Yu-Ru Lin, Christopher Collins
    IEEE Trans. Visual. Comput. Graphics (VAST 2014), [PDF] [Video]
    Honorable Mention Award (3 out of 146 submissions)

  22. 2013

  23. Action Recognition by Hierarchical Sequence Summarization
    Yale Song, Louis-Philippe Morency, Randall Davis
    CVPR 2013, [PDF] [Code]

  24. One-Class Conditional Random Fields for Sequential Anomaly Detection
    Yale Song, Zhen Wen, Ching-Yung Lin, Randall Davis
    IJCAI 2013, [PDF]

  25. Distribution-Sensitive Learning for Imbalanced Datasets
    Yale Song, Louis-Philippe Morency, Randall Davis
    FG 2013, [PDF]

  26. Learning a Sparse Codebook of Facial and Body Microexpressions for Emotion Recognition
    Yale Song, Louis-Philippe Morency, Randall Davis
    ICMI 2013, [PDF] [Slides]

  27. 2012

  28. Multi-View Latent Variable Discriminative Models for Action Recognition
    Yale Song, Louis-Philippe Morency, Randall Davis
    CVPR 2012, [PDF] [Project] [Code]

  29. Multimodal Human Behavior Analysis: Learning Correlation and Interaction Across Modalities
    Yale Song, Louis-Philippe Morency, Randall Davis
    ICMI 2012, [PDF] [Slides]

  30. Continuous Body and Hand Gesture Recognition for Natural Human-Computer Interaction
    Yale Song, David Demirdjian, Randall Davis
    ACM Trans. Interact. Intell. Syst. 2(1), 2012, [PDF]
    Press coverage: MIT News, Economist, The Verge, CNET, Gizmodo, DailyBRINK

  31. 2011

  32. Tracking Body and Hands For Gesture Recognition: NATOPS Aircraft Handling Signals Database
    Yale Song, David Demirdjian, Randall Davis
    FG 2011, [PDF] [Dataset]

  33. Multi-Signal Gesture Recognition Using Temporal Smoothing Hidden Conditional Random Fields
    Yale Song, David Demirdjian, Randall Davis
    FG 2011, [PDF]


  1. Structured Video Content Analysis: Learning Spatio-Temporal and Multimodal Structures
    Yale Song
    PhD Thesis, Massachusetts Institute of Technology, 2014 [DSpace@MIT]

  2. Multi-Signal Gesture Recognition using Body and Hand Poses
    Yale Song
    SM Thesis, Massachusetts Institute of Technology, 2010 [DSpace@MIT]

Professional Service

    • Program Chair: ICMI 2019
    • Area Chair / Senior Program Committee: WACV 2018, FG 2018, ICMI 2016-2018
    • Reviewer / Program Committee: CVPR, ECCV, ICCV, WACV, FG, ICMI, CHI, UIST
    • Journal Reviewer: TPAMI, TIP, TAFF, TKDE, TiiS, CVIU

Interns / Students