GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks
Saelyne Yang, Jaesang Yu, Yi-Hao Peng, Kevin Qinghong Lin, Jae Won Cho, Yale Song, Juho Kim
CVPR 2026
Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction
Ce Zhang, Yale Song, Ruta Desai, Michael Louis Iuzzolino, Joseph Tighe, Gedas Bertasius, Satwik Kottur
WACV 2026
2025
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Jang Hyun Cho†, Andrea Madotto†, Effrosyni Mavroudi†, Triantafyllos Afouras†, Tushar Nagarajan†, Muhammad Maaz†, Yale Song†, Tengyu Ma†, Shuming Hu†, Hanoona Rasheed, Peize Sun, Po-Yao Huang, Daniel Bolya, Suyog Jain, Miguel Martin, Huiyu Wang, Nikhila Ravi, Shashank Jain, Tammy Stark, Shane Moon, Babak Damavandi, Vivian Lee, Andrew Westbury, Salman Khan, Piotr Dollar, Lorenzo Torresani, Kristen Grauman, Christoph Feichtenhofer (†: joint first author)
NeurIPS 2025 (Spotlight)
[Preprint] [Model] [Code] [Data]
Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
Shraman Pramanick, Effrosyni Mavroudi, Yale Song, Rama Chellappa, Lorenzo Torresani, Triantafyllos Afouras
ICCV 2025 (Highlight)
Anomaly Detection in Time Series with Robust Variational Quasi-Recurrent Autoencoders
Tung Kieu, Razvan Cirstea, Yan Zhao, Bin Yang, Chenjuan Guo, Yale Song, Christian Jensen ICDE 2022
Attention-Based Deep Metric Learning for Near-Duplicate Video Retrieval
Kuan-Hsun Wang, Chia Chun Cheng, Yi-Ling Chen, Yale Song, Shang-Hong Lai ICPR 2020 (Oral)
Structured Video Content Analysis: Learning Spatio-Temporal and Multimodal Structures
Yale Song PhD Thesis, Massachusetts Institute of Technology, 2014 [DSpace@MIT]
Multi-Signal Gesture Recognition using Body and Hand Poses
Yale Song SM Thesis, Massachusetts Institute of Technology, 2010 [DSpace@MIT]
Talks (selected)
Life in Industrial Research Labs NeurIPS 2021 Workshop on New in ML, Dec 2021 [Video]
Towards Self-Supervised Holistic Video Representations ICCV Tutorial on Holistic Video Understanding, Oct 2021 [Video] [Slides]
An Introduction to Learning from Unlabeled Video ICCV Tutorial on Efficient Video Understanding, Oct 2021 [Video] [Slides]
Learning from Unlabeled Video UBC Topics in Artificial Intelligence (guest lecture). Apr 2021
SNU Data Science Seminar. Dec 2020 [Video] [Slides]
Interns / Students
Ching-Yao Chuang (MIT/Stefanie Jegelka and Antonio Torralba), Microsoft Research, 2021
Sharath Girish (UMD/Abhinav Shrivastava), Microsoft Research, 2021
Chandler Squires (MIT/Caroline Uhler and David Sontag), Microsoft Research, 2021
Xiaolong Li (VT/Lynn Abbott), Microsoft Research, 2021
Tsu-Jui Ray Fu (UCSB/William Yang Wang), Microsoft Research, 2020
Yuan-Ting Hu (UIUC/Alexander Schwing), Microsoft Research, 2020
Julia Gong (Stanford/Serena Yeung), Microsoft Research, 2020
Sangho Lee (SNU/Gunhee Kim), Microsoft Research, 2020
Shuang Ma (SUNY Buffalo/Chang Wen Chen), Microsoft Research, 2018
Youngjae Yu (SNU/Gunhee Kim), Microsoft Research, 2018
Chris Thomas (Univ. Pittsburgh/Adriana Kovashka), Yahoo Research, 2017
Keith Maki (CMU/Carolyn Penstein Rose), Yahoo Research, 2017
Yunseok Jang (SNU/Gunhee Kim), Yahoo Research, 2015-2017