My research is centered around computer vision and machine learning, especially visual perception (knowing what is where, how things interact with each other, how things change over time, and why things are the way they are) and visual manipulation (doing certain operations to images and videos to generate new visual content). Most of my recent work is focused on video understanding, with such applications as video tagging, highlighting, and summarization. Some of my work have been deployed to production at Yahoo, including video thumbnail detection at Flickr and Tumblr, video summary generation at Video Guide, and live stream video highlighting at Yahoo eSports. Currently, I am interested in deep generative models and deep reinforcement learning for video prediction, i.e., generate videos from a single static image.
I obtained Master's and PhD degrees in Computer Science from Massachusetts Institute of Technology in 2010 and 2014, respectively. I was a member of the Computer Science and Artificial Intelligence Laboratory, and my advisor was Randall Davis. My dissertation investigated learning from structured data and its applications to video understanding. I was lucky to have my committee Randall Davis (chair), Bill Freeman, John Fisher, and Louis-Philippe Morency.