Michael (Miki) Rubinstein
Postdoctoral Researcher at Microsoft Research

Microsoft
One Memorial Drive

Cambridge, MA 02142

Email:
Phone: +1 857-453-6453


I am a Postdoctoral Researcher at Microsoft Research New England. I got my PhD from MIT, where I was advised by Bill Freeman.
I work in various areas of image and video analysis that are at the intersection of computer vision and graphics. In particular, I am interested in low-level image/video processing, motion analysis, and computational photography and video. Read more about my research here.

CV [PDF] [LinkedIn]
Current/Past Affiliations

 

News
Aug 01 2014 New paper in ECCV 2014: "Refraction Wiggles for Measuring Fluid Depth and Velocity from Video"
Jun 27 2014 Our demo "Real-time Video Magnification" won the Best Demo Award at CVPR 2014!
May 20 2014 New paper in SIGGRAPH 2014: "The Visual Microphone: Passive Recovery of Sound from Video". More details coming soon!
Mar 31 2014 New paper in ICCP 2014: "Riesz Pyramids for Fast Phase-Based Video Magnification"
Nov 29 2013 Giving several talks in Israel: TAU Dec 11, IDC Dec 12, HUJI Dec 16, Weizmann Dec 19
Sep 13 2013 Co-organizing the tutorial "Dense Image Correspondences for Computer Vision" at ICCV 2013
Apr 20 2013 Invited talk at ICCP 2013
Apr 04 2013 Two new papers: "Unsupervised Joint Object Discovery and Segmentation in Internet Images" accepted to CVPR 2013, and "Phase-based Video Motion Processing" conditionally accepted to SIGGRAPH 2013
Feb 27 2013 Our video magnification work is on the New York Times
Feb 01 2013 Our video "Revealing Invisible Changes In The World" won the honorable mention in the NSF International Science & Engineering Visualization Challenge 2012 and is featured in Science
Jul 12 2012 "Towards Longer Long-Range Motion Trajectories" accepted to BMVC 2012
Jun 28 2012 "Annotation Propagation in Large Image Databases via Dense Image Correspondence" accepted to ECCV 2012
Jun 10 2012 Working this summer in the IVM group at Microsoft Research Redmond
May 20 2012 "Eulerian Video Magnification for Revealing Subtle Changes in the World" accepted to SIGGRAPH 2012
Mar 05 2012 I am supported by the Microsoft Research PhD Fellowship (2012-2013)
May 23 2011 Spending the summer at Microsoft Research New England
May 03 2011 "Motion Denoising with Application to Time-lapse Photography" accepted to CVPR 2011
May 03 2011 I am a recipient of the 2011 NVIDIA Graduate Fellowship
Sep 12 2010 RetargetMe dataset is now online
Aug 15 2010 "A Comparative Study of Image Retargeting" conditionally accepted to SIGGRAPH Asia 2010

 

Links

 

Research Highlights
Video Magnification In my PhD I developed new methods to extract subtle motion and color signals from videos. These methods can be used to visualize blood perfusion, measure heart rate, and magnify tiny motions and changes we cannot normally see, all using regular cameras and videos.[My PhD thesis (MIT Feb'14)] [Story in NYTimes (Feb'13)] [Revealing Invisible Changes in the World (NSF SciVis'12)] [Phase-based Motion Processing (SIGGRAPH'13)] [Eulerian Video Magnification (SIGGRAPH'12)] [Motion Denoising (CVPR'11)]
 
Joint Inference in Image DatasetsDense image correspondences are used to propagate information across weakly-annotated image datasets, to infer pixel labels jointly in all the images.[Object Discovery and Segmentation (CVPR'13)] [Annotation Propagation (ECCV'12)]  
Image and Video Retargeting In my Masters I worked on content-aware algorithms for resizing images and videos to fit different display sizes and aspect ratios. "Content-aware" means the image/video is resized based on its actual content: parts that are visually more important are preserved at the expense of less important ones.
This technology was licensed by Adobe and added to Photoshop as "Content Aware Scaling".
[RetargetMe (SIGGRAPH Asia'10)] [My Masters thesis (May'09)] [Multi-operator Retargeting (SIGGRAPH'09)] [Improved Seam-Carving (SIGGRAPH'08)]
 

 

Publications

My publications and patents on Google Scholar

  Tianfan Xue, Michael Rubinstein, Neal Wadhwa, Anat Levin, Fredo Durand, William T. Freeman
Refraction Wiggles for Measuring Fluid Depth and Velocity from Video
Proc. of the European Conference on Computer Vision (ECCV), 2014
[Abstract] [Paper] [Webpage] [BibTeX]
Patent pending
We present principled algorithms for measuring the velocity and 3D location of refractive fluids, such as hot air or gas, from natural videos with textured backgrounds. Our main observation is that intensity variations related to movements of refractive fluid elements, as observed by one or more video cameras, are consistent over small space-time volumes. We call these intensity variations “refraction wiggles”, and use them as features for tracking and stereo fusion to recover the fluid motion and depth from video sequences. We give algorithms for 1) measuring the (2D, projected) motion of refractive fluids in monocular videos, and 2) recovering the 3D position of points on the fluid from stereo cameras. Unlike pixel intensities, wiggles can be extremely subtle and cannot be known with the same level of confidence for all pixels, depending on factors such as background texture and physical properties of the fluid. We thus carefully model uncertainty in our algorithms for robust estimation of fluid motion and depth. We show results on controlled sequences, synthetic simulations, and natural videos. Different from previous approaches for measuring refractive flow, our methods operate directly on videos captured with ordinary cameras, do not require auxiliary sensors, light sources or designed backgrounds, and can correctly detect the motion and location of refractive fluids even when they are invisible to the naked eye.
  Abe Davis, Michael Rubinstein, Neal Wadhwa, Gautham Mysore, Fredo Durand, William T. Freeman
The Visual Microphone: Passive Recovery of Sound from Video
ACM Transactions on Graphics, Volume 33, Number 4 (Proc. SIGGRAPH), 2014
[Abstract] [Paper] [Webpage] [BibTeX]
Patent pending
When sound hits an object, it causes small vibrations of the object’s surface. We show how, using only high-speed video of the object, we can extract those minute vibrations and partially recover the sound that produced them, allowing us to turn everyday objects—a glass of water, a potted plant, a box of tissues, or a bag of chips—into visual microphones. We recover sounds from highspeed footage of a variety of objects with different properties, and use both real and simulated data to examine some of the factors that affect our ability to visually recover sound. We evaluate the quality of recovered sounds using intelligibility and SNR metrics and provide input and recovered audio samples for direct comparison. We also explore how to leverage the rolling shutter in regular consumer cameras to recover audio from standard frame-rate videos, and use the spatial resolution of our method to visualize how sound-related vibrations vary over an object’s surface, which we can use to recover the vibration modes of an object.
  Neal Wadhwa, Michael Rubinstein, Fredo Durand, William T. Freeman
Riesz Pyramids for Fast Phase-Based Video Magnification
IEEE International Conference on Computational Photography (ICCP), 2014
[Abstract] [Paper] [Tech report] [Webpage] [BibTeX]
Patent pending
CVPR 2014 Best Demo Award
We present a new compact image pyramid representation, the Riesz pyramid, that can be used for real-time, high quality, phase-based video magnification. Our new representation is less overcomplete than even the smallest two orientation, octave-bandwidth complex steerable pyramid, and can be implemented using compact and efficient linear filters in the spatial domain. Motion-magnified videos produced using this new representation are of comparable quality to those produced using the complex steerable pyramid. When used with phase-based video magnification, the Riesz pyramid phase-shifts image features along only their dominant orientation rather than every orientation like the complex steerable pyramid.
  Michael Rubinstein
Analysis and Visualization of Temporal Variations in Video
PhD Thesis, Massachusetts Institute of Technology, Feb 2014
[Abstract] [Thesis] [Webpage] [BibTeX]
Our world is constantly changing, and it is important for us to understand how our environment changes and evolves over time. A common method for capturing and communicating such changes is imagery -- whether captured by consumer cameras, microscopes or satellites, images and videos provide an invaluable source of information about the time-varying nature of our world. Due to the great progress in digital photography, such images and videos are now widespread and easy to capture, yet computational models and tools for understanding and analyzing time-varying processes and trends in visual data are scarce and undeveloped.

In this dissertation, we propose new computational techniques to efficiently represent, analyze and visualize both short-term and long-term temporal variation in videos and image sequences. Small-amplitude changes that are difficult or impossible to see with the naked eye, such as variation in human skin color due to blood circulation and small mechanical movements, can be extracted for further analysis, or exaggerated to become visible to an observer. Our techniques can also attenuate motions and changes to remove variation that distracts from the main temporal events of interest.

The main contribution of this thesis is in advancing our knowledge on how to process spatiotemporal imagery and extract information that may not be immediately seen, so as to better understand our dynamic world through images and videos.
  Neal Wadhwa, Michael Rubinstein, Fredo Durand, William T. Freeman
Phase-based Video Motion Processing
ACM Transactions on Graphics, Volume 32, Number 4 (Proc. SIGGRAPH), 2013
[Abstract] [Paper] [Webpage] [BibTeX]
Patent pending
We introduce a technique to manipulate small movements in videos based on an analysis of motion in complex-valued image pyramids. Phase variations of the coefficients of a complex-valued steerable pyramid over time correspond to motion, and can be temporally processed and amplified to reveal imperceptible motions, or attenuated to remove distracting changes. This processing does not involve the computation of optical flow, and in comparison to the previous Eulerian Video Magnification method it supports larger amplification factors and is significantly less sensitive to noise. These improved capabilities broaden the set of applications for motion processing in videos. We demonstrate the advantages of this approach on synthetic and natural video sequences, and explore applications in scientific analysis, visualization and video enhancement.
  Michael Rubinstein, Armand Joulin, Johannes Kopf, Ce Liu
Unsupervised Joint Object Discovery and Segmentation in Internet Images
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2013
[Abstract] [Paper] [Webpage] [BibTeX]
We present a new unsupervised algorithm to discover and segment out common objects from large and diverse image collections. In contrast to previous co-segmentation methods, our algorithm performs well even in the presence of significant amounts of noise images (images not containing a common object), as typical for datasets collected from Internet search. The key insight to our algorithm is that common object patterns should be salient within each image, while being sparse with respect to smooth transformations across images. We propose to use dense correspondences between images to capture the sparsity and visual variability of the common object over the entire database, which enables us to ignore noise objects that may be salient within their own images but do not commonly occur in others. We performed extensive numerical evaluation on established co-segmentation datasets, as well as several new datasets generated using Internet search. Our approach is able to effectively segment out the common object for diverse object categories, while naturally identifying images where the common object is not present.

  Michael Rubinstein, Neal Wadhwa, Fredo Durand, William T. Freeman
Revealing Invisible Changes In The World
Science Vol. 339 No. 6119 Feb 1 2013
NSF International Science and Engineering Visualization Challenge (SciVis), 2012
Honorable mention
[Article in Science] [Video] [NSF SciVis 2012] [BibTeX]
  Michael Rubinstein, Ce Liu, William T. Freeman
Annotation Propagation in Large Image Databases via Dense Image Correspondence
Proc. of the European Conference on Computer Vision (ECCV), 2012
[Abstract] [Paper] [Webpage] [BibTeX]
Patent pending
Our goal is to automatically annotate many images with a set of word tags and a pixel-wise map showing where each word tag occurs. Most previous approaches rely on a corpus of training images where each pixel is labeled. However, for large image databases, pixel labels are expensive to obtain and are often unavailable. Furthermore, when classifying multiple images, each image is typically solved for independently, which often results in inconsistent annotations across similar images. In this work, we incorporate dense image correspondence into the annotation model, allowing us to make do with significantly less labeled data and to resolve ambiguities by propagating inferred annotations from images with strong local visual evidence to images with weaker local evidence. We establish a large graphical model spanning all labeled and unlabeled images, then solve it to infer annotations, enforcing consistent annotations over similar visual patterns. Our model is optimized by efficient belief propagation algorithms embedded in an expectation-maximization (EM) scheme. Extensive experiments are conducted to evaluate the performance on several standard large-scale image datasets, showing that the proposed framework outperforms state-of-the-art methods.
  Michael Rubinstein, Ce Liu, William T. Freeman
Towards Longer Long-Range Motion Trajectories
Proc. of the British Machine Vision Conference (BMVC), 2012
[Abstract] [Paper] [Supplemental (.zip)] [BMVC'12 poster] [BibTeX]
Although dense, long-rage, motion trajectories are a prominent representation of motion in videos, there is still no good solution for constructing dense motion tracks in a truly long-rage fashion. Ideally, we would want every scene feature that appears in multiple, not necessarily contiguous, parts of the sequence to be associated with the same motion track. Despite this reasonable and clearly stated objective, there has been surprisingly little work on general-purpose algorithms that can accomplish that task. State-of-the-art dense motion trackers process the sequence incrementally in a frame-by-frame manner, and associate, by design, features that disappear and reappear in the video, with different tracks, thereby losing important information of the long-term motion signal. In this paper, we strive towards an algorithm for producing generic long-range motion trajectories that are robust to occlusion, deformation and camera motion. We leverage accurate local (short-range) trajectories produced by current motion tracking methods and use them as an initial estimate for a global (long-range) solution. Our algorithm re-correlates the short trajectory estimates and links them to form a long-range motion representation by formulating a combinatorial assignment problem that is defined and optimized globally over the entire sequence. This allows to correlate tracks in arbitrarily distinct parts of the sequence, as well as handle track ambiguities by spatiotemporal regularization. We report results of the algorithm on synthetic examples, natural and challenging videos, and evaluate the representation for action recognition.
  Hao-Yu Wu, Michael Rubinstein, Eugene Shih, John Guttag, Fredo Durand, William T. Freeman
Eulerian Video Magnification for Revealing Subtle Changes in the World
ACM Transactions on Graphics, Volume 31, Number 4 (Proc. SIGGRAPH), 2012
[Abstract] [Paper] [Webpage] [BibTeX]
Patent pending
Our goal is to reveal temporal variations in videos that are difficult or impossible to see with the naked eye and display them in an indicative manner. Our method, which we call Eulerian Video Magnification, takes a standard video sequence as input, and applies spatial decomposition, followed by temporal filtering to the frames. The resulting signal is then amplified to reveal hidden information. Using our method, we are able to visualize the flow of blood as it fills the face and to amplify and reveal small motions. Our technique can be run in real time to instantly show phenomena occurring at the temporal frequencies selected by the user.
  Michael Rubinstein, Ce Liu, Peter Sand, Fredo Durand, William T. Freeman
Motion Denoising with Application to Time-lapse Photography

IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2011
[Abstract] [Paper] [Webpage] [BibTeX]
Motions can occur over both short and long time scales. We introduce motion denoising, which treats short-term changes as noise, long-term changes as signal, and rerenders a video to reveal the underlying long-term events. We demonstrate motion denoising for time-lapse videos. One of the characteristics of traditional time-lapse imagery is stylized jerkiness, where short-term changes in the scene appear as small and annoying jitters in the video, often obfuscating the underlying temporal events of interest. We apply motion denoising for resynthesizing time-lapse videos showing the long-term evolution of a scene with jerky short-term changes removed. We show that existing filtering approaches are often incapable of achieving this task, and present a novel computational approach to denoise motion without explicit motion analysis. We demonstrate promising experimental results on a set of challenging time-lapse sequences.
  Michael Rubinstein, Diego Gutierrez, Olga Sorkine, Ariel Shamir
A Comparative Study of Image Retargeting
ACM Transactions on Graphics, Volume 29, Number 5 (Proc. SIGGRAPH Asia), 2010
[Abstract] [Paper] [Webpage] [BibTeX]
The numerous works on media retargeting call for a methodological approach for evaluating retargeting results. We present the first comprehensive perceptual study and analysis of image retargeting. First, we create a benchmark of images and conduct a large scale user study to compare a representative number of state-of-the-art retargeting methods. Second, we present analysis of the users’ responses, where we find that humans in general agree on the evaluation of the results and show that some retargeting methods are consistently more favorable than others. Third, we examine whether computational image distance metrics can predict human retargeting perception. We show that current measures used in this context are not necessarily consistent with human rankings, and demonstrate that better results can be achieved using image features that were not previously considered for this task. We also reveal specific qualities in retargeted media that are more important for viewers. The importance of our work lies in promoting better measures to assess and guide retargeting algorithms in the future. The full benchmark we collected, including all images, retargeted results, and the collected user data, are available to the research community for further investigation.
  Michael Rubinstein
Discrete Approaches to Content-aware Image and Video Retargeting

MSc Thesis, The Interdisciplinary Center, May 2009
[PDF] [High-resolution PDF (70MB)] [BibTeX]
  Michael Rubinstein, Ariel Shamir, Shai Avidan
Multi-operator Media Retargeting
ACM Transactions on Graphics, Volume 28, Number 3 (Proc. SIGGRAPH), 2009
[Abstract] [Paper] [Webpage] [BibTeX]
Patented
Content aware resizing gained popularity lately and users can now choose from a battery of methods to retarget their media. However, no single retargeting operator performs well on all images and all target sizes. In a user study we conducted, we found that users prefer to combine seam carving with cropping and scaling to produce results they are satisfied with. This inspires us to propose an algorithm that combines different operators in an optimal manner. We define a resizing space as a conceptual multi-dimensional space combining several resizing operators, and show how a path in this space defines a sequence of operations to retarget media. We define a new image similarity measure, which we term Bi-Directional Warping (BDW), and use it with a dynamic programming algorithm to find an optimal path in the resizing space. In addition, we show a simple and intuitive user interface allowing users to explore the resizing space of various image sizes interactively. Using key-frames and interpolation we also extend our technique to retarget video, providing the flexibility to use the best combination of operators at different times in the sequence.
  Michael Rubinstein, Ariel Shamir, Shai Avidan
Improved Seam Carving for Video Retargeting
ACM Transactions on Graphics, Volume 27, Number 3 (Proc. SIGGRAPH), 2008
[Abstract] [Paper] [Webpage] [Code] [BibTex]
Patented
Implemented in Adobe Photoshop as Content-aware scaling
Video, like images, should support content aware resizing. We present video retargeting using an improved seam carving operator. Instead of removing 1D seams from 2D images we remove 2D seam manifolds from 3D space-time volumes. To achieve this we replace the dynamic programming method of seam carving with graph cuts that are suitable for 3D volumes. In the new formulation, a seam is given by a minimal cut in the graph and we show how to construct a graph such that the resulting cut is a valid seam. That is, the cut is monotonic and connected. In addition, we present a novel energy criterion that improves the visual quality of the retargeted images and videos. The original seam carving operator is focused on removing seams with the least amount of energy, ignoring energy that is introduced into the images and video by applying the operator. To counter this, the new criterion is looking forward in time - removing seams that introduce the least amount of energy into the retargeted result. We show how to encode the improved criterion into graph cuts (for images and video) as well as dynamic programming (for images). We apply our technique to images and videos and present results of various applications.
  Ariel Shamir, Michael Rubinstein, Tomer Levinboim
Inverse Computer Graphics: Parametric Comics Creation from 3D Interaction
IEEE Computer Graphics & Applications, Volume 26, Number 3, 30-38, 2006
[Abstract] [Paper] [Webpage] [BibTeX]
There are times when Computer Graphics is required to be succinct and simple. Carefully chosen simplified and static images can portray a narration of a story as effectively as 3D photo-realistic continuous graphics. In this paper we present an automatic system which transforms continuous graphics originating from real 3D virtualworld interactions into a sequence of comics images. The system traces events during the interaction and then analyzes and breaks them into scenes. Based on user defined parameters of point-ofview and story granularity it chooses specific time-frames to create static images, renders them, and applies post-processing to reduce their cluttering. The system utilizes the same principal of intelligent reduction of details in both temporal and spatial domains for choosing important events and depicting them visually. The end result is a sequence of comics images which summarize the main happenings and present them in a coherent, concise and visually pleasing manner.

 

Software and Data
  Object Discovery and Segmentation Internet Datasets
The Internet image collections we used for the evaluation in our CVPR'13 paper, with human foreground-background masks, and the segmentation results by our method and by other co-segmentation techniques.
  Eulerian Video Magnification
MATLAB/C++ implementation of our method, with code that reproduces all the results in our SIGGRAPH'12 paper. This technology is patented by MIT, and the code is provided for non-commercial research purposes only.

 
A dataset of 80 images and retargeted results, ranked by human viewers. The project website contains all the data we collected and also provides a nice synopsis of the current state of image retargeting research.
  Image Retargeting Survey
The system I've developed for collecting user feedback on image retargeting results. It is based on the linked-paired comparison design to collect and analyze data when the number of stimuli is very large.
The code is written in HTML, PHP and JavaScript. It supports multiple experiment designs, and can be easily used with Amazon Mechanical Turk. See my paper and the project website for further details.
A live demo is available here.
  Seam Carving (v1.0, 2009-04-10)
A MATLAB re-implementation of the seam carving method I worked on at MERL. It is provided for research/educational purposes only. This algorithm is patented and owned by Mitsubishi Electric Research Labs, Cambridge MA.
The code supports backward and forward energy using both the dynamic programming and graph cut formulations. See demo.m for usage example.
Please cite my Masters thesis if you use this code.
  • maxflow (v1.1, 2008-09-15)
    A MATLAB wrapper for the Boykov-Kolmogorov max-flow algorithm. Also available on MATLAB Central.

 

Teaching and Talks For more conference talks, see the projects web pages under Publications above.

  • Tutorial: "Dense Image Correspondences for Computer Vision"
    CVPR 2014, Columbus Ohio, Jun 23 2014 [Webpage]
    ICCV 2013, Sydney Australia, Dec 2 2013 [Webpage]
  • "Seam Carving and Content-driven Retargeting of Images and Video"
    MIT 6.865 Computational Photography, 2010-11 (guest lecturer) [PPT (210MB)] [PDF]


Teaching:


Other non-conference talks I gave here and there:
  • "Introduction to Recursive Bayesian Filtering"
    Seminar on Advanced Topics in Computer Graphics, Tel Aviv University 2009
    [PPT] [PDF]
  • "Tracking with focus on Particle Filters"
    Seminar on Vision-based Security, IDC 2009
    Part I: [PPT] [PDF], Part II: [PPT] [PDF]
  • "Dimensionality Reduction by Random Mapping"
    Seminar on Advanced Topics in Computer Graphics, Tel Aviv University 2006
    [PPT] [PDF]

 

Press
Yedioth Ahronoth, Mar 17 2013: "The Hidden Secrets of Video" (Hebrew)
Discovery Channel, Feb 28 2013: "Daily Planet" (video)
Daily Mail, Feb 28 2013: "How to spot a liar (and cheat at poker)"
NYTimes, Feb 27 2013: "Scientists Uncover Invisible Motion in Video" | Video
Txchnologist, Feb 1 2013: "New Video Process Reveals Heart Rate, Invisible Movement"
MIT News, Feb 1 2013: "MIT researchers honored for 'Revealing Invisible Changes"
Wired UK, Jul 25 2012: "MIT algorithm measures your pulse by looking at your face"
Technology Review, Jul 24 2012: "Software Detects Motion that the Human Eye Can't See"
BBC Radio, Jul 3 2012: "MIT Video colour amplification"
Der Spiegel, Jun 27 2012: "Video software can make pulse visible" (German)
MIT News, Jun 21 2012: "Researchers amplify variations in video, making the invisible visible" | Video | Spotlight (our work on the MIT front page)
PetaPixel, Jun 13 2012: "Magnifying the Subtle Changes in Video to Reveal the Invisible"
Huffington Post, Jun 4 2012: "MIT's New Video Technology Could Give You Superhuman Sight"
Gizmodo, Jun 4 2012: "New X-Ray Vision-Style Video Can Show a Pulse Beating Through Skin"
PhotoshopDaily, Jul 22 2009: "Insider Info: Content-Aware Scaling"
NYTimes, Nov 20 2008: "What's in Photoshop CS4 for Photographers?"
ZDNet, Oct 28 2008: "Best new feature for photographers in Adobe Photoshop CS4"
CNET, Nov 19 2007: "Seam carving photo resizing now for video"

 

Miscellaneous