Michael (Miki) Rubinstein

Staff Research Scientist, Google

Google Inc.
355 Main Street
Cambridge, MA 02142

Email: mrubxkxkxk@google.com | mrubqwqwqw@csail.mit.edu

I am a Research Scientist at Google. I received my PhD from MIT, under the supervision of Bill Freeman. Before joining Google, I spent a year as a postdoc at Microsoft Research New England.
I work at the intersection of computer vision and computer graphics. In particular, I am interested in low-level image/video processing and computational photography. You can read more about my research here.

CV: pdf (old) | LinkedIn
Current/Past Affiliations:


  Sep 16 2020 New paper in SIGGRAPH ASIA 2020: "Layered Neural Rendering for Retiming People in Video"
  Apr 21 2020 New paper in CVPR 2020: "SpeedNet: Learning the Speediness in Videos"
  May 06 2018 Upcoming talk: May 23 - LDV Vision Summit, NY
  May 06 2018 New paper in SIGGRAPH 2018: "Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation"
  Sep 24 2017 Upcoming talks: Oct 23 - MIT Media Lab, Nov 16 - UC Berkeley, Dec 8 - U. Washington
  Oct 23 2015 Upcoming talks: Nov 8 - IBM Research Israel, Nov 10 - Technion, Dec 9 - TTI/Vanguard [next] SF, Feb 19 2016 - UIUC
  May 17 2015 New paper in SIGGRAPH 2015: "A Computational Approach for Obstruction-Free Photography".
  Mar 20 2015 Two new papers in CVPR 2015: "Visual Vibrometry: Estimating Material Properties from Small Motion in Video", and "Best-Buddies Similarity for Robust Template Matching".
  Dec 10 2014 A recent talk I gave at TEDxBeaconStreet is now online
  Nov 17 2014 I've moved to Google! I'm starting a new computer vision research group in Cambridge (MA), together with Bill Freeman, Ce Liu and Dilip Krishnan
  Nov 17 2014 My PhD dissertation won the George M. Sprowls Award for outstanding doctoral thesis in Computer Science at MIT
  Aug 01 2014 New paper in ECCV 2014: "Refraction Wiggles for Measuring Fluid Depth and Velocity from Video"
  Jun 27 2014 Our demo "Real-time Video Magnification" won the Best Demo Award at CVPR 2014!
  May 20 2014 New paper in SIGGRAPH 2014: "The Visual Microphone: Passive Recovery of Sound from Video"
  Mar 31 2014 New paper in ICCP 2014: "Riesz Pyramids for Fast Phase-Based Video Magnification"
  Nov 29 2013 Giving several talks in Israel: TAU Dec 11, IDC Dec 12, HUJI Dec 16, Weizmann Dec 19
  Sep 13 2013 Co-organizing the tutorial "Dense Image Correspondences for Computer Vision" at ICCV 2013
  Apr 20 2013 Invited talk at ICCP 2013
  Apr 04 2013 Two new papers: "Unsupervised Joint Object Discovery and Segmentation in Internet Images" accepted to CVPR 2013, and "Phase-based Video Motion Processing" conditionally accepted to SIGGRAPH 2013
  Feb 27 2013 Our video magnification work is on the New York Times
  Feb 01 2013 Our video "Revealing Invisible Changes In The World" won the honorable mention in the NSF International Science & Engineering Visualization Challenge 2012 and is featured in Science
  Jul 12 2012 "Towards Longer Long-Range Motion Trajectories" accepted to BMVC 2012
  Jun 28 2012 "Annotation Propagation in Large Image Databases via Dense Image Correspondence" accepted to ECCV 2012
  Jun 10 2012 Working this summer in the IVM group at Microsoft Research Redmond
  May 20 2012 "Eulerian Video Magnification for Revealing Subtle Changes in the World" accepted to SIGGRAPH 2012
  Mar 05 2012 I am supported by the Microsoft Research PhD Fellowship (2012-2013)
  May 23 2011 Spending the summer at Microsoft Research New England
  May 03 2011 "Motion Denoising with Application to Time-lapse Photography" accepted to CVPR 2011
  May 03 2011 I am a recipient of the 2011 NVIDIA Graduate Fellowship
  Sep 12 2010 RetargetMe dataset is now online
  Aug 15 2010 "A Comparative Study of Image Retargeting" conditionally accepted to SIGGRAPH Asia 2010


Research Highlights


Video Magnification, Analysis of Small Motions In my PhD I developed new methods to extract subtle motion and color signals from videos. These methods can be used to visualize blood perfusion, measure heart rate, and magnify tiny motions and changes we cannot normally see, all using regular cameras and videos.TEDx talk (Nov'14) | My PhD thesis (MIT Feb'14) | Story in NYTimes (Feb'13) | Revealing Invisible Changes in the World (NSF SciVis'12) | Phase-based Motion Processing (SIGGRAPH'13) | Eulerian Video Magnification (SIGGRAPH'12) | Motion Denoising (CVPR'11)
Pattern Discovery and Joint Inteference in Image CollectionsDense image correspondences are used to propagate information across weakly-annotated image datasets, to infer pixel labels jointly in all the images.Tutorial talk (ICCV'13) | Object Discovery and Segmentation (CVPR'13) | Annotation Propagation (ECCV'12)  
Image and Video Retargeting In my Masters I worked on content-aware algorithms for resizing images and videos to fit different display sizes and aspect ratios. "Content-aware" means the image/video is resized based on its actual content: parts that are visually more important are preserved at the expense of less important ones.
This technology was licensed by Adobe and added to Photoshop as "Content Aware Scaling".
RetargetMe dataset (SIGGRAPH Asia'10) | My Masters thesis (May'09) | Multi-operator Retargeting (SIGGRAPH'09) | Improved Seam-Carving (SIGGRAPH'08)



My publications and patents on Google Scholar

  Erika Lu, Forrester Cole, Tali Dekel, Weidi Xie, Andrew Zisserman, David Salesin, William T. Freeman, Michael Rubinstein
Layered Neural Rendering for Retiming People in Video
ACM Transactions on Graphics, Volume 39, Number 6 (Proc. SIGGRAPH ASIA), 2020
Abstract | Paper | Webpage | BibTex
We present a method for retiming people in an ordinary, natural video — manipulating and editing the time in which different motions of individuals in the video occur. We can temporally align different motions, change the speed of certain actions (speeding up/slowing down, or entirely "freezing" people), or "erase" selected people from the video altogether. We achieve these effects computationally via a dedicated learning-based layered video representation, where each frame in the video is decomposed into separate RGBA layers, representing the appearance of different people in the video. A key property of our model is that it not only disentangles the direct motions of each person in the input video, but also correlates each person automatically with the scene changes they generate — e.g., shadows, reflections, and motion of loose clothing. The layers can be individually retimed and recombined into a new video, allowing us to achieve realistic, high-quality renderings of retiming effects for real-world videos depicting complex actions and involving multiple individuals, including dancing, trampoline jumping, or group running.
  Sagie Benaim, Ariel Ephrat, Oran Lang, Inbar Mosseri, William T. Freeman, Michael Rubinstein, Michal Irani, Tali Dekel
SpeedNet: Learning the Speediness in Videos
IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2020 (Oral presentation)
Abstract | Paper | Webpage | BibTex
We wish to automatically predict the "speediness" of moving objects in videos---whether they move faster, at, or slower than their "natural" speed. The core component in our approach is SpeedNet---a novel deep network trained to detect if a video is playing at normal rate, or if it is sped up. SpeedNet is trained on a large corpus of natural videos in a self-supervised manner, without requiring any manual annotations. We show how this single, binary classification network can be used to detect arbitrary rates of speediness of objects. We demonstrate prediction results by SpeedNet on a wide range of videos containing complex natural motions, and examine the visual cues it utilizes for making those predictions. Importantly, we show that through predicting the speed of videos, the model learns a powerful and meaningful space-time representation that goes beyond simple motion cues. We demonstrate how those learned features can boost the performance of self-supervised action recognition, and can be used for video retrieval. Furthermore, we also apply SpeedNet for generating time-varying, adaptive video speedups, which can allow viewers to watch videos faster, but with less of the jittery, unnatural motions typical to videos that are sped up uniformly.
  Tae-Hyun Oh, Tali Dekel, Changil Kim, Inbar Mosseri, William T. Freeman, Michael Rubinstein, Wojciech Matusik
Speech2Face: Learning the Face Behind a Voice
IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2019
Abstract | Paper | Webpage | BibTex
How much can we infer about a person's looks from the way they speak? In this paper, we study the task of reconstructing a facial image of a person from a short audio recording of that person speaking. We design and train a deep neural network to perform this task using millions of natural Internet/YouTube videos of people speaking. During training, our model learns voice-face correlations that allow it to produce images that capture various physical attributes of the speakers such as age, gender and ethnicity. This is done in a self-supervised manner, by utilizing the natural co-occurrence of faces and speech in Internet videos, without the need to model attributes explicitly. We evaluate and numerically quantify how--and in what manner--our Speech2Face reconstructions, obtained directly from audio, resemble the true face images of the speakers.
  Ariel Ephrat, Inbar Mosseri, Oran Lang, Tali Dekel, Kevin Wilson, Avinatan Hassidim, William T. Freeman, Michael Rubinstein
Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation
ACM Transactions on Graphics, Volume 34, Number 4 (Proc. SIGGRAPH), 2018
Abstract | Paper | Webpage | BibTex
We present a joint audio-visual model for isolating a single speech signal from a mixture of sounds such as other speakers and background noise. Solving this task using only audio as input is extremely challenging and does not provide an association of the separated speech signals with speakers in the video. In this paper, we present a deep network-based model that incorporates both visual and auditory signals to solve this task. The visual features are used to "focus" the audio on desired speakers in a scene and to improve the speech separation quality. To train our joint audio-visual model, we introduce AVSpeech, a new dataset comprised of thousands of hours of video segments from the Web. We demonstrate the applicability of our method to classic speech separation tasks, as well as real-world scenarios involving heated interviews, noisy bars, and screaming children, only requiring the user to specify the face of the person in the video whose speech they want to isolate. Our method shows clear advantage over state-of-the-art audio-only speech separation in cases of mixed speech. In addition, our model, which is speaker-independent (trained once, applicable to any speaker), produces better results than recent audio-visual speech separation methods that are speaker-dependent (require training a separate model for each speaker of interest).
  Tali Dekel, Michael Rubinstein, Ce Liu, William T. Freeman
On the Effectiveness of Visible Watermarks
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2017
Abstract | Paper | Webpage | BibTex
Visible watermarking is a widely-used technique for marking and protecting copyrights of many millions of images on the web, yet it suffers from an inherent security flaw--watermarks are typically added in a consistent manner to many images. We show that this consistency allows to automatically estimate the watermark and recover the original images with high accuracy. Specifically, we present a generalized multi-image matting algorithm that takes a watermarked image collection as input and automatically estimates the "foreground" (watermark), its alpha matte, and the "background" (original) images. Since such an attack relies on the consistency of watermarks across image collection, we explore and evaluate how it is affected by various types of inconsistencies in the watermark embedding that could potentially be used to make watermarking more secured. We demonstrate the algorithm on stock imagery available on the web, and provide extensive quantitative analysis on synthetic watermarked data. A key takeaway message of this paper is that visible watermarks should be designed to not only be robust against removal from a single image, but to be more resistant to mass-scale removal from image collections as well.
  Neal Wadhwa, Hao-Yu Wu, Abe Davis, Michael Rubinstein, Eugene Shih, Gautham J. Mysore, Justin G. Chen, Oral Buyukozturk, John V. Guttag, William T. Freeman, Frédo Durand
Eulerian Video Magnification and Analysis
Communications of the ACM, January 2017
Abstract | PDF | CACM article online | BibTex
The world is filled with important, but visually subtle signals. A person's pulse, the breathing of an infant, the sag and sway of a bridge—these all create visual patterns, which are too difficult to see with the naked eye. We present Eulerian Video Magnification, a computational technique for visualizing subtle color and motion variations in ordinary videos by making the variations larger. It is a microscope for small changes that are hard or impossible for us to see by ourselves. In addition, these small changes can be quantitatively analyzed and used to recover sounds from vibrations in distant objects, characterize material properties, and remotely measure a person's pulse.

  Tianfan Xue, Michael Rubinstein, Ce Liu, William T. Freeman
A Computational Approach for Obstruction-Free Photography
ACM Transactions on Graphics, Volume 34, Number 4 (Proc. SIGGRAPH), 2015
Abstract | Paper | Webpage | BibTex
We present a unified computational approach for taking photos through reflecting or occluding elements such as windows and fences. Rather than capturing a single image, we instruct the user to take a short image sequence while slightly moving the camera. Differences that often exist in the relative position of the background and the obstructing elements from the camera allow us to separate them based on their motions, and to recover the desired background scene as if the visual obstructions were not there. We show results on controlled experiments and many real and practical scenarios, including shooting through reflections, fences, and raindrop-covered windows.
  Abe Davis, Katherine L. Bouman, Justin G. Chen, Michael Rubinstein, Fredo Durand, William T. Freeman
Visual Vibrometry: Estimating Material Properties from Small Motions in Video
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2015 (Oral presentation)
Abstract | Paper | Webpage | BibTex
The estimation of material properties is important for scene understanding, with many applications in vision, robotics, and structural engineering. This paper connects fundamentals of vibration mechanics with computer vision techniques in order to infer material properties from small, often imperceptible motion in video. Objects tend to vibrate in a set of preferred modes. The shapes and frequencies of these modes depend on the structure and material properties of an object. Focusing on the case where geometry is known or fixed, we show how information about an object's modes of vibration can be extracted from video and used to make inferences about that object's material properties. We demonstrate our approach by estimating material properties for a variety of rods and fabrics by passively observing their motion in high-speed and regular-framerate video.
  Tali Dekel, Shaul Oron, Michael Rubinstein, Shai Avidan, William T. Freeman
Best-Buddies Similarity for Robust Template Matching
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2015
Abstract | Paper | Webapge | BibTeX
We propose a novel method for template matching in unconstrained environments. Its essence is the Best-Buddies Similarity (BBS), a useful, robust, and parameter-free similarity measure between two sets of points. BBS is based on counting the number of Best-Buddies Pairs (BBPs)—pairs of points in source and target sets, where each point is the nearest neighbor of the other. BBS has several key features that make it robust against complex geometric deformations and high levels of outliers, such as those arising from background clutter and occlusions. We study these properties, provide a statistical analysis that justifies them, and demonstrate the consistent success of BBS on a challenging realworld dataset.
  Fredo Durand, William T. Freeman, Michael Rubinstein
A World of Movement
Scientific American, Volume 312, Number 1, January 2015
Article in SciAm | Videos | BibTeX
  Tianfan Xue, Michael Rubinstein, Neal Wadhwa, Anat Levin, Fredo Durand, William T. Freeman
Refraction Wiggles for Measuring Fluid Depth and Velocity from Video
Proc. of the European Conference on Computer Vision (ECCV), 2014 (Oral presentation)
Abstract | Paper | Webpage | BibTeX
Patent pending
We present principled algorithms for measuring the velocity and 3D location of refractive fluids, such as hot air or gas, from natural videos with textured backgrounds. Our main observation is that intensity variations related to movements of refractive fluid elements, as observed by one or more video cameras, are consistent over small space-time volumes. We call these intensity variations “refraction wiggles”, and use them as features for tracking and stereo fusion to recover the fluid motion and depth from video sequences. We give algorithms for 1) measuring the (2D, projected) motion of refractive fluids in monocular videos, and 2) recovering the 3D position of points on the fluid from stereo cameras. Unlike pixel intensities, wiggles can be extremely subtle and cannot be known with the same level of confidence for all pixels, depending on factors such as background texture and physical properties of the fluid. We thus carefully model uncertainty in our algorithms for robust estimation of fluid motion and depth. We show results on controlled sequences, synthetic simulations, and natural videos. Different from previous approaches for measuring refractive flow, our methods operate directly on videos captured with ordinary cameras, do not require auxiliary sensors, light sources or designed backgrounds, and can correctly detect the motion and location of refractive fluids even when they are invisible to the naked eye.
  Abe Davis, Michael Rubinstein, Neal Wadhwa, Gautham Mysore, Fredo Durand, William T. Freeman
The Visual Microphone: Passive Recovery of Sound from Video
ACM Transactions on Graphics, Volume 33, Number 4 (Proc. SIGGRAPH), 2014
Abstract | Paper | Webpage | BibTeX
Patent pending
When sound hits an object, it causes small vibrations of the object’s surface. We show how, using only high-speed video of the object, we can extract those minute vibrations and partially recover the sound that produced them, allowing us to turn everyday objects—a glass of water, a potted plant, a box of tissues, or a bag of chips—into visual microphones. We recover sounds from highspeed footage of a variety of objects with different properties, and use both real and simulated data to examine some of the factors that affect our ability to visually recover sound. We evaluate the quality of recovered sounds using intelligibility and SNR metrics and provide input and recovered audio samples for direct comparison. We also explore how to leverage the rolling shutter in regular consumer cameras to recover audio from standard frame-rate videos, and use the spatial resolution of our method to visualize how sound-related vibrations vary over an object’s surface, which we can use to recover the vibration modes of an object.
  Neal Wadhwa, Michael Rubinstein, Fredo Durand, William T. Freeman
Riesz Pyramids for Fast Phase-Based Video Magnification
IEEE International Conference on Computational Photography (ICCP), 2014
Abstract | Paper | Tech report | Webpage | BibTeX
Patent pending
CVPR 2014 Best Demo Award
We present a new compact image pyramid representation, the Riesz pyramid, that can be used for real-time, high quality, phase-based video magnification. Our new representation is less overcomplete than even the smallest two orientation, octave-bandwidth complex steerable pyramid, and can be implemented using compact and efficient linear filters in the spatial domain. Motion-magnified videos produced using this new representation are of comparable quality to those produced using the complex steerable pyramid. When used with phase-based video magnification, the Riesz pyramid phase-shifts image features along only their dominant orientation rather than every orientation like the complex steerable pyramid.
  Michael Rubinstein
Analysis and Visualization of Temporal Variations in Video
PhD Thesis, Massachusetts Institute of Technology, Feb 2014
Abstract | Thesis | Webpage | BibTeX
George M. Sprowls Award for outstanding doctoral thesis in Computer Science at MIT
Our world is constantly changing, and it is important for us to understand how our environment changes and evolves over time. A common method for capturing and communicating such changes is imagery -- whether captured by consumer cameras, microscopes or satellites, images and videos provide an invaluable source of information about the time-varying nature of our world. Due to the great progress in digital photography, such images and videos are now widespread and easy to capture, yet computational models and tools for understanding and analyzing time-varying processes and trends in visual data are scarce and undeveloped.

In this dissertation, we propose new computational techniques to efficiently represent, analyze and visualize both short-term and long-term temporal variation in videos and image sequences. Small-amplitude changes that are difficult or impossible to see with the naked eye, such as variation in human skin color due to blood circulation and small mechanical movements, can be extracted for further analysis, or exaggerated to become visible to an observer. Our techniques can also attenuate motions and changes to remove variation that distracts from the main temporal events of interest.

The main contribution of this thesis is in advancing our knowledge on how to process spatiotemporal imagery and extract information that may not be immediately seen, so as to better understand our dynamic world through images and videos.
  Neal Wadhwa, Michael Rubinstein, Fredo Durand, William T. Freeman
Phase-based Video Motion Processing
ACM Transactions on Graphics, Volume 32, Number 4 (Proc. SIGGRAPH), 2013
Abstract | Paper | Webpage | BibTeX
Patent pending
We introduce a technique to manipulate small movements in videos based on an analysis of motion in complex-valued image pyramids. Phase variations of the coefficients of a complex-valued steerable pyramid over time correspond to motion, and can be temporally processed and amplified to reveal imperceptible motions, or attenuated to remove distracting changes. This processing does not involve the computation of optical flow, and in comparison to the previous Eulerian Video Magnification method it supports larger amplification factors and is significantly less sensitive to noise. These improved capabilities broaden the set of applications for motion processing in videos. We demonstrate the advantages of this approach on synthetic and natural video sequences, and explore applications in scientific analysis, visualization and video enhancement.
  Michael Rubinstein, Armand Joulin, Johannes Kopf, Ce Liu
Unsupervised Joint Object Discovery and Segmentation in Internet Images
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2013
Abstract | Paper | Webpage | BibTeX
We present a new unsupervised algorithm to discover and segment out common objects from large and diverse image collections. In contrast to previous co-segmentation methods, our algorithm performs well even in the presence of significant amounts of noise images (images not containing a common object), as typical for datasets collected from Internet search. The key insight to our algorithm is that common object patterns should be salient within each image, while being sparse with respect to smooth transformations across images. We propose to use dense correspondences between images to capture the sparsity and visual variability of the common object over the entire database, which enables us to ignore noise objects that may be salient within their own images but do not commonly occur in others. We performed extensive numerical evaluation on established co-segmentation datasets, as well as several new datasets generated using Internet search. Our approach is able to effectively segment out the common object for diverse object categories, while naturally identifying images where the common object is not present.

  Michael Rubinstein, Neal Wadhwa, Fredo Durand, William T. Freeman
Revealing Invisible Changes In The World
Science Vol. 339 No. 6119 Feb 1 2013
NSF International Science and Engineering Visualization Challenge (SciVis), 2012
Honorable mention
Article in Science | Video | NSF SciVis 2012 | BibTeX
  Michael Rubinstein, Ce Liu, William T. Freeman
Annotation Propagation in Large Image Databases via Dense Image Correspondence
Proc. of the European Conference on Computer Vision (ECCV), 2012
Abstract | Paper | Webpage | BibTeX
Patent pending
Our goal is to automatically annotate many images with a set of word tags and a pixel-wise map showing where each word tag occurs. Most previous approaches rely on a corpus of training images where each pixel is labeled. However, for large image databases, pixel labels are expensive to obtain and are often unavailable. Furthermore, when classifying multiple images, each image is typically solved for independently, which often results in inconsistent annotations across similar images. In this work, we incorporate dense image correspondence into the annotation model, allowing us to make do with significantly less labeled data and to resolve ambiguities by propagating inferred annotations from images with strong local visual evidence to images with weaker local evidence. We establish a large graphical model spanning all labeled and unlabeled images, then solve it to infer annotations, enforcing consistent annotations over similar visual patterns. Our model is optimized by efficient belief propagation algorithms embedded in an expectation-maximization (EM) scheme. Extensive experiments are conducted to evaluate the performance on several standard large-scale image datasets, showing that the proposed framework outperforms state-of-the-art methods.
  Michael Rubinstein, Ce Liu, William T. Freeman
Towards Longer Long-Range Motion Trajectories
Proc. of the British Machine Vision Conference (BMVC), 2012
Abstract | Paper | Supplemental (.zip) | BMVC'12 poster | BibTeX
Although dense, long-rage, motion trajectories are a prominent representation of motion in videos, there is still no good solution for constructing dense motion tracks in a truly long-rage fashion. Ideally, we would want every scene feature that appears in multiple, not necessarily contiguous, parts of the sequence to be associated with the same motion track. Despite this reasonable and clearly stated objective, there has been surprisingly little work on general-purpose algorithms that can accomplish that task. State-of-the-art dense motion trackers process the sequence incrementally in a frame-by-frame manner, and associate, by design, features that disappear and reappear in the video, with different tracks, thereby losing important information of the long-term motion signal. In this paper, we strive towards an algorithm for producing generic long-range motion trajectories that are robust to occlusion, deformation and camera motion. We leverage accurate local (short-range) trajectories produced by current motion tracking methods and use them as an initial estimate for a global (long-range) solution. Our algorithm re-correlates the short trajectory estimates and links them to form a long-range motion representation by formulating a combinatorial assignment problem that is defined and optimized globally over the entire sequence. This allows to correlate tracks in arbitrarily distinct parts of the sequence, as well as handle track ambiguities by spatiotemporal regularization. We report results of the algorithm on synthetic examples, natural and challenging videos, and evaluate the representation for action recognition.
  Hao-Yu Wu, Michael Rubinstein, Eugene Shih, John Guttag, Fredo Durand, William T. Freeman
Eulerian Video Magnification for Revealing Subtle Changes in the World
ACM Transactions on Graphics, Volume 31, Number 4 (Proc. SIGGRAPH), 2012
Abstract | Paper | Webpage | BibTeX
Patent pending
Our goal is to reveal temporal variations in videos that are difficult or impossible to see with the naked eye and display them in an indicative manner. Our method, which we call Eulerian Video Magnification, takes a standard video sequence as input, and applies spatial decomposition, followed by temporal filtering to the frames. The resulting signal is then amplified to reveal hidden information. Using our method, we are able to visualize the flow of blood as it fills the face and to amplify and reveal small motions. Our technique can be run in real time to instantly show phenomena occurring at the temporal frequencies selected by the user.
  Michael Rubinstein, Ce Liu, Peter Sand, Fredo Durand, William T. Freeman
Motion Denoising with Application to Time-lapse Photography

IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2011
Abstract | Paper | Webpage | BibTeX
Motions can occur over both short and long time scales. We introduce motion denoising, which treats short-term changes as noise, long-term changes as signal, and rerenders a video to reveal the underlying long-term events. We demonstrate motion denoising for time-lapse videos. One of the characteristics of traditional time-lapse imagery is stylized jerkiness, where short-term changes in the scene appear as small and annoying jitters in the video, often obfuscating the underlying temporal events of interest. We apply motion denoising for resynthesizing time-lapse videos showing the long-term evolution of a scene with jerky short-term changes removed. We show that existing filtering approaches are often incapable of achieving this task, and present a novel computational approach to denoise motion without explicit motion analysis. We demonstrate promising experimental results on a set of challenging time-lapse sequences.
  Michael Rubinstein, Diego Gutierrez, Olga Sorkine, Ariel Shamir
A Comparative Study of Image Retargeting
ACM Transactions on Graphics, Volume 29, Number 5 (Proc. SIGGRAPH Asia), 2010
Abstract | Paper | Webpage | BibTeX
The numerous works on media retargeting call for a methodological approach for evaluating retargeting results. We present the first comprehensive perceptual study and analysis of image retargeting. First, we create a benchmark of images and conduct a large scale user study to compare a representative number of state-of-the-art retargeting methods. Second, we present analysis of the users’ responses, where we find that humans in general agree on the evaluation of the results and show that some retargeting methods are consistently more favorable than others. Third, we examine whether computational image distance metrics can predict human retargeting perception. We show that current measures used in this context are not necessarily consistent with human rankings, and demonstrate that better results can be achieved using image features that were not previously considered for this task. We also reveal specific qualities in retargeted media that are more important for viewers. The importance of our work lies in promoting better measures to assess and guide retargeting algorithms in the future. The full benchmark we collected, including all images, retargeted results, and the collected user data, are available to the research community for further investigation.
  Michael Rubinstein
Discrete Approaches to Content-aware Image and Video Retargeting

MSc Thesis, The Interdisciplinary Center, May 2009
pdf | High-resolution pdf (70MB) | BibTeX
  Michael Rubinstein, Ariel Shamir, Shai Avidan
Multi-operator Media Retargeting
ACM Transactions on Graphics, Volume 28, Number 3 (Proc. SIGGRAPH), 2009
Abstract | Paper | Webpage | BibTeX
Content aware resizing gained popularity lately and users can now choose from a battery of methods to retarget their media. However, no single retargeting operator performs well on all images and all target sizes. In a user study we conducted, we found that users prefer to combine seam carving with cropping and scaling to produce results they are satisfied with. This inspires us to propose an algorithm that combines different operators in an optimal manner. We define a resizing space as a conceptual multi-dimensional space combining several resizing operators, and show how a path in this space defines a sequence of operations to retarget media. We define a new image similarity measure, which we term Bi-Directional Warping (BDW), and use it with a dynamic programming algorithm to find an optimal path in the resizing space. In addition, we show a simple and intuitive user interface allowing users to explore the resizing space of various image sizes interactively. Using key-frames and interpolation we also extend our technique to retarget video, providing the flexibility to use the best combination of operators at different times in the sequence.
  Michael Rubinstein, Ariel Shamir, Shai Avidan
Improved Seam Carving for Video Retargeting
ACM Transactions on Graphics, Volume 27, Number 3 (Proc. SIGGRAPH), 2008
Abstract | Paper | Webpage | Code | BibTex
Implemented in Adobe Photoshop as Content-aware scaling
Video, like images, should support content aware resizing. We present video retargeting using an improved seam carving operator. Instead of removing 1D seams from 2D images we remove 2D seam manifolds from 3D space-time volumes. To achieve this we replace the dynamic programming method of seam carving with graph cuts that are suitable for 3D volumes. In the new formulation, a seam is given by a minimal cut in the graph and we show how to construct a graph such that the resulting cut is a valid seam. That is, the cut is monotonic and connected. In addition, we present a novel energy criterion that improves the visual quality of the retargeted images and videos. The original seam carving operator is focused on removing seams with the least amount of energy, ignoring energy that is introduced into the images and video by applying the operator. To counter this, the new criterion is looking forward in time - removing seams that introduce the least amount of energy into the retargeted result. We show how to encode the improved criterion into graph cuts (for images and video) as well as dynamic programming (for images). We apply our technique to images and videos and present results of various applica
  Ariel Shamir, Michael Rubinstein, Tomer Levinboim
Inverse Computer Graphics: Parametric Comics Creation from 3D Interaction
IEEE Computer Graphics & Applications, Volume 26, Number 3, 30-38, 2006
Abstract | Paper | Webpage | BibTeX
There are times when Computer Graphics is required to be succinct and simple. Carefully chosen simplified and static images can portray a narration of a story as effectively as 3D photo-realistic continuous graphics. In this paper we present an automatic system which transforms continuous graphics originating from real 3D virtualworld interactions into a sequence of comics images. The system traces events during the interaction and then analyzes and breaks them into scenes. Based on user defined parameters of point-ofview and story granularity it chooses specific time-frames to create static images, renders them, and applies post-processing to reduce their cluttering. The system utilizes the same principal of intelligent reduction of details in both temporal and spatial domains for choosing important events and depicting them visually. The end result is a sequence of comics images which summarize the main happenings and present them in a coherent, concise and visually pleasing manner.


Code and Data
  Object Discovery and Segmentation Internet Datasets
The Internet image collections we used for the evaluation in our CVPR'13 paper, with human foreground-background masks, and the segmentation results by our method and by other co-segmentation techniques.
  Eulerian Video Magnification
MATLAB/C++ implementation of our method, with code that reproduces all the results in our SIGGRAPH'12 paper. This technology is patented by MIT, and the code is provided for non-commercial research purposes only.

A dataset of 80 images and retargeted results, ranked by human viewers. The project website contains all the data we collected and also provides a nice synopsis of the current state of image retargeting research.
  Image Retargeting Survey
The system I've developed for collecting user feedback on image retargeting results. It is based on the linked-paired comparison design to collect and analyze data when the number of stimuli is very large.
The code is written in HTML, PHP and JavaScript. It supports multiple experiment designs, and can be easily used with Amazon Mechanical Turk. See my paper and the project website for further details.
A live demo is available here.
  Seam Carving (v1.0, 2009-04-10)
A MATLAB re-implementation of the seam carving method I worked on at MERL. It is provided for research/educational purposes only. This algorithm is patented and owned by Mitsubishi Electric Research Labs, Cambridge MA.
The code supports backward and forward energy using both the dynamic programming and graph cut formulations. See demo.m for usage example.
  • maxflow (v1.1, 2008-09-15)
    A MATLAB wrapper for the Boykov-Kolmogorov max-flow algorithm. Also available on MATLAB Central.





  • Tutorial: "Dense Image Correspondences for Computer Vision"
    CVPR 2014, Columbus Ohio, Jun 23 2014  Webpage
    ICCV 2013, Sydney Australia, Dec 2 2013  Webpage

For more conference talks, see the project web pages under Publications above.


  • MIT 6.869 Advances in Computer Vision, Spring 2011 (TA)  Webpage
  • "Seam Carving and Content-driven Retargeting of Images and Video"  pdf | pptx (55mb)
    MIT 6.865 Computational Photography, 2010-11 (guest lecturer)

Other non-conference talks I gave here and there:
  • "Introduction to Recursive Bayesian Filtering"  pdf | ppt
    Seminar on Advanced Topics in Computer Graphics, Tel Aviv University 2009
  • "Tracking with focus on Particle Filters"  Part I: pdf | ppt, Part II: pdf | ppt
    Seminar on Vision-based Security, IDC 2009
  • "Dimensionality Reduction by Random Mapping"  pdf | ppt
    Seminar on Advanced Topics in Computer Graphics, Tel Aviv University 2006

Audience reactions at my talks :-)
(Photo credit: Gulnara Gross, TEDxBeaconStreet)


Press Coverage

(scroll to 3:30)

MathWorks, Oct 10 2015, "MIT CSAIL Researchers Develop Video Processing Algorithms to Magnify Minute Movements and Changes in Color"
TechCrunch, Aug 4 2015: "Google And MIT Researchers Demo An Algorithm That Lets You Take Clear Photos Through Reflections"
MIT Technology Review, Aug 4 2015: "Erase Obstructions from Photos with a Click"
MIT News, May 21 2015: "Gauging materials’ physical properties from video"
The Washington Post, Jan 28 2015: "‘Motion microscope’ reveals movements too small for the human eye" | Reuters video
NPR, Aug 30 2014: Wait Wait ... Don't Tell Me! (an amusing segment about our "Visual Microphone" starts at 3:30)
CNN, Aug 7 2014: "Eavesdropping with a camera and potted plants"
IEEE Spectrum, Aug 6 2014: "Your Candy Wrappers are Listening"
Wired, Aug 5 2014: "How to reconstruct speech from a silent video of a crisp packet"
ABC News, Aug 5 2014: "Your Bag of Chips Is Spying on You"
engadget, Aug 4 2014: "Visual microphone can pick up speech from a bag of potato chips"
New Scientist, Aug 4 2014: "Caught on tape: cameras turn video into sound"
Time, Aug 4 2014: "MIT Researchers Can Spy on Your Conversations With a Potato-Chip Bag"
Washington Post, Aug 4 2014: "MIT researchers can listen to your conversation by watching your potato chip bag"
MIT News, Aug 4 2014: "Extracting audio from visual information"
Yedioth Ahronoth, Mar 17 2013: "The Hidden Secrets of Video" (Hebrew)
Discovery Channel, Feb 28 2013: "Daily Planet" (video)
Daily Mail, Feb 28 2013: "How to spot a liar (and cheat at poker)"
New York Times, Feb 27 2013: "Scientists Uncover Invisible Motion in Video" | Video
Txchnologist, Feb 1 2013: "New Video Process Reveals Heart Rate, Invisible Movement"
MIT News, Feb 1 2013: "MIT researchers honored for 'Revealing Invisible Changes"
Wired, Jul 25 2012: "MIT algorithm measures your pulse by looking at your face"
MIT Technology Review, Jul 24 2012: "Software Detects Motion that the Human Eye Can't See"
BBC Radio, Jul 3 2012: "MIT Video colour amplification"
Der Spiegel, Jun 27 2012: "Video software can make pulse visible" (German)
MIT News, Jun 21 2012: "Researchers amplify variations in video, making the invisible visible" | Video | Spotlight (our work on the MIT front page)
PetaPixel, Jun 13 2012: "Magnifying the Subtle Changes in Video to Reveal the Invisible"
Huffington Post, Jun 4 2012: "MIT's New Video Technology Could Give You Superhuman Sight"
Gizmodo, Jun 4 2012: "New X-Ray Vision-Style Video Can Show a Pulse Beating Through Skin"

More press...