SIGGRAPH 2014
The Visual Microphone: Passive Recovery of Sound from Video
Abe Davis1 Michael Rubinstein2,1 Neal Wadhwa1 Gautham Mysore3 Frédo Durand1 William T. Freeman1
1MIT CSAIL 2Microsoft Research 3Adobe Research

 



Abstract

When sound hits an object, it causes small vibrations of the object’s surface. We show how, using only high-speed video of the object, we can extract those minute vibrations and partially recover the sound that produced them, allowing us to turn everyday objects—a glass of water, a potted plant, a box of tissues, or a bag of chips—into visual microphones. We recover sounds from highspeed footage of a variety of objects with different properties, and use both real and simulated data to examine some of the factors that affect our ability to visually recover sound. We evaluate the quality of recovered sounds using intelligibility and SNR metrics and provide input and recovered audio samples for direct comparison. We also explore how to leverage the rolling shutter in regular consumer cameras to recover audio from standard frame-rate videos, and use the spatial resolution of our method to visualize how sound-related vibrations vary over an object’s surface, which we can use to recover the vibration modes of an object.


@article{Davis2014VisualMic,
  author = {Abe Davis and Michael Rubinstein and Neal Wadhwa and Gautham Mysore and Fredo Durand and William T. Freeman},
  title = {The Visual Microphone: Passive Recovery of Sound from Video},
  journal = {ACM Transactions on Graphics (Proc. SIGGRAPH)},
  year = {2014},
  volume = {33},
  number = {4},
  pages = {79:1--79:10}
}


Paper
: PDF

** Patent pending

Supplementary material: link

SIGGRAPH presentation: zip (ppt + videos; 340MB)

 

 

Contact

For more information, please contact Abe Davis and Michael Rubinstein

 

Press

 

Sounds Recovered from Videos

Below are some examples of sounds we recovered just from high-speed videos of objects. We will add more results soon. In the meantime, you can check out our video and supplementary material above.
Our audio results are best experienced using good speakers, preferably headphones.

(The reason we used "Mary Had a Little Lamb" in several of our demos: The Internet Archive)

 

MIDI music recovered from a video of a bag of chips, and from a video of a plant:

Video (representative frame) Source sound Sound recovered from video
700 x 400, 2200 Hz
     
Video (representative frame) Source sound Sound recovered from video
700 x 400, 2200 Hz

 

Speech recovered from a video of a small patch on a bag of chips lying on a table in a room:

Video (representative frame) Source sound Sound recovered from video
192 x 192, 20000 Hz

 

A child's singing recovered from a video of a foil wrapper of a candy bar:

Video (representative frame) Source sound (0-8kHz) Sound recovered from video
480 x 480, 6000 Hz

 

Code and Data

We are working to release our videos, results and code. Stay tuned...

 

Related Publications

We have been working for a couple of years now on techniques to analyze subtle color and motion signals in videos. Check out our previous work:

Riesz Pyramids for Fast Phase-Based Video Magnification, ICCP 2014

Analysis and Visualization of Temporal Variations in Video, Michael Rubinstein, PhD Thesis, MIT Feb 2014

Phase-Based Video Motion Processing, SIGGRAPH 2013

Eulerian Video Magnification for Revealing Subtle Changes in the World, SIGGRAPH 2012

 

Acknowledgements

We thank Justin Chen for his helpful feedback, Dr. Michael Feng and Draper Laboratory for lending us their Laser Doppler Vibrometer, and the SIGGRAPH reviewers for their comments. We acknowledge funding support from QCRI and NSF CGV-1111415. Abe Davis and Neal Wadhwa were supported by the NSF Graduate Research Fellowship Program under Grant No. 1122374. Abe Davis was also supported by QCRI, and Neal Wadhwa was also supported by the MIT Department of Mathematics. Part of this work was done when Michael Rubinstein was a student at MIT, supported by the Microsoft Research PhD Fellowship.

 

Last updated: Jul 2014