Scene Reconstruction from High Spatio-Angular Light Fields

ACM SIGGRAPH 2013

Changil Kim1,2, Henning Zimmer1,2, Yael Pritch1, Alexander Sorkine-Hornung1, Markus Gross1,2

1Disney Research Zurich, 2ETH Zurich

Teaser image thumbnail

The images on the left show a 2D slice of a 3D input light field, a so called epipolar-plane image (EPI), and two out of one hundred 21 megapixel images that were used to construct the light field. Our method computes 3D depth information for all visible scene points, illustrated by the depth EPI on the right. From this representation, individual depth maps or segmentation masks for any of the input views can be extracted as well as other representations like 3D point clouds. The horizontal red lines connect corresponding scanlines in the images with their respective positions in the EPI.

Abstract

This paper describes a method for scene reconstruction of complex, detailed environments from 3D light fields. Densely sampled light fields in the order of 109 light rays allow us to capture the real world in unparalleled detail, but efficiently processing this amount of data to generate an equally detailed reconstruction represents a significant challenge to existing algorithms. We propose an algorithm that leverages coherence in massive light fields by breaking with a number of established practices in image-based reconstruction. Our algorithm first computes reliable depth estimates specifically around object boundaries instead of interior regions, by operating on individual light rays instead of image patches. More homogeneous interior regions are then processed in a fine-to-coarse procedure rather than the standard coarse-to-fine approaches. At no point in our method is any form of global optimization performed. This allows our algorithm to retain precise object contours while still ensuring smooth reconstructions in less detailed areas. While the core reconstruction method handles general unstructured input, we also introduce a sparse representation and a propagation scheme for reliable depth estimates which make our algorithm particularly effective for 3D input, enabling fast and memory efficient processing of “Gigaray light fields” on a standard GPU. We show dense 3D reconstructions of highly detailed scenes, enabling applications such as automatic segmentation and image-based rendering, and provide an extensive evaluation and comparison to existing image-based reconstruction techniques.

Downloads

Datasets

This website provides the datasets including all the images and the results used in the paper. Please cite the above paper if you use any part of the images or results provided on the website or in the paper. For questions or feedback please contact me.

We provide the following as part of the datasets:

Mansion

Mansion image thumbnail Mansion depth thumbnail

Church

Church image thumbnail Church depth thumbnail

Couch

Couch image thumbnail Couch depth thunbmail

Bikes

Bikes image thumbnail Bikes depth thumbnail

Statue

Statue image thumbnail Statue depth thumbnail

Please click here (ZIP, 7.6 GB) to download all five datasets including the images and depth maps as a single zipped archive.

Acquisition

All images provided here have been captured using a Canon EOS 5D Mark II DSLR camera and a Canon EF 50mm f/1.4 USM lens, with an exception of the Couch dataset which was captured with a Canon EF 50mm f/1.2 L USM lens. A Zaber T-LST1500D motorized linear stage was used to drive the camera to shooting positions. The focus and aperture settings vary between datasets, but were kept identical within each dataset. The camera focal length is 50mm and the sensor size is 36x24mm for all datasets. PTLens was used to radially undistort the captured images, and Voodoo Camera Tracker was used to estimate the camera poses for rectification. See the paper for detail. Additionally, the exposure variation was compensated additively for the Bikes and Statue datasets.

File Formats

All images are provided in JPEG format. Depth maps are provided in two different formats. Those in .dmap files contain the dense disparity values d as defined in Equation 1 of the paper. The first and the second 32-bit unsigned integers respectively indicate the width and the height of the depth map. The rest of the file is an uncompressed sequence of width x height 32-bit floats in row-major order. Little-endian is assumed for all 4-byte words. The depth values z can be obtained by applying Equation 1 given the camera focal length f in pixels and the camera separation b for each dataset. Additionally, depth maps are stored as grayscale images in PNG format for easier visual inspection.

Video

BibTeX

@article{Kim2013scene,
  author    = {Changil Kim and Henning Zimmer and Yael Pritch and Alexander Sorkine-Hornung and Markus Gross},
  title     = {Scene Reconstruction from High Spatio-Angular Resolution Light Fields},
  journal   = {ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH)},
  volume    = {32},
  number    = {4},
  year      = {2013},
  pages     = {73:1--73:12},
}