Multiple-Aperture Photography

Samuel W. Hasinoff and Kiriakos N. Kutulakos

   

Publications

Samuel W. Hasinoff and Kiriakos N. Kutulakos, Multiple-Aperture Photography for High Dynamic Range and Post-Capture Refocusing. IEEE Trans. Pattern Analysis and Machine Intelligence, submitted. [pdf]

Samuel W. Hasinoff and Kiriakos N. Kutulakos, A Layer-Based Restoration Framework for Variable-Aperture Photography. Proc. 11th IEEE International Conference on Computer Vision, ICCV 2007, 8 pp. (DVD proceedings). [pdf]

Samuel W. Hasinoff, Variable-Aperture Photography. PhD Thesis, University of Toronto, Dept. of Computer Science, 2008. [pdf]
Alain Fournier Ph.D. Thesis Award

Journal abstract

In this article we present multiple-aperture photography, a new method for analyzing sets of images captured with different aperture settings, with all other camera parameters fixed. Using an image restoration framework, we show that we can simultaneously account for defocus, high dynamic range exposure (HDR), and noise, all of which are confounded according to aperture. Our formulation is based on a layered decomposition of the scene that models occlusion effects in detail. Recovering such a scene representation allows us to adjust the camera parameters in post-capture, to achieve changes in focus setting or depth of field -- with all results available in HDR. Our method is designed to work with very few input images: we demonstrate results from real sequences obtained using the three-image "aperture bracketing" mode found on consumer digital SLR cameras.

Supplementary material

general experimental setup

To test our approach on real data, we captured sequences using a Canon EOS 1Ds Mark II, secured on a tripod, with an 85mm f1.2L lens set to manual focus. In all our experiments we use the three-image "aperture bracketing" mode set to +-2 stops, and select shutter speed so that the images are captured at f8, f4, and f2 (yielding relative exposure levels of roughly 1, 4, and 16, respectively). We captured RAW images for increased dynamic range, and demonstrate our results for downsampled 500x333 pixel images.

All the results videos (MPEG-2) below include a side panel with three sliders, to help visualize the camera settings used to synthesize new images. The red zones on the sliders indicate extrapolation:
  1. aperture
    • from narrow to wide
    • ticks indicate the f-stops of the input images (f8, f4, f2)
  2. focus
    • from near to far
    • ticks indicate the estimated relative depths of the scene layers, on a logarithmic scale
  3. exposure
    • from dark to bright
    • ticks indicate exposures corresponding to the input images
    • to indicate tonemapping, the full range is shown spanned

"dumpster" dataset

Outdoor sequence, composed of three distinct and roughly fronto-parallel layers—a background building, a pebbled wall, and a rusty dumpster. The foreground dumpster is darker than the rest of the scene and is almost in-focus. Although the layering recovered by the restoration is not pixel-accurate at the boundaries, resynthesis with new camera settings yields visually realistic results

"portrait" dataset

This portrait was captured indoors in a dark room, using only available light from the background window. The subject is nearly in-focus and very dark compared to the background buildings outside, and an even darker chair sits defocused in the foreground. Note that while the final layer assignment is only roughly accurate (e.g., near the subject's right shoulder), the discrepancies are restricted mainly to low-texture regions near layer boundaries, where layer membership is ambiguous and has little influence on resynthesis. In this sense, our method is similar to image-based rendering from stereo where reconstruction results that deviate from ground truth in "unimportant" ways can still lead to visually realistic new images. Because the chair is under-exposed even in the widest-aperture image, slight artifacts can be observed at its boundary, due to over-sharpening and posterization.

[new] "pillars" dataset

Outdoor sequence, composed of two differently exposed structures—a dark wall is occluded by several bright stone pillars. Note how the method assigns slightly different depths to the two segments containing the gradually sloping background wall. Although not as noticeable in the synthesized results, the initial segmentation misassigns the lower-rightmost portion of the foreground ledge to the background layer.

[new] "doors" dataset

This architectural scene was captured outdoors at twilight, and consists of sloping wall containing a row of rusty doors, with a more brightly illuminated background. The sloping, hallway-like geometry constitutes a test for our method's ability to handle scenes that violate our piecewise fronto-parallel scene model. As the results show, despite the fact that our method decomposes the scene into six fronto-parallel layers, the recovered layer ordering is almost correct, and our restoration allows us to resynthesize visually realistic new images. Note that the reduced detail for the tree in the background is due to scene motion caused by wind over the 1-second total capture time.

[new] "macro" dataset - failure case

Our final sequence was a macro still life scene, captured using a 10mm extension tube to reduce the minimum focusing distance of the lens, and to increase the magnification to approximately life-sized (1:1). The scene is composed of a miniature glass bottle whose inner surface is painted, and a dried bundle of green tea leaves. This is a challenging dataset for several reasons: the level of defocus is severe outside the very narrow depth-of-field, the scene consists of both smooth and intricate geometry (bottle and tea leaves, respectively), and the reflections on the glass surface only become focused at incorrect virtual depths. The initial segmentation leads to a very coarse decomposition into layers, which is not improved by our optimization. While the resynthesis results for this scene suffer from strong artifacts, the gross structure, blur levels, and ordering of the scene layers are still recovered correctly. The worst artifacts are the bright "cracks" occurring at layer boundaries, due to a combination of incorrect layer segmentation and our diffusion-based inpainting method.

"lena" dataset - synthetic

This synthetic dataset consists of an HDR version of the 512x512 pixel Lena image, where we simulate HDR by dividing the image into three vertical bands and artificially exposing each band. We decompose the image into layers by assigning different depths to each of three horizontal bands, and generate the input images by applying the forward image formation model. Finally, we add Gaussian noise to the input with a standard deviation of 1% of the intensity range.

Acknowledgements

This work was supported in part by the Natural Sciences and Engineering Research Council of Canada under the RGPIN and CGS-D programs, by a fellowship from the Alfred P. Sloan Foundation, and by an Ontario Premiers Research Excellence Award.