Photo-Consistent Reconstruction of Semitransparent Scenes

Samuel W. Hasinoff and Kiriakos N. Kutulakos

Publications

Samuel W. Hasinoff and Kiriakos N. Kutulakos, Photo-Consistent Reconstruction of Semitransparent Scenes by Density-Sheet Decomposition. IEEE Trans. Pattern Analysis and Machine Intelligence, 29(5), pp. 870-885, 2007. [pdf]

Samuel W. Hasinoff and Kiriakos N. Kutulakos, Photo-Consistent 3D Fire by Flame-Sheet Decomposition. In Proceedings of 9th IEEE International Conference on Computer Vision, ICCV 2003, pp. 1184-1191. [pdf]

Samuel W. Hasinoff, Three-Dimensional Reconstruction of Fire from Images, MSc Thesis, University of Toronto, Department of Computer Science, 2002. [pdf]

Journal abstract

This paper considers the problem of reconstructing visually realistic 3D models of dynamic semitransparent scenes, such as fire, from a very small set of simultaneous views (even two). We show that this problem is equivalent to a severely underconstrained computerized tomography problem, for which traditional methods break down. Our approach is based on the observation that every pair of photographs of a semitransparent scene defines a unique density field, called a Density Sheet, that 1) concentrates all its density on one connected, semitransparent surface, 2) reproduces the two photos exactly, and 3) is the most spatially compact density field that does so. From this observation, we reduce reconstruction to the convex combination of sheet-like density fields, each of which is derived from the Density Sheet of two input views. We have applied this method specifically to the problem of reconstructing 3D models of fire. Experimental results suggest that this method enables high-quality view synthesis without overfitting artifacts.

Supplementary material

Video of the ICCV 2003 talk [divx avi, 100MB] [asx, 25MB].
PowerPoint slides, presented at ICCV 2003 [zip]
Accompanying video of reconstruction results [divx avi]

"torch" dataset

The first scene ("torch") was a citronella patio torch, burning with a flickering flame about 10cm tall. Two synchronized progressive-scan Sony DXC-9000 cameras, roughly 90 degrees apart, were used to acquire videos of the flame at 640x480 resolution. While ground truth for this scene was not available, the structure of the flame is simple, and it appears that epipolar slices actually contain a single elongated blob of density.
The cameras were calibrated using Bouguet's Camera Calibration Toolbox for Matlab to an accuracy of about 0.5 pixels. The input views were then rectified so that corresponding horizontal scanlines define the epipolar slices. For each of the 25 frames in the video sequence, each epipolar slice was reconstructed independently.
We compared three different reconstruction methods with respect to the quality of synthesized views interpolating between the two cameras. First, the multiplication solution shows typical blurring and doubling artifacts. Interpolated views of this solution do not contain the same high frequency content as the input images and suggest that the viewpoints were "accidentally" aligned so as to hide significant structures in the scene. Second, an algebraic method based on fitting Gaussian blobs to the density field (Hasinoff, 2002) over-fits the two input images and produces a badly mottled appearance for synthesized views. This confirms that sparse-view tomography methods are not suitable when the number of viewpoints is extremely limited. Third, the Density Sheet reconstruction produces very realistic views that appear indistinguishable in quality from the input views.
If the viewpoint is varied while time is fixed, the true nature of the Density Sheet reconstruction as a transparent surface can be perceived, however the addition of temporal dynamics enhances the photo-realism. For simple flames like the "torch" scene, a two-view reconstruction consisting of a single Density Sheet serves as a good impostor for the true scene.

input sequence - view 1, view 2
view interpolation, frame 8

multiplication solution
sheet solution
blob-based algebraic solution

sheet solution, with flame dynamics
sheet solution, 360-degrees, with flame dynamics
sheet solution, 360-degrees, stepped dynamics

"burner" dataset

The second scene ("burner"), consists of a colorful turbulent flame emanating from a gas burner.
Ivo Ihrke and Marcus Magnor, GrOVis-Fire: A multi-video sequence for volumetric reconstruction and rendering research. http://www.mpi-inf.mpg.de/~ihrke/Projects/Fire/GrOVis_Fire/
This scene was captured for 348 frames using eight synchronized video cameras with 320x240 resolution. The cameras are roughly equidistant from the flame and distributed over the viewing hemisphere. Ground truth is not available for this scene either, however for this dataset an algebraic tomography method restricted by the visual hull produces a reasonable approximation to the true density field (Ihrke and Magnor, 2004).
For the "burner" dataset we applied our algorithms to just two of the input cameras in the dataset, spaced about 60 degrees apart. We used the available calibration information to rectify the images from this pair of cameras and then computed the Density Sheet from corresponding scanlines. Finally, we used a homography as in view morphing (Seitz and Dyer, 1996) to warp the images synthesized in the rectified space back to the interpolated camera.
Although the "burner" scene is significantly more complex than the previous scene, two-view reconstruction using a single Density Sheet can still produce realistic intermediate views. As with the "torch" dataset, the view interpolation appears even more plausible when the dynamics of the flame are added.
A significant artifact occurs when the true flame consists of multiple structures and the assumed imaging model is not satisfied exactly. In this example, the true scene and both input images consist of two major flame structures, but the images disagree on the total intensity (i.e., total density) of each structure. For the Density Sheet solution, this discrepancy leads to a "tearing" effect where a spurious thin vertical structure is used to explain the lack of correspondence. Note that for this particular dataset this problem applies more generally to any technique assuming a linear image formation model.

sheet solution (cameras 3 and 4)

view interpolation, various frames - 317, 271, 139, 187, 191
fixed interpolated view (midway), with flame dynamics
view interpolation, with flame dynamics

sheet solution (cameras 5 and 6)

view interpolation, various frames - 317, 271, 139, 187, 191
fixed interpolated view (midway), with flame dynamics
view interpolation, with flame dynamics

"jet" dataset

The third scene ("jet") consists of a complex flame emerging from a gaseous jet. The dataset consists of 47 synchronized views, roughly corresponding to inward-facing views arranged around a quarter-circle, captured from a promotional video for a commercial 3D freeze-frame system. Since no explicit calibration was available for this sequence, we assumed that the views were evenly spaced.
To test the view synthesis capabilities of our approach, we used a subset of the 47 frames as input and used the rest to evaluate agreement with the synthesized views. Rendering results from the multiplication solution and the Density Sheet solution (Algorithm 1) suggest that these solutions cannot accurately capture the flame's appearance and structure, and are not well suited to more complex flames with large appearance variation across viewpoint. To incorporate more views, we applied the blob-based algebraic tomography method (Hasinoff, 2002) and the Density Sheet Decomposition Algorithm (Algorithm 3) with the 45 degree view as a third image. In the latter algorithm, we generated B = 150 Decomposed Density Sheets for each of the P = 3 pairs of input views in Step 1, giving rise to a total of 450 basis density fields.
To further explore the benefit of using multiple views, we optimized the convex combination of these fields in Steps 2-4 in two ways: by maximizing photo-consistency with (1) the three input images and (2) by maximizing photo-consistency with four additional images from the sequence. The results suggest that the Density Sheet Decomposition Algorithm produces rendered images that are superior to the other methods. They also show that increasing the number of images during optimization has a clear benefit.
As a final experiment, we ran the Density Sheet Decomposition Algorithm using seven input images, but with the number of Decomposed Density Sheets per view-pair reduced to B = 50, so the resulting QP is still over-determined. Visually, the results for M=7 appear to give a 3D reconstruction that is more coherent and accurate, but also more smoothed, than previous results for this dataset.

ground truth sequence of 47 input views
multiplication solution
sheet solution
blob-based algebraic solution
sheet decomposition method (3 views)
sheet decomposition method (3 views, 4 additional views for image error)
sheet decomposition method (7 views)

Acknowledgements

The authors gratefully acknowledge the support of the US National Science Foundation under Grant No. IRI-9875628, of the Natural Sciences and Engineering Research Council of Canada under the RGPIN, PGS-A, and CGS-D programs, and of the Alfred P. Sloan Foundation.