Multi-frame stereo matching with edges, planes, and superpixels (Supplementary Material)

Supplementary material Multi-frame stereo matching with edges, planes, and superpixels

This webpage requires Javascript to run. In some browsers, you may need to explicitly enable it. If the results do not fit width-wise across your screen, you can zoom out in your web browser.

This document includes the following results:
1. View interpolation
2. Results on Midd-F
3. Results on Disney
4: Comparison with Kim et al. [3] on Midd-Q
5: Varying numbers of frames as input

1. View interpolation Back to top
First, we show view interpolation result. For each sequence, we recover the depth maps of both left and right frames using 5 (Teddy) or 9 (Disney-Mansion) frames around those two frames. We then render viewpoints between the left and right frames by warping both color images with recovered depth maps and combining these results using z-buffering and linear blending. To evaluate the quality of interpolation, we compare the interpolated view with the actual image captured at that viewpoint.

Legend:

Captured: the actual image captured at that view, which serves as the ground truth
Interp using SGM depth: interpolated image using the depth map recovered by SGM.
Interp using our depth: interpolated image using the depth map recovered by our technique

Teddy

In the interpolated view using SGM depth, some background pixels near the boundary are missing. For example, both the digit 2 in the top patch and the digit 4 in the bottom patch are missing (compared SGM with the ground truth). This is because depth at those regions is incorrect due to foreground fattening. Such errors do not exist in our result.

Disney-mansion

In the interpolated view using depth calculated by SGM, there are haloes around leaves (the top patch) or the spike (the bottom patch) due to the foreground fattening, while the boundary of these objects is much cleaner in our result.

The following results show the synthesized videos using our depth maps. The first and the last frame of each sequence are used as inputs and the rest frames are generated by view interpolation.

Teddy

Disney-mansion

Back to top

2. Results on Midd-F Back to top
We provide the depth maps recovered by our technique on Midd-F, along with intermediate results (depth of edges and depth of patches), and comparisons with SGM. Please mouse over or click on the labels beneath each image to switch between them (it might take 1-2 seconds to load an image since the images are very large). Two corresponding close-up views are shown besides each sequence.

Legend:

SGM depth: the estimated depth map with SGM algorithm
Our depth: the estimated depth map with our algorithm
SGM error, Our error: black regions are errors in non-occluded regions, and gray regions are errors in occluded regions (threshold=2.0).
Depth of edges: sparse depth map of edges by edge matching.
Depth of patches: depth of derived overlapping slanted planes from edges. For all the sequences, the patch size is 32x32. But since different sequences have different size, the size of the patches may look different.

Aloe

Our depth, error rate=2.38

Art

Our depth, error rate=3.32

In the SGM depth, the leaf shown in the top patch is thicker than it should be (compared with ground truth depth), and there are also errors between two leaves shown in the bottom patches. Those errors do not exist in our depth map.

Much of the error in our depth occurs at the bulge of the aloe and one of bottom leaf (see the error map). This is primarily due to the fact that these regions are less textured, and therefore fewer edges were detected (see depth of edges).

The boundaries of the three pens shown in the top patch are mostly accurate in our depth map, but are thicker than the ground truth in the SGM depth map. Our algorithm also produces fewer errors on the background (see both patches).

Cloth3

Our depth, error rate=1.01

Cones

Our depth, error rate=2.59

The recovered depth maps by our algorithm and SGM are mostly accurate in this sequence, since the input images are highly textured.

Cones in this sequence are occluded by other cones, making it challenging. Our edge matching algorithm can correctly estimate the depth of each cones (see depth of edges), and therefore the estimated depth has clear boundaries between those cones, even when two cones have similar colors (see two red cones shown in the bottom patch). On the contrary, the boundaries in SGM depth are less-smooth.

Dolls

Our depth, error rate=4.84

Rocks2

Our depth, error rate=1.06

The boundaries of these dolls are mostly accurate in our depth map, while their boundaries in SGM depth map are slightly larger than ground truth (see the boundaries of two dolls shown in the two zoomed patches).

The recovered depth map by both algorithms are mostly accurate, except for a few textureless holes between rocks.

Teddy

Our depth, error rate=5.46

This is a challenging sequence, as there are highly foreshortened regions (the newspaper at the bottom), self-occlusion (leaves at the bottom), and untextured regions (the background on the right side of Teddy). Our technique correctly recovers the depth map in most of these regions, while SGM creates large error in textureless regions, such as in the region on the right side of Teddy (the top patch), or leaves at the bottom (the bottom patch).

Since there is no ground truth depth map for Disney, we only provide a qualitative comparison between our results with the results of SGM and [3]. Note that [3] uses 101 frames as input while SGM and our algorithm only use 9 frames as input.

Legend:

Kim et al. [3]: the estimated depth map by [3].
SGM: the estimated depth map with SGM algorithm
Ours: the estimated depth map with our technique
Depth of edges: same as above.
Depth of patches: same as above.

Church

The recovered depth map by our algorithm is mostly the same as the one recovered by Kim et al. [3], except some tiny structures. The boundaries of powerlines in our depth map are clearer than SGM (see the top patch). Also, same as Kim et al. [3], we remove the depth of the sky using the mask provided in [3].

Mansion

This is a very challenging sequence, since there are lots of thin structures, such as the fence shown in the top patch or the leaves shown in the bottom patch. Our algorithm accrurately recovers depth of most of these objects, while the boundaries of these objects become thicker in SGM depth (see two zoomed patches). Compared with Kim et al. [3], although our algorithm produces less accurate depth on thin structures, we also create fewer errors on flat regions, like the blue window behind the spikes (see the top patch). Also, we only use 9 frames while Kim et al. [3] uses 101 frames as input.

Statue

All three algorithms work well on this sequence, and our algorithm recovers more accurate boundaries of the antenna of the car (top patch) and leaves below the statue (bottom patch).

Back to top

4. Comparison with Kim et al. [3] on Midd-Q

To quantitatively compare our algorithm with Kim et al. [3], we test both aglorithms on the following three sequences from Middlebury 2001 dataset. Error rate (%) of recovered depth are shown below (threshold = 1.0). Our algorithm performs much better than~Kim et al. [3]. Note that the algorithm by Kim et al. [3] is not designed for a small number of input images.

Sequence	No. of Frames	Kim et al. [3]	Ours
Tsukuba	5	8.42	5.83
Venus	9	10.59	0.64
Sawtooth	9	6.25	1.46

Back to top

5. Varying number of frames as input Back to top
This experiment shows how our algorithm performs under different number of frames. The recovered depth map is shown on the left, and the corresponding error map is shown on the right. Please mouse over or click on the labels beneath each image to switch between visualization.

Input/Estimated depth map	Error map

Although we use 7 or 9 frames as input in most experiments, our algorithm still works reasonably well when fewer frames are provided, as shown in Fig. 9. Even with just 3 frames, clean depth maps can still be derived, and errors decrease when more frames are used.
Back to top