CSC 2530 - Visual Modeling

Project #1: Video Mosaics

Implementation Details and Results

October 19, 2001

The Team Members

George ElKoura (gelkoura@dgp.toronto.edu)
Sam Hasinoff (hasinoff@cs.toronto.edu)

Image Registration

At the heart of the implementation is the Lucas-Kanade image registration algorithm. We employ Lucas-Kanade in conjunction with Laplacian Pyramids to register images quickly. Estimates from the registration of coarser pyramid levels get propogated down the pyramid and are used to initialize estimates at the finer levels. The initial estimate of the displacement is always 0, since camera motion is relatively slow.

The registration is done on a gray scale version of each image, for speed and to avoid issues of calculating distance between points in RGB space. The following conversion formula is used:

    gray = 0.299 red + 0.587 green + 0.114 blue

This is the standard (linear) method of extracting intensity from an RGB image. Vista implements this in the VRGBImageToGray function.

For one of our sequences (Lounge 3), we didn't keep the camera on long enough to capture a full 360 degree panorama. As a result, there was a very large gap between the first and last images. The gap was approximately two-thirds the width of an image. To accomodate this, we implemented a brute-force search algorithm which was applied at the coarsest pyramid level. The algorithm exhaustively searches a discrete grid of displacements to find the one with the smallest error.

We did not apply the brute-force search over the entire mosaic, because we found it was sensitive to grid-size and the number of pyramid levels. With the wrong settings, it was prone to getting stuck in local minima. Close matches on the coarse image could direct it to an area of displacements that was not globally optimal, where it would get stuck even at the finer levels. Thus we only apply the brute-force method to the registration between the first and last images, where it seems to work remarkably well. See the results below.

Laplacian Pyramids

We decomposed our images into different levels of detail using a standard Laplacian pyramids approach. The filtering and image sampling made use of built-in Vista functionality.

The pyramid scheme was not implemented in its full generality for images of arbitrary dimensions. Instead, we imposed the restriction that the image dimensions must be divisible by some appropriate power of two corresponding to the number of pyramid levels. For example, a pyramid with five levels can only be generated from images whose dimensions are divisible by 2⁴=16.

The original Burt and Adelson paper did not describe appropriate behaviour at the image borders when applying the filter. We simply used the default Vista technique of replicating the nearest border pixels outside of the actual image.

Pseudo-Global Alignment

As expected, the first frame didn't naturally line up with the last frame in a manner suitable for cylindrical projection for any of the sequences that we tested. Our approach to fixing this problem consisted of three steps. First we replicate the first frame and register it with the last frame. Then, after generating the mosaic image, we compute half of the width of the first image and crop that amount from either side of the mosaic, thereby distributing the first frame evenly between the left and right sides. Finally, if the first frame is vertically displaced from the last frame, we apply a vertical shear to bring them into alignment, evenly distributing the error over the entire mosaic.

This technique and its results are shown in the Lounge sequences below.

When shearing is applied to images, we post-process the sheared mosaic to remove any completely black rows from the top and bottom of the mosaic.

Pasting

The pasting technique we used is Voronoi tesselation, as described by Peleg. The Voronoi intervals are based on displacement between images in the horizontal direction only -- sufficient for cylindrical panoramas.

The pasting is done from left-to-right to simplify the algorithm. If the sequence is reversed, images are read in the opposite direction.

The pasting is done column by column. Each column is grabbed from a single image corresponding to its Voronoi interval, so that every column is guaranateed to be from the centre of some image. This has the added benefit of eliminating radial distortions around at the edges of the image.

We did not investigate other pasting techniques that Peleg mentions as alternatives. For example, we would have liked to try setting each mosaic column to the average of all corresponding columns in the overlapping aligned images.

Bilinear Interpolation

Our initial tests with using bilinear interpolation suggested that the increase in quality was not worth the performance sacrifice. These initial tests were flawed because the test images were too small and were all cut-outs of a single image. Bilinear interpolation significantly improved the quality of the mosaics. Testing on real video data clearly shows the advantage of using bilinear interpolation.

Bilinear interpolation was used for both image registration and pasting. The alternative to not using bilinear interpolation was to use "closest-pixel" rounding, which accentuates off-by-one errors during pasting, whereas bilinear interpolation gives a nice average for the pixel colour at the desired location and smoothes out the error.

Compare the following two mosaics. The first is the mosaic of University Campus Quad that was generated without bilinear interpolation.

The following image was computed with exactly the same parameters, except that bilinear interpolation was used during image registration and pasting.

Notice that green tiles to the far left and far right of the image. In the un-interpolated image, you can plainly see jagged edges, whereas in the interpolated image, the edges are smooth.

A disadvantage of using bilinear interpolation is the aliasing visible at the top and bottom borders of certain images. We did not have time to repair this artefact.

The Program

The program we wrote to implement the image-mosaicing scheme described in Peleg is called mkmosaic. It is a command-line application that takes a set of numbered images and outputs a single mosaic image in ppm format.

    Usage: mkmosaic [options] -in basefilename -out mosaic -n numimages
	Where options are:
	    -r          - Reverses the image sequence
	    -s          - Applies a shear to the images to line up first and
                          last frames.
	    -l num      - Number of Laplacian pyramid levels (defaults to 5)
	    -v          - Variable size images (slightly slower)
	    -c          - Repeat first frame as last

The images are expected to be numbered basefilename001.v, basefilename002.v, and so on, for a total of numimages files.

The mosaic is generated from left-to-right. If the sequence of images is instead from right-to-left, using the "-r" option will correct the problem by reversing the way the sequence of images is read in. Not using this option on a right-to-left sequence will produce undefined results.

If you are generating a full 360 degree panorama, then most of the time you want to use the "-s" and the "-c" options. Combined, these options tell the program to attempt to line up the left and right portions of the final mosiac image, making the final image appropriate for viewing using a cylindrical mapping.

The program assumes that all the images in a sequence are the same size. This is usually the case. However, if the images are of variable size, then using the "-v" option will force the program to query each image individually for its size. Turning this option on will make the program a little slower.

The early implementations of the program loaded the entire sequence in memory before performing any calculations. This proved to be too tasking on some machines, especially for the hi-resolution sequences. A new memory management scheme is now employed whereby images are loaded on demand and freed after they are no longer needed.

Not Implemented

Blending

We did implement blending between Laplacian pyramids, using multi-resoultion splining. However, we didn't use this technique when generating the mosaic because we didn't have the time to extend it to more than two overlapping images. We also don't believe that the sequences we grabbed required any such blending because the gain control was fixed during the recording.

Rotation

We didn't implement support for camera rotation around the optical axis, simply because we didn't have the time. We see from our results that this drawback was not significant for our cylindrical panoramas. Even in the hand held sequence, where signficant rotation did occur, some rotation was accounted for and the images generally registered correctly.

The Results

The Lounge

We took three sequences of the Computer Science Graduate Students lounge in the Sanford Fleming building. All of these sequences had a significant down-drift, we believe because the camera was tilted somewhat sideways (rotated on its optical axis) on the tripod. The unadjusted result is shown in the image below.

The down-drift is even more severe in the case where no bilinear interpolation is performed. This is because fractional displacements get rounded-up, leading to even larger vertical displacements as shown in the image below.

To adjust for this down-drift, we used the option of our program that allows the user to apply a corrective shear to the image ("-s"). The shear is performed in the vertical (y) direction in the amount ydisp/moscols, where ydisp is the amount of vertical displacement and moscols is the width of the mosaic. The result after applying the shear is shown below.

One of the sequences shot of the lounge did not capture a full 360 degrees, so there was a large gap between the first and last images. Our standard image registration technique failed. Notice the fridge in the image below at the far right.

Using a brute force search at the coarsest level, we managed to register the first and last image over the large gap. This technique worked very well and the result is shown in the image below.

Panoramas

Front Campus

This sequence was shot outside while many objects (people and cars) in the scene were moving in the distant background. The results are quite good. A person was walking by, and as a result ended up blurry in the mosaic, but no ghosting effects or mis-registrations are visible. The seam is in the middle of Convocation Hall and is relatively smooth.

Panoramas

Front Campus

The Medical Sciences Building

This shot was intended to be a test for objects moving in the foreground. We decided to photograph a high-traffic area for this test. A man walked right in front of the camera, but is absent from the mosaic because he stayed off-centre in the frame.

There was quite a bit of variation in the lighting conditions in this sequence as well, from the shadowy entrance to the brightly lit street. Since the gain on the camera was fixed, blending was not an issue.

Excessive blurriness and ghosting in this mosaic are due to the rapid motion of the camera for this sequence.

Panoramas

The Medical Sciences Building

The University Campus Quad

This was a straightforward mosaic. The results are very nice -- perhaps our nicest mosaic of all. The only complication in this shot was the low lighting because it was filmed in the evening. Using the post-processing features of Adobe Premiere we were able to change the gamma-correction setting to brighten the images. We generated a higher resolution version to further test this sequence which performed equally well.

Panoramas

Hand Held

This sequence was shot outside of the Sanford Fleming building without a tripod. The camera was moved very quickly along a plane. This sequence contains moving objects, camera twists, and fast motion. In short, this is the image registration algorithm's worst nightmare.

All things considered, we were surprised it turned out this well. Notice the mis-registration error caused by the moving vehicle. The image tail light of the car appears in two spots. This hand held sequence is reminiscent of some of Peleg's VideoBrush results, and motivates the development of a system where camera frames can be captured and registered in real-time.

This image shows the result if no image registration was performed (only the middle of each image is grabbed).

Conclusions

The Peleg method turned out to be useful in generating panoramic mosaics, and was in fact a fairly simple algorithm to implement. However, we are not close to the real-time performance that was described in the paper. We are doing far more work than necessary for image registration: we register the entire image, not just a strip from the centre, and we use more iterations and pyramid levels than required for the sake of caution. Finally, our implementation has not been optimized.

Overall, we are very pleased with our results.

References

P. Burt and E. Adelson. The Laplacian pyramid as a compact image code, IEEE Transactions on Communications, 31:482-540, 1983
P. Burt and E. Adelson. A multiresolution spline with application to image mosaics, ACM Transactions on Graphics, 2:215-236, 1983.
B. D. Lucas and T. Kanade. An Iterative Image Registration Technique with an Application to Stereo Vision, International Joint Conference on Artificial Intelligence, pages 674-679, 1981.
R. Szeliski and H.Y. Shum, Creating Full View Panoramic Image Mosaics and Environment Maps, Proc. ACM SIGGRAPH, pp. 251-258, 1997.
S. Peleg and J. Herman, Panoramic Mosaics by Manifold Projection, CVPR'97, June 1997, pp. 338-343.