Optical Flow Based Robot Navigation


    Our aim is to develop algorithms that will be used for robust visual navigation of mobile autonomous agents. The approach is to efficiently compute and use optical flow fields to extract the features of the environment that are important for our purpose and to use this information as our guide for motion. We are using image frames supplied by both a monocular vision system of a physical robot and a simulated environment as the input for testing our algorithms. The raw images from these continuous image sequences are first converted to SLOG images, and then a patch matching is applied to these preprocessed images to obtain the optical flow field. Currently, two programs that we have designed make use of the computed flow information to avoid obstacles by a method called the balance strategy, and to estimate the time to contact with a surface.


Optical Flow Algorithm

    In an image, each pixel corresponds to the intensity value obtained by the projection of an object in 3-D space onto the image plane. When the objects move, their corresponding projections also change position in the image plane. Optical flow is a vector field that shows the direction and magnitude of these intensity changes from one image to the other [1]. The following example, which is obtained by our program, clarifies this discussion:
    This image shows a scene from a simulated environment, where the agent has just moved one step forward from its initial starting position.
The Optical Flow Field

    This second image shows the displacement of some selected points (a dense matrix of evenly spaced pixels) in the image plane - superimposed on the first image - that is caused by the motion of the agent. We color coded the flow vectors (also called "needles") so that their special attributes can easily be visualized. The meaning that the colors carry will be described after the algorithm is explained.

    We are computing the optical flow between each image in a continuous input stream, and using these flow fields we aim to extract necessary information like the layout of the surfaces seen in the images, and the motion of the observation point [2].

    The method that we are using to obtain the optical flow field is basically as follows [3] [4]:

    First, a gaussian filter is applied to the raw input images. This is a low pass filter, and has a blurring (smoothing) effect on the image. Then, laplacian filter is applied to obtain the second derivative information from the images. Theoretically, both of these filtering operations are 2-D convolutions, but practically we implement them as two 1-D and one 2-D convolutions. The same effect of applying a 2-D, gaussian filter - an NxN square matrix - is obtained in two 1-D steps (which helps us reduce the number of necessary operations from N² to 2N+1), and then the laplacian is applied as usual. The combined effect of these two filters are referred to as a LoG filter (Laplacian of Gaussian). The result of this operation is the detection of the edges in the images. After the LoG filter is applied to an image, the zero crossings of the intensity values show the position of the edges. Therefore, it suffices to look at the sign changes to detect the edges. We then produce binary sign of laplacian of gaussian (SLOG) images by using the sign information. The following figure shows two successive frames from an image sequence, the effects of the filters, and the binary SLOG images obtained at the end of the described process:

    Once we have two successive binary SLOG images, in order to find the displacements of features, we apply a procedure called patch matching [4] [5]; For each needle of the flow field, a patch centered around the origin of that needle in the first image is taken. Then this patch is compared with all of the same sized patches that have their centers in a search area in the second image. The search area is a rectangle whose center has the same coordinates with the origin of the needle, and whose sizes can be adjusted on the fly. If the number of the matching pixels in two patches are above some (percentage) threshold, then the two patches are considered to match. The vector defined by the needle origin and the center of the best matching patch is the displacement that we were trying to find.

    There are also some special cases that may arise during the matching process. We are also keeping records of them (for possible future use). These cases, and the color codes that are used in the images for visualization are the followings:

    Below are two more examples of optical flow fields:

    Note that almost all of the needles on the ground are non-match cases. This is because the texture on the ground is too fine-grained and gets lost in the gaussian filtering.

Back to top

Balance Strategy

    The balance strategy is a control law that can be used by mobile agents to avoid obstacles, to trail moving targets, or to escape from approaching enemies [2]. In order to manage all of these goals, the mobile agent should try to balance the amount of image intensity change on the left and right sides of the center of its field of view. In order to do this, the agent can determine the direction and the magnitude of its rotation by looking at the difference of the sum of the magnitudes of the left and right side optical flow vectors.

    If the agent wants to avoid obstacles, then it should turn away from the side that shows more motion, since this indicates a possible approach to a stationary obstacle. An example of this case is shown in the figure below:

    In this case, the sum of the magnitudes of the flow vectors on the left side is greater than the right side sum. Therefore the agent should turn right to avoid the obstacle.
    Similarly, if the agent wants to trail a target, it should turn to the same side that shows more motion. In this way, it keeps the target in focus.

    A simple modification makes use of this strategy in escaping from enemies. The agent simply moves backward, while keeping the enemy in focus as in trailing a target.

Back to top

Time To Contact Estimation

    The estimation of the remaining time to contact with a surface is another useful tool for autonomous navigation, which can be computed using optical flow fields [2]. When the agent moves at a constant speed, the TTC can be estimated by the ratio of the distance of a point in the image plane from the focus of expansion to the rate of change in this distance (divergence from the focus of expansion) [1][6].This gives us the number of remaining frames before a contact to that point takes place. The incorporation of the agent speed and the frame rate converts this estimate from the dimensionless remaining number of frames to the remaining seconds.
    This is a snapshot of our program in action computing time to contact estimations. The image sequence used is prerecorded by our robot while it was approaching a wall covered with newspapers with a speed of 8 cm/sec (The newspapers provide a texture that is more adequate for patch matching than the white, textureless wall). There are 76 frames in this sequence.

    In order to reduce the effect of input noise, or the errors that may occur due to the integer nature of the needle lengths, our program averages over a large number of estimates to determine the overall single result.

    This graph shows each of the time to contact estimations made by the above run of our program for all of the successive frame pairs in the input stream.
Back to top

Our Test Domains

    We test our programs both in simulated environments, and in real environments by using a physical robot:
    We use Crystal Space, a free 3-D engine as the basis for our simulated environments. It is developed in C++. For more information about Crystal Space, click on the logo, or follow the link below:

Crystal Space

    Our robot, Erik, is an indoor robot (RWI - B21r) equipped with a digital 1/2" CCD color camera with digital signal processing capabilities. You can learn more about Erik by clicking on his images or by clicking the following link:

Erik (RWI - B21r)

For more information about our camera, click on the image or follow the link below:


Back to top

Current Test Results

    The following videos show the current performance of our algorithms in both simulated and real environments:
    Erik is finding its way between chairs in our laboratory using the balance strategy for obstacle avoidance.     Erik is running down the corridor, again using the balance strategy. Note that this video is produced with a frame rate of 3 frames per second, therefore please adjust your mpeg player program accordingly to see the real motion.
    Virtual Erik is wandering in a simulated environment created by Crystal Space engine. Another simulated environment. Virtual Erik is passing through a door, and following a corridor.

Back to top


[1] Ted A. Camus, ``Real-Time Optical Flow'', Ph.D. Thesis, May 1995.
[2] Andrew P. Duchon, William H. Warren, and Leslie Pack Kaelbling, ``Ecological Robotics'', Adaptive Behavior, Volume 6, Number 3/4, 1998.
[3] Marr, D., and Hildreth, E.C., ``Theory of Edge Detection'', RoyalP(B-207), 1980, pp. 187-217.
[4] Nishihara, H. K., ``Practical Real-Time Imaging Stereo Matcher'', OptEng(23), No. 5, September/October 1984, pp. 536-545.
[5] Nishihara, H. K., ``Real-Time Implementation of a Sign-Correlation Algorithm for Image-Matching'', Technical report, Teleos Research, February 1990.
[6] N. Ancona and T. Poggio, ``Optical Flow from 1D Correlation: Application to a Simple Time-to-Crash Detector'', In: Proceedings of the Fourth International Conference on Computer Vision, Berling, Germany, May 1993.

Back to top

Selim Temizer

Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
The Artificial Intelligence Laboratory
Learning and Intelligent Systems (LIS) Group