MIT Vision and Modeling TR 190 -- June 1992
Two paradigms for visual analysis are {\em top-down}, starting from high-level models or information about the image, and {\em bottom-up}, where little is assumed about the image or objects in it. We explore a local, bottom-up approach to image analysis. We develop operators to identify and classify image junctions, which contain important visual cues for identifying occlusion, transparency, and surface bends.
Like the human visual system, we begin with the application of linear filters which are oriented in all possible directions. We develop an efficient way to create an oriented filter of arbitrary orientation by describing it as a linear combination of {\em basis filters}. This approach to oriented filtering, which we call {\em steerable filters}, offers advantages for analysis as well as computation. We design a variety of steerable filters, including steerable quadrature pairs, which measure local energy. We show applications of these filters in orientation and texture analysis, and image representation and enhancement.
We develop methods based on steerable filters to study structures such as contours and junctions. We describe how to post-filter the energy measures in order to more efficiently analyze structures with multiple orientations. We introduce a new detector for contours, based on energy local maxima. We analyze contour phases at energy local maxima, and compare the results with the prediction of a simple model.
Using these tools, we analyze junctions. Based on local oriented filters, we develop simple mechanisms which respond selectively to ``T'', ``L'', and ``X'' junctions. The T and X junctions may indicate occlusion and transparency, respectively. These mechanism show that detectors for important, low-level visual cues can be built out of oriented filters and energy measures, which resemble responses found in the visual cortex.
We present a second approach to junction detection based on salient contours. We combine our contour detector with the structural saliency algorithm of Shashua and Ullman, which finds visually salient contours. To improve its descriptive power, we include a competitive mechanism in the algorithm. From the local configuration of saliencies, we form simple detectors which respond to cues for occlusion, transparency and surface bending. Using the saliency values and curve linking information, we can propagate this information along image contours.
For both algorithms, we show successful results on simple synthetic and natural images. We show results for more complicated scenes and discuss the methods do not work, and why. Each algorithm uses only local calculations applied in parallel throughout the image, and assumes little prior information about the objects it expects to see.
pdf file of full thesis (2.3 Mbytes).