Generating Novel Views of Objects for Class Recognition

Multi-view object class recognition can be achieved using existing approaches for single-view object class recognition, by treating different views as entirely independent classes. This strategy requires a large amount of training data for many viewpoints, which can be costly to obtain. In this project, we propose the Potemkin model, which can be constructed from as few as two views of an object of the target class, and which can be used to transform images of objects from one view to several other views, effectively multiplying their value for class recognition. These transformed objects, which can be easily combined with the background of the original image to generate a complete image in the new view, can then be used as virtual training data for any view-dependent 2D recognition system. We show that automatically transformed images dramatically decrease the data requirements for multi-view object class recognition.

The Potemkin model of an object class can be viewed as a collection of parts, which are oriented 3D primitive shapes. There are two different versions of this model. In our CVPR'07 paper, we propose the basic Potemkin model, which only uses a single oriented primitive for learning the transforms from view to view. In our CVIU paper, we extend the basic Potemkin model to use a basis set of multiple oriented primitives, and to select among them to represent each of the parts of the target class based on a pair of initial training images of the class.


Han-Pang Chiu, Leslie Pack Kaelbling, and Tomas Lozano-Perez, "Learning to Generate Novel Views of Objects for Class Recognition", Computer Vision and Image Understanding (CVIU), 2009. pdf

Han-Pang Chiu, Leslie Pack Kaelbling, and Tomas Lozano-Perez, "Virtual Training for Multi-View Object Class Recognition", CVPR, 2007. pdf