Orientation histograms for hand gesture recognition


We present a method to recognize hand gestures, based on a pattern recognition technique developed by McConnell \cite{McConnell86} employing histograms of local orientation. We use the orientation histogram as a feature vector for gesture classfication and interpolation. This method is simple and fast to compute, and offers some robustness to scene illumination changes. We have implemented a real-time version, which can distinguish a small vocabulary of about 10 different hand gestures. All the computation occurs on a workstation; special hardware is used only to digitize the image. A user can operate a computer graphic crane under hand gesture control, or play a game. We discuss limitations of this method. For moving or ``dynamic gestures'', the histogram of the spatio-temporal gradients of image intensity form the analogous feature vector and may be useful for dynamic gesture recognition.

Subset of vocabulary of gestures used to control computer graphic crane. (a) shows the training set of gestures for the commands up, down and right. (c) shows a test set of the same gestures, under the same lighting conditions. (e) is a test set, made under different lighting conditions. (b), (d), and (f) are the corresponding orientation histograms. Note that the shapes look approximately the same as for the same hand positions made under different lighting conditions, (b). An extension of this vocabulary of commands can control in real-time a computer graphic crane, (g).

Problem images for the orientation histogram based gesture classifier. Users typically feel that (a) and (b) represent the same gesture, yet their orientation histograms are very different, shown overlaid in (c). A remedy for this problem is to provide training images of the gesture at various orientations. (Mathematical rotation of the feature vector is not sufficient; the corresponding orientation histograms are typically not simple rotations of each other.) Sometimes small changes in the image can cause large semantic differences, while changing the orientation histograms little. Users classify (d) and (e) as different gestures, yet their orientation histograms are nearly identical, (f). One has to construct a gesture vocabulary which avoids such gestures with similar orientation histograms. Finally, for this simple statistical technique to work, the hand must dominate the image. If it does not, then even large changes in the hand pose can cause negligible changes to the orientation histogram (g) -- (i).


Computer Vision for Interactive Computer Graphics, IEEE Computer Graphics and Applications, Vol. 18, No. 3, May-June 1998
W. T. Freeman, D. B. Anderson, P. A. Beardsley, C. N. Dodge, M. Roth, C. D. Weissman, W. S. Yerazunis, H. Kage, K. Kyuma, Y. Miyake, and K. Tanaka.
Also available as MERL-TR99-02.

W. T. Freeman and M. Roth, Orientation histograms for hand gesture recognition, Intl. Workshop on Automatic Face- and Gesture- Recognition, IEEE Computer Society, Zurich, Switzerland, June, 1995, pp. 296--301. MERL-TR94-03.