In collaboration with LP Morency and Trevor Darrell
Many visual trackers lose track of their target over time. To prevent this from happening, our trackers determine the pose of the object by matches each image of the object to an object model by trying various transformations on the model. The transformation that produces the best match with the image is reported as the pose of the object. The model is refined online as tracking goes on. Our contribution is the use of object models that consist of a collection of pose-annotated keyframes.
We have built several trackers based on this principle: a head tracker and various egomotion estimators. Our head tracker relies on stereo cameras from Videre Design. The head tracker runs at about 14 frames per second, and is accurate to within a few degrees of rotation and a few millimiters of translation when the subject is 1-2 meters away from the camera.
![]() |
![]() |
An example video (7.5MB). The number of dots under the box represent the number of keyframes used to estimate the pose of that frame. See paper for more detail. See another video (11.4MB). This illustrative video (46 MB) shows how the pose of keyframes (squares on the grid) are refined over time.
Apart from segmentation, there is not much difference between tracking an object using a stationary camera and tracking the position of a moving camera moving through a stationary scene. We have used view-based appearance models for estimation camera motion as well.
The following papers describe these trackers in detail.
The head tracker is now maintained by LP Morency.
Louis-Philippe has enhanced the original registration algorithm by
recoding it in C++ (to make it real-time) and combining it with the
ICP 3D registration algortihm to improve accuracy. These are the
original registration algorithm and LP's enhnancement:
We have used the tracker for various human-computer interactions.
See LP's page for examples of the tracker being used to obtain
feedback for interactive dialog systems.