

We have developed a method to reconstruct the 3d positions
of a moving human figure from observing the figure's motions over
time, recorded from a single video camera. This may have application
to humancomputer interaction, computer graphics, or interative
virtual environments.
We built an interactive tracking system to process real video
sequences, and can achieve reasonable 3d reconstructions of the human
figure motion for various sequences.
Reconstructing 3d from 2d (image) information is an underdetermined
problem, and we must rely of prior knowledge about how people tend to
move in order to resolve ambiguities. We learned a model of people
move, using examples of 3d human motion data obtained from a
multiplecamera "motion capture" system. We used Bayesian methods to


Background and objectives:
As one watches a film or video of a person moving, one can
easily estimate the 3dimensional motions of the moving person from
watching the 2d projected images over time. A dancer could repeat
the motions depicted in the film. Yet such 3d motion is hard for a
computer to estimate. Such estimation is the goal of this work.
Technical discussion:
Our approach is to use strong prior knowledge about how humans move.
We show that this prior knowledge dramatically improves the 3d
reconstructions. We learn our prior model from examples of 3d human
motion.
We first studied the 3d reconstruction in a simplified image rendering
domain where a Bayesian analysis provides analytic solutions to
fundamental questions about estimating figural motion from image data.
Using insights from the simplified domain, we applied our Bayesian
method to real images and reconstruct human figure motions from
archival video. Our system accomodates interactive correction of
automated 2d tracking errors, which allows reconstruction even from
difficult film sequences.
We represent human figure motion as a linear combination of short
snippets of the training examples. We use Singular Value
Decomposition to obtain the optimal 50 dimensional linear model, given
the training data.
Bayes rule provides two terms which multiply together to obtain the
optimal 3d reconstruction. The "likelihood" term constrains the 2d
projection of the 3d model to match the image data. We require that
the image data under the projection of a given limb part have an
approximately constant intensity over time.
The "prior" term constrains the reconstructed 3d model to be a
probably 3d human motion. We model human motions as a
highdimensional Gaussian distribution.
We use standard optimization techniques to find the 3d reconstruction
which makes the optimal tradeoff between fidelity to the image data,
and high prior probability of occurring, given the training data.
Those results show how to estimate the 3d figure motion if we can
place a 2d stick figure over the image of the moving person. We
developed such a tracker, allowing interactive correction of tracking
mistakes, to test our 3d recovery method. We show good recovery of
3d motion for a difficult dance sequence, viewed from a single
camera. These results show the power of adding prior knowledge about
human motions, in a Bayesian framework, to the problem of interpreting
images of people.
Bayesian reconstruction of 3D human motion from singlecamera video
Nicholas R. Howe, M. E. Leventon and W. T. Freeman,
in Adv. in Neural Information Processing Systems 12 (NIPS), edited by
S. A. Solla, T. K. Leen, and KR. Muller.
Available as
MERLTR9937
Bayesian estimation of 3d human motion
M. E. Leventon and W. T. Freeman
Available as
MERLTR9806
