Mary101face Mary101
Tony Ezzat, Gadi Geiger, and Tomaso Poggio.
MIT Center for Biological and Computational Learning

In memory of Christian Benoit

Multidimensional Morphable Models

At the heart of our visual speech synthesis approach is the multidimensional morphable model representation, which is a generative model of video capable of morphing between various lip images to synthesize new, previously unseen lip configurations.

The basic underlying assumption of the MMM is that the complete set of mouth images associated with human speech lie in a low-dimensional space whose axes represent mouth appearance variation and mouth shape variation. Mouth appearance is represented in the MMM as a set of prototype images extracted from the recorded corpus. Mouth shape is represented in the MMM as a set of optical flow vectors computed automatically from the recorded corpus. In the work presented here, 46 images are extracted and 46 optical flow correspondences are computed. The low-dimensional MMM space is paramaterized by shape parameters alpha and appearance parameters beta.

The MMM maybe viewed as a ``black box'' capable of performing two tasks: Firstly, given as input a set of parameters alpha, beta, the MMM is capable of synthesizing an image of the subject's face with that shape-appearance configuration. Synthesis is performed by morphing the various prototype images to produce novel, previously unseen mouth images which correspond to the input parameters alpha, beta.

Conversely, the MMM can also perform analysis: given an input lip image, the MMM computes shape and appearance parameters alpha, beta that represent the position of that input image in MMM space. In this manner, it is possible to project} the entire recorded corpus onto the constructed MMM, and produce a time series of alpha, beta parameters that represent trajectories of mouth motion in MMM space. We term this operation analyzing the recorded corpus.


home Send any comments or questions to Tony Ezzat