Tony Ezzat's MikeTalk Page

Tony Ezzat and Tomaso Poggio.
MIT Center for Biological and Computational Learning

CHECK OUT OUR LATEST ANIMATION RESULTS HERE!

Overview

The goal of this project is to create a videorealistic text-to-audiovisual speech synthesizer. The system should take as input any typed sentence, and produce as output an audio-visual movie of a face enunciating that sentence. By videorealistic we mean that the final audiovisual output should look like it was a videocamera recording of a talking human subject.

Prior Work

Much of the prior work in text-to-audiovisual speech synthesis has focused on integrating facial models which are based on traditional 3D graphics modelling techniques. These 3D faces are often augmented with muscular models and physics simulations in order to enhance the visual realism of the mouth movement. Most of these methods, however, have not achieved a high degree of videorealism due to the complexity involved in modelling the mouth movement dynamics.

Our Approach

In order to improve videorealism, we have taken another approach which may best be summarized as an image-based, morphing method:

First, a visual corpus of a subject enunciating a set of key words is initially recorded. The corpus is designed so that it captures a large range of the mouth visemes associated with English speech.
Next, one single image for each viseme is identified and extracted from the corpus sequence.
Thirdly, we define a morph transformation from each viseme image to every other viseme image.
Finally, we utilize a text-to-speech system to convert unconstrained input text into a string of phonemes, along with duration information for each phoneme. Using this information, we determine the appropriate sequence of viseme transitions to make, as well as the rate of the transformations. The final visual sequence is composed of a concatenation of the viseme transitions, played in synchrony with the audio speech signal generated by the TTS system.

Results

The following sequences are a sample of our results. They are sentences produced by MikeTalk which were never uttered by the original speaker. Please let us know if you are having problems with the formats. Also, please contact the authors if youwould like a short videotape depicting the results of this work.

Note: The Quicktime sequences have been compressed with the Cinepak compressor for best playback speed (although this might affect picture quality slightly).

"12345"	SGI Quicktime AVI	"678910"	SGI Quicktime AVI
"goodmorning sir..how are you feeling today?"	SGI Quicktime AVI	"you have received 10 email messages."	SGI Quicktime AVI
"your account balance is $2125."	SGI Quicktime AVI	"cat, dog, pig, cow, moose, horse, sheep."	SGI Quicktime AVI
"welcome to Bell Atlantic's home page."	SGI Quicktime AVI	"hello dad, i just wanted to wish you a very happy birthday."	SGI Quicktime AVI
"your hotel room has been reserved. thank you for staying at Sheraton."	SGI Quicktime AVI	"ask not what your country can do for you, ask what you can do for your country."	SGI Quicktime AVI
"hello kids, our lesson for today will be about how to add two fractions."	SGI Quicktime AVI	"my name is mike jones."	SGI Quicktime AVI
"please press the button on your left."	SGI Quicktime AVI	"i have to say that i think that OJ Simpson killed his wife."	SGI Quicktime AVI

Patents

MIT has filed for patents on the technologies involved in this work.

Publications

You can read the details behind this work in the following recent papers:

Visual Speech Synthesis by Morphing Visemes Tony Ezzat and Tomaso Poggio, MIT AI Memo No 1658/CBCL Memo No 173. May 1999 (ps.gz) (pdf)

MikeTalk: A Talking Facial Display Based on Morphing Visemes, Tony Ezzat and Tomaso Poggio, Proceedings of the Computer Animation Conference Philadelphia, PA, June 1998. (ps.gz) (pdf)

Videorealistic Talking Faces: A Morphing Approach, Tony Ezzat and Tomaso Poggio, Proceedings of the Audiovisual Speech Processing Workshop, Rhodes, Greece, September 1997. (ps.gz) (pdf)

Last updated May 19, 1999. Send any comments or questions to Tony Ezzat