MikeTalk
Tony Ezzat and Tomaso Poggio.
MIT Center for Biological and Computational Learning

CHECK OUT OUR LATEST ANIMATION RESULTS HERE!

Overview

The goal of this project is to create a videorealistic text-to-audiovisual speech synthesizer. The system should take as input any typed sentence, and produce as output an audio-visual movie of a face enunciating that sentence. By videorealistic we mean that the final audiovisual output should look like it was a videocamera recording of a talking human subject.

Prior Work

Much of the prior work in text-to-audiovisual speech synthesis has focused on integrating facial models which are based on traditional 3D graphics modelling techniques. These 3D faces are often augmented with muscular models and physics simulations in order to enhance the visual realism of the mouth movement. Most of these methods, however, have not achieved a high degree of videorealism due to the complexity involved in modelling the mouth movement dynamics.

Our Approach

In order to improve videorealism, we have taken another approach which may best be summarized as an image-based, morphing method:

Results

The following sequences are a sample of our results. They are sentences produced by MikeTalk which were never uttered by the original speaker. Please let us know if you are having problems with the formats. Also, please contact the authors if youwould like a short videotape depicting the results of this work.

Note: The Quicktime sequences have been compressed with the Cinepak compressor for best playback speed (although this might affect picture quality slightly).
"12345" SGI
Quicktime
AVI
"678910" SGI
Quicktime
AVI
"goodmorning sir..how are
you feeling today?"
SGI
Quicktime
AVI
"you have received 10 email messages." SGI
Quicktime
AVI
"your account balance is $2125." SGI
Quicktime
AVI
"cat, dog, pig, cow, moose, horse, sheep." SGI
Quicktime
AVI
"welcome to Bell Atlantic's home page." SGI
Quicktime
AVI
"hello dad, i just wanted to wish you a very happy birthday." SGI
Quicktime
AVI
"your hotel room has been reserved. thank you for staying at Sheraton." SGI
Quicktime
AVI
"ask not what your country can do for you,
ask what you can do for your country."
SGI
Quicktime
AVI
"hello kids, our lesson for today will be about how to add two fractions." SGI
Quicktime
AVI
"my name is mike jones." SGI
Quicktime
AVI
"please press the button on your left." SGI
Quicktime
AVI
"i have to say that i think that OJ Simpson killed his wife." SGI
Quicktime
AVI

Patents

MIT has filed for patents on the technologies involved in this work.

Publications

You can read the details behind this work in the following recent papers:
Visual Speech Synthesis by Morphing Visemes Tony Ezzat and Tomaso Poggio, MIT AI Memo No 1658/CBCL Memo No 173. May 1999 (ps.gz) (pdf)

MikeTalk: A Talking Facial Display Based on Morphing Visemes, Tony Ezzat and Tomaso Poggio, Proceedings of the Computer Animation Conference Philadelphia, PA, June 1998. (ps.gz) (pdf)

Videorealistic Talking Faces: A Morphing Approach, Tony Ezzat and Tomaso Poggio, Proceedings of the Audiovisual Speech Processing Workshop, Rhodes, Greece, September 1997. (ps.gz) (pdf)

Last updated May 19, 1999. Send any comments or questions to Tony Ezzat