Corpus

We collected a small 20-second audio corpus consisting of 9 Harvard List sentences sampled at 16Khz. All of the sentences were STFT-analyzed using a 48msec Hamming window with a 2ms frame rate, producing about 9000 audio frames for the entire corpus. Each frame was FFT'ed with a zeropadding factor of 4, yielding a hi-resolution 3072-dimensional spectral frame. For each frame, we extracted the smooth spectral magnitude envelope using a cepstral method.

Included in the corpus of the 9000 smoothed spectral envelopes are the first and last frames of all the sounds shown below. However, none of the intermediate frames are included in the corpus!

Algorithm

The morphs are generated by using the "graph flow" algorithm presented in Section 7 of the paper. The algorithm first constructs the corpus graph representation of the corpus. Then the Dijkstra shortest path algorithm is used to compute the shortest path between any two chosen envelopes (such as the first and last frames of each sound below). Concatenated flow is then computed along the intermediate envelopes of the shortest path to produce the final cumulative audio flow. Finally a morph is computed along this final audio flow.

Results

We generated morph transitions between the first and last frames of the various sounds shown below. For each sound transition, we present:

In all cases, we present the transition in movie format and in image (spectrogram) format. Y-axis for the movies is absolute amplitude, while X-axis is frequency from 0 to 5 Khz. Y-axis for the spectrograms is frequency from 0 to 8Khz, X-axis is frames, and Z-axis is log amplitude.

Real Transition

Cross-Faded Transition

Morphed Transition using Graph Flow

/ay/

Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg)

/oy/

Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg)

/ow/

Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg)

/aw/

Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg)

/weh/

Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg)

/aar/

Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg)

/ehl/

Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg)

In the following sounds, there is not much formant movement, but formant appearance/disappearance instead. Our audio flow morphing algorithm handles these cases as well.

Real Transition

Cross-Faded Transition

Morphed Transition Using Graph Flow

/mao/

Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg)

/ahn/

Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg)

In the following cases, a 20second-corpus was not enough to extract a good transition, so we increased the size to 30 seconds, and re-applied the algorithm yielding appropriate transitions shown below.

Real Transition

Cross-Faded Transition

Morphed Transition Using Graph Flow

/yuw/

Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg)

/ey/

Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg) Movie (AVI) Spectrogram (jpg)


Last updated August 25, 2005. Send any comments or questions to Tony Ezzat