Extracted Faces for CUAVE and other AV Datasets

CUAVE Group Set

Extracted and aligned faces, along with audio features for all "group" sequences in the CUAVE database in MATLAB format can be downloaded:
cuave-group-aligned.zip 611MB.

Note that is a large ZIP file containing indivudal MATLAB .mat files for each sequence. Once unziped you should have a the following files:

>> ls
g01_aligned.mat g05_aligned.mat g09_aligned.mat g13_aligned.mat g17_aligned.mat g21_aligned.mat
g02_aligned.mat g06_aligned.mat g10_aligned.mat g14_aligned.mat g18_aligned.mat g22_aligned.mat
g03_aligned.mat g07_aligned.mat g11_aligned.mat g15_aligned.mat g19_aligned.mat
g04_aligned.mat g08_aligned.mat g12_aligned.mat g16_aligned.mat g20_aligned.mat

You can read an individual sequnce in matlab simply by doing a :

>> load('g01_aligned.mat');

It contains a 2 element cell array containing grayscale video frames, one 75x50xnFrames matrix for each person:

>> video
video =
[75x50x957 double] [75x50x957 double]

It also has frame indexed raw audio and mfccs in variables "audioIndexed", and "mfccs". Ground truth labeling from [Besson 06] is in the variable "labels." (Note: I recently added new labels for the period when both people are speaking). There may be some other random extra varaibles with some optical flow statistics I was playing around with at one point.

For any questions please contact siracusa at csail.mit.edu.

Dataset 2

Similarly the audio and video for a second dataset can be downloaded:
dataset2.mat 41MB (right click and save/download link).
In this dataset when an individual is speaking the other person looks at them.