CUAVE Group Set

Extracted and aligned faces, along with audio features for all "group" sequences in the CUAVE database in MATLAB format can be downloaded: 611MB.

Note that is a large ZIP file containing indivudal MATLAB .mat files for each sequence. Once unziped you should have a the following files:

>> ls
g01_aligned.mat g05_aligned.mat g09_aligned.mat g13_aligned.mat g17_aligned.mat g21_aligned.mat
g02_aligned.mat g06_aligned.mat g10_aligned.mat g14_aligned.mat g18_aligned.mat g22_aligned.mat
g03_aligned.mat g07_aligned.mat g11_aligned.mat g15_aligned.mat g19_aligned.mat
g04_aligned.mat g08_aligned.mat g12_aligned.mat g16_aligned.mat g20_aligned.mat

You can read an individual sequnce in matlab simply by doing a :

>> load('g01_aligned.mat');

It contains a 2 element cell array containing grayscale video frames, one 75x50xnFrames matrix for each person:

>> video
video =
[75x50x957 double] [75x50x957 double]

It also has frame indexed raw audio and mfccs in variables "audioIndexed", and "mfccs". Ground truth labeling from [Besson 06] is in the variable "labels." (Note: I recently added new labels for the period when both people are speaking). There may be some other random extra varaibles with some optical flow statistics I was playing around with at one point.

For any questions please contact siracusa at

Dataset 2

Similarly the audio and video for a second dataset can be downloaded:
dataset2.mat 41MB (right click and save/download link).
In this dataset when an individual is speaking the other person looks at them.