Multimodal Communication Error Detection

 


GOAL/APPROACH

 

We propose a framework for communication error detection in conversational systems using audio-visual observations of users.

Our approach consists in extracting audio and visual features of the user interacting with a conversational system, analyzing the features corresponding to a conversational turn using a temporal sequence model-based approach (e.g., HMM, CRF, HCRF) on the individual modalities and performing the fusion of the results for the two modalities.

We varied the types of visual features and audio features extracted to compare and find discriminative ones for training and testing. Our initial results have shown that for the visual features, facial motion estimates perform better than head pose estimates in detecting errors. For the audio features, lexical-based prosody features perform better than affect-based prosody features. A separate experiment has also shown that computing the relative changes in prosody statistics from one utterance to the next are very useful for error detection as well, and that fusion by voting has achieved 83.3% accuracy in error detection. We are in the process of collecting more data, and conducting new experiments.

Additional information can be found here.

 


DEMONSTRATION

 

Face tracking: Demo

This video shows our face tracking algorithm estimating the pose of the user’s face while he is interacting with a conversational system (SLS Galaxy ‘Web restaurant application’)

 

Face tracking: Demo

This video shows our integrated algorithms for estimating face pose/gesture and speaking activity

 

 

Communication Error Detection (Coming soon…)

 

 


 REFERENCES

o       S. Wang, D. Demirdjian and T. Darrell. Detecting Communication Errors from Visual Cues during the System's Conversational Turn. Proceedings of the International Conference on Multimodal Interfaces, 2007 (to appear).

o       S. Wang, D. Demirdjian, H. Kjellstrom and T. Darrell. Multimodal Communication Error Detection for Driver-Car Interaction. In the 4th International Conference on Informatics in Control, Automation and Robotics, 2007. [PDF]