Projects

 

Home
Projects
Publications
Curriculum Vitae
Videos
Softwares

horizontal rule

Current projects

bulletContext-based visual feedback recognition

        Context-based Recognition is a new concept to improve visual feedback recognition where contextual knowledge from the interactive system is analyzed online to anticipate visual feedback from the human participant. We developed a multi-modal framework for context-based visual recognition was successfully tested on conversational and non-embodied interfaces for head gesture and eye gestures recognition and has shown a statistically significant improvement in recognition performance. See details: Project webpage

bulletDiscriminative model for sequence labeling (gesture recognition)

        Latent-Dynamic Conditional Random Fields is a new discriminative model for visual gesture recognition which can model the sub-structure of a gesture sequence, can learn the dynamics between gesture labels and be directly applied to label un-segmented sequences. The LDRCF model outperforms previous approaches (i.e. HMM, SVM, CRF and HCRF) for visual gesture recognition and can efficiently learn relevant contextual information necessary for visual feedback anticipation. See details: Project webpage

bulletUser studies: Visual feedback with embodied agents

        During face-to-face conversation, people use visual feedback to communicate relevant information and to synchronize communicative rhythm between participants. While a large literature exists in psychology describing and analyzing visual feedback during human-to-human interaction, there are still many unanswered questions about natural visual feedback for multimodal interfaces.

        In parallel to novel algorithm development, I have investigated the types of visual feedback that are naturally performed by human participants when interacting with multimodal interfaces and which, if automatically recognized, can improve the user experience. My thesis includes user studies to analyze specific head gaze cues, head gestures (head nod and head shake), eye gaze cues, and eye gestures (gaze aversion). I designed my user studies to explore two different axes: embodiment and conversational capabilities. For embodiment, I grouped the multimodal interfaces into three categories: virtual embodied interfaces, physical embodied interfaces and non-embodied interfaces. The conversational capabilities of interactive systems ranged from simple trigger-based interfaces to more elaborate conversational interfaces. In a recent user study, I observed that human participants naturally performed eye gaze aversion gestures (i.e. eye movements to empty or uninformative regions of space) when interacting with an embodied agent [ICMI 2006]. If correctly recognized, these eye gestures can be interpreted by the interface as a “thinking state” of the user and the interface can wait for mutual gaze to be re-established before taking its turn.

bulletAppearance model for real-time head pose estimation

         Adaptive View-based Appearance Model (AVAM) is a new user independent approach for head pose estimation which merges differential tracking with view-based tracking. AVAMs can track large head motion for long periods of time with bounded drift. We experimentally observed an RMS error within the accuracy limit of an attached inertial sensor. The AVAM is part of a real-time head pose tracking system available for download.  The tracker can accurately estimate the head position and orientation for a long period of time. For more information, please visit the project website: Watson


This page was last updated on 12/07/06.