Multi-View Latent Variable Discriminative Models

Multi-View Latent Variable Discriminative Models for Action Recognition

Presented at CVPR 2012

Yale Song¹, Louis-Philippe Morency², Randall Davis¹
¹MIT Computer Science and Artificial Intelligence Laboratory
²USC Institute for Creative Technology

Figure 1. Graphical representations of multi-view latent variable discriminative models.

Abstract

Many human action recognition tasks involve data that can be factorized into multiple views such as body postures and hand shapes. These views often interact with each other over time, providing important cues to understanding the action. We present multi-view latent variable discriminative models that jointly learn both view-shared and view-specific sub-structures to capture the interaction between views. Knowledge about the underlying structure of the data is formulated as a multi-chain structured latent conditional model, explicitly learning the interaction between multiple views using disjoint sets of hidden variables in a discriminative manner. The chains are tied using a predetermined topology that repeats over time. We present three topologies --linked, coupled, and linked-coupled-- that differ in the type of interaction between views that they model. We evaluate our approach on both segmented and unsegmented human action recognition tasks, using the ArmGesture, the NATOPS, and the ArmGesture-Continuous data. Experimental results show that our approach outperforms previous state-of-the-art action recognition models.

Publications

CVPR 2012 paper [pdf][bibtex]
CVPR 2012 poster [pdf]
MLSS 2011 poster [pdf]

Codes

HCRF-light 2.0 [zip] (5 MB)
Comes with an implementation of hierarchical sequence summarization [3].
HCRF-light 1.1 [zip] (28 MB)
Provides MATLAB wrapper function and sample script. Also comes with precompiled excutables and mex files for Windows 32 bit and 64 bit machines.
HCRF-light 1.0 [zip] (3 MB)
This light verson contains implementations of HCRF [1], LDCRF [2], and their multi-view counterparts, MV-HCRF and MV-LDCRF.
HCRF library Sourceforge [SourceForge.net]
Multi-view latent variable discriminative models are a part of the larger project, hCRF library, hosted by the SourceForge.net

Datasets

NATOPS Dataset [mat] (4 MB)
This dataset contains three pairs of body-hand gestures used when handling aircraft on the deck of an aircraft carrier. The observation features include automatically tracked 3D body postures and hand shapes. The body feature includes 3D joint velocities for left/right elbows and wrists, and represented as a 12D feature vector. The hand feature includes probability estimates of five predefined hand shapes - opened/closed palm, thumb up/down, and "no hand". The fifth shape, no hand, was dropped in the final representation, resulting in an 8D feature vector. The dataset was sampled at 20 FPS.
ArmGesture Dataset [1] [mat] (2.6 MB)
This dataset includes the six arm gestures. Observation features include automatically tracked 2D joint angles and 3D euclidean coordinates for left/right shoulders and elbows; each observation is represented as a 20D feature vector. The dataset was collected from 13 participants with an average of 120 samples per class (exact sample counts per class are [88, 117, 118, 132, 179, 90]).
ArmGesture-Continuous Dataset [mat] (2.4 MB)
This dataset is contains unsegmented squences of gestures based on the ArmGesture dataset. To generate an unsegmented sequence, we randomly selected 3 to 5 (segmented) samples from different classes, and concatenated them in random order. This resulted in 182 samples in total, with an average of 92 frames per sample.

References

[1] Ariadna Quattoni, Sy Bor Wang, Louis-Philippe Morency, Michael Collins, Trevor Darrell: Hidden Conditional Random Fields. TPAMI 2007
[2] Louis-Philippe Morency, Ariadna Quattoni, Trevor Darrell: Latent-Dynamic Discriminative Models for Continuous Gesture Recognition. CVPR 2007
[3] Yale Song, Louis-Philippe Morency, Randall Davis: Action Recognition by Hierarchical Sequence Summarization. CVPR 2013

Acknowledgements

This work was funded by the Office of Naval Research Science of Autonomy program #N000140910625, the National Science Foundation #IIS-1018055, and the U.S. Army Research, Development, and Engineering Command (RDECOM).

Last update: May 30, 2012