Multi-View Latent Variable Discriminative Models for Action Recognition
Presented at CVPR 2012
Yale Song1,
Louis-Philippe Morency2,
Randall Davis1
1MIT Computer Science and Artificial Intelligence Laboratory
2USC Institute for Creative Technology

Figure 1. Graphical representations of multi-view latent variable discriminative models.
Many human action recognition tasks involve data that
can be factorized into multiple views such as body postures
and hand shapes. These views often interact with each other
over time, providing important cues to understanding the
action. We present multi-view latent variable discriminative
models that jointly learn both view-shared and view-specific
sub-structures to capture the interaction between views.
Knowledge about the underlying structure of the data is
formulated as a multi-chain structured latent conditional model,
explicitly learning the interaction between multiple views using
disjoint sets of hidden variables in a discriminative manner.
The chains are tied using a predetermined topology that repeats
over time. We present three topologies --linked, coupled, and
linked-coupled-- that differ in the type of interaction between
views that they model. We evaluate our approach on both segmented
and unsegmented human action recognition tasks, using the ArmGesture,
the NATOPS, and the ArmGesture-Continuous data. Experimental results
show that our approach outperforms previous state-of-the-art action
recognition models.
- CVPR 2012 paper [pdf][bibtex]
- CVPR 2012 poster [pdf]
- MLSS 2011 poster [pdf]
- HCRF-light 2.0 [zip] (5 MB)
Comes with an implementation of hierarchical sequence summarization [3].
- HCRF-light 1.1 [zip] (28 MB)
Provides MATLAB wrapper function and sample script. Also comes with precompiled excutables and mex files for Windows 32 bit and 64 bit machines.
- HCRF-light 1.0 [zip] (3 MB)
This light verson contains implementations of HCRF [1], LDCRF [2], and their multi-view counterparts, MV-HCRF and MV-LDCRF.
- HCRF library Sourceforge [SourceForge.net]
Multi-view latent variable discriminative models are a part of the larger project, hCRF library, hosted by the SourceForge.net
- NATOPS Dataset [mat] (4 MB)
This dataset contains three pairs of body-hand gestures used when handling aircraft on the deck of an aircraft carrier. The observation features include automatically tracked 3D body postures and hand shapes. The body feature includes 3D joint velocities for left/right elbows and wrists, and represented as a 12D feature vector. The hand feature includes probability estimates of five predefined hand shapes - opened/closed palm, thumb up/down, and "no hand". The fifth shape, no hand, was dropped in the final representation, resulting in an 8D feature vector. The dataset was sampled at 20 FPS.
- ArmGesture Dataset [1] [mat] (2.6 MB)
This dataset includes the six arm gestures. Observation features include automatically tracked 2D joint angles and 3D euclidean coordinates for left/right shoulders and elbows; each observation is represented as a 20D feature vector. The dataset was collected from 13 participants with an average of 120 samples per class (exact sample counts per class are [88, 117, 118, 132, 179, 90]).
- ArmGesture-Continuous Dataset [mat] (2.4 MB)
This dataset is contains unsegmented squences of gestures based on the ArmGesture dataset. To generate an unsegmented sequence, we randomly selected 3 to 5 (segmented) samples from different classes, and concatenated them in random order. This resulted in 182 samples in total, with an average of 92 frames per sample.
References
[1] Ariadna Quattoni, Sy Bor Wang, Louis-Philippe Morency, Michael Collins, Trevor Darrell: Hidden Conditional Random Fields. TPAMI 2007
[2] Louis-Philippe Morency, Ariadna Quattoni, Trevor Darrell: Latent-Dynamic Discriminative Models for Continuous Gesture Recognition. CVPR 2007
[3] Yale Song, Louis-Philippe Morency, Randall Davis: Action Recognition by Hierarchical Sequence Summarization. CVPR 2013
Acknowledgements
This work was funded by
the Office of Naval Research Science of Autonomy program #N000140910625,
the National Science Foundation #IIS-1018055,
and the U.S. Army Research, Development, and Engineering Command (RDECOM).
Last update: May 30, 2012