Abstract
This paper presents a novel framework for human action recognition based on a newly proposed mid-level feature representation method named Lie Algebrized Guassians (LAG). As an action sequence can be treated as a 3D object in space-time space, we address the action recognition problem by recognizing 3D objects and characterize 3D objects by the probability distributions of local spatio-temporal features. First, for each video, we densely sample local spatio-temporal features (e.g. HOG3D) at multiple scales confined in bounding boxes of human body. Moreover, normalized spatial coordinates are appended to local descriptor in order to capture spatial position information. Then the distribution of local features in each video is modeled by a Gaussian Mixture Model (GMM). To estimate the parameters of video-specific GMMs, a global GMM is trained using all training data and video-specific GMMs are adapted from the global GMM. Then the LAG is adopted to vectorize those video-specific GMMs. Finally, linear SVM is employed for classification. Experimental results on the KTH and UCF Sports dataset show that our method achieves state-of-the-art performance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Software available at http://htk.eng.cam.ac.uk/
References
Bregonzio M, Gong S, Xiang T (2009) Recognising action as clouds of space-time interest points. In: IEEE conference on computer vision and pattern recognition
Chang C, Lin C (2011) LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2(27):1–27. 27, Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance
Gilbert A, Illingworth J, Bowden R (2009) Fast realistic multi-action recognition using mined dense spatio-temporal features. In: IEEE international conference on computer vision
Gong L, Chen M, Hu C (2013) Lie algebrized gaussians for image representation. arXiv:1304.0823v1
Kläser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: British machine vision conference
Kovashka A, Grauman K (2010) Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: IEEE conference on computer vision and pattern recognition
Laptev I, Lindeberg T (2003) Space-time interest points. In: IEEE international conference on computer vision
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE conference on computer vision and pattern recognition
Le Q, Zou W, Yeung S, Ng A (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: IEEE conference on computer vision and pattern recognition
Lin Z, Jiang Z, Davis LS (2009) Recognizing actions by shape-motion prototype trees. In: IEEE international conference on computer vision
Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos “in the wild”. In: IEEE conference on computer vision and pattern recognition
Liu J, Shah M (2008) Learning human actions via information maximization. In: IEEE conference on computer vision and pattern recognition
Liu L, Wang L, Liu X (2011) In defense of soft-assignment coding. In: IEEE international conference on computer vision
O’Hara S, Draper B (2012) Scalable action recognition with a subspace forest. In: IEEE conference on computer vision and pattern recognition
Oikonomopoulos A, Patras I, Pantic M (2005) Spatio-temporal salient points for visual recognition of human actions. IEEE Trans Syst Man Cybern, Part B: Cybern 36(3):710–719
Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted gaussian mixture models. Digit Signal Process 10(1–3):19–41
Rodriguez MD, Ahmed J, Shah M (2008) Action mach: A spatio-temporal maximum average correlation height filter for action recognition. In: IEEE conference on computer vision and pattern recognition
Schüldt C, Laptev I, Caputo B (2004) Recognizing human actions: A local svm approach. In: International conference on pattern recognition
Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: ACM international conference on multimedia
van Gemert JC, Veenman CJ, Smeulders AWM, Geusebroek JM (2010) Visual word ambiguity. IEEE Trans Pattern Anal Mach Intell 32(7):1271–1283
Wang H, Ullah MM, Kläser A, Laptev I, Schmid C (2009) Evaluation of local spatio-temporal features for action recognition. In: British machine vision conference
Wang J, Chen Z, Wu Y (2011) Action recognition with multiscale spatio-temporal contexts. In: IEEE conference on computer vision and pattern recognition
Willems G, Tuytelaars T, Gool LV (2008) An efficient dense and scale-invariant spatio-temporal interest point detector. In: European conference on computer vision
Wong S, Cipolla R (2007) Extracting spatio-temporal interest points using global information. In: IEEE international conference on computer vision
Wu X, Xu D, Duan L, Luo J (2011) Action recognition using context and appearance distribution features. In: IEEE conference on computer vision and pattern recognition
Yan S, Zhou X, Liu M, Hasegawa-Johnson M, Huang TS (2008) Regression from patch-kernel. In: IEEE conference on computer vision and pattern recognition
Yeffet L, Wolf L (2009) Local trinary patterns for human action recognition. In: IEEE international conference on computer vision
Zhou X, Cui N, Li Z, Liang F, Huang TS (2009) Hierarchical gaussianization for image classification. In: IEEE international conference on computer vision
Zhou X, Zhuang X, Yan S, Chang S, Hasegawa-Johnson M, Huang TS (2008) Sift-bag kernel for video event analysis. In: ACM international conference on multimedia
Acknowledgments
This work is supported by National Natural Science Foundation of China (No.61073094 and No.U1233119).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, M., Gong, L., Wang, T. et al. Action recognition using lie algebrized gaussians over dense local spatio-temporal features. Multimed Tools Appl 74, 2127–2142 (2015). https://doi.org/10.1007/s11042-013-1746-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1746-8