research-article

MA-Net: Multi-Attention Network for Skeleton-Based Action Recognition

Authors:

Yunfei ZhangAuthors Info & Claims

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

Article No.: 42, Pages 1 - 7

https://doi.org/10.1145/3595916.3626414

Published: 01 January 2024 Publication History

Abstract

Graph Convolution Networks (GCNs) have become the main-stream framework for skeleton-based action recognition tasks. Aiming at the problem of redundant spatial-temporal feature information and neighborhood constraints obtained in GCNs, we propose a novel method called Multi-Attention Network (MA-Net) to explore crucial skeleton information, including two main modules: Combined Attention Graph Convolution (CAGC) and Multi-layer Transposed Attention Encoding (MTAE). The CAGC utilizes multi-dimensional combination attention to capture more valuable information and enhance feature performance. The MTAE adopts self-attention to encode feature maps, effectively establishing long-range dependency and capturing global information. Centre on the attention mechanism, these two modules combine the complementary advantages of GCN (i.e., local topology and temporal dynamics) and Transformer (i.e., global context and dynamic attention). Extensive experiments on the challenging NTU-RGB+D 60 and Kinetics-Skeleton datasets demonstrate that our model performs excellently.

References

[1]

J.K. Aggarwal and M.S.Ryoo. 2011. Human Activity Analysis: A Review. ACM Comput. Surv. 43, 3 (apr 2011), 43 pages. https://doi.org/10.1145/1922649.1922653

Digital Library

[2]

Bin Ren, Mengyuan Liu, Runwei Ding, and Hong Liu. 2020. A Survey on 3D Skeleton-Based Action Recognition Using Learning Method. arXiv e-prints (2020), arXiv:2002.05907.

[3]

Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. In Proceedings of the Thirty-Second AAAI Conference on Artifcial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artifcial Intelligence (New Orleans, Louisiana, USA) (AAAI'18/IAAI'18/EAAI'18). AAAI Press, 9 pages.

[4]

Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. 2019. Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3590-3598. https://doi.org/10.1109/CVPR.2019.00371

[5]

Lei Shi, Yifan Zhang, Jian Cheng, and Hanging Lu. 2019. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12018-12027. https://doi.org/10.1109/CVPR.2019.01230

[6]

Chenyang Si, Wentao Chen, Wei ang, Liang Wang, and Tieniu Tan. 2019. An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1227-1236. https://doi.org/10.1109/CVPR.2019.00132

[7]

Jun Liu, Gang Wang, Ping Hu, Ling-Yu Duan, and Alex C. Kot. 2017. Global Context-Aware Attention LSTM Networks for 3D Action Recognition. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3671-3680. https://doi.org/10.1109/CVPR.2017.391

[8]

Yanbo Gao, li Chuankun, Shuai Li, Xun Cai, Mao Ye, and Hui Yuan. 2022. A Deep Attention Model for Action Recognition from Skeleton Data. Applied Sciences 12 (02 2022), 2006. https://doi.org/10.3390/app12042006

[9]

Chao Li, Qiaoyong Zhong, Di Xie, and Shiliang Pu. 2018. Co-Occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation. In Proceedings of the 27th International foint Conference on Artificial Intelligence (Stockholm, Sweden) (IJCAI'18). AAAI Press, 786-792.

Digital Library

[10]

Kailin Xu, Fanfan Ye, Qiaoyong Zhong, and Di Xie. 2021. Topology-aware Convolutional Neural Network for Efficient Skeleton-based Action Recognition. In AAAI Conference on Artificial Intelligence.

[11]

Weiyao Xu, Wu Muqing, Jie Zhu, and Min Zhao. 2021. Multi-scale skeleton adaptive weighted GCN for skeleton-based human action recognition in IoT. Applied Soft Computing 104 (03 2021), 107236. https://doi.org/10.1016/j.asoc.2021.107236

Digital Library

[12]

Xiaowei Zhu, Qian Huang, Chang Li, Lulu Wang, and Zhuang Miao. 2022. Part-Wise Topology Graph Convolutional Network For Skeleton-Based Action Recognition. In Artificial Intelligence: Second CAAI International Conference, CICAI2022, Beijing, China, August 27-28, 2022, Revised Selected Papers, Part I (Beijing, China). Springer-Verlag, Berlin, Heidelberg, 317-329. https://doi.org/10.1007/978-3-031-20497-5_26

Digital Library

[13]

Xiao Yang. 2020. An Overview of the Attention Mechanisms in Computer Vision. Journal of Physics: Conference Series 1693 (12 2020), 012173. https://doi.org/10.1088/1742-6596/1693/1/012173

[14]

Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. 2015. Spatial Transformer Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 (Montreal, Canada)(NIPS'15). MIT Press, Cambridge, MA, USA, 2017-2025.

[15]

Jie Hu, Li Shen, Samuel Albanie, Gang Sun, and Enhua Wu. 2020. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 42, 8 (aug 2020), 2011-2023. https://doi.org/10.1109/TPAMI.2019.2913372

Digital Library

[16]

Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. 2014. Recurrent Models of Visual Attention. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (Montreal, Canada) (NIPS'14). MIT Press, Cambridge, MA, USA, 2204-2212.

[17]

Ashish Vaswani, Noam Shazeer, Niki Parmar, jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 6000-6010.

Digital Library

[18]

Alaaeldin Ali, Hugo Touvron, Mathilde Caron, Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, and Herve Jegou. 2021. XCiT: Cross-Covariance Image Transformers. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 20014-20027.

[19]

Muhammad Maaz, Abdelrahman Shaker, Hisham Cholakkal, Salman Khan, Syed Wagas Zamir, Rao Muhammad Anwer, and Fahad Shahbaz Khan. 2023. EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture For Mobile Vision Applications. In Computer Vision - ECCV 2022 Workshops: Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part VIII (Tel Aviv, Israel). SpringerVerlag, Berlin, Heidelberg, 3-20. https://doi.org/10.1007/978-3-031-25082-8_1

Digital Library

[20]

Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1010-1019. https://doi.org10.1109/CVPR.2016.115

[21]

Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Apostol Natsev, Mustafa Suleyman, and Andrew Zisserman. 2017. The Kinetics Human Action Video Dataset. ArXiv abs/1705.06950 (2017)

[22]

Wei Peng, Xiaopeng Hong, Haoyu Chen, and Guoying Zhao. 2020. Learning Graph Convolutional Network for Skeleton-Based Human Action Recognition by Neural Searching. Proceedings of the AAAI Conference on Artificial Intelligence 34(04 2020), 2669-2676. https://doi.org/10.1609/aaai.v34i03.5652

[23]

Cong Wu, Xiao-Jun Wu, and Josef Kittler. 2021. Graph2Net: Perceptually-Enriched Graph Learning for Skeleton-Based Action Recognition. IEEE Transactions on Circuits and Systems for Video Technology PP (06 2021), 1-1. https://doi.org/10.1109/TCSVT.2021.3085959

[24]

Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. 2022. Symbiotic Graph Neural Networks for 3D Skeleton-Based Human Action Recognition and Motion Prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 6 (2022), 3316-3333. https://doi.org/10.1109/TPAMI.2021.3053765

[25]

Hao Yang, Dan Yan, Li Zhang, Yunda Sun, Dong Li, and Stephen Maybank. 2021. Feedback Graph Convolutional Network for Skeleton-Based Action Recognition. IEEE Transactions on Image Processing PP (11 2021), 1-1. https://doi.org/10.1109/TIP.2021.3129117

[26]

Jun Xie, Qiguang Miao, Ruyi Liu, Wentian Xin, Lei Tang, Sheng Zhong, and Xuesong Gao. 2021. Attention Adjacency Matrix based Graph Convolutional Networks for Skeleton-based Action Recognition. Neurocomputing 440 (02 2021). https://doi.org/10.1016/j.neucom.2021.02.001

[27]

Ziyu Liu, Hongwen Zhang, Zhenghao Chen, Zhiyong Wang, and Wanli Ouyang. 2020. Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 140-149. https://doi.org/10.1109/CVPR42600.2020.00022

[28]

Jun Kong, Yuhang Bian, and Min Jiang. 2022. MTT: Multi-Scale Temporal Transformer for Skeleton-Based Action Recognition. IEEE Signal Processing Letters PP (01 2022), 1-1. https://doi.org/10.1109/LSP.2022.3142675

[29]

Bing-Kun Gao, Le Dong, Hong-Bo Bi, and Yun-Ze Bi. 2022. Focus on Temporal Graph Convolutional Networks with Unified Attention for Skeleton-Based Action Recognition. Applied Intelligence 52, 5 (mar 2022), 5608-5616. https://doi.org/10.1007/s10489-021-02723-6

Digital Library

[30]

Yanjing Sun, Han Huang, Xiao Yun, Bin Yang, and Kaiwen Dong. 2022. Triplet Attention Multiple Spacetime-Semantic Graph Convolutional Network for Skeleton-Based Action Recognition. Applied Intelligence 52, 1 (jan 2022), 113-126. https://doi.org/10.1007/s10489-021-02370-X

Digital Library

[31]

Shannan Guan, Haiyan Lu, Linchao Zhu, and Gengfa Fang. 2022. AFE-CNN: 3D Skeleton-Based Action Recognition with Action Feature Enhancement. Neurocomput. 514, C (dec 2022), 256-267. https://doi.org/10.1016/j.neucom.2022.10.016

Digital Library

[32]

Benyue Su, Peng Zhang, Manzhen Sun, and Min Sheng. 2023. Direction-Guided Two-Stream Convolutional Neural Networks for Skeleton-Based Action Recognition. Soft Comput. 53 (02 2023), 11833-11842. https://doi.org/10.1007/s00500-023-07862-1

Digital Library

[33]

Qin Cheng, Jun Cheng, Ziliang Ren, Qieshi Zhang, and Jianming Liu. 2023. Multi-scale spatial-temporal convolutional neural network for skeleton-based action recognition. Pattern Analysis and Applications (2023).

[34]

Danfeng Zhuang, Min Jiang, and Jun Kong. 2023. Time-to-space progressive network using overlap skeleton contexts for action recognition. Signal Processing 207 (2023), 108953. https://doi.org/10.1016/j.sigpro.2023.108953

Digital Library

[35]

Qilin Zhu and Hongmin Deng. 2023. Spatial adaptive graph convolutional network for skeleton-based action recognition. Applied Intelligence 53 (01 2023), 1-13. https://doi.org/10.1007/s10489-022-04442-y

Digital Library

[36]

Yujian Jiang, Zhaoneng Sun, Saisai Yu, Shuang Wang, and Yang Song. 2022. A Graph Skeleton Transformer Network for Action Recognition. Symmetry 14 (07 2022), 1547. https://doi.org/10.3390/sym14081547

[37]

Yanan Liu, Hao Zhang, Yanqiu Li, Kangjian He, and Dan Xu. 2023. Skeleton-based Human Action Recognition via Large-kernel Attention Graph Convolutional Network. IEEE Transactions on Visualization and Computer Graphics 29 (2023), 2575-2585.

Digital Library

[38]

Haidong Zhu, Zhao-Heng Zheng, and Ramkant Nevatia. 2022. Temporal Shift and Attention Modules for Graphical Skeleton Action Recognition. 2022 26th International Conference on Pattern Recognition (ICPR) (2022), 3145-3151.

Index Terms

MA-Net: Multi-Attention Network for Skeleton-Based Action Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Hierarchical Graph Convolutional Network for Skeleton-Based Action Recognition
Image and Graphics
Abstract
Skeleton-based action recognition has drawn much attention recently. Previous methods mainly focus on using RNNs or CNNs to process skeletons. But they ignore the topological structure of the skeleton which is very important for action ...
Dual-domain graph convolutional networks for skeleton-based action recognition
Abstract
Skeleton-based action recognition is attracting more and more attention owing to the general representation ability of skeleton data. The Graph Convolutional Networks (GCNs) methods extended from Convolutional Neural Networks (CNNs) are proposed ...
Spatio-Temporal and View Attention Deep Network for Skeleton based View-invariant Human Action Recognition
MOBIMEDIA'18: Proceedings of the 11th EAI International Conference on Mobile Multimedia Communications

Skeleton-based human action recognition has been widely studied recently with the advancement of depth capturing devices. However, the skeleton data captured from a single camera is visually view-dependent and contains noise. In this paper, we propose a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

December 2023

745 pages

ISBN:9798400702051

DOI:10.1145/3595916

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

MMAsia '23

Sponsor:

SIGMM

MMAsia '23: ACM Multimedia Asia

December 6 - 8, 2023

Tainan, Taiwan

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
65
Total Downloads

Downloads (Last 12 months)65
Downloads (Last 6 weeks)3

Reflects downloads up to 14 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents