skip to main content
10.1145/3595916.3626414acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

MA-Net: Multi-Attention Network for Skeleton-Based Action Recognition

Published: 01 January 2024 Publication History

Abstract

Graph Convolution Networks (GCNs) have become the main-stream framework for skeleton-based action recognition tasks. Aiming at the problem of redundant spatial-temporal feature information and neighborhood constraints obtained in GCNs, we propose a novel method called Multi-Attention Network (MA-Net) to explore crucial skeleton information, including two main modules: Combined Attention Graph Convolution (CAGC) and Multi-layer Transposed Attention Encoding (MTAE). The CAGC utilizes multi-dimensional combination attention to capture more valuable information and enhance feature performance. The MTAE adopts self-attention to encode feature maps, effectively establishing long-range dependency and capturing global information. Centre on the attention mechanism, these two modules combine the complementary advantages of GCN (i.e., local topology and temporal dynamics) and Transformer (i.e., global context and dynamic attention). Extensive experiments on the challenging NTU-RGB+D 60 and Kinetics-Skeleton datasets demonstrate that our model performs excellently.

References

[1]
J.K. Aggarwal and M.S.Ryoo. 2011. Human Activity Analysis: A Review. ACM Comput. Surv. 43, 3 (apr 2011), 43 pages. https://doi.org/10.1145/1922649.1922653
[2]
Bin Ren, Mengyuan Liu, Runwei Ding, and Hong Liu. 2020. A Survey on 3D Skeleton-Based Action Recognition Using Learning Method. arXiv e-prints (2020), arXiv:2002.05907.
[3]
Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. In Proceedings of the Thirty-Second AAAI Conference on Artifcial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artifcial Intelligence (New Orleans, Louisiana, USA) (AAAI'18/IAAI'18/EAAI'18). AAAI Press, 9 pages.
[4]
Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. 2019. Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3590-3598. https://doi.org/10.1109/CVPR.2019.00371
[5]
Lei Shi, Yifan Zhang, Jian Cheng, and Hanging Lu. 2019. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12018-12027. https://doi.org/10.1109/CVPR.2019.01230
[6]
Chenyang Si, Wentao Chen, Wei ang, Liang Wang, and Tieniu Tan. 2019. An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1227-1236. https://doi.org/10.1109/CVPR.2019.00132
[7]
Jun Liu, Gang Wang, Ping Hu, Ling-Yu Duan, and Alex C. Kot. 2017. Global Context-Aware Attention LSTM Networks for 3D Action Recognition. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3671-3680. https://doi.org/10.1109/CVPR.2017.391
[8]
Yanbo Gao, li Chuankun, Shuai Li, Xun Cai, Mao Ye, and Hui Yuan. 2022. A Deep Attention Model for Action Recognition from Skeleton Data. Applied Sciences 12 (02 2022), 2006. https://doi.org/10.3390/app12042006
[9]
Chao Li, Qiaoyong Zhong, Di Xie, and Shiliang Pu. 2018. Co-Occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation. In Proceedings of the 27th International foint Conference on Artificial Intelligence (Stockholm, Sweden) (IJCAI'18). AAAI Press, 786-792.
[10]
Kailin Xu, Fanfan Ye, Qiaoyong Zhong, and Di Xie. 2021. Topology-aware Convolutional Neural Network for Efficient Skeleton-based Action Recognition. In AAAI Conference on Artificial Intelligence.
[11]
Weiyao Xu, Wu Muqing, Jie Zhu, and Min Zhao. 2021. Multi-scale skeleton adaptive weighted GCN for skeleton-based human action recognition in IoT. Applied Soft Computing 104 (03 2021), 107236. https://doi.org/10.1016/j.asoc.2021.107236
[12]
Xiaowei Zhu, Qian Huang, Chang Li, Lulu Wang, and Zhuang Miao. 2022. Part-Wise Topology Graph Convolutional Network For Skeleton-Based Action Recognition. In Artificial Intelligence: Second CAAI International Conference, CICAI2022, Beijing, China, August 27-28, 2022, Revised Selected Papers, Part I (Beijing, China). Springer-Verlag, Berlin, Heidelberg, 317-329. https://doi.org/10.1007/978-3-031-20497-5_26
[13]
Xiao Yang. 2020. An Overview of the Attention Mechanisms in Computer Vision. Journal of Physics: Conference Series 1693 (12 2020), 012173. https://doi.org/10.1088/1742-6596/1693/1/012173
[14]
Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. 2015. Spatial Transformer Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 (Montreal, Canada)(NIPS'15). MIT Press, Cambridge, MA, USA, 2017-2025.
[15]
Jie Hu, Li Shen, Samuel Albanie, Gang Sun, and Enhua Wu. 2020. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 42, 8 (aug 2020), 2011-2023. https://doi.org/10.1109/TPAMI.2019.2913372
[16]
Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. 2014. Recurrent Models of Visual Attention. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (Montreal, Canada) (NIPS'14). MIT Press, Cambridge, MA, USA, 2204-2212.
[17]
Ashish Vaswani, Noam Shazeer, Niki Parmar, jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 6000-6010.
[18]
Alaaeldin Ali, Hugo Touvron, Mathilde Caron, Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, and Herve Jegou. 2021. XCiT: Cross-Covariance Image Transformers. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 20014-20027.
[19]
Muhammad Maaz, Abdelrahman Shaker, Hisham Cholakkal, Salman Khan, Syed Wagas Zamir, Rao Muhammad Anwer, and Fahad Shahbaz Khan. 2023. EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture For Mobile Vision Applications. In Computer Vision - ECCV 2022 Workshops: Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part VIII (Tel Aviv, Israel). SpringerVerlag, Berlin, Heidelberg, 3-20. https://doi.org/10.1007/978-3-031-25082-8_1
[20]
Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1010-1019. https://doi.org10.1109/CVPR.2016.115
[21]
Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Apostol Natsev, Mustafa Suleyman, and Andrew Zisserman. 2017. The Kinetics Human Action Video Dataset. ArXiv abs/1705.06950 (2017)
[22]
Wei Peng, Xiaopeng Hong, Haoyu Chen, and Guoying Zhao. 2020. Learning Graph Convolutional Network for Skeleton-Based Human Action Recognition by Neural Searching. Proceedings of the AAAI Conference on Artificial Intelligence 34(04 2020), 2669-2676. https://doi.org/10.1609/aaai.v34i03.5652
[23]
Cong Wu, Xiao-Jun Wu, and Josef Kittler. 2021. Graph2Net: Perceptually-Enriched Graph Learning for Skeleton-Based Action Recognition. IEEE Transactions on Circuits and Systems for Video Technology PP (06 2021), 1-1. https://doi.org/10.1109/TCSVT.2021.3085959
[24]
Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. 2022. Symbiotic Graph Neural Networks for 3D Skeleton-Based Human Action Recognition and Motion Prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 6 (2022), 3316-3333. https://doi.org/10.1109/TPAMI.2021.3053765
[25]
Hao Yang, Dan Yan, Li Zhang, Yunda Sun, Dong Li, and Stephen Maybank. 2021. Feedback Graph Convolutional Network for Skeleton-Based Action Recognition. IEEE Transactions on Image Processing PP (11 2021), 1-1. https://doi.org/10.1109/TIP.2021.3129117
[26]
Jun Xie, Qiguang Miao, Ruyi Liu, Wentian Xin, Lei Tang, Sheng Zhong, and Xuesong Gao. 2021. Attention Adjacency Matrix based Graph Convolutional Networks for Skeleton-based Action Recognition. Neurocomputing 440 (02 2021). https://doi.org/10.1016/j.neucom.2021.02.001
[27]
Ziyu Liu, Hongwen Zhang, Zhenghao Chen, Zhiyong Wang, and Wanli Ouyang. 2020. Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 140-149. https://doi.org/10.1109/CVPR42600.2020.00022
[28]
Jun Kong, Yuhang Bian, and Min Jiang. 2022. MTT: Multi-Scale Temporal Transformer for Skeleton-Based Action Recognition. IEEE Signal Processing Letters PP (01 2022), 1-1. https://doi.org/10.1109/LSP.2022.3142675
[29]
Bing-Kun Gao, Le Dong, Hong-Bo Bi, and Yun-Ze Bi. 2022. Focus on Temporal Graph Convolutional Networks with Unified Attention for Skeleton-Based Action Recognition. Applied Intelligence 52, 5 (mar 2022), 5608-5616. https://doi.org/10.1007/s10489-021-02723-6
[30]
Yanjing Sun, Han Huang, Xiao Yun, Bin Yang, and Kaiwen Dong. 2022. Triplet Attention Multiple Spacetime-Semantic Graph Convolutional Network for Skeleton-Based Action Recognition. Applied Intelligence 52, 1 (jan 2022), 113-126. https://doi.org/10.1007/s10489-021-02370-X
[31]
Shannan Guan, Haiyan Lu, Linchao Zhu, and Gengfa Fang. 2022. AFE-CNN: 3D Skeleton-Based Action Recognition with Action Feature Enhancement. Neurocomput. 514, C (dec 2022), 256-267. https://doi.org/10.1016/j.neucom.2022.10.016
[32]
Benyue Su, Peng Zhang, Manzhen Sun, and Min Sheng. 2023. Direction-Guided Two-Stream Convolutional Neural Networks for Skeleton-Based Action Recognition. Soft Comput. 53 (02 2023), 11833-11842. https://doi.org/10.1007/s00500-023-07862-1
[33]
Qin Cheng, Jun Cheng, Ziliang Ren, Qieshi Zhang, and Jianming Liu. 2023. Multi-scale spatial-temporal convolutional neural network for skeleton-based action recognition. Pattern Analysis and Applications (2023).
[34]
Danfeng Zhuang, Min Jiang, and Jun Kong. 2023. Time-to-space progressive network using overlap skeleton contexts for action recognition. Signal Processing 207 (2023), 108953. https://doi.org/10.1016/j.sigpro.2023.108953
[35]
Qilin Zhu and Hongmin Deng. 2023. Spatial adaptive graph convolutional network for skeleton-based action recognition. Applied Intelligence 53 (01 2023), 1-13. https://doi.org/10.1007/s10489-022-04442-y
[36]
Yujian Jiang, Zhaoneng Sun, Saisai Yu, Shuang Wang, and Yang Song. 2022. A Graph Skeleton Transformer Network for Action Recognition. Symmetry 14 (07 2022), 1547. https://doi.org/10.3390/sym14081547
[37]
Yanan Liu, Hao Zhang, Yanqiu Li, Kangjian He, and Dan Xu. 2023. Skeleton-based Human Action Recognition via Large-kernel Attention Graph Convolutional Network. IEEE Transactions on Visualization and Computer Graphics 29 (2023), 2575-2585.
[38]
Haidong Zhu, Zhao-Heng Zheng, and Ramkant Nevatia. 2022. Temporal Shift and Attention Modules for Graphical Skeleton Action Recognition. 2022 26th International Conference on Pattern Recognition (ICPR) (2022), 3145-3151.

Index Terms

  1. MA-Net: Multi-Attention Network for Skeleton-Based Action Recognition
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia
      December 2023
      745 pages
      ISBN:9798400702051
      DOI:10.1145/3595916
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 January 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Action recognition
      2. Attention mechanism
      3. Graph convolution network
      4. Skeleton
      5. Transformer

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      MMAsia '23
      Sponsor:
      MMAsia '23: ACM Multimedia Asia
      December 6 - 8, 2023
      Tainan, Taiwan

      Acceptance Rates

      Overall Acceptance Rate 59 of 204 submissions, 29%

      Upcoming Conference

      MM '24
      The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 65
        Total Downloads
      • Downloads (Last 12 months)65
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 14 Sep 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media