Transformer-based two-source motion model for multi-object tracking

Yang, Jieming; Ge, Hongwei; Su, Shuzhi; Liu, Guoqing

doi:10.1007/s10489-021-03012-y

Transformer-based two-source motion model for multi-object tracking

Published: 10 January 2022

Volume 52, pages 9967–9979, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Jieming Yang^1,2,
Hongwei Ge^1,2,
Shuzhi Su³ &
…
Guoqing Liu^1,2

1064 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

Recently, benefit from the development of detection models, the multi-object tracking method based on tracking-by-detection has greatly improved performance. However, most methods still utilize traditional motion models for position prediction, such as the constant velocity model and Kalman filter. Only a few methods adopt deep network-based methods for prediction. Still, these methods only exploit the simplest RNN(Recurrent Neural Network) to predict the position, and the position offset caused by the camera movement is not considered. Therefore, inspired by the outstanding performance of Transformer in temporal tasks, this paper proposes a Transformer-based motion model for multi-object tracking. By taking the historical position difference of the target and the offset vector between consecutive frames as input, the model considers the motion of the target itself and the camera at the same time, which improves the prediction accuracy of the motion model used in the multi-target tracking method, thereby improving tracking performance. Through comparative experiments and tracking results on MOTchallenge benchmarks, the effectiveness of the proposed method is proved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

A Survey of Object Tracking Methods Based on Deep Learning

A Survey of Multi-object Video Tracking Algorithms

2D recurrent neural networks: a high-performance tool for robust visual tracking in dynamic scenes

Article 13 October 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Bae SH, Yoon KJ (2014) Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning. Proceedings of the IEEE conference on computer vision and pattern recognition, 1218-1225
Lenz P, Geiger A, Urtasun R, Followme (2015) Efficient online min-cost flow tracking with bounded memory and computation. Proceedings of the IEEE International Conference on Computer Vision, 4364-4372
Xu J, Cao Y, Zhang Z et al (2019) Spatial-temporal relation networks for multi-object tracking. Proceedings of the IEEE/CVF International Conference on Computer Vision, 3988-3998
Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles. Proceedings of the IEEE/CVF International Conference on Computer Vision, 941-951
Bewley A, Ge Z, Ott L et al (2016) Simple online and realtime tracking. 2016 IEEE international conference on image processing (ICIP). IEEE, 3464-3468
Bochinski E, Eiselein V, Sikora T (2017) High-speed tracking-by-detection without using image information. 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 1-6
Zhou X, Koltun V, Krähenbühl P (2020) Tracking objects as points. European Conference on Computer Vision. Springer, Cham, 474-490
Zhu J, Yang H, Liu N et al (2018) Online multi-object tracking with dual matching attention networks. Proceedings of the European Conference on Computer Vision (ECCV), 366-382
Evangelidis GD, Psarakis EZ (2008) Parametric image alignment using enhanced correlation coefficient maximization. IEEE Trans Pattern Anal Mach Intell 30(10):1858–1865
Article Google Scholar
Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. IEEE international conference on image processing (ICIP). IEEE, 3645-3649
Milan A, Rezatofighi SH, Dick A et al (2017) Online multi-target tracking using recurrent neural networks. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1)
Babaee M, Li Z, Rigoll G (2019) A dual cnn–rnn for multiple people tracking. Neurocomputing 368:69–83
Article Google Scholar
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. arXiv preprint arXiv:1706.03762
Raganato A, Tiedemann J (2018) An analysis of encoder representations in transformer-based machine translation[C], Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. The Association for Computational Linguistics, Stroudsburg
Kasai J, Cross J, Ghazvininejad M et al (2020) Non-autoregressive machine translation with disentangled context transformer. International Conference on Machine Learning. PMLR, 5144-5155
Jiang M, Wu J, Shi X et al (2019) Transformer based memory network for sentiment analysis of web comments. IEEE Access 7:179942–179953
Article Google Scholar
Naseem U, Razzak I, Musial K et al (2020) Transformer based deep intelligent contextual embedding for twitter sentiment analysis. Future Gener Comput Syst 113:58–69
Article Google Scholar
Wang T, Wan X, Jin H (2020) AMR-to-text generation with graph transformer. Trans Assoc Comput Linguist 8:19–33
Article Google Scholar
Li G, Crego JM, Senellart J (2019) Enhanced transformer model for data-to-text generation. Proceedings of the 3rd Workshop on Neural Generation and Translation, 148-156
Andriyenko A, Schindler K (2011) Multi-target tracking by continuous energy minimization. CVPR 2:76
Alahi A, Goel K, Ramanathan V et al (2016) Social lstm: Human trajectory prediction in crowded spaces. Proceedings of the IEEE conference on computer vision and pattern recognition, 961-971
Gupta A, Johnson J, Fei-Fei L et al (2018) Social gan: Socially acceptable trajectories with generative adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2255-2264
Ivanovic B, Pavone M (2019) The trajectron: Probabilistic multi-agent trajectory modeling with dynamic spatiotemporal graphs. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2375-2384
Vemula A, Muelling K, Oh J (2018) Social attention: Modeling attention in human crowds. 2018 IEEE international Conference on Robotics and Automation (ICRA). IEEE, 4601-4607
Xu Y, Piao Z, Gao S (2018) Encoding crowd interaction with deep neural network for pedestrian trajectory prediction. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5275-5284
Huang Y, Bi H, Li Z et al (2019) Stgat: Modeling spatial-temporal interactions for human trajectory prediction. Proceedings of the IEEE/CVF International Conference on Computer Vision, 6272-6281
Zhang P, Ouyang W, Zhang P et al (2019) Sr-lstm: State refinement for lstm towards pedestrian trajectory prediction. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12085-12094
Yu C, Ma X, Ren J et al (2020) Spatio-temporal graph transformer networks for pedestrian trajectory prediction. European Conference on Computer Vision. Springer, Cham, 507-523
Raganato A, Tiedemann J (2018) An analysis of encoder representations in transformer-based machine translation. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. The Association for Computational Linguistics, Stroudsburg
Carion N, Massa F, Synnaeve G et al (2020) End-to-end object detection with transformers. European Conference on Computer Vision. Springer, Cham, 213-229
Chen M, Radford A, Child R et al (2020) Generative pretraining from pixels. International Conference on Machine Learning. PMLR, 1691-1703
Liu R, Yuan Z, Liu T et al (2021) End-to-end lane shape prediction with transformers. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 3694-3702
Sun P, Jiang Y, Zhang R et al (2020) Transtrack: Multiple-object tracking with transformer. arXiv preprint arXiv:2012.15460
Meinhardt T, Kirillov A, Leal-Taixe L et al (2021) Trackformer: Multi-object tracking with transformers. arXiv preprint arXiv:2101.02702
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Article Google Scholar
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Proceedings of the Advances in neural information processing systems, pp 91-99
Yang F, Choi W, Lin Y (2016) Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2129-2137
Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the CLEAR MOT metrics. EURASIP J Image Video Process (2008):1-10
Fang K, Xiang Y, Li X, Savarese S (2018) Recurrent autoregressive networks for online multi-object tracking. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pp 466-475
Sun S, Akhtar N, Song H, Mian AS, Shah M (2019) Deep affinity network for multiple object tracking. IEEE Trans Pattern Anal Mach Intell 43(1):104–119
Google Scholar
Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N (2017) Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4836-4845
Yoon Y-C, Kim DY, Yoon K, Song Y-m, Jeon M (2021) Online multiple pedestrian tracking using deep temporal appearance matching association. Inf Sci 561:326–351
Article MathSciNet Google Scholar
Wang Z, Zheng L, Liu Y et al (2020) Towards real-time multi-object tracking. European Conference on Computer Vision. Springer, Glasgow, 107-122

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant 61806006, China Postdoctoral Science Foundation under Grant No. 2019M660149, Graduate Innovation Foundation of Jiangsu Province under Grant No. KYLX16_0781, the 111 Project under Grants No. B12018, and PAPD of Jiangsu Higher Education Institutions.

Author information

Authors and Affiliations

School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu, 214122, China
Jieming Yang, Hongwei Ge & Guoqing Liu
Key Laboratory of Advanced Process Control for Light Industry, Jiangnan University, Ministry of Education, Wuxi, 214122, China
Jieming Yang, Hongwei Ge & Guoqing Liu
School of Computer Science and Engineering, Anhui University of Science & Technology, Huainan, Anhui, 232001, China
Shuzhi Su

Authors

Jieming Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hongwei Ge
View author publications
You can also search for this author in PubMed Google Scholar
Shuzhi Su
View author publications
You can also search for this author in PubMed Google Scholar
Guoqing Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongwei Ge.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, J., Ge, H., Su, S. et al. Transformer-based two-source motion model for multi-object tracking. Appl Intell 52, 9967–9979 (2022). https://doi.org/10.1007/s10489-021-03012-y

Download citation

Accepted: 14 November 2021
Published: 10 January 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s10489-021-03012-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Transformer-based two-source motion model for multi-object tracking

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Survey of Object Tracking Methods Based on Deep Learning

A Survey of Multi-object Video Tracking Algorithms

2D recurrent neural networks: a high-performance tool for robust visual tracking in dynamic scenes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Transformer-based two-source motion model for multi-object tracking

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Survey of Object Tracking Methods Based on Deep Learning

A Survey of Multi-object Video Tracking Algorithms

2D recurrent neural networks: a high-performance tool for robust visual tracking in dynamic scenes

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation