Skip to main content

Advertisement

Log in

Transformer-based two-source motion model for multi-object tracking

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Recently, benefit from the development of detection models, the multi-object tracking method based on tracking-by-detection has greatly improved performance. However, most methods still utilize traditional motion models for position prediction, such as the constant velocity model and Kalman filter. Only a few methods adopt deep network-based methods for prediction. Still, these methods only exploit the simplest RNN(Recurrent Neural Network) to predict the position, and the position offset caused by the camera movement is not considered. Therefore, inspired by the outstanding performance of Transformer in temporal tasks, this paper proposes a Transformer-based motion model for multi-object tracking. By taking the historical position difference of the target and the offset vector between consecutive frames as input, the model considers the motion of the target itself and the camera at the same time, which improves the prediction accuracy of the motion model used in the multi-target tracking method, thereby improving tracking performance. Through comparative experiments and tracking results on MOTchallenge benchmarks, the effectiveness of the proposed method is proved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Bae SH, Yoon KJ (2014) Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning. Proceedings of the IEEE conference on computer vision and pattern recognition, 1218-1225

  2. Lenz P, Geiger A, Urtasun R, Followme (2015) Efficient online min-cost flow tracking with bounded memory and computation. Proceedings of the IEEE International Conference on Computer Vision, 4364-4372

  3. Xu J, Cao Y, Zhang Z et al (2019) Spatial-temporal relation networks for multi-object tracking. Proceedings of the IEEE/CVF International Conference on Computer Vision, 3988-3998

  4. Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles. Proceedings of the IEEE/CVF International Conference on Computer Vision, 941-951

  5. Bewley A, Ge Z, Ott L et al (2016) Simple online and realtime tracking. 2016 IEEE international conference on image processing (ICIP). IEEE, 3464-3468

  6. Bochinski E, Eiselein V, Sikora T (2017) High-speed tracking-by-detection without using image information. 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 1-6

  7. Zhou X, Koltun V, Krähenbühl P (2020) Tracking objects as points. European Conference on Computer Vision. Springer, Cham, 474-490

  8. Zhu J, Yang H, Liu N et al (2018) Online multi-object tracking with dual matching attention networks. Proceedings of the European Conference on Computer Vision (ECCV), 366-382

  9. Evangelidis GD, Psarakis EZ (2008) Parametric image alignment using enhanced correlation coefficient maximization. IEEE Trans Pattern Anal Mach Intell 30(10):1858–1865

    Article  Google Scholar 

  10. Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. IEEE international conference on image processing (ICIP). IEEE, 3645-3649

  11. Milan A, Rezatofighi SH, Dick A et al (2017) Online multi-target tracking using recurrent neural networks. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1)

  12. Babaee M, Li Z, Rigoll G (2019) A dual cnn–rnn for multiple people tracking. Neurocomputing 368:69–83

    Article  Google Scholar 

  13. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. arXiv preprint arXiv:1706.03762

  14. Raganato A, Tiedemann J (2018) An analysis of encoder representations in transformer-based machine translation[C], Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. The Association for Computational Linguistics, Stroudsburg

  15. Kasai J, Cross J, Ghazvininejad M et al (2020) Non-autoregressive machine translation with disentangled context transformer. International Conference on Machine Learning. PMLR, 5144-5155

  16. Jiang M, Wu J, Shi X et al (2019) Transformer based memory network for sentiment analysis of web comments. IEEE Access 7:179942–179953

    Article  Google Scholar 

  17. Naseem U, Razzak I, Musial K et al (2020) Transformer based deep intelligent contextual embedding for twitter sentiment analysis. Future Gener Comput Syst 113:58–69

    Article  Google Scholar 

  18. Wang T, Wan X, Jin H (2020) AMR-to-text generation with graph transformer. Trans Assoc Comput Linguist 8:19–33

    Article  Google Scholar 

  19. Li G, Crego JM, Senellart J (2019) Enhanced transformer model for data-to-text generation. Proceedings of the 3rd Workshop on Neural Generation and Translation, 148-156

  20. Andriyenko A, Schindler K (2011) Multi-target tracking by continuous energy minimization. CVPR 2:76

  21. Alahi A, Goel K, Ramanathan V et al (2016) Social lstm: Human trajectory prediction in crowded spaces. Proceedings of the IEEE conference on computer vision and pattern recognition, 961-971

  22. Gupta A, Johnson J, Fei-Fei L et al (2018) Social gan: Socially acceptable trajectories with generative adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2255-2264

  23. Ivanovic B, Pavone M (2019) The trajectron: Probabilistic multi-agent trajectory modeling with dynamic spatiotemporal graphs. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2375-2384

  24. Vemula A, Muelling K, Oh J (2018) Social attention: Modeling attention in human crowds. 2018 IEEE international Conference on Robotics and Automation (ICRA). IEEE, 4601-4607

  25. Xu Y, Piao Z, Gao S (2018) Encoding crowd interaction with deep neural network for pedestrian trajectory prediction. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5275-5284

  26. Huang Y, Bi H, Li Z et al (2019) Stgat: Modeling spatial-temporal interactions for human trajectory prediction. Proceedings of the IEEE/CVF International Conference on Computer Vision, 6272-6281

  27. Zhang P, Ouyang W, Zhang P et al (2019) Sr-lstm: State refinement for lstm towards pedestrian trajectory prediction. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12085-12094

  28. Yu C, Ma X, Ren J et al (2020) Spatio-temporal graph transformer networks for pedestrian trajectory prediction. European Conference on Computer Vision. Springer, Cham, 507-523

  29. Raganato A, Tiedemann J (2018) An analysis of encoder representations in transformer-based machine translation. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. The Association for Computational Linguistics, Stroudsburg

  30. Carion N, Massa F, Synnaeve G et al (2020) End-to-end object detection with transformers. European Conference on Computer Vision. Springer, Cham, 213-229

  31. Chen M, Radford A, Child R et al (2020) Generative pretraining from pixels. International Conference on Machine Learning. PMLR, 1691-1703

  32. Liu R, Yuan Z, Liu T et al (2021) End-to-end lane shape prediction with transformers. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 3694-3702

  33. Sun P, Jiang Y, Zhang R et al (2020) Transtrack: Multiple-object tracking with transformer. arXiv preprint arXiv:2012.15460

  34. Meinhardt T, Kirillov A, Leal-Taixe L et al (2021) Trackformer: Multi-object tracking with transformers. arXiv preprint arXiv:2101.02702

  35. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  36. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Proceedings of the Advances in neural information processing systems, pp 91-99

  37. Yang F, Choi W, Lin Y (2016) Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2129-2137

  38. Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the CLEAR MOT metrics. EURASIP J Image Video Process (2008):1-10

  39. Fang K, Xiang Y, Li X, Savarese S (2018) Recurrent autoregressive networks for online multi-object tracking. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pp 466-475

  40. Sun S, Akhtar N, Song H, Mian AS, Shah M (2019) Deep affinity network for multiple object tracking. IEEE Trans Pattern Anal Mach Intell 43(1):104–119

    Google Scholar 

  41. Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N (2017) Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4836-4845

  42. Yoon Y-C, Kim DY, Yoon K, Song Y-m, Jeon M (2021) Online multiple pedestrian tracking using deep temporal appearance matching association. Inf Sci 561:326–351

    Article  MathSciNet  Google Scholar 

  43. Wang Z, Zheng L, Liu Y et al (2020) Towards real-time multi-object tracking. European Conference on Computer Vision. Springer, Glasgow, 107-122

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant 61806006, China Postdoctoral Science Foundation under Grant No. 2019M660149, Graduate Innovation Foundation of Jiangsu Province under Grant No. KYLX16_0781, the 111 Project under Grants No. B12018, and PAPD of Jiangsu Higher Education Institutions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongwei Ge.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, J., Ge, H., Su, S. et al. Transformer-based two-source motion model for multi-object tracking. Appl Intell 52, 9967–9979 (2022). https://doi.org/10.1007/s10489-021-03012-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-03012-y

Keywords

Navigation