Portrait matting using an attention-based memory network

Song, Shufeng; Chau, Lap-Pui; Lin, Zhiping

doi:10.1007/s00371-023-03061-z

Portrait matting using an attention-based memory network

Original article
Published: 11 September 2023

Volume 40, pages 3733–3746, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

211 Accesses
1 Citation
Explore all metrics

Abstract

We propose a novel network to perform auxiliary-free video matting task. Unlike most existing approaches that require trimaps or pre-captured backgrounds as auxiliary inputs, our method uses binary segmentation masks as priors and realizes the auxiliary-free matting. Furthermore, we design the attention-based memory block by combining the idea of the memory network and self-attention to compute pixel-level temporal coherence among video frames to enhance the overall performance. Moreover, we also provide direct supervision for the temporal-guided memory module to boost the network’s robustness. The validation results on various testing datasets show that our method outperforms several state-of-the-art auxiliary-free matting methods in terms of the alpha and foreground prediction quality and temporal consistency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Efficient Semantic-Guidance High-Resolution Video Matting

Wider and Higher: Intensive Integration and Global Foreground Perception for Image Matting

Fast Portrait Matting Using Spatial Detail-Preserving Network

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

All data generated or analyzed during this study are included in this published article and are freely available to any researcher wishing to use them for non-commercial purposes, without breaching participant confidentiality.

References

Shahrian, E., Rajan, D., Price, B., & Cohen, S.: Improving image matting using comprehensive sampling sets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 636–643, (2013).
Forte, M., & Pitié, F.: $ F $, $ B $, Alpha Matting. arXiv preprint arXiv:2003.07711, (2020).
Juan, O., & Keriven, R.: Trimap segmentation for fast and user-friendly alpha matting. In International Workshop on Variational, Geometric, and Level Set Methods in Computer Vision (pp. 186–197). Springer, Berlin, Heidelberg, (2005).
Gupta, V., Raman, S.: Automatic trimap generation for image matting. In 2016 International conference on signal and information processing, pp. 1–5 (2016).
Cao, G., Li, J., Chen, X., He, Z.: Patch-based self-adaptive matting for high-resolution image and video. Vis. Comput. 35(1), 133–147 (2019)
Article Google Scholar
Wang, J., Cohen, M.F.: Image and video matting: a survey. Foundations and Trends® in Comput. Graphics and Vision 3(2), 97–175 (2008)
Article Google Scholar
Cai, S., Zhang, X., Fan, H., Huang, H., Liu, J., Liu, J., Sun, J.: Disentangled image matting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8819–8828 (2019).
Zhang, Y., Wang, C., Cui, M., Ren, P., Xie, X., Hua, X. S., Xu, W.: Attention-guided Temporally Coherent Video Object Matting. In Proceedings of the 29th ACM International Conference on Multimedia, pp. 5128–5137 (2021).
Lin, S., Ryabtsev, A., Sengupta, S., Curless, B. L., Seitz, S. M., & Kemelmacher-Shlizerman, I.: Real-time high-resolution background matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8762–8771, (2021).
Sengupta, S., Jayaram, V., Curless, B., Seitz, S. M., Kemelmacher-Shlizerman, I.: Background matting: The world is your green screen. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2291–2300, (2020).
Ke, Z., Li, K., Zhou, Y., Wu, Q., Mao, X., Yan, Q., Lau, R. W.: Is a green screen really necessary for real-time portrait matting?. arXiv preprint arXiv:2011.11961 (2020).
Lin, S., Yang, L., Saleemi, I., & Sengupta, S.: Robust High-Resolution Video Matting with Temporal Guidance. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 238–247, (2022).
Zhang, Y., Niu, L.: A substructure segmentation method of left heart regions from cardiac CT images using local mesh descriptors, context and spatial location information. Pattern Recognit Image Anal. 29(2), 230–239 (2019)
Article Google Scholar
Tang, H., Huang, Y., Fan, Y., Zeng, X.: Very deep residual network for image matting. In 2019 IEEE International Conference on Image Processing, pp. 4255–4259 (2019).
Chen, L. C., Papandreou, G., Schroff, F., & Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).
Xu, N., Price, B., Cohen, S., & Huang, T.: Deep image matting. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2970–2979, (2017).
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems, 30 (2017).
Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3146–3154 (2019).
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Learning a discriminative feature network for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1857–1866 (2018).
Jiang, M., Zhai, F., Kong, J.: Sparse attention module for optimizing semantic segmentation performance combined with a multi-task feature extraction network. Vis. Comput. 38(7), 2473–2488 (2022)
Article Google Scholar
Oh, S. W., Lee, J. Y., Xu, N., & Kim, S. J.: Fast user-guided video object segmentation by interaction-and-propagation networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5247–5256 (2019).
Voigtlaender, P., Chai, Y., Schroff, F., Adam, H., Leibe, B., Chen, L. C.: Feelvos: Fast end-to-end embedding learning for video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9481–9490 (2019).
Caelles, S., Maninis, K. K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., & Van Gool, L.: One-shot video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 221–230 (2017).
Hu, Y. T., Huang, J. B., & Schwing, A. G.: Videomatch: Matching based video object segmentation. In Proceedings of the European conference on computer vision (ECCV), pp. 54–70 (2018).
Shin Yoon, J., Rameau, F., Kim, J., Lee, S., Shin, S., & So Kweon, I.: Pixel-level matching for video object segmentation using convolutional neural networks. In Proceedings of the IEEE international conference on computer vision, pp. 2167–2176, (2017).
Hu, Y. T., Huang, J. B., & Schwing, A.: Maskrnn: Instance level video object segmentation. Advances in neural information processing systems, 30 (2017).
Kumar, A., Irsoy, O., Ondruska, P., Iyyer, M., Bradbury, J., Gulrajani, I., Socher, R.: Ask me anything: Dynamic memory networks for natural language processing. In International conference on machine learning, pp. 1378–1387, (2016).
Oh, S. W., Lee, J. Y., Xu, N., Kim, S. J.: Video object segmentation using space-time memory networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9226–9235 (2019).
Lu, H., Dai, Y., Shen, C., Xu, S.: Indices matter: Learning to index for deep image matting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3266–3275, (2019).
Hou, Q., & Liu, F.: Context-aware image matting for simultaneous foreground and alpha estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4130–4139, (2019).
Li, Y., Lu, H.: Natural image matting via guided contextual attention. In Proceedings of the AAAI Conference on Artificial Intelligence 34(07), 11450–11457 (2020)
Article Google Scholar
Tang, J., Aksoy, Y., Oztireli, C., Gross, M., Aydin, T. O.: Learning-based sampling for natural image matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3055–3063 (2019).
Sun, Y., Tang, C. K., & Tai, Y. W.: Semantic image matting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11120–11129 (2021).
Sun, Y., Wang, G., Gu, Q., Tang, C. K., & Tai, Y. W.: Deep video matting via spatio-temporal alignment and aggregation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6975–6984 (2021).
Ballas, N., Yao, L., Pal, C., Courville, A.: Delving deeper into convolutional networks for learning video representations. arXiv preprint arXiv:1511.06432, (2015).
Sun, J., Ke, Z., Zhang, L., Lu, H., & Lau, R. W.: MODNet-V: Improving Portrait Video Matting via Background Restoration. arXiv preprint arXiv:2109.11818, (2021).
Howard, A., Sandler, M., Chu, G., Chen, L. C., Chen, B., Tan, M., Adam, H.: Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019).
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Fei-Fei, L.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J.: Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016).
Nair, V., & Hinton, G. E.: Rectified linear units improve restricted boltzmann machines, In ICML, (2010).
Ioffe, S., & Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pp. 448–456 (2015).
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., & Sang, N.: Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European conference on computer vision, pp. 325–34, (2018).
Steven, A.: matting_human_datasets. Github. https://github.com/aisegmentcn/ (2022).
Ye, J., Jing, Y., Wang, X., Ou, K., Tao, D., Song, M.: Edge-sensitive human cutout with hierarchical granularity and loopy matting guidance. IEEE Trans. Image Process. 29, 1177–1191 (2019)
Article MathSciNet Google Scholar
Wu, H., Zheng, S., Zhang, J., & Huang, K.: Fast end-to-end trainable guided filter. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1838–1847 (2018).
Yu, C., Wang, J., Gao, C., Yu, G., Shen, C., & Sang, N.: Context prior for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12416–12425 (2020).
Rhemann, C., Rother, C., Wang, J., Gelautz, M., Kohli, P., & Rott, P.: A perceptually motivated online benchmark for image matting. In 2009 IEEE Conference on Computer Vision and Pattern Recognition , pp. 1826–1833. IEEE (2009).
Erofeev, M., Gitman, Y., Vatolin, D. S., Fedorov, A., Wang, J.: Perceptually Motivated Benchmark for Video Matting. In BMVC, pp. 99–1, (2015).
Vladislav, S.: flops-counter.pytorch. Github. https://github.com/sovrasov/flops-counter.pytorch (2019).
Mixkit: https://mixkit.co/ (2022).

Download references

Acknowledgements

The authors would like to express gratitude to Dr. Yi Wang, for giving some precious suggestions.

Author information

Authors and Affiliations

School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
Shufeng Song & Zhiping Lin
Department of Electrical and Electronic Engineering, The Hong Kong Polytechnic University, Hong Kong, China
Lap-Pui Chau

Authors

Shufeng Song
View author publications
You can also search for this author in PubMed Google Scholar
Lap-Pui Chau
View author publications
You can also search for this author in PubMed Google Scholar
Zhiping Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lap-Pui Chau.

Ethics declarations

Conflict of interest

The process of writing and the content of the article does not give grounds for raising the issue of a conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Song, S., Chau, LP. & Lin, Z. Portrait matting using an attention-based memory network. Vis Comput 40, 3733–3746 (2024). https://doi.org/10.1007/s00371-023-03061-z

Download citation

Accepted: 12 August 2023
Published: 11 September 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s00371-023-03061-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Portrait matting using an attention-based memory network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Semantic-Guidance High-Resolution Video Matting

Wider and Higher: Intensive Integration and Global Foreground Perception for Image Matting

Fast Portrait Matting Using Spatial Detail-Preserving Network

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Portrait matting using an attention-based memory network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Semantic-Guidance High-Resolution Video Matting

Wider and Higher: Intensive Integration and Global Foreground Perception for Image Matting

Fast Portrait Matting Using Spatial Detail-Preserving Network

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation