research-article

MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar

Authors:

Hongjiang Xiao,

Shengping Zhang,

Yebin LiuAuthors Info & Claims

SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers

Article No.: 58, Pages 1 - 9

https://doi.org/10.1145/3641519.3657499

Published: 13 July 2024 Publication History

Abstract

The ability to animate photo-realistic head avatars reconstructed from monocular portrait video sequences represents a crucial step in bridging the gap between the virtual and real worlds. Recent advancements in head avatar techniques, including explicit 3D morphable meshes (3DMM), point clouds, and neural implicit representation have been exploited for this ongoing research. However, 3DMM-based methods are constrained by their fixed topologies, point-based approaches suffer from a heavy training burden due to the extensive quantity of points involved, and the last ones suffer from limitations in deformation flexibility and rendering efficiency. In response to these challenges, we propose MonoGaussianAvatar (Monocular Gaussian Point-based Head Avatar), a novel approach that harnesses 3D Gaussian point representation coupled with a Gaussian deformation field to learn explicit head avatars from monocular portrait videos. We define our head avatars with Gaussian points characterized by adaptable shapes, enabling flexible topology. These points exhibit movement with a Gaussian deformation field in alignment with the target pose and expression of a person, facilitating efficient deformation. Additionally, the Gaussian points have controllable shape, size, color, and opacity combined with Gaussian splatting, allowing for efficient training and rendering. Experiments demonstrate the superior performance of our method, which achieves state-of-the-art results among previous methods. Code and data can be found at https://github.com/aipixel/MonoGaussianAvatar.

Supplemental Material

MP4 File

video and Appendix

Download
229.03 MB

PDF File

video and Appendix

Download
2.20 MB

References

[1]

ShahRukh Athar, Zhixin Shu, and Dimitris Samaras. 2023. Flame-in-nerf: Neural control of radiance fields for free view face animation. In IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG). 1–8.

Digital Library

[2]

ShahRukh Athar, Zexiang Xu, Kalyan Sunkavalli, Eli Shechtman, and Zhixin Shu. 2022. Rignerf: Fully controllable neural 3d portraits. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition. 20364–20373.

[3]

Ziqian Bai, Feitong Tan, Zeng Huang, Kripasindhu Sarkar, Danhang Tang, Di Qiu, Abhimitra Meka, Ruofei Du, Mingsong Dou, Sergio Orts-Escolano, 2023. Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16890–16900.

[4]

Jonathan T Barron and Jitendra Malik. 2011. High-frequency shape and albedo from shading using natural image statistics. In CVPR 2011. IEEE, 2521–2528.

Digital Library

[5]

Thabo Beeler, Bernd Bickel, Paul Beardsley, Bob Sumner, and Markus Gross. 2010. High-Quality Single-Shot Capture of Facial Geometry. ACM Trans. Graph. 29, 4, Article 40 (jul 2010), 9 pages. https://doi.org/10.1145/1778765.1778777

Digital Library

[6]

Alexander Bergman, Petr Kellnhofer, Wang Yifan, Eric Chan, David Lindell, and Gordon Wetzstein. 2022. Generative neural articulated radiance fields. Advances in Neural Information Processing Systems 35 (2022), 19900–19916.

[7]

Shrisha Bharadwaj, Yufeng Zheng, Otmar Hilliges, Michael J. Black, and Victoria Fernandez Abrevaya. 2023. FLARE: Fast learning of Animatable and Relightable Mesh Avatars. ACM Transactions on Graphics 42 (Dec. 2023), 15. https://doi.org/10.1145/3618401

Digital Library

[8]

Volker Blanz and Thomas Vetter. 2023. A morphable model for the synthesis of 3D faces. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2. 157–164.

Digital Library

[9]

Derek Bradley, Wolfgang Heidrich, Tiberiu Popa, and Alla Sheffer. 2010. High Resolution Passive Facial Performance Capture. 29, 4, Article 41 (jul 2010), 10 pages. https://doi.org/10.1145/1778765.1778778

Digital Library

[10]

Zhiqin Chen and Hao Zhang. 2019. Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5939–5948.

[11]

Hao-Bin Duan, Miao Wang, Jin-Chuan Shi, Xu-Chuan Chen, and Yan-Pei Cao. 2023. BakedAvatar: Baking Neural Fields for Real-Time Head Avatar Synthesis. arXiv preprint arXiv:2311.05521 (2023).

[12]

Guy Gafni, Justus Thies, Michael Zollhofer, and Matthias Nießner. 2021. Dynamic neural radiance fields for monocular 4d facial avatar reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8649–8658.

[13]

Xuan Gao, Chenglai Zhong, Jun Xiang, Yang Hong, Yudong Guo, and Juyong Zhang. 2022a. Reconstructing personalized semantic facial nerf models from monocular video. ACM Transactions on Graphics (TOG) 41, 6 (2022), 1–12.

Digital Library

[14]

Xuan Gao, Chenglai Zhong, Jun Xiang, Yang Hong, Yudong Guo, and Juyong Zhang. 2022b. Reconstructing Personalized Semantic Facial NeRF Models From Monocular Video. ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia) 41, 6 (2022). https://doi.org/10.1145/3550454.3555501

Digital Library

[15]

Abhijeet Ghosh, Graham Fyffe, Borom Tunwattanapong, Jay Busch, Xueming Yu, and Paul Debevec. 2011. Multiview Face Capture Using Polarized Spherical Gradient Illumination. ACM Trans. Graph. 30, 6 (dec 2011), 1–10. https://doi.org/10.1145/2070781.2024163

Digital Library

[16]

Philip-William Grassal, Malte Prinzler, Titus Leistner, Carsten Rother, Matthias Nießner, and Justus Thies. 2022. Neural head avatars from monocular RGB videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18653–18664.

[17]

Roger Grosse, Micah K Johnson, Edward H Adelson, and William T Freeman. 2009. Ground truth dataset and baseline evaluations for intrinsic image algorithms. In 2009 IEEE 12th International Conference on Computer Vision. IEEE, 2335–2342.

[18]

Yudong Guo, Keyu Chen, Sen Liang, Yong-Jin Liu, Hujun Bao, and Juyong Zhang. 2021. Ad-nerf: Audio driven neural radiance fields for talking head synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5784–5794.

[19]

Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang, Boyao Zhou, Boning Liu, Shengping Zhang, and Liqiang Nie. 2023. Gaussianavatar: Towards realistic human avatar modeling from a single video via animatable 3d gaussians. arXiv preprint arXiv:2312.02134 (2023).

[20]

Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer, 694–711.

[21]

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 2023. 3d gaussian splatting for real-time radiance field rendering. TOG 42, 4 (2023), 1–14.

Digital Library

[22]

Taras Khakhulin, Vanessa Sklyarova, Victor Lempitsky, and Egor Zakharov. 2022. Realistic One-shot Mesh-based Head Avatars. In European Conference of Computer vision (ECCV).

[23]

Marc Levoy, Kari Pulli, Brian Curless, Szymon Rusinkiewicz, David Koller, Lucas Pereira, Matt Ginzton, Sean Anderson, James Davis, Jeremy Ginsberg, Jonathan Shade, and Duane Fulk. 2000. The Digital Michelangelo Project: 3D Scanning of Large Statues. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques(SIGGRAPH ’00). ACM Press/Addison-Wesley Publishing Co., USA, 131–144. https://doi.org/10.1145/344779.344849

Digital Library

[24]

Tianye Li, Timo Bolkart, Michael J Black, Hao Li, and Javier Romero. 2017. Learning a model of facial shape and expression from 4D scans.ACM Trans. Graph. 36, 6 (2017), 194–1.

Digital Library

[25]

Zhe Li, Zerong Zheng, Lizhen Wang, and Yebin Liu. 2023. Animatable gaussians: Learning pose-dependent gaussian maps for high-fidelity human avatar modeling. arXiv preprint arXiv:2311.16096 (2023).

[26]

Xian Liu, Yinghao Xu, Qianyi Wu, Hang Zhou, Wayne Wu, and Bolei Zhou. 2022. Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation. In Proceedings of the European Conference on Computer Vision (ECCV).

Digital Library

[27]

Stephen Lombardi, Jason Saragih, Tomas Simon, and Yaser Sheikh. 2018. Deep Appearance Models for Face Rendering. ACM Trans. Graph. 37, 4, Article 68 (July 2018), 13 pages.

Digital Library

[28]

Stephen Lombardi, Tomas Simon, Gabriel Schwartz, Michael Zollhoefer, Yaser Sheikh, and Jason Saragih. 2021. Mixture of Volumetric Primitives for Efficient Neural Rendering. ACM Trans. Graph. 40, 4, Article 59 (jul 2021), 13 pages.

Digital Library

[29]

Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. 2024. Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis. In 3DV.

[30]

Shugao Ma, Tomas Simon, Jason Saragih, Dawei Wang, Yuecheng Li, Fernando De La Torre, and Yaser Sheikh. 2021. Pixel Codec Avatars. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 64–73.

[31]

Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, and Andreas Geiger. 2019. Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4460–4470.

[32]

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2021. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99–106.

Digital Library

[33]

Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. 2019. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 165–174.

[34]

Pascal Paysan, Reinhard Knothe, Brian Amberg, Sami Romdhani, and Thomas Vetter. 2009. A 3D face model for pose and illumination invariant face recognition. In 2009 sixth IEEE international conference on advanced video and signal based surveillance. Ieee, 296–301.

[35]

Shunsuke Saito, Gabriel Schwartz, Tomas Simon, Junxuan Li, and Giljoo Nam. 2023. Relightable gaussian codec avatars. arXiv preprint arXiv:2312.03704 (2023).

[36]

Ruizhi Shao, Jingxiang Sun, Cheng Peng, Zerong Zheng, Boyao Zhou, Hongwen Zhang, and Yebin Liu. 2023. Control4d: Dynamic portrait editing by learning 4d gan from 2d diffusion-based editor. arXiv preprint arXiv:2305.20082 (2023).

[37]

Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, and Gang Zeng. 2023. DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation. arXiv preprint arXiv:2309.16653 (2023).

[38]

Marshall Tappen, William Freeman, and Edward Adelson. 2002. Recovering intrinsic images from a single image. Advances in neural information processing systems 15 (2002).

[39]

Cong Wang, Di Kang, Yanpei Cao, Linchao Bao, Ying Shan, and Song-Hai Zhang. 2023. Neural Point-based Volumetric Avatar: Surface-guided Neural Points for Efficient and Photorealistic Volumetric Head Avatar. In ACM SIGGRAPH Asia 2023 Conference Proceedings.

[40]

Chenyang Wang, Zerong Zheng, Tao Yu, Xiaoqian Lv, Bineng Zhong, Shengping Zhang, and Liqiang Nie. 2024. DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]

Lizhen Wang, Zhiyuan Chen, Tao Yu, Chenguang Ma, Liang Li, and Yebin Liu. 2022. Faceverse: a fine-grained and detail-controllable 3d face morphable model from a hybrid dataset. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 20333–20342.

[42]

Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Wang Xinggang. 2023. 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering. arXiv preprint arXiv:2310.08528 (2023).

[43]

Jun Xiang, Xuan Gao, Yudong Guo, and Juyong Zhang. 2023. FlashAvatar: High-Fidelity Digital Avatar Rendering at 300FPS. arXiv preprint arXiv:2312.02214 (2023).

[44]

Yuelang Xu, Benwang Chen, Zhe Li, Hongwen Zhang, Lizhen Wang, Zerong Zheng, and Yebin Liu. 2023a. Gaussian head avatar: Ultra high-fidelity head avatar via dynamic gaussians. arXiv preprint arXiv:2312.03029 (2023).

[45]

Yuelang Xu, Lizhen Wang, Xiaochen Zhao, Hongwen Zhang, and Yebin Liu. 2023c. AvatarMAV: Fast 3D Head Avatar Reconstruction Using Motion-Aware Neural Voxels. In ACM SIGGRAPH 2023 Conference Proceedings.

Digital Library

[46]

Yuelang Xu, Hongwen Zhang, Lizhen Wang, Xiaochen Zhao, Huang Han, Qi Guojun, and Yebin Liu. 2023d. LatentAvatar: Learning Latent Expression Code for Expressive Neural Head Avatar. In ACM SIGGRAPH 2023 Conference Proceedings.

[47]

Zhen Xu, Sida Peng, Haotong Lin, Guangzhao He, Jiaming Sun, Yujun Shen, Hujun Bao, and Xiaowei Zhou. 2023b. 4K4D: Real-Time 4D View Synthesis at 4K Resolution. (2023).

[48]

Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin. 2023. Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction. arXiv preprint arXiv:2309.13101 (2023).

[49]

Lior Yariv, Yoni Kasten, Dror Moran, Meirav Galun, Matan Atzmon, Basri Ronen, and Yaron Lipman. 2020. Multiview neural surface reconstruction by disentangling geometry and appearance. Advances in Neural Information Processing Systems 33 (2020), 2492–2502.

[50]

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586–595.

[51]

Xiaochen Zhao, Jingxiang Sun, Lizhen Wang, and Yebin Liu. 2023a. InvertAvatar: Incremental GAN Inversion for Generalized Head Avatars. arXiv preprint arXiv:2312.02222 (2023).

[52]

Xiaochen Zhao, Lizhen Wang, Jingxiang Sun, Hongwen Zhang, Jinli Suo, and Yebin Liu. 2023b. HAvatar: High-Fidelity Head Avatar via Facial Model Conditioned Neural Radiance Field. ACM Trans. Graph. (oct 2023). https://doi.org/10.1145/3626316 Just Accepted.

Digital Library

[53]

Shunyuan Zheng, Boyao Zhou, Ruizhi Shao, Boning Liu, Shengping Zhang, Liqiang Nie, and Yebin Liu. 2023b. Gps-gaussian: Generalizable pixel-wise 3d gaussian splatting for real-time human novel view synthesis. arXiv preprint arXiv:2312.02155 (2023).

[54]

Yufeng Zheng, Victoria Fernández Abrevaya, Marcel C Bühler, Xu Chen, Michael J Black, and Otmar Hilliges. 2022. Im avatar: Implicit morphable head avatars from videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13545–13555.

[55]

Yufeng Zheng, Wang Yifan, Gordon Wetzstein, Michael J Black, and Otmar Hilliges. 2023a. Pointavatar: Deformable point-based head avatars from videos. In CVPR. 21057–21067.

[56]

Wojciech Zielonka, Timo Bolkart, and Justus Thies. 2023. Instant volumetric head avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4574–4584.

[57]

Matthias Zwicker, Hanspeter Pfister, Jeroen Van Baar, and Markus Gross. 2001. EWA volume splatting. In Proceedings Visualization, 2001. VIS’01. IEEE, 29–538.

Cited By

Zhao XSun JWang LSuo JLiu Y(2024)InvertAvatar: Incremental GAN Inversion for Generalized Head AvatarsACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657478(1-10)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657478
Wu TYuan YZhang LYang JCao YYan LGao L(2024)Recent advances in 3D Gaussian splattingComputational Visual Media10.1007/s41095-024-0436-y10:4(613-642)Online publication date: 8-Jul-2024
https://doi.org/10.1007/s41095-024-0436-y

Index Terms

MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

3D Gaussian Blendshapes for Head Avatar Animation
SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers

We introduce 3D Gaussian blendshapes for modeling photorealistic head avatars. Taking a monocular video as input, we learn a base head model of neutral expression, along with a group of expression blendshapes, each of which corresponds to a basis ...
HQ3DAvatar: High-quality Implicit 3D Head Avatar
Multi-view volumetric rendering techniques have recently shown great potential in modeling and synthesizing high-quality head avatars. A common approach to capture full head dynamic performances is to track the underlying geometry using a mesh-based ...
Demo of FaceVR: real-time facial reenactment and eye gaze control in virtual reality
SIGGRAPH '17: ACM SIGGRAPH 2017 Emerging Technologies

We introduce FaceVR, a novel method for gaze-aware facial reenactment in the Virtual Reality (VR) context. The key component of FaceVR is a robust algorithm to perform real-time facial motion capture of an actor who is wearing a head-mounted display (...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers

July 2024

1106 pages

ISBN:9798400705250

DOI:10.1145/3641519

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Natural Science Foundation of China

Conference

SIGGRAPH '24

Sponsor:

SIGGRAPH

SIGGRAPH '24: Special Interest Group on Computer Graphics and Interactive Techniques Conference

July 27 - August 1, 2024

CO, Denver, USA

Acceptance Rates

Overall Acceptance Rate 1,822 of 8,601 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
218
Total Downloads

Downloads (Last 12 months)218
Downloads (Last 6 weeks)124

Reflects downloads up to 15 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhao XSun JWang LSuo JLiu Y(2024)InvertAvatar: Incremental GAN Inversion for Generalized Head AvatarsACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657478(1-10)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657478
Wu TYuan YZhang LYang JCao YYan LGao L(2024)Recent advances in 3D Gaussian splattingComputational Visual Media10.1007/s41095-024-0436-y10:4(613-642)Online publication date: 8-Jul-2024
https://doi.org/10.1007/s41095-024-0436-y

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents