research-article

CycleHand: Increasing 3D Pose Estimation Ability on In-the-wild Monocular Image through Cyclic Flow

Authors:

Pan Pan,

Ping TanAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 2452 - 2463

https://doi.org/10.1145/3503161.3547828

Published: 10 October 2022 Publication History

Get Access

Abstract

Current methods for 3D hand pose estimation fail to generalize well to in-the-wild new scenarios due to varying camera viewpoints, self-occlusions, and complex environments. To address this problem, we propose CycleHand to improve the generalization ability of the model in a self-supervised manner. Our motivation is based on an observation: if one globally rotates the whole hand and reversely rotates it back, the estimated 3D poses of fingers should keep consistent before and after the rotation because the wrist-relative hand poses stay unchanged during global 3D rotation. Hence, we propose arbitrary-rotation self-supervised consistency learning to improve the model's robustness for varying viewpoints. Another innovation of CycleHand is that we propose a high-fidelity texture map to render the photorealistic rotated hand with different lighting conditions, backgrounds, and skin tones to further enhance the effectiveness of our self-supervised task. To reduce the potential negative effects brought by the domain shift of synthetic images, we use the idea of contrastive learning to learn a synthetic-real consistent feature extractor in extracting domain-irrelevant hand representations. Experiments show that CycleHand can largely improve the hand pose estimation performance in both canonical datasets and real-world applications.

Supplementary Material

MP4 File (MM22-358.mp4)

CycleHand is aim to enhance the current 3D hand pose estimation network by improving its in-the-wild performance. The core of CycleHand is simple and straightforward: involves the hard view (hand crop taken under severe viewpoint) rendered image into the training process. We utilize neural rendering to help us achieve this. Moreover, to avoid the mesh penetration problem, we come up with some novel mechanical constraints to solve this problem elegantly. Hope you enjoy our video!

Download
379.35 MB

References

[1]

Adnane Boukhayma, Rodrigo de Bem, and Philip HS Torr. 2019. 3d hand shape and pose from images in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10843--10852.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Training‐based head pose estimation under monocular vision

DOPE: Distillation of Part Experts for Whole-Body 3D Pose Estimation in the Wild

Can 3D Pose Be Learned from 2D Projections Alone?

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations