CycleHand: Increasing 3D Pose Estimation Ability on In-the-wild Monocular Image through Cyclic Flow

Published: 10 October 2022


Current methods for 3D hand pose estimation fail to generalize well to in-the-wild new scenarios due to varying camera viewpoints, self-occlusions, and complex environments. To address this problem, we propose CycleHand to improve the generalization ability of the model in a self-supervised manner. Our motivation is based on an observation: if one globally rotates the whole hand and reversely rotates it back, the estimated 3D poses of fingers should keep consistent before and after the rotation because the wrist-relative hand poses stay unchanged during global 3D rotation. Hence, we propose arbitrary-rotation self-supervised consistency learning to improve the model's robustness for varying viewpoints. Another innovation of CycleHand is that we propose a high-fidelity texture map to render the photorealistic rotated hand with different lighting conditions, backgrounds, and skin tones to further enhance the effectiveness of our self-supervised task. To reduce the potential negative effects brought by the domain shift of synthetic images, we use the idea of contrastive learning to learn a synthetic-real consistent feature extractor in extracting domain-irrelevant hand representations. Experiments show that CycleHand can largely improve the hand pose estimation performance in both canonical datasets and real-world applications.

CycleHand is aim to enhance the current 3D hand pose estimation network by improving its in-the-wild performance. The core of CycleHand is simple and straightforward: involves the hard view (hand crop taken under severe viewpoint) rendered image into the training process. We utilize neural rendering to help us achieve this. Moreover, to avoid the mesh penetration problem, we come up with some novel mechanical constraints to solve this problem elegantly. Hope you enjoy our video!


    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    Published: 10 October 2022


    Author Tags

    1. 3D pose estimation
    2. domain adaption
    3. hand
    4. texture


