G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete Diffusion Model

Xie, Pan; Zhang, Qipeng; Peng, Taiyi; Tang, Hao; Du, Yao; Li, Zexian

Computer Science > Computer Vision and Pattern Recognition

arXiv:2208.09141 (cs)

[Submitted on 19 Aug 2022 (v1), last revised 18 Dec 2023 (this version, v3)]

Title:G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete Diffusion Model

Authors:Pan Xie, Qipeng Zhang, Taiyi Peng, Hao Tang, Yao Du, Zexian Li

View PDF HTML (experimental)

Abstract:The Sign Language Production (SLP) project aims to automatically translate spoken languages into sign sequences. Our approach focuses on the transformation of sign gloss sequences into their corresponding sign pose sequences (G2P). In this paper, we present a novel solution for this task by converting the continuous pose space generation problem into a discrete sequence generation problem. We introduce the Pose-VQVAE framework, which combines Variational Autoencoders (VAEs) with vector quantization to produce a discrete latent representation for continuous pose sequences. Additionally, we propose the G2P-DDM model, a discrete denoising diffusion architecture for length-varied discrete sequence data, to model the latent prior. To further enhance the quality of pose sequence generation in the discrete space, we present the CodeUnet model to leverage spatial-temporal information. Lastly, we develop a heuristic sequential clustering method to predict variable lengths of pose sequences for corresponding gloss sequences. Our results show that our model outperforms state-of-the-art G2P models on the public SLP evaluation benchmark. For more generated results, please visit our project page: \textcolor{blue}{\url{this https URL}}

Comments:	Accepted by AAAI2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2208.09141 [cs.CV]
	(or arXiv:2208.09141v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2208.09141

Submission history

From: Pan Xie [view email]
[v1] Fri, 19 Aug 2022 03:49:13 UTC (973 KB)
[v2] Sun, 12 Feb 2023 12:21:37 UTC (2,619 KB)
[v3] Mon, 18 Dec 2023 16:45:30 UTC (3,871 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete Diffusion Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete Diffusion Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators