Investigating Glyph Phonetic Information for Chinese Spell Checking: What Works and What's Next

Zhang, Xiaotian; Zheng, Yanjun; Yan, Hang; Qiu, Xipeng

Computer Science > Computation and Language

arXiv:2212.04068 (cs)

[Submitted on 8 Dec 2022 (v1), last revised 21 May 2023 (this version, v3)]

Title:Investigating Glyph Phonetic Information for Chinese Spell Checking: What Works and What's Next

Authors:Xiaotian Zhang, Yanjun Zheng, Hang Yan, Xipeng Qiu

View PDF

Abstract:While pre-trained Chinese language models have demonstrated impressive performance on a wide range of NLP tasks, the Chinese Spell Checking (CSC) task remains a challenge. Previous research has explored using information such as glyphs and phonetics to improve the ability to distinguish misspelled characters, with good results. However, the generalization ability of these models is not well understood: it is unclear whether they incorporate glyph-phonetic information and, if so, whether this information is fully utilized. In this paper, we aim to better understand the role of glyph-phonetic information in the CSC task and suggest directions for improvement. Additionally, we propose a new, more challenging, and practical setting for testing the generalizability of CSC models. All code is made publicly available.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2212.04068 [cs.CL]
	(or arXiv:2212.04068v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2212.04068

Submission history

From: Xiaotian Zhang [view email]
[v1] Thu, 8 Dec 2022 04:37:29 UTC (11,814 KB)
[v2] Sun, 18 Dec 2022 04:21:45 UTC (11,814 KB)
[v3] Sun, 21 May 2023 14:55:37 UTC (12,004 KB)

Computer Science > Computation and Language

Title:Investigating Glyph Phonetic Information for Chinese Spell Checking: What Works and What's Next

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Investigating Glyph Phonetic Information for Chinese Spell Checking: What Works and What's Next

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators