skip to main content
10.5555/2390948.2391007dlproceedingsArticle/Chapter ViewAbstractPublication PagesemnlpConference Proceedingsconference-collections
research-article
Free access

Joint Chinese word segmentation, POS tagging and parsing

Published: 12 July 2012 Publication History

Abstract

In this paper, we propose a novel decoding algorithm for discriminative joint Chinese word segmentation, part-of-speech (POS) tagging, and parsing. Previous work often used a pipeline method -- Chinese word segmentation followed by POS tagging and parsing, which suffers from error propagation and is unable to leverage information in later modules for earlier components. In our approach, we train the three individual models separately during training, and incorporate them together in a unified framework during decoding. We extend the CYK parsing algorithm so that it can deal with word segmentation and POS tagging features. As far as we know, this is the first work on joint Chinese word segmentation, POS tagging and parsing. Our experimental results on Chinese Tree Bank 5 corpus show that our approach outperforms the state-of-the-art pipeline system.

References

[1]
Wanxiang Che, Zhenghua Li, Yongqiang Li, Yuhang Guo, Bing Qin, and Ting Liu. 2009. Multilingual dependency-based syntactic and semantic parsing. In Proceedings of CoNLL 09, pages 49--54.
[2]
Michael Collins. 2002. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Proceedings of EMNLP 2002, pages 1--8.
[3]
Jenny Rose Finkel, Alex Kleeman, and Christopher D. Manning. 2008. Efficient, feature-based, conditional random field parsing. In Proceedings of ACL-08: HLT, pages 959--967.
[4]
Yoav Goldberg and Reut Tsarfaty. 2008. A single generative model for joint morphological segmentation and syntactic parsing. In Proceedings of ACL 2008: HLT, pages 371--379.
[5]
Spence Green and Christopher D. Manning. 2010. Better arbic parsing: Baselines, evaluations, and analysis. In Proceedings of Coling 2010, pages 394--402.
[6]
Mary Harper and Zhongqiang Huang. 2009. Chinese statistical parsing. In Gale Book.
[7]
Jun Hatori, Takuya Matsuzaki, Yusuke Miyao, and Jun'ichi Tsujii. 2011. Incremental joint pos tagging and dependency parsing in chinese. In Proceedings of IJCNLP 2011, pages 1216--1224.
[8]
Wenbin Jiang, Liang Huang, Qun Liu, and Yajuan Lü. 2008a. A cascaded linear model for joint chinese word segmentation and part-of-speech tagging. In Proceedings of ACL 2008: HLT, pages 897--904.
[9]
Wenbin Jiang, Haitao Mi, and Qun Liu. 2008b. Word lattice reranking for chinese word segmentation and part-of-speech tagging. In Proceedings of Coling 2008, pages 385--392.
[10]
Wenbin Jiang, Liang Huang, and Qun Liu. 2009. Automatic adaptation of annotation standards: Chinese word segmentation and pos tagging -- a case study. In Proceedings of ACL-IJCNLP 2009, pages 522--530.
[11]
Guangjin Jin and Xiao Chen. 2008. The fourth international chinese language processing bakeoff: Chinese word segmentation, named entity recognition and chinese pos tagging. In Proceedings of Sixth SIGHAN Workshop on Chinese Language Processing.
[12]
Canasai Kruengkrai, Kiyotaka Uchimoto, Jun'ichi Kazama, Yiou Wang, Kentaro Torisawa, and Hitoshi Isahara. 2009. An error-driven word-character hybrid model for joint chinese word segmentation and pos tagging. In Proceedings of ACL 2009, pages 513--521.
[13]
John Lafferty, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML 2001, pages 282--289.
[14]
John Lee, Jason Naradowsky, and David A. Smith. 2011. A discriminative model for joint morphological disambiguation and dependency parsing. In Proceedings ACL 2011: HLT, pages 885--894.
[15]
Zhenghua Li, Min Zhang, Wanxiang Che, Ting Liu, Wenliang Chen, and Haizhou Li. 2011. Joint models for chinese pos tagging and dependency parsing. In Proceedings of EMNLP 2011, pages 1180--1191.
[16]
Slav Petrov and Dan Klein. 2007. Improved inference for unlexicalized parsing. In Proceedings of NAACL 2007, pages 404--411.
[17]
Xian Qian, Qi Zhang, Yaqian Zhou, Xuanjing Huang, and Lide Wu. 2010. Joint training and decoding using virtual nodes for cascaded segmentation and tagging tasks. In Proceedings of EMNLP 2010, pages 187--195.
[18]
Brian Roark, Mary Harper, Eugene Charniak, Bonnie Dorr, Mark Johnson, Jeremy G. Kahn, Yang Liu, Mari Ostendorf, John Hale, Anna Krasnyanskaya, Matthew Lease, Izhak Shafran, Matthew Snover, Robin Stewart, Lisa Yung, and Lisa Yung. 2006. Sparseval: Evaluation metrics for parsing speech. In Proceedings Language Resources and Evaluation (LREC).
[19]
Sunita Sarawagi and William W. Cohen. 2004. Semi-markov conditional random fields for information extraction. In Proceedings of NIPS 2004.
[20]
Weiwei Sun. 2011. A stacked sub-word model for joint chinese word segmentation and part-of-speech tagging. In Proceedings of ACL 2011, pages 1385--1394.
[21]
Yue Zhang and Stephen Clark. 2008. Joint word segmentation and POS tagging using a single perceptron. In Proceedings of ACL 2008: HLT, pages 888--896.
[22]
Yue Zhang and Stephen Clark. 2010. A fast decoder for joint word segmentation and POS-tagging using a single discriminative model. In Proceedings of EMNLP 2010, pages 843--852.
[23]
Yue Zhang and Stephen Clark. 2011. Syntactic processing using the generalized perceptron and beam search. Comput. Linguist., 37(1): 105--151.
[24]
Ruiqiang Zhang, Genichiro Kikui, and Eiichiro Sumita. 2006. Subword-based tagging for confidence-dependent chinese word segmentation. In Proceedings of the COLING/ACL 2006, pages 961--968.
[25]
Hai Zhao and Chunyu Kit. 2008. Unsupervised segmentation helps supervised learning of character tagging forword segmentation and named entity recognition. In Proceedings of Sixth SIGHAN Workshop on Chinese Language Processing, pages 106--111.

Cited By

View all
  • (2021)A Neural Joint Model with BERT for Burmese Syllable Segmentation, Word Segmentation, and POS TaggingACM Transactions on Asian and Low-Resource Language Information Processing10.1145/343681820:4(1-23)Online publication date: 26-May-2021
  • (2020)Research on the Labelling Technology of Morphology and SyntaxProceedings of the 2nd International Conference on Artificial Intelligence and Advanced Manufacture10.1145/3421766.3421812(181-184)Online publication date: 15-Oct-2020
  • (2018)Experimental Study of Chinese POS TaggingProceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence10.1145/3297156.3297158(1-5)Online publication date: 8-Dec-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
EMNLP-CoNLL '12: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
July 2012
1573 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 12 July 2012

Qualifiers

  • Research-article

Acceptance Rates

Overall Acceptance Rate 73 of 234 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)9
Reflects downloads up to 15 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2021)A Neural Joint Model with BERT for Burmese Syllable Segmentation, Word Segmentation, and POS TaggingACM Transactions on Asian and Low-Resource Language Information Processing10.1145/343681820:4(1-23)Online publication date: 26-May-2021
  • (2020)Research on the Labelling Technology of Morphology and SyntaxProceedings of the 2nd International Conference on Artificial Intelligence and Advanced Manufacture10.1145/3421766.3421812(181-184)Online publication date: 15-Oct-2020
  • (2018)Experimental Study of Chinese POS TaggingProceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence10.1145/3297156.3297158(1-5)Online publication date: 8-Dec-2018
  • (2017)A feature-enriched neural model for joint Chinese word segmentation and part-of-speech taggingProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3171837.3171839(3960-3966)Online publication date: 19-Aug-2017
  • (2016)Joint argument inference in chinese event extraction with argument consistency and event relevanceIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2015.249714824:4(612-622)Online publication date: 1-Apr-2016
  • (2015)Character-based parsing with convolutional neural networkProceedings of the 24th International Conference on Artificial Intelligence10.5555/2832249.2832395(1054-1060)Online publication date: 25-Jul-2015
  • (2015)A Unified Model for Solving the OOV Problem of Chinese Word SegmentationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/269994014:3(1-29)Online publication date: 12-Jun-2015

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media