skip to main content
10.3115/1119250.1119261dlproceedingsArticle/Chapter ViewAbstractPublication PagessighanConference Proceedingsconference-collections
Article
Free access

A Chinese efficient analyser integrating word segmentation, part-of-speech tagging, partial parsing and full parsing

Published: 11 July 2003 Publication History

Abstract

This paper introduces an efficient analyser for the Chinese language, which efficiently and effectively integrates word segmentation, part-of-speech tagging, partial parsing and full parsing. The Chinese efficient analyser is based on a Hidden Markov Model (HMM) and an HMM-based tagger. That is, all the components are based on the same HMM-based tagging engine. One advantage of using the same single engine is that it largely decreases the code size and makes the maintenance easy. Another advantage is that it is easy to optimise the code and thus improve the speed while speed plays a critical important role in many applications. Finally, the performances of all the components can benefit from the optimisation of existing algorithms and/or adoption of better algorithms to a single engine. Experiments show that all the components can achieve state-of-art performances with high efficiency for the Chinese language.

References

[1]
Abney S. 1997. Part-of-Speech Tagging and Partial Parsing. Corpus-based Methods in Natural Language Processing. Edited by Steve Young and Gerrit Bloothooft. Kluwer Academic Publishers, Dordrecht.
[2]
Bai ShuanHu, Li HaiZhou, Lin ZhiWei and Yuan BaoSheng. 1998. Building class-based language models with contextual statistics. Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP'1998). pages 173--176. Seattle, Washington, USA.
[3]
Black E. and Abney S. 1991. A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars. Proceedings of DRAPA workshop on Speech and Natural Language. pages 306--311. Pacific Grove, CA. DRAPA.
[4]
Collins M. J. 1997. Three Generative, Lexicalised Models for Statistical Parsing. Proceedings of the Thirtieth-Five Annual Meeting of the Association for Computational Linguistics (ACL'97). pages 184--191.
[5]
Feldman R. 1997. Text Mining - Theory and Practice. Proceedings of the Third International Conference on Knowledge Discovery & Data Mining (KDD'1997).
[6]
Rabiner L. 1989. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. IEEE 77(2), pages 257--285.
[7]
Tjong K. S. Erik and Buchholz S. 2000. Introduction to the CoNLL-2000 Shared Task: Chunking. Proceedings of the Conference on Computational Language Learning (CoNLL'2000). Pages 127--132. Lisbon, Portugal. 11-14 Sept.
[8]
Viterbi A. J. 1967. Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm. IEEE Transactions on Information Theory, IT 13(2), 260--269.
[9]
Watson B. and Tsoi A Chunk. 1992. Second order Hidden Markov Models for speech recognition. Proceeding of the Fourth Australian International Conference on Speech Science and Technology. pages 146--151.
[10]
Zhou GuoDong and Su Jian. 2000a. Error-driven HMM-based Chunk Tagger with Context-dependent Lexicon. Proceedings of the Joint Conference on Empirical Methods on Natural Language Processing and Very Large Corpus (EMNLP/ VLC'2000). Hong Kong, 7-8 Oct.
[11]
Zhou GuoDong, Su Jian and Tey TongGuan. 2000b. Hybrid Text Chunking. Proceedings of the Conference on Computational Language Learning (CoNLL'2000). Pages 163--166. Lisbon, Portugal, 11-14 Sept.

Cited By

View all
  • (2011)Parsing the internal structure of wordsProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 110.5555/2002472.2002645(1405-1414)Online publication date: 19-Jun-2011
  • (2005)A lexicon-constrained character model for chinese morphological analysisProceedings of the Second international joint conference on Natural Language Processing10.1007/11562214_48(542-552)Online publication date: 11-Oct-2005

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
SIGHAN '03: Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
July 2003
193 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 11 July 2003

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)16
Reflects downloads up to 15 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2011)Parsing the internal structure of wordsProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 110.5555/2002472.2002645(1405-1414)Online publication date: 19-Jun-2011
  • (2005)A lexicon-constrained character model for chinese morphological analysisProceedings of the Second international joint conference on Natural Language Processing10.1007/11562214_48(542-552)Online publication date: 11-Oct-2005

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media