skip to main content
research-article

A Neural Joint Model with BERT for Burmese Syllable Segmentation, Word Segmentation, and POS Tagging

Published: 26 May 2021 Publication History

Abstract

The smallest semantic unit of the Burmese language is called the syllable. In the present study, it is intended to propose the first neural joint learning model for Burmese syllable segmentation, word segmentation, and part-of-speech (POS) tagging with the BERT. The proposed model alleviates the error propagation problem of the syllable segmentation. More specifically, it extends the neural joint model for Vietnamese word segmentation, POS tagging, and dependency parsing [28] with the pre-training method of the Burmese character, syllable, and word embedding with BiLSTM-CRF-based neural layers. In order to evaluate the performance of the proposed model, experiments are carried out on Burmese benchmark datasets, and we fine-tune the model of multilingual BERT. Obtained results show that the proposed joint model can result in an excellent performance.

References

[1]
Chris Alberti, Kenton Lee, and Michael Collins. 2019. A BERT baseline for the natural questions. arXiv: Computation and Language.
[2]
Cunli Mao, Zhibo Man, Zhengtao Yu, Zhenhan Wang, Shengxiang Gao, and Yafei Zhang. 2020. A Burmese dependency parsing method based on transfer learning. In Proceedings of the 2020 International Conference on Asian Language Processing (IALP’20). IEEE, 92–97.
[3]
Bernd Bohnet, Ryan McDonald, Goncalo Simoes, Daniel Andor, Emily Pitler, and Joshua Maynez. 2018. Morphosyntactic tagging with a meta-BiLSTM model over context sensitive token encodings. arXiv:1805.08237.
[4]
Wanxiang Che, Yijia Liu, Yuxuan Wang, Bo Zheng, and Ting Liu. 2018. Towards better UD parsing: Deep contextualized word embeddings, ensemble, and treebank concatenation. arXiv:1807.03121.
[5]
Xinchi Chen, Xipeng Qiu, and Xuanjing Huang. 2016. A feature-enriched neural model for joint Chinese word segmentation and part-of-speech tagging. arXiv:1611.05384.
[6]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805.
[7]
Chenchen Ding, Hnin Thu Zar Aye, Win Pa Pa, Khin Thandar Nwet, Khin Mar Soe, Masao Utiyama, and Eiichiro Sumita. 2019. Towards Burmese (Myanmar) morphological analysis: Syllable-based tokenization and part-of-speech tagging. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 19, 1 (2019), 1–34.
[8]
Chenchen Ding, Ye Kyaw Thu, Masao Utiyama, and Eiichiro Sumita. 2016. Word segmentation for Burmese (Myanmar). ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 15, 4 (2016), 1–10.
[9]
Chenchen Ding, Masao Utiyama, and Eiichiro Sumita. 2018. NOVA: A feasible and flexible annotation system for joint tokenization and part-of-speech tagging. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 18, 2 (2018), 1–18.
[10]
Chenchen Ding, Sann Su Su Yee, Win Pa Pa, Khin Mar Soe, Masao Utiyama, and Eiichiro Sumita. 2020. A Burmese (Myanmar) treebank: Guideline and analysis. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 19, 3 (2020), 1–13.
[11]
Jun Hatori, Takuya Matsuzaki, Yusuke Miyao, and Jun’ichi Tsujii. 2012. Incremental joint approach to word segmentation, POS tagging, and dependency parsing in Chinese. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 1045–1053.
[12]
Tin Htay Hlaing and Yoshiki Mikami. 2014. Automatic syllable segmentation of Myanmar texts using finite state transducer. ICTer 6, 2 (2014).
[13]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.
[14]
Hla Hla Htay and Kavi Narayana Murthy. 2008. Myanmar word segmentation using syllable level longest matching. In Proceedings of the 6th Workshop on Asian Language Resources.
[15]
Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991.
[16]
Wenbin Jiang, Liang Huang, Qun Liu, and Yajuan Lü. 2008. A cascaded linear model for joint Chinese word segmentation and part-of-speech tagging. In Proceedings of ACL-08: HLT. 897–904.
[17]
Zhanming Jie and Wei Lu. 2019. Dependency-guided LSTM-CRF for named entity recognition. arXiv:1909.10148.
[18]
Dan Kondratyuk and Milan Straka. 2019. 75 languages, 1 model: Parsing universal dependencies universally. arXiv:1904.02099.
[19]
Canasai Kruengkrai, Kiyotaka Uchimoto, Jun’ichi Kazama, Yiou Wang, Kentaro Torisawa, and Hitoshi Isahara. 2009. An error-driven word-character hybrid model for joint Chinese word segmentation and POS tagging. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1. Association for Computational Linguistics, 513–521.
[20]
John Lafferty. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML.
[21]
Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv:1603.01360.
[22]
Yang Liu. 2019. Fine-tune BERT for extractive summarization. arXiv:1903.10318.
[23]
Zin Maung Maung and Yoshiki Mikami. 2008. A rule-based syllable segmentation of Myanmar text. In Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages.
[24]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781.
[25]
Aye Myat Mon, Soe Lai Phyue, Myint Myint Thein, Su Su Htay, and Thinn Thinn Win. 2010. Analysis of Myanmar word boundary and segmentation by using statistical approach. In Proceedings of the 2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE’10), Vol. 5. IEEE, V5-233–V5-237.
[26]
Cynthia Myint. 2011. A hybrid approach for part-of-speech tagging of Burmese texts. In Proceedings of the 2011 International Conference on Computer and Management (CAMAN’11). IEEE, 1–4.
[27]
Phyu Hninn Myint, Tin Myat Htwe, and Ni Lar Thein. 2011. Bigram part-of-speech tagger for Myanmar language. In Proceedings of 2011 International Conference on Information Communication and Management, Singapore. 147–152.
[28]
Dat Quoc Nguyen. [n.d.]. A neural joint model for Vietnamese word segmentation, POS tagging and dependency parsing.
[29]
Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage re-ranking with BERT.
[30]
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. arXiv:1802.05365.
[31]
Myat Lay Phyu and Kiyota Hashimoto. 2017. Burmese word segmentation with character clustering and CRFs. In Proceedings of the 2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE’17). IEEE, 1–6.
[32]
Tao Qian, Yue Zhang, Meishan Zhang, Yafeng Ren, and Donghong Ji. 2015. A transition-based model for joint segmentation, POS-tagging and normalization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1837–1846.
[33]
Xian Qian and Yang Liu. 2012. Joint Chinese word segmentation, POS tagging and parsing. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 501–511.
[34]
Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai, and Xuanjing Huang. 2020. Pre-trained models for natural language processing: A survey. Science China Technological Sciences (2020), 1–26.
[35]
Nuo Qun, Hang Yan, Xipeng Qiu, and Xuanjing Huang. 2020. Chinese word segmentation via BiLSTM+Semi-CRF with relay node. Journal of Computer Science and Technology 35, 5 (2020), 1115–1126.
[36]
Lin Songkai, Mao Cunli, Yu Zhengtao, Guo Jianyi, Wang Hongbin, and Zhang Jiafu. 2018. A method of Myanmar word segmentation based on convolution neural network. Journal of Chinese Information Processing 6 (2018), 8.
[37]
Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, and Cordelia Schmid. 2019. VideoBERT: A joint model for video and language representation learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7464–7473.
[38]
Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to fine-tune BERT for text classification? CoRR abs/1905.05583 (2019). arxiv:1905.05583. http://arxiv.org/abs/1905.05583
[39]
Weiwei Sun. 2011. A stacked sub-word model for joint Chinese word segmentation and part-of-speech tagging. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 1385–1394.
[40]
Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019. Bert rediscovers the classical nlp pipeline. arXiv:1905.05950.
[41]
Tun Thura Thet, Jin-Cheon Na, and Wunna Ko Ko. 2008. Word segmentation for the Myanmar language. Journal of Information Science 34, 5 (2008), 688–704.
[42]
Ye Kyaw Thu, Andrew Finch, Eiichiro Sumita, and Yoshinori Sagisaka. 2014. Integrating dictionaries into an unsupervised model for Myanmar word segmentation. In Proceedings of the Fifth Workshop on South and Southeast Asian Natural Language Processing. 20–27.
[43]
Ye Kyaw Thu, Win Pa Pa, Masao Utiyama, Andrew Finch, and Eiichiro Sumita. 2016. Introducing the Asian language treebank (ALT). In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). 1574–1578.
[44]
Yuxuan Wang, Wanxiang Che, Jiang Guo, Yijia Liu, and Ting Liu. 2019. Cross-lingual BERT transformation for zero-shot dependency parsing. arXiv:1909.06775.
[45]
Liner Yang, Meishan Zhang, Yang Liu, Maosong Sun, Nan Yu, and Guohong Fu. 2017. Joint POS tagging and dependence parsing with transition-based neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, 8 (2017), 1352–1358.
[46]
Meishan Zhang, Nan Yu, and Guohong Fu. 2018. A simple and effective neural model for joint word segmentation and POS tagging. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, 9 (2018), 1528–1538.
[47]
Meishan Zhang, Yue Zhang, Wanxiang Che, and Ting Liu. 2014. Character-level Chinese dependency parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1326–1336.
[48]
Shaoning Zhang, Cunli Mao, Zhengtao Yu, Hongbin Wang, Zhongwei Li, and Jiafu Zhang. 2018. Word segmentation for Burmese based on dual-layer CRFs. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 18, 1 (2018), 1–11.
[49]
Yue Zhang and Stephen Clark. 2008. Joint word segmentation and POS tagging using a single perceptron. In Proceedings of ACL-08: HLT. 888–896.
[50]
Yue Zhang and Stephen Clark. 2010. A fast decoder for joint word segmentation and POS-tagging using a single discriminative model. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 843–852.
[51]
Xiaoqing Zheng, Hanyang Chen, and Tianyu Xu. 2013. Deep learning for Chinese word segmentation and POS tagging. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 647–657.

Cited By

View all
  • (2023)Indian Language Analysis with XLM-RoBERTa: Enhancing Parts of Speech Tagging for Effective Natural Language Preprocessing2023 Seventh International Conference on Image Information Processing (ICIIP)10.1109/ICIIP61524.2023.10537689(850-854)Online publication date: 22-Nov-2023
  • (2022)The Comparison of Language Models with a Novel Text Filtering Approach for Turkish Sentiment AnalysisACM Transactions on Asian and Low-Resource Language Information Processing10.1145/355789222:2(1-16)Online publication date: 27-Dec-2022
  • (2022)A BiLSTM-CRF Based Approach to Word Segmentation in Chinese2022 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech)10.1109/DASC/PiCom/CBDCom/Cy55231.2022.9927991(1-4)Online publication date: 12-Sep-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing
ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 20, Issue 4
July 2021
419 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3465463
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 May 2021
Accepted: 01 November 2020
Revised: 01 July 2020
Received: 01 March 2020
Published in TALLIP Volume 20, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Burmese
  2. word segmentation
  3. POS tagging
  4. joint training
  5. BiLSTM-CRF
  6. BERT

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • Key Program of National Natural Science Foundation of China
  • National Natural Science Foundation of China
  • Key Project of Natural Science Foundation of Yunnan Province
  • Candidates of the Young and Middle Aged Academic and Technical Leaders of Yunnan Province

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)37
  • Downloads (Last 6 weeks)4
Reflects downloads up to 15 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Indian Language Analysis with XLM-RoBERTa: Enhancing Parts of Speech Tagging for Effective Natural Language Preprocessing2023 Seventh International Conference on Image Information Processing (ICIIP)10.1109/ICIIP61524.2023.10537689(850-854)Online publication date: 22-Nov-2023
  • (2022)The Comparison of Language Models with a Novel Text Filtering Approach for Turkish Sentiment AnalysisACM Transactions on Asian and Low-Resource Language Information Processing10.1145/355789222:2(1-16)Online publication date: 27-Dec-2022
  • (2022)A BiLSTM-CRF Based Approach to Word Segmentation in Chinese2022 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech)10.1109/DASC/PiCom/CBDCom/Cy55231.2022.9927991(1-4)Online publication date: 12-Sep-2022

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media