skip to main content
research-article

Integrating Multiple Dependency Corpora for Inducing Wide-Coverage Japanese CCG Resources

Published: 30 January 2015 Publication History

Abstract

A novel method to induce wide-coverage Combinatory Categorial Grammar (CCG) resources for Japanese is proposed in this article. For some languages including English, the availability of large annotated corpora and the development of data-based induction of lexicalized grammar have enabled deep parsing, i.e., parsing based on lexicalized grammars. However, deep parsing for Japanese has not been widely studied. This is mainly because most Japanese syntactic resources are represented in chunk-based dependency structures, while previous methods for inducing grammars are dependent on tree corpora. To translate syntactic information presented in chunk-based dependencies to phrase structures as accurately as possible, integration of annotation from multiple dependency-based corpora is proposed. Our method first integrates dependency structures and predicate-argument information and converts them into phrase structure trees. The trees are then transformed into CCG derivations in a similar way to previously proposed methods. The quality of the conversion is empirically evaluated in terms of the coverage of the obtained CCG lexicon and the accuracy of the parsing with the grammar. While the transforming process used in this study is specialized for Japanese, the framework of our method would be applicable to other languages for which dependency-based analysis has been regarded as more appropriate than phrase structure-based analysis due to morphosyntactic features.

References

[1]
Bharat Ram Ambati, Tejaswini Deoskar, and Mark Steedman. 2013. Using CCG categories to improve Hindi dependency parsing. In Proceedings of ACL. 604--609.
[2]
Daisuke Bekki. 2010. Formal Theory of Japanese Syntax. Kuroshio Shuppan (in Japanese).
[3]
Johan Bos. 2007. Recognising textual entailment and computational semantics. In Proceedings of the 7th International Workshop on Computational Semantics (IWCS).
[4]
Johan Bos, Cristina Bosco, and Alessandro Mazzei. 2009. Converting adependency treebank to a categorical grammar treebank for Italian. In Proceedings of the 8th International Workshop on Treebanks and Linguistic Theories (TLT). 27--38.
[5]
Johan Bos, Stephen Clark, Mark Steedman, James R. Curran, and Julia Hockenmaier. 2004. Wide-coverage semantic representations from a CCG parser. In Proceedings of COLING. 1240--1246.
[6]
Ruken Çakıcı. 2005. Automatic induction of a CCG grammar for Turkish. In Proceedings of the ACL Student Research Workshop. 73--78.
[7]
Stephen Clark and James R. Curran. 2007. Wide-coverage efficient statistical parsing with CCG and log-linear models. Computational Linguist. 33, 4.
[8]
Takao Gunji. 1987. Japanese Phrase Structure Grammar: A Unification-Based Approach. D. Reidel.
[9]
Hiroki Hanaoka, Hideki Mima, and Jun’ichi Tsujii. 2010. A Japanese particle corpus built by example-based annotation. In Proceedings of LREC.
[10]
Yuta Hayashibe, Mamoru Komachi, and Yuji Matsumoto. 2011. Japanese predicate argument structure analysis exploiting argument position and type. In Proceedings of IJCNLP. 201--209.
[11]
Julia Hockenmaier. 2006. Creating a CCGbank and a wide-coverage CCG lexicon for German. In Proceedings of the Joint Conference of COLING/ACL.
[12]
Julia Hockenmaier and Mark Steedman. 2007. CCGbank: A corpus of CCG derivations and dependency structures extracted from the Penn Treebank. Computational Linguist. 33, 3, 355--396.
[13]
Ryu Iida, Mamoru Komachi, Kentaro Inui, and Yuji Matsumoto. 2007. Annotating a Japanese text corpus with predicate-argument and coreference relations. In Proceedings of the Linguistic Annotation Workshop. 132--139.
[14]
Ryu Iida and Massimo Poesio. 2011. A cross-lingual ILP solution to zero anaphora resolution. In Proceedings of ACL-HLT. 804--813.
[15]
Hans Kamp and Uwe Reyle. 1993. From Discourse to Logic. Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Kluwer, Dordrecht, The Netherlands.
[16]
Daisuke Kawahara and Sadao Kurohashi. 2011. Generative modeling of coordination by factoring parallelism and selectional preferences. In Proceedings of IJCNLP 2011.
[17]
Daisuke Kawahara, Sadao Kurohashi, and Koiti Hasida. 2002. Construction of a Japanese relevance-tagged corpus. In Proceedings of the 8th Annual Meeting of the Association for Natural Language Processing. 495--498 (in Japanese).
[18]
Nobo Komagata. 1999. Information structure in texts: A computational analysis of contextual appropriateness in English and Japanese, Ph.D. Dissertation, University of Pennsylvania.
[19]
Taku Kudo and Yuji Matsumoto. 2002. Japanese dependency analysis using cascaded chunking. In Proceedings of CoNLL.
[20]
Sadao Kurohashi and Makoto Nagao. 2003. Building a Japanese parsed corpus. In Treebanks, Anne Abeillé (Ed.), Text, Speech and Language Technology, Vol. 20, Springer, The Netherlands, 249--260.
[21]
M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguist. 19, 2, 313--330.
[22]
Takashi Masuoka and Yukinori Takubo. 1989. Basic Japanese Grammar. Kuroshio Publishing, Tokyo (in Japanese).
[23]
Yusuke Miyao and Jun’ichi Tsujii. 2008. Feature forest models for probabilistic HPSG parsing. Computational Linguist. 34, 1, 35--80.
[24]
Martha Palmer, Daniel Gildea, and Paul Kingsbury. 2005. The Proposition Bank: An annotated corpus of semantic roles. Computational Linguist. 31, 1, 71--106.
[25]
Ivan A. Sag, Thomas Wasow, and Emily M. Bender. 2003. Syntactic Theory: A Formal Introduction. 2nd Ed. CSLI Publications.
[26]
Ryohei Sasano and Sadao Kurohashi. 2011. A discriminative approach to Japanese zero anaphora resolution with large-scale lexicalized case frames. In Proceedings of IJCNLP.
[27]
Manabu Sassano and Sadao Kurohashi. 2009. A unified single scan algorithm for Japanese base phrase chunking and dependency parsing. In Proceedings of ACL-IJCNLP.
[28]
Melanie Siegel and Emily M. Bender. 2002. Efficient deep processing of Japanese. In Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization.
[29]
Mark Steedman. 2001. The Syntactic Process. MIT Press.
[30]
Sumire Uematsu, Takuya Matsuzaki, Hiroki Hanaoka, Yusuke Miyao, and Hideki Mima. 2013. Integrating multiple dependency corpora for inducing wide-coverage Japanese CCG resources. In Proceedings of ACL. 1042--1051.
[31]
David Vadas and James Curran. 2007. Adding noun phrase structure to the Penn Treebank. In Proceedings of ACL. 240--247.
[32]
Emiko Yamada, Eiji Aramaki, Takeshi Imai, and Kazuhiko Ohe. 2010. Internal structure of a disease name and its application for ICD coding. Studies in Health Technol. Informatics 160, 2, 1010--1014.
[33]
Kazuhiro Yoshida. 2005. Corpus-oriented development of Japanese HPSG parsers. In Proceedings of the ACL Student Research Workshop.

Cited By

View all
  • (2022)Reducing Syntactic Complexity for Information Extraction from Japanese Requirement Specifications2022 29th Asia-Pacific Software Engineering Conference (APSEC)10.1109/APSEC57359.2022.00051(387-396)Online publication date: Dec-2022
  • (2020)Translate Japanese into Formal Languages with an Enhanced Generalization AlgorithmIntelligent Computing10.1007/978-3-030-52246-9_47(638-655)Online publication date: 4-Jul-2020
  • (2018)Hindi CCGbankLanguage Resources and Evaluation10.1007/s10579-017-9379-652:1(67-100)Online publication date: 17-Dec-2018

Index Terms

  1. Integrating Multiple Dependency Corpora for Inducing Wide-Coverage Japanese CCG Resources

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 14, Issue 1
    January 2015
    83 pages
    ISSN:2375-4699
    EISSN:2375-4702
    DOI:10.1145/2730923
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 January 2015
    Accepted: 01 July 2014
    Revised: 01 March 2014
    Received: 01 December 2013
    Published in TALLIP Volume 14, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Combinatory Categorial Grammar
    2. Japanese parsing
    3. dependency annotation
    4. grammar development

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 14 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Reducing Syntactic Complexity for Information Extraction from Japanese Requirement Specifications2022 29th Asia-Pacific Software Engineering Conference (APSEC)10.1109/APSEC57359.2022.00051(387-396)Online publication date: Dec-2022
    • (2020)Translate Japanese into Formal Languages with an Enhanced Generalization AlgorithmIntelligent Computing10.1007/978-3-030-52246-9_47(638-655)Online publication date: 4-Jul-2020
    • (2018)Hindi CCGbankLanguage Resources and Evaluation10.1007/s10579-017-9379-652:1(67-100)Online publication date: 17-Dec-2018

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media