skip to main content
10.1145/3209978.3210009acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Multihop Attention Networks for Question Answer Matching

Published: 27 June 2018 Publication History

Abstract

Attention based neural network models have been successfully applied in answer selection, which is an important subtask of question answering (QA). These models often represent a question by a single vector and find its corresponding matches by attending to candidate answers. However, questions and answers might be related to each other in complicated ways which cannot be captured by single-vector representations. In this paper, we propose Multihop Attention Networks (MAN) which aim to uncover these complex relations for ranking question and answer pairs. Unlike previous models, we do not collapse the question into a single vector, instead we use multiple vectors which focus on different parts of the question for its overall semantic representation and apply multiple steps of attention to learn representations for the candidate answers. For each attention step, in addition to common attention mechanisms, we adopt sequential attention which utilizes context information for computing context-aware attention weights. Via extensive experiments, we show that MAN outperforms state-of-the-art approaches on popular benchmark QA datasets. Empirical studies confirm the effectiveness of sequential attention over other attention mechanisms.

References

[1]
Jimmy Ba and Diederik Kingma . 2015. Adam: A Method for Stochastic Optimization. In Proceedings of International Conference of Learning Representations.
[2]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio . 2015. Neural machine translation by jointly learning to align and translate Proceedings of International Conference of Learning Representations.
[3]
Sebastian Brarda, Philip Yeres, and Samuel R. Bowman . 2017. Sequential Attention: A Context-Aware Alignment Function for Machine Reading Proceedings of the 2nd Workshop on Representation Learning for NLP. 75--80.
[4]
Danqi Chen, Jason Bolton, and Christopher D. Manning . 2016. A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2358--2367.
[5]
Qin Chen, Qinmin Hu, Jimmy Xiangji Huang, Liang He, and Weijie An . 2017. Enhancing Recurrent Neural Networks with Positional Attention for Question Answering Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 993--996.
[6]
Sumit Chopra, Raia Hadsell, and Yann LeCun . 2005. Learning a Similarity Metric Discriminatively, with Application to Face Verification Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01. 539--546.
[7]
Cicero dos Santos, Ming Tan, Bing Xiang, and Bowen Zhou . 2016. Attentive pooling networks. In CoRR, abs/1602.03609.
[8]
Minwei Feng, Bing Xiang, Michael R. Glass, Lidan Wang, and Bowen Zhou . 2015. Applying deep learning to answer selection: A study and an open task Workshop on Automatic Speech Recognition and Understanding. 813--820.
[9]
Michael Heilman and Noah A. Smith . 2010 a. Tree Edit Models for Recognizing Textual Entailments, Paraphrases, and Answers to Questions. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 1011--1019.
[10]
Michael Heilman and Noah A. Smith . 2010 b. Tree Edit Models for Recognizing Textual Entailments, Paraphrases, and Answers to Questions. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 1011--1019.
[11]
Karl Moritz Hermann, Tomávs Kovciský, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom . 2015. Teaching Machines to Read and Comprehend. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1. 1693--1701.
[12]
Sepp Hochreiter and Jürgen Schmidhuber . 1997. Long Short-Term Memory. Neural Comput. (1997), 1735--1780.
[13]
Minghao Hu, Yuxing Peng, and Xipeng Qiu . 2017. Reinforced Mnemonic Reader for Machine Comprehension. arXiv preprint arXiv:1705.02798 (2017).
[14]
Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher . 2016. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing Proceedings of The 33rd International Conference on Machine Learning. 1378--1387.
[15]
Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio . 2017. A Structured Self-Attentive Sentence Embedding. In International Conference on Learning Representations 2017 (Conference Track).
[16]
Yang Liu, Chengjie Sun, Lei Lin, and Xiaolong Wang . 2016. Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention. CoRR (2016).
[17]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning . 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532--1543.
[18]
Jinfeng Rao, Hua He, and Jimmy Lin . 2016. Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 1913--1916.
[19]
Tim Rockt"aschel, Edward Grefenstette, Karl Moritz Hermann, Tomávs Kovciský, and Phil Blunsom . 2015. Reasoning about Entailment with Neural Attention. arXiv preprint arXiv:1509.06664.
[20]
Alexander M. Rush, Sumit Chopra, and Jason Weston . 2015. A Neural Attention Model for Abstractive Sentence Summarization Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 379--389.
[21]
Aliaksei Severyn and Alessandro Moschitti . 2013. Automatic Feature Engineering for Answer Selection and Extraction Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 458--467.
[22]
Aliaksei Severyn and Alessandro Moschitti . 2015. Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 373--382.
[23]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le . 2014. Sequence to Sequence Learning with Neural Networks Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. 3104--3112.
[24]
Ming Tan, Cicero dos Santos, Bing Xiang, and Bowen Zhou . 2016. Improved Representation Learning for Question Answer Matching Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 464--473.
[25]
Ming Tan, Bing Xiang, and Bowen Zhou . 2015. LSTM-based Deep Learning Models for non-factoid answer selection. CoRR Vol. abs/1511.04108 (2015).
[26]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin . 2017. Attention is All you Need. Advances in Neural Information Processing Systems 30. 6000--6010.
[27]
Bingning Wang, Kang Liu, and Jun Zhao . 2016. Inner Attention based Recurrent Neural Networks for Answer Selection Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1288--1297.
[28]
Di Wang and Eric Nyberg . 2015. A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 707--712.
[29]
Mengqiu Wang and Christopher Manning . 2010. Probabilistic Tree-Edit Models with Structured Latent Variables for Textual Entailment and Question Answering. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010). 1164--1172.
[30]
Mengqiu Wang, Noah A. Smith, and Teruko Mitamura . 2007. What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). 22--32.
[31]
Yi Yang, Wen-tau Yih, and Christopher Meek . 2015. WikiQA: A Challenge Dataset for Open-Domain Question Answering Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2013--2018.
[32]
Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch, and Peter Clark . 2013. Answer Extraction as Sequence Tagging with Tree Edit Distance Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 858--867.
[33]
Wen-tau Yih, Ming-Wei Chang, Christopher Meek, and Andrzej Pastusiak . 2013. Question Answering Using Enhanced Lexical Semantic Models Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1744--1753.
[34]
Wenpeng Yin, Hinrich Schutze, Bing Xiang, and Bowen Zhou . 2016. ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs. Transactions of the Association for Computational Linguistics (2016), 259--272.
[35]
Lei Yu, Karl M. Hermann, Phil Blunsom, and Stephen Pulman . 2014. Deep learning for answer sentence selection. In NIPS Deep Learning Workshop.

Cited By

View all
  • (2023)Multi-head attention based candidate segment selection in QA over hybrid dataIntelligent Data Analysis10.3233/IDA-22703227:6(1839-1852)Online publication date: 20-Nov-2023
  • (2023)A General Survey on Attention Mechanisms in Deep LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.312645635:4(3279-3298)Online publication date: 1-Apr-2023
  • (2023)BERT-PG: a two-branch associative feature gated filtering network for aspect sentiment classificationJournal of Intelligent Information Systems10.1007/s10844-023-00785-160:3(709-730)Online publication date: 16-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
June 2018
1509 pages
ISBN:9781450356572
DOI:10.1145/3209978
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. answer selection
  2. attention mechanism
  3. non-factoid qa
  4. representation learning

Qualifiers

  • Research-article

Conference

SIGIR '18
Sponsor:

Acceptance Rates

SIGIR '18 Paper Acceptance Rate 86 of 409 submissions, 21%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Multi-head attention based candidate segment selection in QA over hybrid dataIntelligent Data Analysis10.3233/IDA-22703227:6(1839-1852)Online publication date: 20-Nov-2023
  • (2023)A General Survey on Attention Mechanisms in Deep LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.312645635:4(3279-3298)Online publication date: 1-Apr-2023
  • (2023)BERT-PG: a two-branch associative feature gated filtering network for aspect sentiment classificationJournal of Intelligent Information Systems10.1007/s10844-023-00785-160:3(709-730)Online publication date: 16-May-2023
  • (2022)A Survey on Aspect-Based Sentiment ClassificationACM Computing Surveys10.1145/350304455:4(1-37)Online publication date: 21-Nov-2022
  • (2022)Attentive Relational State Representation in Decentralized Multiagent Reinforcement LearningIEEE Transactions on Cybernetics10.1109/TCYB.2020.297980352:1(252-264)Online publication date: Jan-2022
  • (2022)Exploring Topic Supervision with BERT for Text Matching2022 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN55064.2022.9892023(1-7)Online publication date: 18-Jul-2022
  • (2022)A Transformer-based Attention Flow Model for Intelligent Question and Answering Chatbot2022 14th International Conference on Computer Research and Development (ICCRD)10.1109/ICCRD54409.2022.9730454(167-170)Online publication date: 7-Jan-2022
  • (2022)MultiHop attention for knowledge diagnosis of mathematics examinationApplied Intelligence10.1007/s10489-022-04033-x53:9(10636-10646)Online publication date: 20-Aug-2022
  • (2022)BertHANK: hierarchical attention networks with enhanced knowledge and pre-trained model for answer selectionKnowledge and Information Systems10.1007/s10115-022-01703-764:8(2189-2213)Online publication date: 29-Jun-2022
  • (2022)Regularizing Deep Text Models by Encouraging CompetitionKnowledge Graph and Semantic Computing: Knowledge Graph Empowers the Digital Economy10.1007/978-981-19-7596-7_13(161-173)Online publication date: 19-Nov-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media