research-article

Multihop Attention Networks for Question Answer Matching

Authors:

Nam Khanh Tran,

Claudia NiedereéeAuthors Info & Claims

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

Pages 325 - 334

https://doi.org/10.1145/3209978.3210009

Published: 27 June 2018 Publication History

Abstract

Attention based neural network models have been successfully applied in answer selection, which is an important subtask of question answering (QA). These models often represent a question by a single vector and find its corresponding matches by attending to candidate answers. However, questions and answers might be related to each other in complicated ways which cannot be captured by single-vector representations. In this paper, we propose Multihop Attention Networks (MAN) which aim to uncover these complex relations for ranking question and answer pairs. Unlike previous models, we do not collapse the question into a single vector, instead we use multiple vectors which focus on different parts of the question for its overall semantic representation and apply multiple steps of attention to learn representations for the candidate answers. For each attention step, in addition to common attention mechanisms, we adopt sequential attention which utilizes context information for computing context-aware attention weights. Via extensive experiments, we show that MAN outperforms state-of-the-art approaches on popular benchmark QA datasets. Empirical studies confirm the effectiveness of sequential attention over other attention mechanisms.

References

[1]

Jimmy Ba and Diederik Kingma . 2015. Adam: A Method for Stochastic Optimization. In Proceedings of International Conference of Learning Representations.

[2]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio . 2015. Neural machine translation by jointly learning to align and translate Proceedings of International Conference of Learning Representations.

[3]

Sebastian Brarda, Philip Yeres, and Samuel R. Bowman . 2017. Sequential Attention: A Context-Aware Alignment Function for Machine Reading Proceedings of the 2nd Workshop on Representation Learning for NLP. 75--80.

[4]

Danqi Chen, Jason Bolton, and Christopher D. Manning . 2016. A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2358--2367.

[5]

Qin Chen, Qinmin Hu, Jimmy Xiangji Huang, Liang He, and Weijie An . 2017. Enhancing Recurrent Neural Networks with Positional Attention for Question Answering Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 993--996.

Digital Library

[6]

Sumit Chopra, Raia Hadsell, and Yann LeCun . 2005. Learning a Similarity Metric Discriminatively, with Application to Face Verification Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01. 539--546.

Digital Library

[7]

Cicero dos Santos, Ming Tan, Bing Xiang, and Bowen Zhou . 2016. Attentive pooling networks. In CoRR, abs/1602.03609.

[8]

Minwei Feng, Bing Xiang, Michael R. Glass, Lidan Wang, and Bowen Zhou . 2015. Applying deep learning to answer selection: A study and an open task Workshop on Automatic Speech Recognition and Understanding. 813--820.

[9]

Michael Heilman and Noah A. Smith . 2010 a. Tree Edit Models for Recognizing Textual Entailments, Paraphrases, and Answers to Questions. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 1011--1019.

Digital Library

[10]

Michael Heilman and Noah A. Smith . 2010 b. Tree Edit Models for Recognizing Textual Entailments, Paraphrases, and Answers to Questions. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 1011--1019.

Digital Library

[11]

Karl Moritz Hermann, Tomávs Kovciský, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom . 2015. Teaching Machines to Read and Comprehend. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1. 1693--1701.

Digital Library

[12]

Sepp Hochreiter and Jürgen Schmidhuber . 1997. Long Short-Term Memory. Neural Comput. (1997), 1735--1780.

[13]

Minghao Hu, Yuxing Peng, and Xipeng Qiu . 2017. Reinforced Mnemonic Reader for Machine Comprehension. arXiv preprint arXiv:1705.02798 (2017).

[14]

Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher . 2016. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing Proceedings of The 33rd International Conference on Machine Learning. 1378--1387.

Digital Library

[15]

Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio . 2017. A Structured Self-Attentive Sentence Embedding. In International Conference on Learning Representations 2017 (Conference Track).

[16]

Yang Liu, Chengjie Sun, Lei Lin, and Xiaolong Wang . 2016. Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention. CoRR (2016).

[17]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning . 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532--1543.

[18]

Jinfeng Rao, Hua He, and Jimmy Lin . 2016. Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 1913--1916.

Digital Library

[19]

Tim Rockt"aschel, Edward Grefenstette, Karl Moritz Hermann, Tomávs Kovciský, and Phil Blunsom . 2015. Reasoning about Entailment with Neural Attention. arXiv preprint arXiv:1509.06664.

[20]

Alexander M. Rush, Sumit Chopra, and Jason Weston . 2015. A Neural Attention Model for Abstractive Sentence Summarization Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 379--389.

[21]

Aliaksei Severyn and Alessandro Moschitti . 2013. Automatic Feature Engineering for Answer Selection and Extraction Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 458--467.

[22]

Aliaksei Severyn and Alessandro Moschitti . 2015. Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 373--382.

Digital Library

[23]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le . 2014. Sequence to Sequence Learning with Neural Networks Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. 3104--3112.

Digital Library

[24]

Ming Tan, Cicero dos Santos, Bing Xiang, and Bowen Zhou . 2016. Improved Representation Learning for Question Answer Matching Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 464--473.

[25]

Ming Tan, Bing Xiang, and Bowen Zhou . 2015. LSTM-based Deep Learning Models for non-factoid answer selection. CoRR Vol. abs/1511.04108 (2015).

[26]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin . 2017. Attention is All you Need. Advances in Neural Information Processing Systems 30. 6000--6010.

[27]

Bingning Wang, Kang Liu, and Jun Zhao . 2016. Inner Attention based Recurrent Neural Networks for Answer Selection Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1288--1297.

[28]

Di Wang and Eric Nyberg . 2015. A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 707--712.

[29]

Mengqiu Wang and Christopher Manning . 2010. Probabilistic Tree-Edit Models with Structured Latent Variables for Textual Entailment and Question Answering. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010). 1164--1172.

Digital Library

[30]

Mengqiu Wang, Noah A. Smith, and Teruko Mitamura . 2007. What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). 22--32.

[31]

Yi Yang, Wen-tau Yih, and Christopher Meek . 2015. WikiQA: A Challenge Dataset for Open-Domain Question Answering Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2013--2018.

[32]

Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch, and Peter Clark . 2013. Answer Extraction as Sequence Tagging with Tree Edit Distance Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 858--867.

[33]

Wen-tau Yih, Ming-Wei Chang, Christopher Meek, and Andrzej Pastusiak . 2013. Question Answering Using Enhanced Lexical Semantic Models Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1744--1753.

[34]

Wenpeng Yin, Hinrich Schutze, Bing Xiang, and Bowen Zhou . 2016. ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs. Transactions of the Association for Computational Linguistics (2016), 259--272.

[35]

Lei Yu, Karl M. Hermann, Phil Blunsom, and Stephen Pulman . 2014. Deep learning for answer sentence selection. In NIPS Deep Learning Workshop.

Cited By

Chen QGao XGuo XWang S(2023)Multi-head attention based candidate segment selection in QA over hybrid dataIntelligent Data Analysis10.3233/IDA-22703227:6(1839-1852)Online publication date: 20-Nov-2023
https://doi.org/10.3233/IDA-227032
Brauwers GFrasincar F(2023)A General Survey on Attention Mechanisms in Deep LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.312645635:4(3279-3298)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TKDE.2021.3126456
Wang JWu WRen J(2023)BERT-PG: a two-branch associative feature gated filtering network for aspect sentiment classificationJournal of Intelligent Information Systems10.1007/s10844-023-00785-160:3(709-730)Online publication date: 16-May-2023
https://doi.org/10.1007/s10844-023-00785-1
Show More Cited By

Index Terms

Multihop Attention Networks for Question Answer Matching
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Question answering

Recommendations

User Context-Aware Attention Networks for Answer Selection
Web Information Systems Engineering – WISE 2023
Abstract
Answer selection aims to find the most appropriate answer from a set of candidate answers, playing an increasingly important role in Community-based Question Answering. However, existing studies overlook the correlation among historical answers of ...
BERTDAN: Question-Answer Dual Attention Fusion Networks with Pre-trained Models for Answer Selection
Neural Information Processing
Abstract
Community question answering (CQA) becomes more and more popular in both academy and industry recently. However, a large number of answers often amass in question-answering communities. Hence, it is almost impossible for users to view item by item ...
Probabilistic models for answer-ranking in multilingual question-answering

This article presents two probabilistic models for answering ranking in the multilingual question-answering (QA) task, which finds exact answers to a natural language question written in different languages. Although some probabilistic methods have been ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

June 2018

1509 pages

ISBN:9781450356572

DOI:10.1145/3209978

General Chairs:
Kevyn Collins-Thompson
University of Michigan, United States
,
Qiaozhu Mei
University of Michigan, United States
,
Program Chairs:
Brian Davison
Lehigh University, United States
,
Yiqun Liu
Tsinghua University, China
,
Emine Yilmaz
University College London, United Kingdom

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '18

Sponsor:

SIGIR

SIGIR '18: The 41st International ACM SIGIR conference on research and development in Information Retrieval

July 8 - 12, 2018

MI, Ann Arbor, USA

Acceptance Rates

SIGIR '18 Paper Acceptance Rate 86 of 409 submissions, 21%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
1,795
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)1

Reflects downloads up to 14 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen QGao XGuo XWang S(2023)Multi-head attention based candidate segment selection in QA over hybrid dataIntelligent Data Analysis10.3233/IDA-22703227:6(1839-1852)Online publication date: 20-Nov-2023
https://doi.org/10.3233/IDA-227032
Brauwers GFrasincar F(2023)A General Survey on Attention Mechanisms in Deep LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.312645635:4(3279-3298)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TKDE.2021.3126456
Wang JWu WRen J(2023)BERT-PG: a two-branch associative feature gated filtering network for aspect sentiment classificationJournal of Intelligent Information Systems10.1007/s10844-023-00785-160:3(709-730)Online publication date: 16-May-2023
https://doi.org/10.1007/s10844-023-00785-1
Brauwers GFrasincar F(2022)A Survey on Aspect-Based Sentiment ClassificationACM Computing Surveys10.1145/350304455:4(1-37)Online publication date: 21-Nov-2022
https://dl.acm.org/doi/10.1145/3503044
Liu XTan Y(2022)Attentive Relational State Representation in Decentralized Multiagent Reinforcement LearningIEEE Transactions on Cybernetics10.1109/TCYB.2020.297980352:1(252-264)Online publication date: Jan-2022
https://doi.org/10.1109/TCYB.2020.2979803
Wang YXu MYan YZhao TChen YYang J(2022)Exploring Topic Supervision with BERT for Text Matching2022 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN55064.2022.9892023(1-7)Online publication date: 18-Jul-2022
https://doi.org/10.1109/IJCNN55064.2022.9892023
Xiao Y(2022)A Transformer-based Attention Flow Model for Intelligent Question and Answering Chatbot2022 14th International Conference on Computer Research and Development (ICCRD)10.1109/ICCRD54409.2022.9730454(167-170)Online publication date: 7-Jan-2022
https://doi.org/10.1109/ICCRD54409.2022.9730454
He XZhang TZhang G(2022)MultiHop attention for knowledge diagnosis of mathematics examinationApplied Intelligence10.1007/s10489-022-04033-x53:9(10636-10646)Online publication date: 20-Aug-2022
https://doi.org/10.1007/s10489-022-04033-x
Yang HZhao XWang YSun DChen WHuang W(2022)BertHANK: hierarchical attention networks with enhanced knowledge and pre-trained model for answer selectionKnowledge and Information Systems10.1007/s10115-022-01703-764:8(2189-2213)Online publication date: 29-Jun-2022
https://doi.org/10.1007/s10115-022-01703-7
Li JZhang RTian Y(2022)Regularizing Deep Text Models by Encouraging CompetitionKnowledge Graph and Semantic Computing: Knowledge Graph Empowers the Digital Economy10.1007/978-981-19-7596-7_13(161-173)Online publication date: 19-Nov-2022
https://doi.org/10.1007/978-981-19-7596-7_13
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents