short-paper

GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning

Authors:

Ling LuoAuthors Info & Claims

SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1793 - 1796

https://doi.org/10.1145/3397271.3401250

Published: 25 July 2020 Publication History

Abstract

A chatbot that converses like a human should be goal-oriented (i.e., be purposeful in conversation), which is beyond language generation. However, existing goal-oriented dialogue systems often heavily rely on cumbersome hand-crafted rules or costly labelled datasets, which limits the applicability. In this paper, we propose Goal-oriented Chatbots (GoChat), a framework for end-to-end training the chatbot to maximize the long-term return from offline multi-turn dialogue datasets. Our framework utilizes hierarchical reinforcement learning (HRL), where the high-level policy determines some sub-goals to guide the conversation towards the final goal, and the low-level policy fulfills the sub-goals by generating the corresponding utterance for response. In our experiments conducted on a real-world dialogue dataset for anti-fraud in financial, our approach outperforms previous methods on both the quality of response generation as well as the success rate of accomplishing the goal.

Supplementary Material

MP4 File (3397271.3401250.mp4)

A representation video for SIGIR 2020. Our paper title is GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning.

Download
12.53 MB

References

[1]

He He, Derek Chen, Anusha Balakrishnan, and Percy Liang. 2018. Decoupling Strategy and Generation in Negotiation Dialogues. In EMNLP. 2333--2343.

[2]

Dongyeop Kang, Anusha Balakrishnan, Pararth Shah, Paul A Crook, Y-Lan Boureau, and Jason Weston. 2019. Recommendation as a Communication Game: Self-Supervised Bot-Play for Goal-oriented Dialogue. In EMNLP. 1951--1961.

[3]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[4]

Mike Lewis, Denis Yarats, Yann Dauphin, Devi Parikh, and Dhruv Batra. 2017. Deal or No Deal? End-to-End Learning of Negotiation Dialogues. In EMNLP. 2443--2453.

[5]

Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A Diversity-Promoting Objective Function for Neural Conversation Models. In NAACL-HLT. Association for Computational Linguistics, 110--119.

[6]

Bing Liu and Ian Lane. 2017. Iterative policy learning in end-to-end trainable task-oriented neural dialog models. In ASRU. IEEE, 482--489.

[7]

Ling Luo, Xiang Ao, Feiyang Pan, Jin Wang, Tong Zhao, Ningzi Yu, and Qing He. 2018. Beyond Polarity: Interpretable Financial Sentiment Analysis with Hierarchical Query-driven Attention. In IJCAI. 4244--4250.

[8]

Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In ICML. 1928--1937.

Digital Library

[9]

Feiyang Pan, Qingpeng Cai, Pingzhong Tang, Fuzhen Zhuang, and Qing He. 2019 a. Policy Gradients for Contextual Recommendations. In WWW. 1421--1431.

[10]

Feiyang Pan, Qingpeng Cai, An-Xiang Zeng, Chun-Xiang Pan, Qing Da, Hualin He, Qing He, and Pingzhong Tang. 2019 b. Policy optimization with model-based explorations. In AAAI, Vol. 33. 4675--4682.

Digital Library

[11]

Abdelrhman Saleh, Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, and Rosalind Picard. 2019. Hierarchical reinforcement learning for open-domain dialog. arXiv preprint arXiv:1909.07547 (2019).

[12]

Iulian V Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau. 2016. Building End-to-End Dialogue Systems Using Generative Hierarchical Neural Network Models. (2016).

[13]

Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, and Yoshua Bengio. 2017. A hierarchical latent variable encoder-decoder model for generating dialogues. In AAAI. 3295--3301.

[14]

Zhiliang Tian, Rui Yan, Lili Mou, Yiping Song, Yansong Feng, and Dongyan Zhao. 2017. How to make context more useful? an empirical study on context-aware neural conversational models. In ACL (Volume 2: Short Papers). 231--236.

[15]

Chen Xing, Yu Wu, Wei Wu, Yalou Huang, and Ming Zhou. 2018. Hierarchical recurrent attention network for response generation. In AAAI.

[16]

Zhao Yan, Nan Duan, Peng Chen, Ming Zhou, Jianshe Zhou, and Zhoujun Li. 2017. Building task-oriented dialogue systems for online shopping. In AAAI.

[17]

Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical Attention Networks for Document Classification. In NAACL. 1480--1489.

[18]

Hainan Zhang, Yanyan Lan, Liang Pang, Jiafeng Guo, and Xueqi Cheng. 2019. ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation. In ACL. 3721--3730.

Cited By

Izadi SForouzanfar M(2024)Error Correction and Adaptation in Conversational AI: A Review of Techniques and Applications in ChatbotsAI10.3390/ai50200415:2(803-841)Online publication date: 4-Jun-2024
https://doi.org/10.3390/ai5020041
Wang JLin DLi W(2024)A Target-Driven Planning Approach for Goal-Directed Dialog SystemsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.324207135:8(10475-10487)Online publication date: Aug-2024
https://doi.org/10.1109/TNNLS.2023.3242071
Tiwari AS SSaha SBhattacharyya PDhar MTiwari S(2024)Learning from Failure: Towards Developing a Disease Diagnosis Assistant That Also Learns from Unsuccessful DiagnosesCognitive Computation10.1007/s12559-024-10274-416:5(2222-2240)Online publication date: 27-Jun-2024
https://doi.org/10.1007/s12559-024-10274-4
Show More Cited By

Index Terms

GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

A Task-oriented Chatbot Based on LSTM and Reinforcement Learning
Thanks to the advancements in deep learning, chatbots are widely used in messaging applications. Undoubtedly, a chatbot is a new way of interaction between humans and machines. However, most of the chatbots act as a simple question answering system that ...
Learning cooperative persuasive dialogue policies using framing

Systems Performance for the user simulator is greatly improved by reinforcement learning.Framing is somewhat effective for the user simulator.Average rewards of system reach the minimum value with the policy where the estimated GPF reaches the highest ...
Graph Enhanced Hierarchical Reinforcement Learning for Goal-oriented Learning Path Recommendation
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Goal-oriented Learning path recommendation aims to recommend learning items (concepts or exercises) step-by-step to a learner to promote the mastery level of her specific learning goals. By formulating this task as a Markov decision process, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2020

2548 pages

ISBN:9781450380164

DOI:10.1145/3397271

General Chairs:
Jimmy Huang
York University, Canada
,
Yi Chang
Jilin University, China
,
Xueqi Cheng
Chinese Academy of Sciences, China
,
Program Chairs:
Jaap Kamps
University of Amsterdam, Netherlands
,
Vanessa Murdock
Amazon, U.S.A.
,
Ji-Rong Wen
Renmin University of China, China
,
Yiqun Liu
Tsinghua University, China

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

National Natural Science Foundation of China

Conference

SIGIR '20

Sponsor:

SIGIR

SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval

July 25 - 30, 2020

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
393
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Izadi SForouzanfar M(2024)Error Correction and Adaptation in Conversational AI: A Review of Techniques and Applications in ChatbotsAI10.3390/ai50200415:2(803-841)Online publication date: 4-Jun-2024
https://doi.org/10.3390/ai5020041
Wang JLin DLi W(2024)A Target-Driven Planning Approach for Goal-Directed Dialog SystemsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.324207135:8(10475-10487)Online publication date: Aug-2024
https://doi.org/10.1109/TNNLS.2023.3242071
Tiwari AS SSaha SBhattacharyya PDhar MTiwari S(2024)Learning from Failure: Towards Developing a Disease Diagnosis Assistant That Also Learns from Unsuccessful DiagnosesCognitive Computation10.1007/s12559-024-10274-416:5(2222-2240)Online publication date: 27-Jun-2024
https://doi.org/10.1007/s12559-024-10274-4
Li WWang XJin BZha HKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Hierarchical diffusion for offline decision makingProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619235(20035-20064)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619235
Zhu YLi YCui YZhang TWang DZhang YFeng S(2023)A Knowledge-Enhanced Hierarchical Reinforcement Learning-Based Dialogue System for Automatic Disease DiagnosisElectronics10.3390/electronics1224489612:24(4896)Online publication date: 5-Dec-2023
https://doi.org/10.3390/electronics12244896
Drozda PŻmijewski TOsowski MKrasnodębska ATalun A(2023)Goal— oriented conversational bot for employment domainTechnical Sciences10.31648/ts.933326Online publication date: 8-Nov-2023
https://doi.org/10.31648/ts.9333
Kim HYang HLee K(2023)Confident Action Decision via Hierarchical Policy Learning for Conversational RecommendationProceedings of the ACM Web Conference 202310.1145/3543507.3583536(1386-1395)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3543507.3583536
Tiwari ARaj RSaha SBhattacharyya PTiwari SDhar M(2023)Toward Symptom Assessment Guided Symptom Investigation and Disease DiagnosisIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.32368974:6(1752-1766)Online publication date: Dec-2023
https://doi.org/10.1109/TAI.2023.3236897
Tiwari ASaha SBhattacharyya P(2022)A knowledge infused context driven dialogue agent for disease diagnosis using hierarchical reinforcement learningKnowledge-Based Systems10.1016/j.knosys.2022.108292242(108292)Online publication date: Apr-2022
https://doi.org/10.1016/j.knosys.2022.108292
Jadhav HMulani AJadhav M(2022)Design and Development of Chatbot Based on Reinforcement LearningMachine Learning Algorithms for Signal and Image Processing10.1002/9781119861850.ch12(219-229)Online publication date: 18-Nov-2022
https://doi.org/10.1002/9781119861850.ch12
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents