skip to main content
10.1145/3397271.3401250acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning

Published: 25 July 2020 Publication History

Abstract

A chatbot that converses like a human should be goal-oriented (i.e., be purposeful in conversation), which is beyond language generation. However, existing goal-oriented dialogue systems often heavily rely on cumbersome hand-crafted rules or costly labelled datasets, which limits the applicability. In this paper, we propose Goal-oriented Chatbots (GoChat), a framework for end-to-end training the chatbot to maximize the long-term return from offline multi-turn dialogue datasets. Our framework utilizes hierarchical reinforcement learning (HRL), where the high-level policy determines some sub-goals to guide the conversation towards the final goal, and the low-level policy fulfills the sub-goals by generating the corresponding utterance for response. In our experiments conducted on a real-world dialogue dataset for anti-fraud in financial, our approach outperforms previous methods on both the quality of response generation as well as the success rate of accomplishing the goal.

Supplementary Material

MP4 File (3397271.3401250.mp4)
A representation video for SIGIR 2020. Our paper title is GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning.

References

[1]
He He, Derek Chen, Anusha Balakrishnan, and Percy Liang. 2018. Decoupling Strategy and Generation in Negotiation Dialogues. In EMNLP. 2333--2343.
[2]
Dongyeop Kang, Anusha Balakrishnan, Pararth Shah, Paul A Crook, Y-Lan Boureau, and Jason Weston. 2019. Recommendation as a Communication Game: Self-Supervised Bot-Play for Goal-oriented Dialogue. In EMNLP. 1951--1961.
[3]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[4]
Mike Lewis, Denis Yarats, Yann Dauphin, Devi Parikh, and Dhruv Batra. 2017. Deal or No Deal? End-to-End Learning of Negotiation Dialogues. In EMNLP. 2443--2453.
[5]
Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A Diversity-Promoting Objective Function for Neural Conversation Models. In NAACL-HLT. Association for Computational Linguistics, 110--119.
[6]
Bing Liu and Ian Lane. 2017. Iterative policy learning in end-to-end trainable task-oriented neural dialog models. In ASRU. IEEE, 482--489.
[7]
Ling Luo, Xiang Ao, Feiyang Pan, Jin Wang, Tong Zhao, Ningzi Yu, and Qing He. 2018. Beyond Polarity: Interpretable Financial Sentiment Analysis with Hierarchical Query-driven Attention. In IJCAI. 4244--4250.
[8]
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In ICML. 1928--1937.
[9]
Feiyang Pan, Qingpeng Cai, Pingzhong Tang, Fuzhen Zhuang, and Qing He. 2019 a. Policy Gradients for Contextual Recommendations. In WWW. 1421--1431.
[10]
Feiyang Pan, Qingpeng Cai, An-Xiang Zeng, Chun-Xiang Pan, Qing Da, Hualin He, Qing He, and Pingzhong Tang. 2019 b. Policy optimization with model-based explorations. In AAAI, Vol. 33. 4675--4682.
[11]
Abdelrhman Saleh, Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, and Rosalind Picard. 2019. Hierarchical reinforcement learning for open-domain dialog. arXiv preprint arXiv:1909.07547 (2019).
[12]
Iulian V Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau. 2016. Building End-to-End Dialogue Systems Using Generative Hierarchical Neural Network Models. (2016).
[13]
Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, and Yoshua Bengio. 2017. A hierarchical latent variable encoder-decoder model for generating dialogues. In AAAI. 3295--3301.
[14]
Zhiliang Tian, Rui Yan, Lili Mou, Yiping Song, Yansong Feng, and Dongyan Zhao. 2017. How to make context more useful? an empirical study on context-aware neural conversational models. In ACL (Volume 2: Short Papers). 231--236.
[15]
Chen Xing, Yu Wu, Wei Wu, Yalou Huang, and Ming Zhou. 2018. Hierarchical recurrent attention network for response generation. In AAAI.
[16]
Zhao Yan, Nan Duan, Peng Chen, Ming Zhou, Jianshe Zhou, and Zhoujun Li. 2017. Building task-oriented dialogue systems for online shopping. In AAAI.
[17]
Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical Attention Networks for Document Classification. In NAACL. 1480--1489.
[18]
Hainan Zhang, Yanyan Lan, Liang Pang, Jiafeng Guo, and Xueqi Cheng. 2019. ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation. In ACL. 3721--3730.

Cited By

View all
  • (2024)Error Correction and Adaptation in Conversational AI: A Review of Techniques and Applications in ChatbotsAI10.3390/ai50200415:2(803-841)Online publication date: 4-Jun-2024
  • (2024)A Target-Driven Planning Approach for Goal-Directed Dialog SystemsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.324207135:8(10475-10487)Online publication date: Aug-2024
  • (2024)Learning from Failure: Towards Developing a Disease Diagnosis Assistant That Also Learns from Unsuccessful DiagnosesCognitive Computation10.1007/s12559-024-10274-416:5(2222-2240)Online publication date: 27-Jun-2024
  • Show More Cited By

Index Terms

  1. GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2020
    2548 pages
    ISBN:9781450380164
    DOI:10.1145/3397271
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 July 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dialogue system
    2. goal-oriented chatbot
    3. reinforcement learning

    Qualifiers

    • Short-paper

    Funding Sources

    • National Natural Science Foundation of China

    Conference

    SIGIR '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)48
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Error Correction and Adaptation in Conversational AI: A Review of Techniques and Applications in ChatbotsAI10.3390/ai50200415:2(803-841)Online publication date: 4-Jun-2024
    • (2024)A Target-Driven Planning Approach for Goal-Directed Dialog SystemsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.324207135:8(10475-10487)Online publication date: Aug-2024
    • (2024)Learning from Failure: Towards Developing a Disease Diagnosis Assistant That Also Learns from Unsuccessful DiagnosesCognitive Computation10.1007/s12559-024-10274-416:5(2222-2240)Online publication date: 27-Jun-2024
    • (2023)Hierarchical diffusion for offline decision makingProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619235(20035-20064)Online publication date: 23-Jul-2023
    • (2023)A Knowledge-Enhanced Hierarchical Reinforcement Learning-Based Dialogue System for Automatic Disease DiagnosisElectronics10.3390/electronics1224489612:24(4896)Online publication date: 5-Dec-2023
    • (2023)Goal— oriented conversational bot for employment domainTechnical Sciences10.31648/ts.933326Online publication date: 8-Nov-2023
    • (2023)Confident Action Decision via Hierarchical Policy Learning for Conversational RecommendationProceedings of the ACM Web Conference 202310.1145/3543507.3583536(1386-1395)Online publication date: 30-Apr-2023
    • (2023)Toward Symptom Assessment Guided Symptom Investigation and Disease DiagnosisIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.32368974:6(1752-1766)Online publication date: Dec-2023
    • (2022)A knowledge infused context driven dialogue agent for disease diagnosis using hierarchical reinforcement learningKnowledge-Based Systems10.1016/j.knosys.2022.108292242(108292)Online publication date: Apr-2022
    • (2022)Design and Development of Chatbot Based on Reinforcement LearningMachine Learning Algorithms for Signal and Image Processing10.1002/9781119861850.ch12(219-229)Online publication date: 18-Nov-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media