research-article

Open access

Who's next?: Integrating Non-Verbal Turn-Taking Cues for Embodied Conversational Agents

Authors:

Jonathan Ehret,

Andrea Bönsch,

Patrick Nossol,

Cosima A. Ermert,

Chinthusa Mohanathasan,

Sabine J. Schlittmeier,

Torsten W. KuhlenAuthors Info & Claims

IVA '23: Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents

Article No.: 27, Pages 1 - 8

https://doi.org/10.1145/3570945.3607312

Published: 22 December 2023 Publication History

Abstract

Taking turns in a conversation is a delicate interplay of various signals, which we as humans can easily decipher. Embodied conversational agents (ECAs) communicating with humans should leverage this ability for smooth and enjoyable conversations. Extensive research has analyzed human turn-taking cues, and attempts have been made to predict turn-taking based on observed cues. These cues vary from prosodic, semantic, and syntactic modulation over adapted gesture and gaze behavior to actively used respiration. However, when generating such behavior for social robots or ECAs, often only single modalities were considered, e.g., gazing. We strive to design a comprehensive system that produces cues for all non-verbal modalities: gestures, gaze, and breathing. The system provides valuable cues without requiring speech content adaptation. We evaluated our system in a VR-based user study with N = 32 participants executing two subsequent tasks. First, we asked them to listen to two ECAs taking turns in several conversations. Second, participants engaged in taking turns with one of the ECAs directly. We examined the system's usability and the perceived social presence of the ECAs' turn-taking behavior, both with respect to each individual non-verbal modality and their interplay. While we found effects of gesture manipulation in interactions with the ECAs, no effects on social presence were found.

References

[1]

Sean Andrist, Bilge Mutlu, and Michael Gleicher. 2013. Conversational Gaze Aversion for Virtual Agents. In Int. Workshop on Intelligent Virtual Agents. 249--262. https://link.springer.com/chapter/10.1007/978-3-642-40415-3_22

[2]

Christoph Bartneck, Dana Kulic, Elizabeth Croft, and Susana Zoghbi. 2009. Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots. Int J Soc Robot 1 (2009), 71--81. https://doi.org/10.1007/s12369-008-0001-3

[3]

Andrea Bönsch, Alexander R. Bluhm, Jonathan Ehret, and Torsten W. Kuhlen. 2020. Inferring a User's Intent on Joining or Passing by Social Groups. Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, IVA 2020 (2020). https://doi.org/10.1145/3383652.3423862

Digital Library

[4]

Angelo Cafaro, Nadine Glas, and Catherine Pelachaud. 2016. The Effects of Interrupting Behavior on Interpersonal Attitude and Engagement in Dyadic Interactions. In Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 911--920. https://dl.acm.org/citation.cfm?id=2937059

[5]

Justine Cassell. 2000. Embodied conversational interface agents. Commun. ACM (2000), 70--78. https://doi.org/10.1145/332051.332075

Digital Library

[6]

Justine Cassell, Yukiko I Nakano, Timothy W Bickmore, Candace L Sidner, and Charles Rich. 2001. Non-Verbal Cues for Discourse Structure. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics. 114--123. https://www.aclweb.org/anthology/P01- 1016.pdf

Digital Library

[7]

Ferdinand de Coninck, Zerrin Yumak, Guntur Sandino, and Remco Veltkamp. 2019. Non-verbal Behavior Generation for Virtual Characters in Group Conversations. In 2019 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR). https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8942389&tag=1

[8]

David Devault, Johnathan Mell, and Jonathan Gratch. 2015. Toward Natural Turn-Taking in a Virtual Human Negotiation Agent. In AAAI Spring Symposium on Turn-taking and Coordination in Human-Machine Interaction. https://www.aaai.org/ocs/index.php/SSS/SSS15/paper/viewFile/10335/10100

[9]

Yu Ding, Yuting Zhang, Meihua Xiao, and Zhigang Deng. 2017. A Multifaceted Study on Eye Contact based Speaker Identification in Three-party Conversations. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, 3011--3021. https://doi.org/10.1145/3025453.3025644

Digital Library

[10]

Starkey Duncan Jr. 1974. On the structure of speaker-auditor interaction during speaking turns. Language in Socienty 2 (1974), 161--180. https://www.cambridge.org/core/services/aop-cambridge-core/content/view/S0047404500004322

[11]

Jens Edlund and Jonas Beskow. 2009. Mushypeek: A framework for online investigation of audiovisual dialogue phenomena. Language and Speech 52, 2-3 (2009), 351--367. https://doi.org/10.1177/0023830909103179

[12]

Jonathan Ehret, Andrea Bönsch, Lukas Aspöck, Christine T. Röhr, Stefan Baumann, Martine Grice, Janina Fels, and Torsten W. Kuhlen. 2021. Do Prosody and Embodiment Influence the Perceived Naturalness of Conversational Agents' Speech? ACM Transactions on Applied Perception 18, 4 (2021), 21:1--15. https://doi.org/10.1145/3486580

Digital Library

[13]

Jonathan Ehret, Jonas Stienen, Chris Brozdowski, Andrea Bönsch, Irene Mittelberg, Michael Vorländer, and Torsten W. Kuhlen. 2020. Evaluating the Influence of Phoneme-Dependent Dynamic Speaker Directivity of Embodied Conversational Agents' Speech. In Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, IVA 2020. Association for Computing Machinery, Inc. https://doi.org/10.1145/3383652.3423863

Digital Library

[14]

Cosima Antonia Ermert, Chinthusa Mohanathasan, Jonathan Ehret, Sabine Janina Schlittmeier, Torsten W. Kuhlen, and Janina Fels. 2022. AuViST - An Audio-Visual Speech and Text Database for the Heard-Text-Recall Paradigm. https://doi.org/10.18154/RWTH-2023-05543

[15]

Siska Fitrianie, Merijn Bruijnes, Fengxiang Li, Amal Abdulrahman, and Willem-Paul Brinkman. 2022. The Artificial-Social-Agent Questionnaire: Establishing the long and short questionnaire versions. In Proceedings of the 22nd ACM International Conference on Intelligent Virtual Agents. https://doi.org/10.1145/3514197

Digital Library

[16]

Sarah Gillet, Ronald Cumbal, André Pereira, José Lopes, Olov Engwall, and Iolanda Leite. 2021. Robot Gaze Can Mediate Participation Imbalance in Groups with Diferent Skill Levels. In Proceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction (HRI '21). 303--311. https://doi.org/10.1145/3434073.3444670

Digital Library

[17]

Ific Goude, Alexandre Bruckert, Anne-Helene Olivier, Julien Pettre, Remi Cozot, Kadi Bouatouch, Marc Christie, and Ludovic Hoyet. 2023. Real-time Multi-map Saliency-driven Gaze Behavior for Non-conversational Characters. IEEE Transactions on Visualization and Computer Graphics (2023), 1--13. https://doi.org/10.1109/TVCG.2023.3244679

Digital Library

[18]

Dirk Heylen. 2006. Head gestures, gaze and the principles of conversational structure. International Journal of Humanoid Robotics 3, 3 (2006), 241--267. https://www.worldscientific.com/doi/abs/10.1142/S0219843606000746

[19]

Judith Holler and Kobin H Kendrick. 2015. Unaddressed participants' gaze in multi-person interaction: optimizing recipiency. Frontiers in Psychology 6 (2015), 76--89. https://doi.org/10.3389/978-2-88919-825-2

[20]

Chien-Ming Huang, Sean Andrist, Allison Sauppé, and Bilge Mutlu. 2015. Using gaze patterns to predict task intent in collaboration. Frontiers in Psychology 6 (2015), 1049. https://doi.org/10.3389/fpsyg.2015.01049

[21]

Ryo Ishii, Shiro Kumano, and Kazuhiro Otsuka. 2015. Multimodal fusion using respiration and gaze for predicting next speaker in multi-party meetings. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction. 99--106. https://doi.org/10.1145/2818346.2820755

Digital Library

[22]

Ryo Ishii, Kazuhiro Otsuka, Shiro Kumano, and Junji Yamato. 2016. Prediction of who will be the next speaker and when using gaze behavior in multiparty meetings. ACM Trans. Interact. Intell. Syst. 6, 1 (2016), 4:1--33. https://doi.org/10.1145/2757284

Digital Library

[23]

Ryo Ishii, Xutong Ren, Michal Muszynski, and Louis-Philippe Morency. 2021. Multimodal and Multitask Approach to Listener's Backchannel Prediction: Can Prediction of Turn-changing and Turn-management Willingness Improve Backchannel Modeling?. In Proceedings of the 21th ACM International Conference on Intelligent Virtual Agents. 131--138. https://doi.org/10.1145/3472306.3478360

Digital Library

[24]

Kristiina Jokinen, Hirohisa Furukawa, Masafumi Nishida, and Seiichi Yamamoto. 2013. Gaze and Turn-Taking Behavior in Casual Conversational Interactions. ACM Trans. Interact. Intell. Syst 3, 2 (2013), 12:1--30. https://doi.org/10.1145/2499474.2499481

Digital Library

[25]

Adam Kendon and Mark Cook. 1969. The consistency of gaze patterns in social interaction. British Journal of Psychology 60, 4 (1969), 481--494. https://doi.org/10.1111/J.2044-8295.1969.TB01222.X

[26]

Sooha Park Lee, Jeremy B. Badler, and Norman I. Badler. 2002. Eyes alive. In Proceedings of the 29th annual conference on Computer graphics and interactive techniques - SIGGRAPH '02. 637. https://doi.org/10.1145/566570.566629

Digital Library

[27]

Bilge Mutlu, Takayuki Kanda, Jodi Forlizzi, Jessica Hodgins, and Hiroshi Ishiguro. 2012. Conversational gaze mechanisms for humanlike robots. ACM Transactions on Interactive Intelligent Systems 1, 2 (2012), 12:1--33. https://doi.org/10.1145/2070719.2070725

Digital Library

[28]

Catharine Oertel, Marcin Włodarczak, Jens Edlund, Petra Wagner, and Joakim Gustafson. 2012. Gaze Patterns in Turn-Taking. In INTERSPEECH 2012. 2246--2246. https://www.isca-speech.org/archive_v0/interspeech_2012/i12_2246.html

[29]

Tomislav Pejsa, Sean Andrist, Michael Gleicher, and Bilge Mutlu. 2015. Gaze and attention management for embodied conversational agents. ACM Trans. Interact. Intell. Syst 5, 1 (2015), 3:1--34. https://doi.org/10.1145/2724731

Digital Library

[30]

Brian Ravenet, Angelo Cafaro, Beatrice Biancardi, Magalie Ochs, and Catherine Pelachaud. 2015. Conversational behavior reflecting interpersonal attitudes in small group interactions. In International Conference on Intelligent Virtual Agents. 375--388. https://link.springer.com/chapter/10.1007/978-3-319-21996-7_41

[31]

Rutger Rienks, Ronald Poppe, and Dirk Heylen. 2010. Differences in Head Orientation Behavior for Speakers and Listeners: An Experiment in a Virtual Environment. ACM Trans. Appl. Percept 7, 2 (2010), 2:1--13. https://doi.org/10.1145/1658349.1658351

Digital Library

[32]

Carina Riest, Annett B. Jorschick, and Jan P. de Ruiter. 2015. Anticipation in turn-taking: mechanisms and information sources. Frontiers in Psychology 6 (2015), 62--75. https://doi.org/10.3389/978-2-88919-825-2

[33]

Matthew Roddy, Gabriel Skantze, and Naomi Harte. 2018. Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs. In International Conference on Multimodal Interaction. 186--190. https://doi.org/10.1145/3242969.3242997

Digital Library

[34]

K. Ruhland, C. E. Peters, S. Andrist, J. B. Badler, N. I. Badler, M. Gleicher, B. Mutlu, and R. McDonnell. 2015. A Review of Eye Gaze in Virtual Agents, Social Robotics and HCI: Behaviour Generation, User Interaction and Perception. Computer Graphics Forum 34, 6 (2015), 299--326. https://doi.org/10.1111/CGF.12603

Digital Library

[35]

Sabine Janina Schlittmeier, Chinthusa Mohanathasan, Isabel Sarah Schiller, and Andreas Liebl. 2023. Measuring text comprehension and memory: A comprehensive database for Heard Text Recall (HTR) and Read Text Recall (RTR) paradigms, with optional note-taking and graphical displays., 7 pages. https://doi.org/10.18154/RWTH-2023-05285

[36]

Ludwig Sidenmark and Hans Gellersen. 2019. Eye, Head and Torso Coordination During Gaze Shifts in Virtual Reality. ACM Trans. Comput.-Hum. Interact 27, 1 (2019), 4:1--40. https://doi.org/10.1145/3361218

Digital Library

[37]

Gabriel Skantze. 2021. Turn-taking in Conversational Systems and Human-Robot Interaction: A Review. Computer Speech and Language 67, 101178 (2021). https://doi.org/10.1016/J.CSL.2020.101178

[38]

Laura C. Trutoiu, Elizabeth J. Carter, Iain Matthews, and Jessica K. Hodgins. 2011. Modeling and animating eye blinks. ACM Transactions on Applied Perception 8, 3 (2011), 17:1--17. https://doi.org/10.1145/2010325.2010327

Digital Library

[39]

Astrid M. Von Der Pütten, Nicole C. Krämer, Jonathan Gratch, and Sin Hwa Kang. 2010. "It doesn't matter what you are!" Explaining social effects of agents and avatars. Computers in Human Behavior 26, 6 (nov 2010), 1641--1650. https://doi.org/10.1016/J.CHB.2010.06.012

Digital Library

[40]

Petra Wagner, Zofia Malisz, and Stefan Kopp. 2014. Gesture and speech in interaction: An overview. Speech Communication 57 (2014), 209--232. https://doi.org/10.1016/j.specom.2013.09.008

Digital Library

[41]

Margaret Zellers, David House, and Simon Alexanderson. 2016. Prosody and hand gesture at turn boundaries in Swedish. In Speech Prosody. 831--835. https://doi.org/10.21437/SpeechProsody.2016-170

Cited By

Chen JGu CZhang JLiu ZKonomi S(2024)Sensing the Intentions to Speak in VR Group DiscussionsSensors10.3390/s2402036224:2(362)Online publication date: 7-Jan-2024
https://doi.org/10.3390/s24020362
Ehret JBönsch AFels JSchlittmeier SKuhlen T(2024)StudyFramework: Comfortably Setting up and Conducting Factorial-Design Studies Using the Unreal Engine2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)10.1109/VRW62533.2024.00087(442-449)Online publication date: 16-Mar-2024
https://doi.org/10.1109/VRW62533.2024.00087
Ehret JBönsch ASchiller IBreuer CAspöck LFels JSchlittmeier SKuhlen T(2024)Audiovisual Coherence: Is Embodiment of Background Noise Sources a Necessity?2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)10.1109/VRW62533.2024.00017(61-67)Online publication date: 16-Mar-2024
https://doi.org/10.1109/VRW62533.2024.00017
Show More Cited By

Index Terms

Who's next?: Integrating Non-Verbal Turn-Taking Cues for Embodied Conversational Agents
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Intelligent agents
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods
      1. User studies
    2. Interaction paradigms
      1. Natural language interfaces
      2. Virtual reality

Recommendations

Conversational gaze mechanisms for humanlike robots

During conversations, speakers employ a number of verbal and nonverbal mechanisms to establish who participates in the conversation, when, and in what capacity. Gaze cues and mechanisms are particularly instrumental in establishing the participant roles ...
Facilitating multiparty dialog with gaze, gesture, and speech
ICMI-MLMI '10: International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction

We study how synchronized gaze, gesture and speech rendered by an embodied conversational agent can influence the flow of conversations in multiparty settings. We begin by reviewing a computational framework for turn-taking that provides the foundation ...
Authoring Communicative Behaviors for Situated, Embodied Characters
ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

Embodied conversational agents hold great potential as multimodal interfaces due to their ability to communicate naturally using speech and nonverbal cues. The goal of my research is to enable animators and designers to endow ECAs with interactive ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IVA '23: Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents

September 2023

376 pages

ISBN:9781450399944

DOI:10.1145/3570945

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 December 2023

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

IVA '23

Sponsor:

SIGAI

IVA '23: ACM International Conference on Intelligent Virtual Agents

September 19 - 22, 2023

Würzburg, Germany

Acceptance Rates

Overall Acceptance Rate 53 of 196 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
319
Total Downloads

Downloads (Last 12 months)319
Downloads (Last 6 weeks)45

Reflects downloads up to 14 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen JGu CZhang JLiu ZKonomi S(2024)Sensing the Intentions to Speak in VR Group DiscussionsSensors10.3390/s2402036224:2(362)Online publication date: 7-Jan-2024
https://doi.org/10.3390/s24020362
Ehret JBönsch AFels JSchlittmeier SKuhlen T(2024)StudyFramework: Comfortably Setting up and Conducting Factorial-Design Studies Using the Unreal Engine2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)10.1109/VRW62533.2024.00087(442-449)Online publication date: 16-Mar-2024
https://doi.org/10.1109/VRW62533.2024.00087
Ehret JBönsch ASchiller IBreuer CAspöck LFels JSchlittmeier SKuhlen T(2024)Audiovisual Coherence: Is Embodiment of Background Noise Sources a Necessity?2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)10.1109/VRW62533.2024.00017(61-67)Online publication date: 16-Mar-2024
https://doi.org/10.1109/VRW62533.2024.00017
Rupp DGrießer PBonsch AKuhlen T(2024)Authentication in Immersive Virtual Environments through Gesture-Based Interaction with a Virtual Agent2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)10.1109/VRW62533.2024.00016(54-60)Online publication date: 16-Mar-2024
https://doi.org/10.1109/VRW62533.2024.00016
Schiller IBreuer CAspöck LEhret JBönsch AKuhlen TFels JSchlittmeier S(2024)A lecturer’s voice quality and its effect on memory, listening effort, and perception in a VR environmentScientific Reports10.1038/s41598-024-63097-614:1Online publication date: 30-May-2024
https://doi.org/10.1038/s41598-024-63097-6

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents