skip to main content
10.1145/3570945.3607312acmconferencesArticle/Chapter ViewAbstractPublication PagesivaConference Proceedingsconference-collections
research-article
Open access

Who's next?: Integrating Non-Verbal Turn-Taking Cues for Embodied Conversational Agents

Published: 22 December 2023 Publication History

Abstract

Taking turns in a conversation is a delicate interplay of various signals, which we as humans can easily decipher. Embodied conversational agents (ECAs) communicating with humans should leverage this ability for smooth and enjoyable conversations. Extensive research has analyzed human turn-taking cues, and attempts have been made to predict turn-taking based on observed cues. These cues vary from prosodic, semantic, and syntactic modulation over adapted gesture and gaze behavior to actively used respiration. However, when generating such behavior for social robots or ECAs, often only single modalities were considered, e.g., gazing. We strive to design a comprehensive system that produces cues for all non-verbal modalities: gestures, gaze, and breathing. The system provides valuable cues without requiring speech content adaptation. We evaluated our system in a VR-based user study with N = 32 participants executing two subsequent tasks. First, we asked them to listen to two ECAs taking turns in several conversations. Second, participants engaged in taking turns with one of the ECAs directly. We examined the system's usability and the perceived social presence of the ECAs' turn-taking behavior, both with respect to each individual non-verbal modality and their interplay. While we found effects of gesture manipulation in interactions with the ECAs, no effects on social presence were found.

References

[1]
Sean Andrist, Bilge Mutlu, and Michael Gleicher. 2013. Conversational Gaze Aversion for Virtual Agents. In Int. Workshop on Intelligent Virtual Agents. 249--262. https://link.springer.com/chapter/10.1007/978-3-642-40415-3_22
[2]
Christoph Bartneck, Dana Kulic, Elizabeth Croft, and Susana Zoghbi. 2009. Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots. Int J Soc Robot 1 (2009), 71--81. https://doi.org/10.1007/s12369-008-0001-3
[3]
Andrea Bönsch, Alexander R. Bluhm, Jonathan Ehret, and Torsten W. Kuhlen. 2020. Inferring a User's Intent on Joining or Passing by Social Groups. Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, IVA 2020 (2020). https://doi.org/10.1145/3383652.3423862
[4]
Angelo Cafaro, Nadine Glas, and Catherine Pelachaud. 2016. The Effects of Interrupting Behavior on Interpersonal Attitude and Engagement in Dyadic Interactions. In Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 911--920. https://dl.acm.org/citation.cfm?id=2937059
[5]
Justine Cassell. 2000. Embodied conversational interface agents. Commun. ACM (2000), 70--78. https://doi.org/10.1145/332051.332075
[6]
Justine Cassell, Yukiko I Nakano, Timothy W Bickmore, Candace L Sidner, and Charles Rich. 2001. Non-Verbal Cues for Discourse Structure. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics. 114--123. https://www.aclweb.org/anthology/P01- 1016.pdf
[7]
Ferdinand de Coninck, Zerrin Yumak, Guntur Sandino, and Remco Veltkamp. 2019. Non-verbal Behavior Generation for Virtual Characters in Group Conversations. In 2019 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR). https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8942389&tag=1
[8]
David Devault, Johnathan Mell, and Jonathan Gratch. 2015. Toward Natural Turn-Taking in a Virtual Human Negotiation Agent. In AAAI Spring Symposium on Turn-taking and Coordination in Human-Machine Interaction. https://www.aaai.org/ocs/index.php/SSS/SSS15/paper/viewFile/10335/10100
[9]
Yu Ding, Yuting Zhang, Meihua Xiao, and Zhigang Deng. 2017. A Multifaceted Study on Eye Contact based Speaker Identification in Three-party Conversations. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, 3011--3021. https://doi.org/10.1145/3025453.3025644
[10]
Starkey Duncan Jr. 1974. On the structure of speaker-auditor interaction during speaking turns. Language in Socienty 2 (1974), 161--180. https://www.cambridge.org/core/services/aop-cambridge-core/content/view/S0047404500004322
[11]
Jens Edlund and Jonas Beskow. 2009. Mushypeek: A framework for online investigation of audiovisual dialogue phenomena. Language and Speech 52, 2-3 (2009), 351--367. https://doi.org/10.1177/0023830909103179
[12]
Jonathan Ehret, Andrea Bönsch, Lukas Aspöck, Christine T. Röhr, Stefan Baumann, Martine Grice, Janina Fels, and Torsten W. Kuhlen. 2021. Do Prosody and Embodiment Influence the Perceived Naturalness of Conversational Agents' Speech? ACM Transactions on Applied Perception 18, 4 (2021), 21:1--15. https://doi.org/10.1145/3486580
[13]
Jonathan Ehret, Jonas Stienen, Chris Brozdowski, Andrea Bönsch, Irene Mittelberg, Michael Vorländer, and Torsten W. Kuhlen. 2020. Evaluating the Influence of Phoneme-Dependent Dynamic Speaker Directivity of Embodied Conversational Agents' Speech. In Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, IVA 2020. Association for Computing Machinery, Inc. https://doi.org/10.1145/3383652.3423863
[14]
Cosima Antonia Ermert, Chinthusa Mohanathasan, Jonathan Ehret, Sabine Janina Schlittmeier, Torsten W. Kuhlen, and Janina Fels. 2022. AuViST - An Audio-Visual Speech and Text Database for the Heard-Text-Recall Paradigm. https://doi.org/10.18154/RWTH-2023-05543
[15]
Siska Fitrianie, Merijn Bruijnes, Fengxiang Li, Amal Abdulrahman, and Willem-Paul Brinkman. 2022. The Artificial-Social-Agent Questionnaire: Establishing the long and short questionnaire versions. In Proceedings of the 22nd ACM International Conference on Intelligent Virtual Agents. https://doi.org/10.1145/3514197
[16]
Sarah Gillet, Ronald Cumbal, André Pereira, José Lopes, Olov Engwall, and Iolanda Leite. 2021. Robot Gaze Can Mediate Participation Imbalance in Groups with Diferent Skill Levels. In Proceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction (HRI '21). 303--311. https://doi.org/10.1145/3434073.3444670
[17]
Ific Goude, Alexandre Bruckert, Anne-Helene Olivier, Julien Pettre, Remi Cozot, Kadi Bouatouch, Marc Christie, and Ludovic Hoyet. 2023. Real-time Multi-map Saliency-driven Gaze Behavior for Non-conversational Characters. IEEE Transactions on Visualization and Computer Graphics (2023), 1--13. https://doi.org/10.1109/TVCG.2023.3244679
[18]
Dirk Heylen. 2006. Head gestures, gaze and the principles of conversational structure. International Journal of Humanoid Robotics 3, 3 (2006), 241--267. https://www.worldscientific.com/doi/abs/10.1142/S0219843606000746
[19]
Judith Holler and Kobin H Kendrick. 2015. Unaddressed participants' gaze in multi-person interaction: optimizing recipiency. Frontiers in Psychology 6 (2015), 76--89. https://doi.org/10.3389/978-2-88919-825-2
[20]
Chien-Ming Huang, Sean Andrist, Allison Sauppé, and Bilge Mutlu. 2015. Using gaze patterns to predict task intent in collaboration. Frontiers in Psychology 6 (2015), 1049. https://doi.org/10.3389/fpsyg.2015.01049
[21]
Ryo Ishii, Shiro Kumano, and Kazuhiro Otsuka. 2015. Multimodal fusion using respiration and gaze for predicting next speaker in multi-party meetings. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction. 99--106. https://doi.org/10.1145/2818346.2820755
[22]
Ryo Ishii, Kazuhiro Otsuka, Shiro Kumano, and Junji Yamato. 2016. Prediction of who will be the next speaker and when using gaze behavior in multiparty meetings. ACM Trans. Interact. Intell. Syst. 6, 1 (2016), 4:1--33. https://doi.org/10.1145/2757284
[23]
Ryo Ishii, Xutong Ren, Michal Muszynski, and Louis-Philippe Morency. 2021. Multimodal and Multitask Approach to Listener's Backchannel Prediction: Can Prediction of Turn-changing and Turn-management Willingness Improve Backchannel Modeling?. In Proceedings of the 21th ACM International Conference on Intelligent Virtual Agents. 131--138. https://doi.org/10.1145/3472306.3478360
[24]
Kristiina Jokinen, Hirohisa Furukawa, Masafumi Nishida, and Seiichi Yamamoto. 2013. Gaze and Turn-Taking Behavior in Casual Conversational Interactions. ACM Trans. Interact. Intell. Syst 3, 2 (2013), 12:1--30. https://doi.org/10.1145/2499474.2499481
[25]
Adam Kendon and Mark Cook. 1969. The consistency of gaze patterns in social interaction. British Journal of Psychology 60, 4 (1969), 481--494. https://doi.org/10.1111/J.2044-8295.1969.TB01222.X
[26]
Sooha Park Lee, Jeremy B. Badler, and Norman I. Badler. 2002. Eyes alive. In Proceedings of the 29th annual conference on Computer graphics and interactive techniques - SIGGRAPH '02. 637. https://doi.org/10.1145/566570.566629
[27]
Bilge Mutlu, Takayuki Kanda, Jodi Forlizzi, Jessica Hodgins, and Hiroshi Ishiguro. 2012. Conversational gaze mechanisms for humanlike robots. ACM Transactions on Interactive Intelligent Systems 1, 2 (2012), 12:1--33. https://doi.org/10.1145/2070719.2070725
[28]
Catharine Oertel, Marcin Włodarczak, Jens Edlund, Petra Wagner, and Joakim Gustafson. 2012. Gaze Patterns in Turn-Taking. In INTERSPEECH 2012. 2246--2246. https://www.isca-speech.org/archive_v0/interspeech_2012/i12_2246.html
[29]
Tomislav Pejsa, Sean Andrist, Michael Gleicher, and Bilge Mutlu. 2015. Gaze and attention management for embodied conversational agents. ACM Trans. Interact. Intell. Syst 5, 1 (2015), 3:1--34. https://doi.org/10.1145/2724731
[30]
Brian Ravenet, Angelo Cafaro, Beatrice Biancardi, Magalie Ochs, and Catherine Pelachaud. 2015. Conversational behavior reflecting interpersonal attitudes in small group interactions. In International Conference on Intelligent Virtual Agents. 375--388. https://link.springer.com/chapter/10.1007/978-3-319-21996-7_41
[31]
Rutger Rienks, Ronald Poppe, and Dirk Heylen. 2010. Differences in Head Orientation Behavior for Speakers and Listeners: An Experiment in a Virtual Environment. ACM Trans. Appl. Percept 7, 2 (2010), 2:1--13. https://doi.org/10.1145/1658349.1658351
[32]
Carina Riest, Annett B. Jorschick, and Jan P. de Ruiter. 2015. Anticipation in turn-taking: mechanisms and information sources. Frontiers in Psychology 6 (2015), 62--75. https://doi.org/10.3389/978-2-88919-825-2
[33]
Matthew Roddy, Gabriel Skantze, and Naomi Harte. 2018. Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs. In International Conference on Multimodal Interaction. 186--190. https://doi.org/10.1145/3242969.3242997
[34]
K. Ruhland, C. E. Peters, S. Andrist, J. B. Badler, N. I. Badler, M. Gleicher, B. Mutlu, and R. McDonnell. 2015. A Review of Eye Gaze in Virtual Agents, Social Robotics and HCI: Behaviour Generation, User Interaction and Perception. Computer Graphics Forum 34, 6 (2015), 299--326. https://doi.org/10.1111/CGF.12603
[35]
Sabine Janina Schlittmeier, Chinthusa Mohanathasan, Isabel Sarah Schiller, and Andreas Liebl. 2023. Measuring text comprehension and memory: A comprehensive database for Heard Text Recall (HTR) and Read Text Recall (RTR) paradigms, with optional note-taking and graphical displays., 7 pages. https://doi.org/10.18154/RWTH-2023-05285
[36]
Ludwig Sidenmark and Hans Gellersen. 2019. Eye, Head and Torso Coordination During Gaze Shifts in Virtual Reality. ACM Trans. Comput.-Hum. Interact 27, 1 (2019), 4:1--40. https://doi.org/10.1145/3361218
[37]
Gabriel Skantze. 2021. Turn-taking in Conversational Systems and Human-Robot Interaction: A Review. Computer Speech and Language 67, 101178 (2021). https://doi.org/10.1016/J.CSL.2020.101178
[38]
Laura C. Trutoiu, Elizabeth J. Carter, Iain Matthews, and Jessica K. Hodgins. 2011. Modeling and animating eye blinks. ACM Transactions on Applied Perception 8, 3 (2011), 17:1--17. https://doi.org/10.1145/2010325.2010327
[39]
Astrid M. Von Der Pütten, Nicole C. Krämer, Jonathan Gratch, and Sin Hwa Kang. 2010. "It doesn't matter what you are!" Explaining social effects of agents and avatars. Computers in Human Behavior 26, 6 (nov 2010), 1641--1650. https://doi.org/10.1016/J.CHB.2010.06.012
[40]
Petra Wagner, Zofia Malisz, and Stefan Kopp. 2014. Gesture and speech in interaction: An overview. Speech Communication 57 (2014), 209--232. https://doi.org/10.1016/j.specom.2013.09.008
[41]
Margaret Zellers, David House, and Simon Alexanderson. 2016. Prosody and hand gesture at turn boundaries in Swedish. In Speech Prosody. 831--835. https://doi.org/10.21437/SpeechProsody.2016-170

Cited By

View all
  • (2024)Sensing the Intentions to Speak in VR Group DiscussionsSensors10.3390/s2402036224:2(362)Online publication date: 7-Jan-2024
  • (2024)StudyFramework: Comfortably Setting up and Conducting Factorial-Design Studies Using the Unreal Engine2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)10.1109/VRW62533.2024.00087(442-449)Online publication date: 16-Mar-2024
  • (2024)Audiovisual Coherence: Is Embodiment of Background Noise Sources a Necessity?2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)10.1109/VRW62533.2024.00017(61-67)Online publication date: 16-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IVA '23: Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents
September 2023
376 pages
ISBN:9781450399944
DOI:10.1145/3570945
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 December 2023

Check for updates

Author Tags

  1. ECA
  2. breathing
  3. embodied conversational agents
  4. gaze
  5. gesture
  6. non-verbal
  7. social presence
  8. turn-taking
  9. virtual agents

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

IVA '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 53 of 196 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)319
  • Downloads (Last 6 weeks)45
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Sensing the Intentions to Speak in VR Group DiscussionsSensors10.3390/s2402036224:2(362)Online publication date: 7-Jan-2024
  • (2024)StudyFramework: Comfortably Setting up and Conducting Factorial-Design Studies Using the Unreal Engine2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)10.1109/VRW62533.2024.00087(442-449)Online publication date: 16-Mar-2024
  • (2024)Audiovisual Coherence: Is Embodiment of Background Noise Sources a Necessity?2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)10.1109/VRW62533.2024.00017(61-67)Online publication date: 16-Mar-2024
  • (2024)Authentication in Immersive Virtual Environments through Gesture-Based Interaction with a Virtual Agent2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)10.1109/VRW62533.2024.00016(54-60)Online publication date: 16-Mar-2024
  • (2024)A lecturer’s voice quality and its effect on memory, listening effort, and perception in a VR environmentScientific Reports10.1038/s41598-024-63097-614:1Online publication date: 30-May-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media