research-article

Multimodal human behavior analysis: learning correlation and interaction across modalities

Authors:

Yale Song,

Louis-Philippe Morency,

Randall DavisAuthors Info & Claims

ICMI '12: Proceedings of the 14th ACM international conference on Multimodal interaction

Pages 27 - 30

https://doi.org/10.1145/2388676.2388684

Published: 22 October 2012 Publication History

Get Access

Abstract

Multimodal human behavior analysis is a challenging task due to the presence of complex nonlinear correlations and interactions across modalities. We present a novel approach to this problem based on Kernel Canonical Correlation Analysis (KCCA) and Multi-view Hidden Conditional Random Fields (MV-HCRF). Our approach uses a nonlinear kernel to map multimodal data to a high-dimensional feature space and finds a new projection of the data that maximizes the correlation across modalities. We use a multi-chain structured graphical model with disjoint sets of latent variables, one set per modality, to jointly learn both view-shared and view-specific sub-structures of the projected data, capturing interaction across modalities explicitly. We evaluate our approach on a task of agreement and disagreement recognition from nonverbal audio-visual cues using the Canal 9 dataset. Experimental results show that KCCA makes capturing nonlinear hidden dynamics easier and MV-HCRF helps learning interaction across modalities.

References

[1]

K. Bousmalis, L.-P. Morency, and M. Pantic. Modeling hidden dynamics of multimodal cues for spontaneous agreement and disagreement recognition. In FG, 2011.

Google Scholar

[2]

R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter. Probabilistic Networks and Expert Systems. Springer-Verlag, 1999.

Digital Library

Google Scholar

[3]

D. R. Hardoon, S. Szedmák, and J. Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural Comp., 16(12):2639--2664, 2004.

Digital Library

Google Scholar

[4]

J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML, 2001.

Digital Library

Google Scholar

[5]

J. Nocedal and S. J. Wright. Numerical Optimization. Springer-Verlag, 1999.

Crossref

Google Scholar

[6]

G. Potamianos, C. Neti, J. Luettin, and I. Matthews. Audio-Visual Automatic Speech Recognition: An Overview. MIT Press, 2004.

Google Scholar

[7]

A. Quattoni, S. B. Wang, L.-P. Morency, M. Collins, and T. Darrell. Hidden conditional random fields. TPAMI, 29(10):1848--1852, 2007.

Digital Library

Google Scholar

[8]

L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition, pages 267--296. Morgan Kaufmann Publishers Inc., 1990.

Digital Library

Google Scholar

[9]

B. Scholkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA, 2001.

Digital Library

Google Scholar

[10]

Y. Song, L.-P. Morency, and R. Davis. Multi-view latent variable discriminative models for action recognition. In CVPR, 2012.

Google Scholar

[11]

A. Vinciarelli, A. Dielmann, S. Favre, and H. Salamin. Canal9: A database of political debates for analysis of social interactions. In ACII, 2009.

Crossref

Google Scholar

[12]

Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. TPAMI, 31(1):39--58, 2009.

Digital Library

Google Scholar

[13]

H. Zhang, Y. Zhuang, and F. Wu. Cross-modal correlation learning for clustering on image-audio dataset. In MM, pages 273--276, 2007.

Digital Library

Google Scholar

Cited By

View all

Tu VHuynh VYang HKim SNawaz SNandakumar KZaheer MEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in ConversationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612857(9521-9525)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612857
Amat AAdiani DTauseef MBreen MHunt SSwanson AWeitlauf AWarren ZSarkar N(2023)Design of a Desktop Virtual Reality-Based Collaborative Activities Simulator (ViRCAS) to Support Teamwork in Workplace Settings for Autistic AdultsIEEE Transactions on Neural Systems and Rehabilitation Engineering10.1109/TNSRE.2023.327113931(2184-2194)Online publication date: 2023
https://doi.org/10.1109/TNSRE.2023.3271139
Chen YHuang JXiong SLu X(2023)Fine Aligned Discriminative Hashing for Remote Sensing Image-Audio RetrievalIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2023.326930061(1-12)Online publication date: 2023
https://doi.org/10.1109/TGRS.2023.3269300
Show More Cited By

Index Terms

Multimodal human behavior analysis: learning correlation and interaction across modalities
1. Hardware
  1. Communication hardware, interfaces and storage
    1. Signal processing systems

Recommendations

Multimodal and Multitask Approach to Listener's Backchannel Prediction: Can Prediction of Turn-changing and Turn-management Willingness Improve Backchannel Modeling?
IVA '21: Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents

The listener's backchannel has the important function of encouraging a current speaker to hold their turn and continue to speak, which enables smooth conversation. The listener monitors the speaker's turn-management (a.k.a. speaking and listening) ...
Analysis of correlation based dimension reduction methods

Analysis of correlation based dimension reduction methodsDimension reduction is an important topic in data mining and machine learning. Especially dimension reduction combined with feature fusion is an effective preprocessing step when the data are ...
Canonical random correlation analysis
SAC '10: Proceedings of the 2010 ACM Symposium on Applied Computing

Canonical correlation analysis (CCA) is one of the most well-known methods to extract features from multi-view data and has attracted much attention in recent years. However, classical CCA is unsupervised and does not take class label information into ...

Comments

Information & Contributors

Information

Published In

ICMI '12: Proceedings of the 14th ACM international conference on Multimodal interaction

October 2012

636 pages

ISBN:9781450314671

DOI:10.1145/2388676

General Chairs:
Louis-Philippe Morency
University of Southern California, USA
,
Dan Bohus
Microsoft Research, USA
,
Hamid Aghajan
Stanford University, USA
,
Program Chairs:
Justine Cassell
Carnegie Mellon University, USA
,
Anton Nijholt
University of Twente, Netherlands
,
Julien Epps
The University of New South Wales, Australia

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMI '12

Sponsor:

SIGCHI

ICMI '12: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

October 22 - 26, 2012

California, Santa Monica, USA

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

64
Total Citations
View Citations
739
Total Downloads

Downloads (Last 12 months)57
Downloads (Last 6 weeks)5

Reflects downloads up to 21 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Tu VHuynh VYang HKim SNawaz SNandakumar KZaheer MEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in ConversationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612857(9521-9525)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612857
Amat AAdiani DTauseef MBreen MHunt SSwanson AWeitlauf AWarren ZSarkar N(2023)Design of a Desktop Virtual Reality-Based Collaborative Activities Simulator (ViRCAS) to Support Teamwork in Workplace Settings for Autistic AdultsIEEE Transactions on Neural Systems and Rehabilitation Engineering10.1109/TNSRE.2023.327113931(2184-2194)Online publication date: 2023
https://doi.org/10.1109/TNSRE.2023.3271139
Chen YHuang JXiong SLu X(2023)Fine Aligned Discriminative Hashing for Remote Sensing Image-Audio RetrievalIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2023.326930061(1-12)Online publication date: 2023
https://doi.org/10.1109/TGRS.2023.3269300
Ortiz-Clavijo LGallego-Duque CDavid-Diaz JOrtiz-Zamora A(2023)Implications of Emotion Recognition Technologies: Balancing Privacy and Public SafetyIEEE Technology and Society Magazine10.1109/MTS.2023.330653042:3(69-75)Online publication date: Sep-2023
https://doi.org/10.1109/MTS.2023.3306530
Chen JZhang AZhang ARangwala H(2022)FedMSplitProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539384(87-96)Online publication date: 14-Aug-2022
https://dl.acm.org/doi/10.1145/3534678.3539384
Chen YXiong SMou LZhu X(2022)Deep Quadruple-Based Hashing for Remote Sensing Image-Sound RetrievalIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2022.315528360(1-14)Online publication date: 2022
https://doi.org/10.1109/TGRS.2022.3155283
Trick SHerbert FRothkopf CKoert D(2022)Interactive Reinforcement Learning With Bayesian Fusion of Multimodal AdviceIEEE Robotics and Automation Letters10.1109/LRA.2022.31821007:3(7558-7565)Online publication date: Jul-2022
https://doi.org/10.1109/LRA.2022.3182100
Kim JNirjhar EKim JChaspari THam YWinslow JLee CAhn C(2022)Capturing Environmental Distress of Pedestrians Using Multimodal Data: The Interplay of Biosignals and Image-Based DataJournal of Computing in Civil Engineering10.1061/(ASCE)CP.1943-5487.000100936:2Online publication date: Mar-2022
https://doi.org/10.1061/(ASCE)CP.1943-5487.0001009
Malik SBhardwaj NBhardwaj RKumar S(2022)Cross-Modal Retrieval Using Deep LearningProceedings of Third Doctoral Symposium on Computational Intelligence10.1007/978-981-19-3148-2_62(725-734)Online publication date: 10-Nov-2022
https://doi.org/10.1007/978-981-19-3148-2_62
Chen JZhang ADemartini GZuccon GCulpepper JHuang ZTong H(2021)HetMAMLProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482262(191-200)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482262
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Multimodal and Multitask Approach to Listener's Backchannel Prediction: Can Prediction of Turn-changing and Turn-management Willingness Improve Backchannel Modeling?

Analysis of correlation based dimension reduction methods

Canonical random correlation analysis