skip to main content
10.1145/2388676.2388684acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Multimodal human behavior analysis: learning correlation and interaction across modalities

Published: 22 October 2012 Publication History

Abstract

Multimodal human behavior analysis is a challenging task due to the presence of complex nonlinear correlations and interactions across modalities. We present a novel approach to this problem based on Kernel Canonical Correlation Analysis (KCCA) and Multi-view Hidden Conditional Random Fields (MV-HCRF). Our approach uses a nonlinear kernel to map multimodal data to a high-dimensional feature space and finds a new projection of the data that maximizes the correlation across modalities. We use a multi-chain structured graphical model with disjoint sets of latent variables, one set per modality, to jointly learn both view-shared and view-specific sub-structures of the projected data, capturing interaction across modalities explicitly. We evaluate our approach on a task of agreement and disagreement recognition from nonverbal audio-visual cues using the Canal 9 dataset. Experimental results show that KCCA makes capturing nonlinear hidden dynamics easier and MV-HCRF helps learning interaction across modalities.

References

[1]
K. Bousmalis, L.-P. Morency, and M. Pantic. Modeling hidden dynamics of multimodal cues for spontaneous agreement and disagreement recognition. In FG, 2011.
[2]
R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter. Probabilistic Networks and Expert Systems. Springer-Verlag, 1999.
[3]
D. R. Hardoon, S. Szedmák, and J. Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural Comp., 16(12):2639--2664, 2004.
[4]
J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML, 2001.
[5]
J. Nocedal and S. J. Wright. Numerical Optimization. Springer-Verlag, 1999.
[6]
G. Potamianos, C. Neti, J. Luettin, and I. Matthews. Audio-Visual Automatic Speech Recognition: An Overview. MIT Press, 2004.
[7]
A. Quattoni, S. B. Wang, L.-P. Morency, M. Collins, and T. Darrell. Hidden conditional random fields. TPAMI, 29(10):1848--1852, 2007.
[8]
L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition, pages 267--296. Morgan Kaufmann Publishers Inc., 1990.
[9]
B. Scholkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA, 2001.
[10]
Y. Song, L.-P. Morency, and R. Davis. Multi-view latent variable discriminative models for action recognition. In CVPR, 2012.
[11]
A. Vinciarelli, A. Dielmann, S. Favre, and H. Salamin. Canal9: A database of political debates for analysis of social interactions. In ACII, 2009.
[12]
Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. TPAMI, 31(1):39--58, 2009.
[13]
H. Zhang, Y. Zhuang, and F. Wu. Cross-modal correlation learning for clustering on image-audio dataset. In MM, pages 273--276, 2007.

Cited By

View all
  • (2023)DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in ConversationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612857(9521-9525)Online publication date: 26-Oct-2023
  • (2023)Design of a Desktop Virtual Reality-Based Collaborative Activities Simulator (ViRCAS) to Support Teamwork in Workplace Settings for Autistic AdultsIEEE Transactions on Neural Systems and Rehabilitation Engineering10.1109/TNSRE.2023.327113931(2184-2194)Online publication date: 2023
  • (2023)Fine Aligned Discriminative Hashing for Remote Sensing Image-Audio RetrievalIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2023.326930061(1-12)Online publication date: 2023
  • Show More Cited By

Index Terms

  1. Multimodal human behavior analysis: learning correlation and interaction across modalities

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMI '12: Proceedings of the 14th ACM international conference on Multimodal interaction
    October 2012
    636 pages
    ISBN:9781450314671
    DOI:10.1145/2388676
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 October 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. canonical correlation analysis
    2. kernel methods
    3. multi-view latent variable discriminative models
    4. multimodal signal processing

    Qualifiers

    • Research-article

    Conference

    ICMI '12
    Sponsor:
    ICMI '12: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
    October 22 - 26, 2012
    California, Santa Monica, USA

    Acceptance Rates

    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)57
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 21 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in ConversationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612857(9521-9525)Online publication date: 26-Oct-2023
    • (2023)Design of a Desktop Virtual Reality-Based Collaborative Activities Simulator (ViRCAS) to Support Teamwork in Workplace Settings for Autistic AdultsIEEE Transactions on Neural Systems and Rehabilitation Engineering10.1109/TNSRE.2023.327113931(2184-2194)Online publication date: 2023
    • (2023)Fine Aligned Discriminative Hashing for Remote Sensing Image-Audio RetrievalIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2023.326930061(1-12)Online publication date: 2023
    • (2023)Implications of Emotion Recognition Technologies: Balancing Privacy and Public SafetyIEEE Technology and Society Magazine10.1109/MTS.2023.330653042:3(69-75)Online publication date: Sep-2023
    • (2022)FedMSplitProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539384(87-96)Online publication date: 14-Aug-2022
    • (2022)Deep Quadruple-Based Hashing for Remote Sensing Image-Sound RetrievalIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2022.315528360(1-14)Online publication date: 2022
    • (2022)Interactive Reinforcement Learning With Bayesian Fusion of Multimodal AdviceIEEE Robotics and Automation Letters10.1109/LRA.2022.31821007:3(7558-7565)Online publication date: Jul-2022
    • (2022)Capturing Environmental Distress of Pedestrians Using Multimodal Data: The Interplay of Biosignals and Image-Based DataJournal of Computing in Civil Engineering10.1061/(ASCE)CP.1943-5487.000100936:2Online publication date: Mar-2022
    • (2022)Cross-Modal Retrieval Using Deep LearningProceedings of Third Doctoral Symposium on Computational Intelligence10.1007/978-981-19-3148-2_62(725-734)Online publication date: 10-Nov-2022
    • (2021)HetMAMLProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482262(191-200)Online publication date: 26-Oct-2021
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media