research-article

A Multi-task Learning Framework for Time-continuous Emotion Estimation from Crowd Annotations

Authors:

Mojtaba Khomami Abadi,

Ramanathan Subramanian,

Negar Rostamzadeh,

Jagannadan Varadarajan,

Nicu SebeAuthors Info & Claims

CrowdMM '14: Proceedings of the 2014 International ACM Workshop on Crowdsourcing for Multimedia

Pages 17 - 23

https://doi.org/10.1145/2660114.2660116

Published: 07 November 2014 Publication History

Abstract

We propose Multi-task learning (MTL) for time-continuous or dynamic emotion (valence and arousal) estimation in movie scenes. Since compiling annotated training data for dynamic emotion prediction is tedious, we employ crowdsourcing for the same. Even though the crowdworkers come from various demographics, we demonstrate that MTL can effectively discover (1) consistent patterns in their dynamic emotion perception, and (2) the low-level audio and video features that contribute to their valence, arousal (VA) elicitation. Finally, we show that MTL-based regression models, which simultaneously learn the relationship between low-level audio-visual features and high-level VA ratings from a collection of movie scenes, can predict VA ratings for time-contiguous snippets from each scene more effectively than scene-specific models.

References

[1]

M. Abadi, S. Kia, R. Subramanian, P. Avesani, and N. Sebe. User-centric affective video tagging from MEG and peripheral physiological responses. In Affective Computing and Intelligent Interaction, pages 582--587, 2013.

Digital Library

[2]

M. K. Abadi, M. Kia, S. Ramanathan, P. Avesani, and N. Sebe. Decoding affect in videos employing the MEG brain signal. pages 1--6, 2013.

[3]

A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. In Neural Information Processing Systems, 2007.

Digital Library

[4]

R. Caruana. Multitask learning. Springer, 1998.

[5]

L. Chen, S. Gunduz, and M. T. Ozsu. Mixed type audio classification with support vector machine. In IEEE Int'l Conference on Multimedia and Expo, pages 781--784, 2006.

[6]

J. J. Gross and R. W. Levenson. Emotion elicitation using films. Cognition & Emotion, 9(1):87--108, 1995.

[7]

A. Hanjalic and L.-Q. Xu. Affective video content representation and modeling. IEEE Transactions on Multimedia, 7(1):143--154, 2005.

Digital Library

[8]

A. Jalali, P. Ravikumar, S. Sanghavi, and C. Ruan. A dirty model for multi-task learning. In Neural Information Processing Systems, 2010.

[9]

H. Kajino, Y. Tsuboi, and H. Kashima. A convex formulation for learning from crowds. In AAAI Conference on Artificial Intelligence, 2012.

[10]

E. G. Kehoe, J. M. Toomey, J. H. Balsters, and A. L. W. Bokde. Personality modulates the effects of emotional arousal and valence on brain activation. Social Cognitive & Affective Neuroscience, 7:858--70, 2012.

[11]

S. Koelstra, C. Mühl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras. DEAP: A database for emotion analysis using physiological signals. IEEE Transactions on Affective Computing, 3(1):18--31, 2012.

Digital Library

[12]

S. Koelstra, C. Mühl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras. Deap: A database for emotion analysis; using physiological signals. IEEE Transactions on Affective Computing, 3(1):18--31, 2012.

Digital Library

[13]

A. Lerch. An introduction to audio content analysis: Applications in signal processing and music informatics. John Wiley & Sons, 2012.

Digital Library

[14]

D. Li, I. K. Sethi, N. Dimitrova, and T. McGee. Classification of general audio data for content-based retrieval. Pattern Recognition Letters, 22(5):533--544, 2001.

Digital Library

[15]

B. D. Lucas, T. Kanade, et al. An iterative image registration technique with an application to stereo vision. In Int'l Joint Conference on Artificial Intelligence, volume 81, pages 674--679, 1981.

Digital Library

[16]

D. McDuff, R. Kaliouby, and R. W. Picard. Crowdsourcing facial responses to online videos. IEEE Transactions on Affective Computing, 3(4):456--468, 2012.

Digital Library

[17]

K. Mustafa and I. C. Bruce. Robust formant tracking for continuous speech with speaker variability. Audio, Speech, and Language Processing, IEEE Transactions on, 14(2):435--444, 2006.

Digital Library

[18]

R. W. Picard. Affective computing. MIT press, 2000.

Digital Library

[19]

J. Ross, L. Irani, M. Silberman, A. Zaldivar, and B. Tomlinson. Who are the crowdworkers?: shifting demographics in mechanical turk. In Human Factors in Computing Systems, pages 2863--2872, 2010.

Digital Library

[20]

M. Soleymani, M. N. Caro, E. M. Schmidt, C.-Y. Sha, and Y.-H. Yang. 1000 songs for emotional analysis of music. In ACM international workshop on Crowdsourcing for multimedia, pages 1--6, 2013.

Digital Library

[21]

M. Soleymani, G. Chanel, J. J. Kierkels, and T. Pun. Affective characterization of movie scenes based on multimedia content analysis and user's physiological emotional responses. In IEEE Int'l Symposium on Multimedia, pages 228--235, 2008.

Digital Library

[22]

M. Soleymani, J. Davis, and T. Pun. A collaborative personalized affective video retrieval system. In Affective Computing and Intelligent Interaction, pages 1--2, 2009.

[23]

M. Soleymani and M. Larson. Crowdsourcing for affective annotation of video: development of a viewer-reported boredom corpus. In Workshop on Crowdsourcing for Search Evaluation, 2010.

[24]

M. Soleymani, J. Lichtenauer, T. Pun, and M. Pantic. A multimodal database for affect recognition and implicit tagging. T. Affective Computing, 3(1):42--55, 2012.

Digital Library

[25]

T. Steiner, R. Verborgh, R. Van de Walle, M. Hausenblas, and J. G. Vallés. Crowdsourcing event detection in youtube video. In M. Van Erp, W. R. Van Hage, L. Hollink, A. Jameson, and R. Troncy, editors, Workshop on detection, representation, and exploitation of events in the semantic web, pages 58--67, 2011.

[26]

R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267--288, 1996.

[27]

P. Valdez and A. Mehrabian. Effects of color on emotions. Journal of Experimental Psychology: General, 123(4):394, 1994.

[28]

C. Vondrick, D. Patterson, and D. Ramanan. Efficiently scaling up crowdsourced video annotation. Int'l Journal of Computer Vision, 101(1):184--204, 2013.

Digital Library

[29]

H. L. Wang and L.-F. Cheong. Affective understanding in film. IEEE Transactions on Circuits and Systems for Video Technology, 16(6):689--704, 2006.

Digital Library

[30]

J. D. Williams, I. D. Melamed, T. Alonso, B. Hollister, and J. Wilpon. Crowd-sourcing for difficult transcription of speech. In IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pages 535--540, 2011.

[31]

Y. Yan, E. Ricci, R. Subramanian, O. Lanz, and N. Sebe. No matter where you are: Flexible graph-guided multi-task learning for multi-view head pose classification under target motion. In IEEE Int. Conf. on Computer Vision, pages 1177--1184, 2013.

Digital Library

[32]

X. Yuan and S. Yan. Visual classification with multi-task joint sparse representation. In Computer Vision and Pattern Recognition, 2010.

[33]

J. Yuen, B. C. Russell, C. Liu, and A. Torralba. Labelme video: Building a video database with human annotations. In Int'l Conference on Computer Vision, pages 1451--1458, 2009.

[34]

T. Zhang, B. Ghanem, S. Liu, and N. Ahuja. Robust visual tracking via structured multi-task sparse learning. Int'l Journal of Computer Vision, 101(2):367--383, 2013.

Digital Library

[35]

J. Zhou, J. Chen, and J. Ye. MALSAR: Multi-tAsk Learning via StructurAl Regularization. Arizona State University, 2011.

Cited By

Wu DHuang J(2019)Affect Estimation in 3D Space Using Multi-Task Active Learning for RegressionIEEE Transactions on Affective Computing10.1109/TAFFC.2019.2916040(1-1)Online publication date: 2019
https://doi.org/10.1109/TAFFC.2019.2916040
Hajlaoui AChetouani MEssid S(2018)Multi-task Feature Learning for EEG-based Emotion Recognition Using Group Nonnegative Matrix Factorization2018 26th European Signal Processing Conference (EUSIPCO)10.23919/EUSIPCO.2018.8553390(91-95)Online publication date: Sep-2018
https://doi.org/10.23919/EUSIPCO.2018.8553390
Katsimerou CAlbeda JHuldtgren AHeynderickx IRedi J(2016)Crowdsourcing Empathetic IntelligenceACM Transactions on Intelligent Systems and Technology10.1145/28973697:4(1-27)Online publication date: 2-May-2016
https://dl.acm.org/doi/10.1145/2897369

Index Terms

A Multi-task Learning Framework for Time-continuous Emotion Estimation from Crowd Annotations
1. Computing methodologies
  1. Artificial intelligence
    1. Philosophical/theoretical foundations of artificial intelligence
      1. Cognitive science
  2. Machine learning
2. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Multimodal Multi-task Learning for Dimensional and Continuous Emotion Recognition
AVEC '17: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge

Automatic emotion recognition is a challenging task which can make great impact on improving natural human computer interactions. In this paper, we present our effort for the Affect Subtask in the Audio/Visual Emotion Challenge (AVEC) 2017, which ...
A Multi-Task Learning Framework for Emotion Recognition Using 2D Continuous Space

Dimensional models have been proposed in psychology studies to represent complex human emotional expressions. Activation and valence are two common dimensions in such models. They can be used to describe certain emotions. For example, anger is one type ...
Towards Estimating Missing Emotion Self-reports Leveraging User Similarity: A Multi-task Learning Approach
CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

The Experience Sampling Method (ESM) is widely used to collect emotion self-reports to train machine learning models for emotion inference. However, as ESM studies are time-consuming and burdensome, participants often withdraw in between. This unplanned ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CrowdMM '14: Proceedings of the 2014 International ACM Workshop on Crowdsourcing for Multimedia

November 2014

84 pages

ISBN:9781450331289

DOI:10.1145/2660114

General Chairs:
Judith Redi
Delft University of Technology, The Netherlands
,
Mathias Lux
Klagenfurt University, Austria

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

MM '14

Sponsor:

SIGMM

MM '14: 2014 ACM Multimedia Conference

November 7, 2014

Florida, Orlando, USA

Acceptance Rates

CrowdMM '14 Paper Acceptance Rate 8 of 26 submissions, 31%;

Overall Acceptance Rate 16 of 42 submissions, 38%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
124
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu DHuang J(2019)Affect Estimation in 3D Space Using Multi-Task Active Learning for RegressionIEEE Transactions on Affective Computing10.1109/TAFFC.2019.2916040(1-1)Online publication date: 2019
https://doi.org/10.1109/TAFFC.2019.2916040
Hajlaoui AChetouani MEssid S(2018)Multi-task Feature Learning for EEG-based Emotion Recognition Using Group Nonnegative Matrix Factorization2018 26th European Signal Processing Conference (EUSIPCO)10.23919/EUSIPCO.2018.8553390(91-95)Online publication date: Sep-2018
https://doi.org/10.23919/EUSIPCO.2018.8553390
Katsimerou CAlbeda JHuldtgren AHeynderickx IRedi J(2016)Crowdsourcing Empathetic IntelligenceACM Transactions on Intelligent Systems and Technology10.1145/28973697:4(1-27)Online publication date: 2-May-2016
https://dl.acm.org/doi/10.1145/2897369

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents