skip to main content
10.1145/2660114.2660116acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

A Multi-task Learning Framework for Time-continuous Emotion Estimation from Crowd Annotations

Published: 07 November 2014 Publication History

Abstract

We propose Multi-task learning (MTL) for time-continuous or dynamic emotion (valence and arousal) estimation in movie scenes. Since compiling annotated training data for dynamic emotion prediction is tedious, we employ crowdsourcing for the same. Even though the crowdworkers come from various demographics, we demonstrate that MTL can effectively discover (1) consistent patterns in their dynamic emotion perception, and (2) the low-level audio and video features that contribute to their valence, arousal (VA) elicitation. Finally, we show that MTL-based regression models, which simultaneously learn the relationship between low-level audio-visual features and high-level VA ratings from a collection of movie scenes, can predict VA ratings for time-contiguous snippets from each scene more effectively than scene-specific models.

References

[1]
M. Abadi, S. Kia, R. Subramanian, P. Avesani, and N. Sebe. User-centric affective video tagging from MEG and peripheral physiological responses. In Affective Computing and Intelligent Interaction, pages 582--587, 2013.
[2]
M. K. Abadi, M. Kia, S. Ramanathan, P. Avesani, and N. Sebe. Decoding affect in videos employing the MEG brain signal. pages 1--6, 2013.
[3]
A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. In Neural Information Processing Systems, 2007.
[4]
R. Caruana. Multitask learning. Springer, 1998.
[5]
L. Chen, S. Gunduz, and M. T. Ozsu. Mixed type audio classification with support vector machine. In IEEE Int'l Conference on Multimedia and Expo, pages 781--784, 2006.
[6]
J. J. Gross and R. W. Levenson. Emotion elicitation using films. Cognition & Emotion, 9(1):87--108, 1995.
[7]
A. Hanjalic and L.-Q. Xu. Affective video content representation and modeling. IEEE Transactions on Multimedia, 7(1):143--154, 2005.
[8]
A. Jalali, P. Ravikumar, S. Sanghavi, and C. Ruan. A dirty model for multi-task learning. In Neural Information Processing Systems, 2010.
[9]
H. Kajino, Y. Tsuboi, and H. Kashima. A convex formulation for learning from crowds. In AAAI Conference on Artificial Intelligence, 2012.
[10]
E. G. Kehoe, J. M. Toomey, J. H. Balsters, and A. L. W. Bokde. Personality modulates the effects of emotional arousal and valence on brain activation. Social Cognitive & Affective Neuroscience, 7:858--70, 2012.
[11]
S. Koelstra, C. Mühl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras. DEAP: A database for emotion analysis using physiological signals. IEEE Transactions on Affective Computing, 3(1):18--31, 2012.
[12]
S. Koelstra, C. Mühl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras. Deap: A database for emotion analysis; using physiological signals. IEEE Transactions on Affective Computing, 3(1):18--31, 2012.
[13]
A. Lerch. An introduction to audio content analysis: Applications in signal processing and music informatics. John Wiley & Sons, 2012.
[14]
D. Li, I. K. Sethi, N. Dimitrova, and T. McGee. Classification of general audio data for content-based retrieval. Pattern Recognition Letters, 22(5):533--544, 2001.
[15]
B. D. Lucas, T. Kanade, et al. An iterative image registration technique with an application to stereo vision. In Int'l Joint Conference on Artificial Intelligence, volume 81, pages 674--679, 1981.
[16]
D. McDuff, R. Kaliouby, and R. W. Picard. Crowdsourcing facial responses to online videos. IEEE Transactions on Affective Computing, 3(4):456--468, 2012.
[17]
K. Mustafa and I. C. Bruce. Robust formant tracking for continuous speech with speaker variability. Audio, Speech, and Language Processing, IEEE Transactions on, 14(2):435--444, 2006.
[18]
R. W. Picard. Affective computing. MIT press, 2000.
[19]
J. Ross, L. Irani, M. Silberman, A. Zaldivar, and B. Tomlinson. Who are the crowdworkers?: shifting demographics in mechanical turk. In Human Factors in Computing Systems, pages 2863--2872, 2010.
[20]
M. Soleymani, M. N. Caro, E. M. Schmidt, C.-Y. Sha, and Y.-H. Yang. 1000 songs for emotional analysis of music. In ACM international workshop on Crowdsourcing for multimedia, pages 1--6, 2013.
[21]
M. Soleymani, G. Chanel, J. J. Kierkels, and T. Pun. Affective characterization of movie scenes based on multimedia content analysis and user's physiological emotional responses. In IEEE Int'l Symposium on Multimedia, pages 228--235, 2008.
[22]
M. Soleymani, J. Davis, and T. Pun. A collaborative personalized affective video retrieval system. In Affective Computing and Intelligent Interaction, pages 1--2, 2009.
[23]
M. Soleymani and M. Larson. Crowdsourcing for affective annotation of video: development of a viewer-reported boredom corpus. In Workshop on Crowdsourcing for Search Evaluation, 2010.
[24]
M. Soleymani, J. Lichtenauer, T. Pun, and M. Pantic. A multimodal database for affect recognition and implicit tagging. T. Affective Computing, 3(1):42--55, 2012.
[25]
T. Steiner, R. Verborgh, R. Van de Walle, M. Hausenblas, and J. G. Vallés. Crowdsourcing event detection in youtube video. In M. Van Erp, W. R. Van Hage, L. Hollink, A. Jameson, and R. Troncy, editors, Workshop on detection, representation, and exploitation of events in the semantic web, pages 58--67, 2011.
[26]
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267--288, 1996.
[27]
P. Valdez and A. Mehrabian. Effects of color on emotions. Journal of Experimental Psychology: General, 123(4):394, 1994.
[28]
C. Vondrick, D. Patterson, and D. Ramanan. Efficiently scaling up crowdsourced video annotation. Int'l Journal of Computer Vision, 101(1):184--204, 2013.
[29]
H. L. Wang and L.-F. Cheong. Affective understanding in film. IEEE Transactions on Circuits and Systems for Video Technology, 16(6):689--704, 2006.
[30]
J. D. Williams, I. D. Melamed, T. Alonso, B. Hollister, and J. Wilpon. Crowd-sourcing for difficult transcription of speech. In IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pages 535--540, 2011.
[31]
Y. Yan, E. Ricci, R. Subramanian, O. Lanz, and N. Sebe. No matter where you are: Flexible graph-guided multi-task learning for multi-view head pose classification under target motion. In IEEE Int. Conf. on Computer Vision, pages 1177--1184, 2013.
[32]
X. Yuan and S. Yan. Visual classification with multi-task joint sparse representation. In Computer Vision and Pattern Recognition, 2010.
[33]
J. Yuen, B. C. Russell, C. Liu, and A. Torralba. Labelme video: Building a video database with human annotations. In Int'l Conference on Computer Vision, pages 1451--1458, 2009.
[34]
T. Zhang, B. Ghanem, S. Liu, and N. Ahuja. Robust visual tracking via structured multi-task sparse learning. Int'l Journal of Computer Vision, 101(2):367--383, 2013.
[35]
J. Zhou, J. Chen, and J. Ye. MALSAR: Multi-tAsk Learning via StructurAl Regularization. Arizona State University, 2011.

Cited By

View all
  • (2019)Affect Estimation in 3D Space Using Multi-Task Active Learning for RegressionIEEE Transactions on Affective Computing10.1109/TAFFC.2019.2916040(1-1)Online publication date: 2019
  • (2018)Multi-task Feature Learning for EEG-based Emotion Recognition Using Group Nonnegative Matrix Factorization2018 26th European Signal Processing Conference (EUSIPCO)10.23919/EUSIPCO.2018.8553390(91-95)Online publication date: Sep-2018
  • (2016)Crowdsourcing Empathetic IntelligenceACM Transactions on Intelligent Systems and Technology10.1145/28973697:4(1-27)Online publication date: 2-May-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CrowdMM '14: Proceedings of the 2014 International ACM Workshop on Crowdsourcing for Multimedia
November 2014
84 pages
ISBN:9781450331289
DOI:10.1145/2660114
  • General Chairs:
  • Judith Redi,
  • Mathias Lux
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. crowd annotation
  2. movie clips
  3. multi-task learning
  4. time-continuous emotion estimation

Qualifiers

  • Research-article

Funding Sources

Conference

MM '14
Sponsor:
MM '14: 2014 ACM Multimedia Conference
November 7, 2014
Florida, Orlando, USA

Acceptance Rates

CrowdMM '14 Paper Acceptance Rate 8 of 26 submissions, 31%;
Overall Acceptance Rate 16 of 42 submissions, 38%

Upcoming Conference

MM '24
The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne , VIC , Australia

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Affect Estimation in 3D Space Using Multi-Task Active Learning for RegressionIEEE Transactions on Affective Computing10.1109/TAFFC.2019.2916040(1-1)Online publication date: 2019
  • (2018)Multi-task Feature Learning for EEG-based Emotion Recognition Using Group Nonnegative Matrix Factorization2018 26th European Signal Processing Conference (EUSIPCO)10.23919/EUSIPCO.2018.8553390(91-95)Online publication date: Sep-2018
  • (2016)Crowdsourcing Empathetic IntelligenceACM Transactions on Intelligent Systems and Technology10.1145/28973697:4(1-27)Online publication date: 2-May-2016

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media