Authors: Hassan Hayat; Carles Ventura; Agata Lapedriza
Publication name: Affective Computing & Intelligent Interaction (ACII) 2021
Understanding the emotional impact of movies has become important for affective movie analysis, ranking, and indexing. Methods for recognizing evoked emotions are usually trained on human annotated data. Concretely, viewers watch video clips and have to manually annotate the emotions they experienced while watching the videos. Then, the common practice is to aggregate the different annotations, by computing average scores or majority voting, and train and test models on these aggregated annotations. With this procedure a single aggregated evoked emotion annotation is obtained per each video. However, emotions experienced while watching a video are subjective: different individuals might experience different emotions. In this paper, we model the emotions evoked by videos in a different manner: instead of modeling the aggregated value we jointly model the emotions experienced by each viewer and the aggregated value using a multi-task learning approach. Concretely, we propose two deep learning architectures: a Single-Task (ST) architecture and a Multi-Task (MT) architecture. Our results show that the MT approach can more accurately model each viewer and the aggregated annotation when compared to methods that are directly trained on the aggregated annotations. Furthermore, our approach outperforms the current state-of-the-art results on the COGNIMUSE benchmark.