mediaeval 2015 - emotion in music: task overview

39
Emotion in Music: Task Overview Anna Aljanaki 1 Mohammad Soleymani 2 Yi-Hsuan Yang 3 1 Utrecht University, Netherlands 2 University of Geneva, Switzerland 3 Academia Sinica, Taiwan 14-15 September, MediaEval 2015

Upload: multimediaeval

Post on 12-Apr-2017

152 views

Category:

Education


0 download

TRANSCRIPT

Page 1: MediaEval 2015 - Emotion in Music: Task Overview

Emotion in Music: Task Overview

Anna Aljanaki1 Mohammad Soleymani2

Yi-Hsuan Yang3

1Utrecht University, Netherlands2University of Geneva, Switzerland

3Academia Sinica, Taiwan

14-15 September, MediaEval 2015

Page 2: MediaEval 2015 - Emotion in Music: Task Overview

Find me a song...

Page 3: MediaEval 2015 - Emotion in Music: Task Overview

...like this

Page 4: MediaEval 2015 - Emotion in Music: Task Overview

...or like this

Page 5: MediaEval 2015 - Emotion in Music: Task Overview

Emotion in Music Task

� Focuses on audio analysis (optionally, metadata)� Recognizes that during a duration of a song the mood can

change� Uses valence/arousal model

Page 6: MediaEval 2015 - Emotion in Music: Task Overview

Valence/Arousal model

Page 7: MediaEval 2015 - Emotion in Music: Task Overview

We look at emotion over time (over duration of a piece)

Page 8: MediaEval 2015 - Emotion in Music: Task Overview

Our history

From 2013 to now� Emotion in Music 2013 Brave new task

� Dynamic (overtime) emotion prediction� Static (per whole clip) emotion prediction

Page 9: MediaEval 2015 - Emotion in Music: Task Overview

Our history

From 2013 to now� Emotion in Music 2013 Brave new task

� Dynamic (overtime) emotion prediction� Static (per whole clip) emotion prediction

� Emotion in Music 2014� Dynamic task� Feature design (evaluated on static data)

Page 10: MediaEval 2015 - Emotion in Music: Task Overview

Our history

From 2013 to now� Emotion in Music 2013 Brave new task

� Dynamic (overtime) emotion prediction� Static (per whole clip) emotion prediction

� Emotion in Music 2014� Dynamic task� Feature design (evaluated on static data)

� Emotion in Music 2015� Dynamic task

Page 11: MediaEval 2015 - Emotion in Music: Task Overview

Ground truth. Development set music

� In 2013 and 2014 we annotated on Mechanical Turk 1744excerpts of 45 seconds

� Music from Free Music Archive (freemusicarchive.org)� Licensed under Creative Commons� 10 genres: Rock, Pop, Electronic, Hip-Hop, Classical, Soul

and RnB, Country, Folk, International, Jazz

Page 12: MediaEval 2015 - Emotion in Music: Task Overview

Ground truth

Cleaning of development set data

� The data was cleaned based on inter-annotator agreement� It resulted in a reduction from 1744 to 431 songs� The average Cronbach’s alpha for valence is 0.73 ± 0.12

and 0.76 ± 0.12 for arousal

Page 13: MediaEval 2015 - Emotion in Music: Task Overview

Ground truth. Test set music

� 26 complete songs from Multitrack MedleyDB dataset:http://marl.smusic.nyu.edu/medleydb

� 26 complete songs from Jamendo music website� Automatically and manually checked for emotional variety� The same genres as in the development set� Cronbach’s alpha for valence is 0.29 ± 0.94 and

0.65 ± 0.28 for arousal

Page 14: MediaEval 2015 - Emotion in Music: Task Overview

Ground truth

Such a small test set?

� Duration of the train set: 323 minutes� Duration of the test set: 227 minutes

Page 15: MediaEval 2015 - Emotion in Music: Task Overview

Ground truth - evaluation set

Collecting annotations in 2015

� 5 annotators per song: 2 people from the lab and 3Mechanical Turk Workers

� Preliminary listening round� Mechanical Turk workers were supervised and only

received full money after the quality was confirmed

Page 16: MediaEval 2015 - Emotion in Music: Task Overview

Ground truth. Annotations.

Annotation Interface

Page 17: MediaEval 2015 - Emotion in Music: Task Overview

Baseline features

� Baseline features from openSMILE framework (260low-level features)

� 65 low-level acoustic descriptors, their first order derivates,the mean and standard deviation functionals of each LLDover 1s time windows with 50% overlap.

Page 18: MediaEval 2015 - Emotion in Music: Task Overview

Emotion in Music 2015 obligatory runs

Every submission has to include:� Predictions using baseline features� Custom feature set (if applicable)� Free-style run (if desired)

Page 19: MediaEval 2015 - Emotion in Music: Task Overview

Evaluation

Dynamic subtask evaluationWe use RMSE and Pearson’s correlation coefficient as metrics in thefollowing steps:

1. Calculate RMSE between predictions and ground truth for eachsong separately.

2. Average across songs separately for valence and for arousal.

3. Rank all submissions for each dimension based on the averagedRMSE.

4. In case the difference based on the one sided Wilcoxon test isnot significant (p>0.05), we use rho to break the tie.

5. If the ranking changed, we do significance test betweenneighbouring pairs again.

Page 20: MediaEval 2015 - Emotion in Music: Task Overview

Baseline

There is a baseline for participants to compete with:� Baseline features� Linear Regression

Page 21: MediaEval 2015 - Emotion in Music: Task Overview

Results - Arousal

12 teams crossed the finish line and submitted the papers.

Rank Team ArousalRMSE ρ

1 THUHCSIL 0.23 ± 0.11 0.66 ± 0.252 ICL 0.23 ± 0.11 0.63 ± 0.273 SAILUSC 0.24 ± 0.11 0.65 ± 0.224 HKPOLYU 0.24 ± 0.11 0.56 ± 0.275 PKU-AIPL 0.24 ± 0.10 0.54 ± 0.276 IRIT-SAMOVA 0.24 ± 0.11 0.63 ± 0.227 JKU-Tinnitus 0.25 ± 0.11 0.53 ± 0.238 UNIZA 0.25 ± 0.10 0.49 ± 0.239 NCUTom 0.25 ± 0.12 0.34 ± 0.2510 MIRUtrecht 0.26 ± 0.13 0.40 ± 0.3411 Baseline 0.27 ± 0.11 0.36 ± 0.2612 Average baseline 0.28 ± 0.13 013 UoA 0.39 ± 0.17 0.58 ± 0.24

Page 22: MediaEval 2015 - Emotion in Music: Task Overview

Results - Valence

Rank Team ValenceRMSE ρ

1 JUNLP 0.29 ± 0.14 −0.03 ± 0.022 Average Baseline 0.29 ± 0.15 03 THU-HCSIL 0.31 ± 0.17 0.15 ± 0.474 MIRUtrecht 0.29 ± 0.15 0.08 ± 0.395 PKU-AIPL 0.33 ± 0.18 0.01 ± 0.436 NCUTom 0.34 ± 0.16 0.01 ± 0.347 SAILUSC 0.35 ± 0.18 0.00 ± 0.58 Baseline 0.36 ± 0.18 0.01 ± 0.389 IRIT-SAMOVA 0.36 ± 0.19 0.04 ± 0.4910 UNIZA 0.36 ± 0.17 0.01 ± 0.411 ICL 0.37 ± 0.19 0.02 ± 0.4912 JKU-Tinnitus 0.39 ± 0.19 0.01 ± 0.4113 UoA 0.49 ± 0.24 0.02 ± 0.46

Page 23: MediaEval 2015 - Emotion in Music: Task Overview

Evaluation.

So, what happened to valence?Between arousal and valence in the trainset, rho=0.51 ± 0.65and RMSE=0.24 ± 0.17

Page 24: MediaEval 2015 - Emotion in Music: Task Overview

Evaluation

So, what happened to valence?Between arousal and valence in the testset, rho=0.00 ± 0.59and RMSE=0.40 ± 0.25

Page 25: MediaEval 2015 - Emotion in Music: Task Overview

Evaluation

And what about submissions?And in submissions, the correlation between valence andarousal was even stronger than in the train set:

� THUHCSIL, rho=0.79 ± 0.32 and RMSE=0.11 ± 0.07� ICL, rho=0.99 ± 0.00 and RMSE=0.05 ± 0.01� SAILUSC, rho=0.88 ± 0.18 and RMSE=0.09 ± 0.04

Page 26: MediaEval 2015 - Emotion in Music: Task Overview

Feature sets evaluated on arousal

Rank Team Feature setsRMSE ρ

1 ICL 0.25 0.492 MIRUtrecht 0.25 0.483 HKPOLYU 0.26 0.504 JUNLP-run1 0.26 0.495 UNIZA-run1 0.26 0.516 UNIZA-run2 0.26 0.517 IRIT-SAMOVA 0.26 0.508 JUNLP-run2-arousal 0.27 0.359 THU-HCSIL 0.27 0.41

Page 27: MediaEval 2015 - Emotion in Music: Task Overview

Evaluation. Baseline features

Team’s results using baseline features

Page 28: MediaEval 2015 - Emotion in Music: Task Overview

Acknowledgments

Page 29: MediaEval 2015 - Emotion in Music: Task Overview

Missing teams presentations

� PKU-AIPL� HKPOLYU� NCUTom� SAILUSC

Page 30: MediaEval 2015 - Emotion in Music: Task Overview

PKU-AIPL

Kang Cai, Wanyi Yang, Yao Cheng, Deshun Yang, XiaoouChenInstitute of Computer Science and Technology, PekingUniversity, Beijing, China

� Features: MFCC, edge orientation histograms onspectrograms, low-level spectral features

� Continuous conditional random fields with SVR as baseclassifier

Page 31: MediaEval 2015 - Emotion in Music: Task Overview

HKPOLYU

Yang Liu, Yan Liu, Zhonglei GuHong Kong Baptist University, Hong Kong PolytechnicUniversity, Hong Kong SAR

� Features: 260 baseline features� The main contribution is a supervised feature reduction

technique that takes into account similarity between items� SVR as a classifier

Page 32: MediaEval 2015 - Emotion in Music: Task Overview

MIRUtrecht

Anna Aljanaki, Frans Wiering, Remco C. VeltkampUtrecht University

� Features: Essentia, extracted using bigger frames (severalseconds)

� Gaussian Processes� Based on segmenting audio by emotion� There is a poster!

Page 33: MediaEval 2015 - Emotion in Music: Task Overview

Predicting affect in music using regression methods on low level features

Rahul Gupta, Shrikanth NarayananSignal Analysis and Interpretation LabUniversity of Southern California

Page 34: MediaEval 2015 - Emotion in Music: Task Overview

Approach: Regression methods

Baseline features Regression Smoothing Valence/ Arousal

prediction

1. Linear regression

2. Least squares boosting

Moving average filter

Page 35: MediaEval 2015 - Emotion in Music: Task Overview

Approach: Regression methods

Baseline features Regression Smoothing Valence/ Arousal

prediction

Boosted Ensemble of Single feature Filters

(BESiF)

Gradient boosting based combination of

regression + smoothing

=

Page 36: MediaEval 2015 - Emotion in Music: Task Overview

Approach: Regression methods

Baseline features Regression Smoothing Valence/ Arousal

prediction

Method Valence Arousal

RMSE r RMSE r

Baseline .37 .01 .27 .36

Linear regression + smoothing

.35 .01 .24 .65

Least squares boosting + smoothing

.35 .05 .24 .59

BESiF .37 -.04 .28 .50

Page 37: MediaEval 2015 - Emotion in Music: Task Overview

Approach: Regression methods

Baseline features Regression Smoothing Valence/ Arousal

prediction

Method Valence Arousal

RMSE r RMSE r

Baseline .37 .01 .27 .36

Linear regression + smoothing

.35 .01 .24 .65

Least squares boosting + smoothing

.35 .05 .24 .59

BESiF .37 -.04 .28 .50

Future investigations● Annotation biases due

to longer songs● Differences in features

for valence and arousal prediction

● Generalization of models trained on smaller segments to longer segments

Page 38: MediaEval 2015 - Emotion in Music: Task Overview

MediaEval 2015: Recurrent Neural Network Approach to Emotion in Music Tack

Yu-Hao Chin and Jia-Ching Wang

Department of Computer Science and Information Engineering

National Central University, Taiwan, R.O.C

• This paper adopts deep recurrent neural network to predict the valence and arousal for each moment of a

song, and Limited-Memory-Broyden–Fletcher–Goldfarb–Shanno algorithm is used to update the

weights when doing back-propagation. A 10-fold cross validation is used to evaluate the performance.

• Approach 1: The MIR feature set (see Table 1) is adopted. A RNN model is adopted.

• Approach 2: The baseline feature set is adopted. A RNN model is adopted.

Valence

and

Arousal

Music

Database

Feature

Extraction

Recurrent

Neural

Network

Page 39: MediaEval 2015 - Emotion in Music: Task Overview

Technical retreat

Today between 14:15-15:15. Everyone is welcome!