mediaeval 2016 - emotion in music task: lessons learned

Post on 09-Jan-2017

53 Views

Category:

Science

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Emotion in Music Task: Lessons Learned

Anna Aljanaki1 Yi-Hsuan Yang2

Mohammad Soleymani1

1University of Geneva, Switzerland2Academia Sinica, Taiwan

20-21 October, MediaEval 2016

Emotion in Music Task

I 2013 — Emotion in Music Brave New Task.I Organized by M. Soleymani, M.N. Caro, E.M. Schmidt and

Y.-H. YangI 2 subtasks - dynamic (per-second) music emotion

recognition and song-level emotion recognitionI 3 participating teams

Emotion in Music Task

I Focused on audio analysis (optionally, metadata)I Most attention was paid to recognizing how emotion

changes over timeI Used valence/arousal model

Valence/Arousal model

Dynamic emotion tracking (over duration of a piece)

Emotion in Music Task

I 2013 — Emotion in Music Brave New Task.I Organized by M. Soleymani, M.N. Caro, E.M. Schmidt and

Y.-H. YangI 2 tasks - dynamic (per-second) music emotion recognition

and song-level emotion recognitionI 3 participating teams

I 2014 — Emotion in Music Task, Second EditionI Organized by A. Aljanaki, Y.-H. Yang, M. SoleymaniI 2 tasks - dynamic (per-second) music emotion recognition

and feature designI 7 participating teams

Emotion in Music Task

I 2013 — Emotion in Music Brave New Task.I Organized by M. Soleymani, M.N. Caro, E.M. Schmidt and

Y.-H. YangI 2 tasks - dynamic (per-second) music emotion recognition

and song-level emotion recognitionI 3 participating teams

I 2014 — Emotion in Music Task, Second EditionI Organized by A. Aljanaki, Y.-H. Yang, M. SoleymaniI 2 tasks - dynamic (per-second) music emotion recognition

and feature designI 7 participating teams

I 2015 — Emotion in Music Task, Third Edition.I Organized by A. Aljanaki, Y.-H. Yang, M. SoleymaniI 1 task - dynamic (per-second) music emotion recognition,

three submissions - features, prediction on baselinefeatures, prediction on custom features.

I 11 participating teams

Quality of the annotations

Year 2013 2014 2015Total length 9h 18min 12h 30min 3h 46minCronbach’s α for arousal .28 ± 0.28 .31 ± 0.30 .66 ± 0.26GAM’s R2 for arousal .13 ± 0.10 .14 ± 0.11 .44 ± 0.19Cronbach’s α for valence .28 ± 0.29 .20 ± 0.24 .51 ± 0.35GAM’s R2 for valence .13 ± 0.10 .10 ± 0.08 .37 ± 0.21

Quality of the annotations

Year 2013 2014 2015Total length 9h 18min 12h 30min 3h 46minCronbach’s α for arousal .28 ± 0.28 .31 ± 0.30 .66 ± 0.26GAM’s R2 for arousal .13 ± 0.10 .14 ± 0.11 .44 ± 0.19Cronbach’s α for valence .28 ± 0.29 .20 ± 0.24 .51 ± 0.35GAM’s R2 for valence .13 ± 0.10 .10 ± 0.08 .37 ± 0.21

I 2013 & 2014 – 45 second excerpts. 2015 – full songs.I 2013 & 2014 – Amazon Mechanical Turk Workers. 2015 –

Both lab and AMT workers.I 2015 – introduced preliminary listening.

Quality of the annotations - Arousal

Quality of the annotations - Valence

Continuous annotation interface

Continuous annotation problems

I Absolute scaleI Reaction timeI Scaling (’zoom’ levels)

Continuous annotation problems

Absolute scale ratings

Continuous annotation problems

We tried to scale each annotation to the dynamic mean of thesong: aj,i = aj,i + (Aj − A)

Continuous annotation problems

There is a reaction time in the annotations. Before listeners cangive judgements on the emotional content of music, they needto listen to it for some time.

Continuous annotation problems

There is a scaling problem – the unit of emotional expressioncan be structural section, or phrase, or a single note.

Best solutions

Method ρ RMSE2013, BLSTM-RNN .31 ± .37 .08 ± .052014, LSTM .35 ± .45 .10 ± .052015, BLSTM-RNN .66 ± .25 .12 ± .06

Table: Winning algorithms on arousal, ordered by Spearman’s ρ.BLSTM-RNN – Bi-directional Long-Short Term Memory RecurrentNeural Networks.

Method ρ RMSE2013, BLSTM-RNN .19 ± .43 .08 ± .042014, LSTM .20 ± .49 .08 ± .052015, BLSTM-RNN .17 ± .09 .12 ± .54

Table: Winning algorithms on valence, ordered by Spearman’s ρ.

Possible solutions and modifications

I Change the task from emotion tracking to dynamicstracking (diminuendo, crescendo, rallentando)

Possible solutions and modifications

I Change the task from emotion tracking to dynamicstracking (diminuendo, crescendo, rallentando)

I Change the data collection interface

Categorical interface

Possible solutions and modifications

I Change the task from emotion tracking to dynamicstracking (diminuendo, crescendo, rallentando)

I Change the data collection interfaceI Finding the practical task where continuous tracking is

necessary.I Retrieval by an emotional trajectoryI ThumbnailingI Emotion prediction from physiological signals and audio

Acknowledgements

We thank Erik M. Schmidt, Mike N. Caro, Cheng-Ya Sha,Alexander Lansky, Sung-Yen Liu and Eduardo Countinho fortheir contributions to task developments, and anonymousTurkers for their work.

top related