1
The Audiovisual Emotion Gaussians Model for Automatic
Generation of Music Video
Ju-Chiang Wang, Yi-Hsuan Yang, I-Hong Jhuo, Yen-Yu Lin, Hsin-Min Wang
Academia Sinica, Taiwan
2
Introduction• Generate the music video based on the emotion
content recognized by machine
• The novel Audiovisual Emotion Gaussians(AVEG) framework, learns the tripartie relationship among music, video, and emotion
• Project music pieces and video sequences into the multi-dimensional emotion space (3DES), and perform the cross-modal matching via the predicted emotion distributions
3
System Diagram
• Utilize the DEAP dataset (valence, activation, and potency)
– 3D Emotion annotated music videos• Extend the AEG model to handle video (VEG)
– Wang et al. (2012), “The Acoustic Emotion Gaussians model for emotion-based music annotation and retrieval,” Proc. ACM MM (full paper)
4
Preliminary Result• Perform the cross-modal retrieval experiment on
the 120 music and video clips of DEAP• Evaluate the NDCG@P for the ranking
• Measure the average Top 1 Relevance Score
Scenario P=5 P=10 P=15 P=20
Audio to Video Ranking 0.8748 0.8316 0.8221 0.8172
Video to Audio Ranking 0.8737 0.8204 0.8105 0.8073
Random Permutation 0.8035 0.7604 0.7441 0.7370
Scenario A to V V to A Random
Average Relevance 0.4881 0.4826 0.3837