Transcript
Page 1: ACM Multimedia 2012 Grand Challenge: Music Video Generation

1

The Audiovisual Emotion Gaussians Model for Automatic

Generation of Music Video

Ju-Chiang Wang, Yi-Hsuan Yang, I-Hong Jhuo, Yen-Yu Lin, Hsin-Min Wang

Academia Sinica, Taiwan

Page 2: ACM Multimedia 2012 Grand Challenge: Music Video Generation

2

Introduction• Generate the music video based on the emotion

content recognized by machine

• The novel Audiovisual Emotion Gaussians(AVEG) framework, learns the tripartie relationship among music, video, and emotion

• Project music pieces and video sequences into the multi-dimensional emotion space (3DES), and perform the cross-modal matching via the predicted emotion distributions

Page 3: ACM Multimedia 2012 Grand Challenge: Music Video Generation

3

System Diagram

• Utilize the DEAP dataset (valence, activation, and potency)

– 3D Emotion annotated music videos• Extend the AEG model to handle video (VEG)

– Wang et al. (2012), “The Acoustic Emotion Gaussians model for emotion-based music annotation and retrieval,” Proc. ACM MM (full paper)

Page 4: ACM Multimedia 2012 Grand Challenge: Music Video Generation

4

Preliminary Result• Perform the cross-modal retrieval experiment on

the 120 music and video clips of DEAP• Evaluate the NDCG@P for the ranking

• Measure the average Top 1 Relevance Score

Scenario P=5 P=10 P=15 P=20

Audio to Video Ranking 0.8748 0.8316 0.8221 0.8172

Video to Audio Ranking 0.8737 0.8204 0.8105 0.8073

Random Permutation 0.8035 0.7604 0.7441 0.7370

Scenario A to V V to A Random

Average Relevance 0.4881 0.4826 0.3837


Top Related