acm multimedia 2012 grand challenge: music video generation

4

Click here to load reader

Upload: ju-chiang-wang

Post on 03-Jul-2015

77 views

Category:

Documents


0 download

DESCRIPTION

These slides present a novel content-based system that utilizes the perceived emotion of multimedia content as a bridge to connect music and video. Specifically, we propose a novel machine learning framework, called Acousticvisual Emotion Gaussians (AVEG), to jointly learn the tripartite relationship among music, video, and emotion from an emotion-annotated corpus of music videos. For a music piece (or a video sequence), the AVEG model is applied to predict its emotion distribution in a stochastic emotion space from the corresponding low-level acoustic (resp. visual) features. Finally, music and video are matched by measuring the similarity between the two corresponding emotion distributions, based on a distance measure such as KL divergence.

TRANSCRIPT

Page 1: ACM Multimedia 2012 Grand Challenge: Music Video Generation

1

The Audiovisual Emotion Gaussians Model for Automatic

Generation of Music Video

Ju-Chiang Wang, Yi-Hsuan Yang, I-Hong Jhuo, Yen-Yu Lin, Hsin-Min Wang

Academia Sinica, Taiwan

Page 2: ACM Multimedia 2012 Grand Challenge: Music Video Generation

2

Introduction• Generate the music video based on the emotion

content recognized by machine

• The novel Audiovisual Emotion Gaussians(AVEG) framework, learns the tripartie relationship among music, video, and emotion

• Project music pieces and video sequences into the multi-dimensional emotion space (3DES), and perform the cross-modal matching via the predicted emotion distributions

Page 3: ACM Multimedia 2012 Grand Challenge: Music Video Generation

3

System Diagram

• Utilize the DEAP dataset (valence, activation, and potency)

– 3D Emotion annotated music videos• Extend the AEG model to handle video (VEG)

– Wang et al. (2012), “The Acoustic Emotion Gaussians model for emotion-based music annotation and retrieval,” Proc. ACM MM (full paper)

Page 4: ACM Multimedia 2012 Grand Challenge: Music Video Generation

4

Preliminary Result• Perform the cross-modal retrieval experiment on

the 120 music and video clips of DEAP• Evaluate the NDCG@P for the ranking

• Measure the average Top 1 Relevance Score

Scenario P=5 P=10 P=15 P=20

Audio to Video Ranking 0.8748 0.8316 0.8221 0.8172

Video to Audio Ranking 0.8737 0.8204 0.8105 0.8073

Random Permutation 0.8035 0.7604 0.7441 0.7370

Scenario A to V V to A Random

Average Relevance 0.4881 0.4826 0.3837