multi modal music mood classification

51
Multi-modal Music Mood Classification Using Audio, Lyrics and Social Tags Xiao Hu National Institute of Informatics July 5, 2011

Upload: xiaohusmile

Post on 29-Nov-2014

1.841 views

Category:

Entertainment & Humor


3 download

DESCRIPTION

Research Seminar at National Institute of Informatics, Japan

TRANSCRIPT

  • 1. Multi-modal Music MoodClassification Using Audio, Lyrics and Social Tags Xiao Hu National Institute of Informatics July 5, 2011
  • 2. Outline Multimodal Music Mood Classification Research questions Methodology Findings and contributions Future Research 2
  • 3. Music Mood ClassificationExercise: What do you feel about Here comes the sun,How people categorize music mood? here comes the sun, and I say its all right Little darling, its been a How well can computer do it? long cold lonely winter Little darling, it feels like years since its been here Here comes the sun, here comes the sun, . 3
  • 4. Why Mood 4
  • 5. State-of-the-Art Mood categories directly adopted from music psychological models Lack for social context of music listening (Juslin & Laukka, 2004) Can social tags help? Evaluation datasets are small Low consistency cross assessors (Skowronek et al., 2006 Hu et al., 2008) Suboptimal performances of automatic music mood classification systems Mostly audio-based Can lyrics help? 5
  • 6. Research Questions Q1: Can social tags help develop mood taxonomy? Q2: Which lyric features are the most useful for music mood classification? Q3: Are lyrics better than audio in music mood classification? Q4: Can combining lyrics and audio improve the effectiveness of mood classification? Q5: Can combining lyrics and audio improve the efficiency of mood classification? Number of training examples Length of audio dataQ2-5: Improving classification performance by combining lyrics and audio 6
  • 7. Q1: Mood Categories New topic in information science Influential models in music psychology Categorical : Hevner (1936) Dimensional : Russell (1980) often used in previous research on music mood classification 7
  • 8. Russells 2D Model
  • 9. Can Social Tags Help? Last.fm One largest tagging site for Western popular music
  • 10. Social Tags Pros: Users perspectives Large quantity Cons: Noisy: I aaaaam lovin it Linguistic Resources: WordNet-Affect Ambiguous: love Human Expertise: Synonyms: calm, serene 2 music retrieval experts native English speakers Long tail 10
  • 11. 11
  • 12. Distances between Categories Calculated by song co-occurrences Categories associating with the same songs are similar Plotted in 2-D space using Multidimensional Scaling 12
  • 13. Identified Categories 13
  • 14. Research Questions Q1: Can social tags help identify mood categories that are more realistic? Q2: Which lyric features are the most useful for music mood classification? Q3: Are lyrics better than audio in music mood classification? Q4 Can combining lyrics and audio improve the effectiveness of mood classification? Q5: Can combining lyrics and audio improve the efficiency of mood classification? Number of training examples Length of audio data 14
  • 15. What do they feel about
  • 16. Multi-modal Social Tags Mood Categories Ground Truth MUSIC Audio Lyrics Automatic ClassificationQ2-5: Improving classification performance by combining lyrics and audio 16
  • 17. Classification Experiments Evaluation task Binary Classification Evaluation measures and tests Accuracy Friedmans ANOVA Classification algorithm SVM (LIBSVM implementation) 17
  • 18. Ground Truth Dataset Built from social tags Has audio, lyrics and social tags 5,296 unique songs 18 mood categories Equal positive and negative examples 12,980 examples numbers of positive examples in categories 18
  • 19. Baseline System (audio-based) The AMC tasks in MIREX MIREX: Music Information Retrieval Evaluation eXchange AMC: Audio Mood Classification A leading system in AMC 2007 and 2008: Marsyas Music Analysis, Retrieval and Synthesis for Audio Signals; led by Prof. [email protected] Uses audio spectral features 19
  • 20. Lyric-based System Very little existing work Only used basic text features: bag_of_words, part_of_speech Worse than audio-based approaches This research extracted and compared a range of novel lyric features 20
  • 21. Best Lyric Features Basic features: Content words, part-of-speech, function words Psycholinguistic features: Psychological categories in GI (General Inquirer) Scores in ANEW (Affective Norm of English Words) Stylistic features: Punctuation marks; interjection words Statistics: e.g., how many words per minute Combinations: 255 of them! Most comprehensive study on lyric classification so far. 21
  • 22. Lyric Feature Example 22
  • 23. No significant difference between top combinations23
  • 24. 24
  • 25. 25
  • 26. 26
  • 27. Research Questions Q1: Can social tags help identify mood categories that are more realistic? Q2: Which lyric features are the most useful for music mood classification? Q3: Are lyrics better than audio in music mood classification? Q4 Can combining lyrics and audio improve the effectiveness of mood classification? Q5: Can combining lyrics and audio improve the efficiency of mood classification? Number of training examples Length of audio data 27
  • 28. Combine Lyrics and Audio Two hybrid methods: Late fusion Lyric Classifier Prediction Final Prediction Prediction Audio Classifier Feature concatenation Classifier Prediction 28
  • 29. System PerformancesAudio + Lyrics Lyrics Audio 30
  • 30. Effectiveness 31
  • 31. Research Questions Q1: Can social tags help identify mood categories that are more realistic? Q2: Which lyric features are the most useful for music mood classification? Q3: Are lyrics better than audio in music mood classification? Q4 Can combining lyrics and audio improve the effectiveness of mood classification? Q5: Can combining lyrics and audio improve the efficiency of mood classification? Number of training examples Length of audio data 33
  • 32. Automatic Classification (supervised learning) Classifier for Happy Here comes the sun Y Y I will be back N Down with the N sickness N Song A Y Song B N N Training examplesfor Happy New examples 34
  • 33. Learning Curves
  • 34. ConclusionsQ1: Can social tags help identify mood categories that are more realistic?Q2: The most useful lyric Combination of words, linguistic features are: features and text stylistic featuresQ3: Are lyrics better than audio in music mood classification ?Q4: Can combining lyrics and audio improve the effectiveness of mood classification?Q5: Can combining lyrics and audio improve the efficiency of mood classification? 36
  • 35. What does computer feel about
  • 36. ContributionsMethodology Mood categories identified from social tags complement psychological models Established an example of using empirical data to refine/adapt theoretical models Improved lyric affect analysis and multi-modal mood classificationEvaluation Proposed efficient method in building ground truth datasets Largest dataset with ternary information sources to date made available to MIR community via MIREX 2009 http://www.music-ir.org/mirex/2009/index.php/Audio_Tag_ClassificationApplication Provided practical reference for MIR systems Moodydb.com 38
  • 37. Application 39
  • 38. Feature Analysis 40
  • 39. Audio vs. Lyrics 41
  • 40. Top Lyric Features 42
  • 41. Top Lyric Features in Calm 43
  • 42. Top Affective Words vs. 44
  • 43. Future Research Directions 45
  • 44. Affect Analysis for Information Studies Affect is an important factor in information behavior and information access NLP techniques have been applied to attitude, sentiment and opinion analysis I am interested in its applications on human cognition and learning English and Chinese; Text and Music Paper accepted to ISMIR Exploring the Relationship Between Mood and Creativityin Rock Lyrics 46
  • 45. Future Research Directions Multimedia, multimodal: audio-visual-textual 47
  • 46. Summary Multimodal Music Mood Classification Combining lyrics and audio helps improve effectiveness efficiency Contributions Feature analysis Future Research Affect factor in informatics Multimodal, multimedia (Photo mining seminar on Thursday! Prof. Winston Hsu from Taiwan) 48
  • 47. 49
  • 48. References Hu, X. and Downie, J. S. (2010) When Lyrics Outperform Audio for Music Mood Classification: A Feature Analysis, In Proceedings of the 10th International Conference on Music Information Retrieval (ISMIR), Aug. 2010, Utrecht, Netherland. Hu, X. and Downie, J. S. (2010) Improving Mood Classification in Music Digital Libraries by Combining Lyrics and Audio, In Proceedings of the Joint Conference on Digital Libraries2010, (JCDL), June 2010, Surfers Paradise, Australia. (Best Student Paper Award). Hu, X. (2010) Music and Mood: Where Theory and Reality Meet, In the Proceedings of the 5th iConference, University of Illinois at Urbana-Champaign, Feb. 2010, Champaign, IL (Best Student Paper Award). Hu, X. Downie, J. S. and Ehmann, A.(2009) Lyric Text Mining in Music Mood Classification, ISMIR 09. Hu, X. (2009) Combining Text and Audio for Music Mood Classification in Music Digital Libraries, IEEE Bulletin of Technical Committee on Digital Libraries (TCDL), 5(3) Hu, X. (2010) Multi-modal Music Mood Classification, presented in the Jean Tague- Sutcliffe Doctoral Research Poster session at the ALISE Annual Conference, Jan. 2010, Boston, MA. (3rd Place Award). Hu, X. (2009) Categorizing Music Mood in Social Context, In Proceedings of the Annual Meeting of ASIS&T (CD-ROM), Nov. 2009, Vancouver, Canada. 50
  • 49. References (2) Hu, X., Downie, J. S., Laurier, C., Bay, M. and Ehmann, A. (2008a). The 2007 MIREX Audio Music Classification task: lessons learned, In Proceedings of the 9th International Conference on Music Information Retrieval (ISMIR08). Sept. 2008, Philadelphia, USA. Juslin, P. N. and Laukka, P. (2004). Expression, perception, and induction of musical emotions: a review and a questionnaire study of everyday listening. Journal of New Music Research, 33(3): 217-238. Juslin, P. N. and Sloboda, J. A. (2001). Music and emotion: introduction. In P. N. Juslin and J. A. Sloboda (Eds.), Music and Emotion: Theory and Research. New York: Oxford University Press. Skowronek, J., McKinney, M. F. and van de Par, S. (2006). Ground truth for automatic music mood classification. In Proceedings of the 7th International Conference on Music Information Retrieval (ISMIR06), Oct. 2006, Victoria, Canada. 51