Transcript
Page 1: Chia-Hao Chung and Homer Chen - ualberta.cavzhao/temp/Papers/MMSP15_058.pdf · Chia-Hao Chung and Homer Chen National Taiwan University Emails: {b99505003, homer}@ntu.edu.tw ABSTRACT

978-1-4673-7478-1/15/$31.00 © 2015 IEEE

VECTOR REPRESENTATION OF EMOTION FLOW FOR POPULAR MUSIC

Chia-Hao Chung and Homer Chen

National Taiwan University

Emails: {b99505003, homer}@ntu.edu.tw

ABSTRACT

The flow of emotion expressed by music through time is a

useful feature for music information indexing and

retrieval. In this paper, we propose a novel vector representation of emotion flow for popular music. It

exploits the repetitive verse-chorus structure of popular

music and connects a verse (represented by a point) and

its corresponding chorus (another point) in the valence-

arousal emotion plane. The proposed vector representation

visually gives users a snapshot of the emotion flow of a

popular song in an intuitive and instant manner, more

effective than the point and curve representations of music

emotion flow. Because many other genres also have

repetitive music structure, the vector representation has a

wide range of applications.

Index Terms—Affective content, emotion flow,

music emotion representation, music structure.

1. INTRODUCTION

It is commonly agreed that music listening is an appealing

experience for most people because music evokes emotion

in listeners. As emotion conveyed by music is important

to music listening, there is a strong need for effective

extraction and representation of music emotion from the

music organization and retrieval perspective. This paper focuses on music emotion representation.

A typical approach to music emotion representation

condenses the entire emotion flow of a song to a single

emotion. This approach is adopted by most music emotion

recognition (MER) systems [1]–[3]. It works by selecting

a certain segment from the song and mapping the musical

features extracted from the segment to a single emotion.

The emotion representation is either a label, such as

happy, angry, sad, or relaxed, or the coordinates of a point

in, for example, the valence-arousal (VA) emotion plane

[4]. The former is a categorical representation, while the

latter is a dimensional representation [5]. A user can query songs through either form of single-point music emotion

representation, and a music retrieval system responds to

the query with songs that match the emotion specified by

the user [6], [7].

However, the emotion of a music piece varies as it

unrolls in time [8]. This dynamic nature has not been fully

explored for music emotion representation, perhaps

because the emotion flow of music is difficult to qualify

or quantify in data collecting and model training [1]. The

work that comes close is called music emotion tracking

[9]–[12], which generates a sequence of points at regular interval to form an affect curve in the emotion plane [13].

Four examples are shown in Fig. 1, where each curve is

generated by dividing a full song into 30-second segments

with 10-second hop size and by predicting the VA values

of all segments. Each curve depicts the emotion of a song

from the beginning to the end. We can see that the

variation of music emotion can be quite complex and that

a point representation cannot properly capture the

dynamics of music emotion.

The representation of emotion flow for music should

be easy to visualize, yet sufficiently informative to convey

the dynamics of music emotion. The conventional point representation of music emotion is the simplest one;

however, it does not contain any dynamic information of

music emotion. On the other hand, the affect curve can

fairly show the dynamics of music emotion, but it is too

complex to specify for users. Clearly, simplicity and

informativeness are two competing criteria, and a certain

degree of tradeoff between them is necessary in practice.

It has been reported that the emotion expressed by a

music piece has to do with music structure. Schubert et al.

[14] showed that music emotion flow can be attributed to

the changes of music structure. Yang et al. [15] reported that the boundaries between contrasting segments of a

music piece have rapid changes of VA values. Wang et al.

Fig. 1. Affect curves of four songs in the VA plane where diamonds indicate the beginning and circles indicate the end of the songs. The black curve is Smells Like Teen Spirit by Nirvana. The blue curve is Are We the Waiting by Green Day. The green curve is Dying in the sun by The Cranberries. The

red curve is Barriers by Aereogramme.

Page 2: Chia-Hao Chung and Homer Chen - ualberta.cavzhao/temp/Papers/MMSP15_058.pdf · Chia-Hao Chung and Homer Chen National Taiwan University Emails: {b99505003, homer}@ntu.edu.tw ABSTRACT

[16] showed that exploiting the music structure of popular

music for segment selection improves the performance of

an MER system. For popular music, the music structure

usually consists of a number of repetitive musical sections

[17]. Each musical section refers to a song segment that

has its own musical role such as verse or chorus. As

shown in Fig. 2, popular music typically has repetitive

verse-chorus structure and its emotion flow changes

significantly during the transition between verse and

chorus sections.

The burgeoning evidence of the strong relation between music structure and emotion flow motivates us to

develop an effective representation of emotion flow for

music retrieval. The proposed emotion flow representation

of a song is a vector in the VA emotion plane, pointing

from the emotion of a verse to the emotion of its

corresponding chorus. This representation is simple and

intuitive, which is made possible by exploiting the

repetitive property of music structure of popular music.

We focus on popular music in this paper because it has

perhaps the largest user base on a daily basis and because

its structure is normally within a finite set of well-known patterns [18]–[22].

In summary, the primary contributions of this paper

include:

A study on the music structure of popular music,

such as pop, R&B, and rock songs, is conducted to

demonstrate the repetitive property of the music

structure of popular music (Section 2).

A novel vector representation of emotion flow for

popular music is proposed. A comprehensive

comparison of the proposed vector representation

with the point and curve representations is presented (Section 3 and 4).

A performance study is conducted to demonstrate

the accuracy and effectiveness of the vector

representation in capturing the emotion flow of a

song (Section 5).

2. MUSIC STRUCTURE OF POPULAR MUSIC

Music is an art form of organized sounds. A popular song

can be divided into a number of musical sections, such as

introduction (intro), verse, chorus, bridge, instrumental

solo, and ending (outro) [18]. Such sections are structured (maybe repeatedly) in a particular pattern referred to as

musical form. Recovering the musical form is called

music structure analysis and can be considered a segmentation process that detects the temporal position

and duration of each segment [19]. Here, we briefly

review the common musical sections and their musical

roles.

Intro and outro indicate the beginning and the ending

sections, respectively, of a song and usually only contain

instrumental sounds without singing voice and lyrics.

However, not every song has intro or outro. For example,

composers may place a verse or a chorus in the beginning

or at the end of a song to make the song sound special.

The sections corresponding to verse or chorus normally

express a flow of emotion as the music unfolds. The verse usually has low energy, and it is the place where the story

of the song is narrated. Compared to verse, chorus is

emotive and leaves significant impression on listeners

[20]. Other structural elements, such as bridge and

instrumental solo, are optional and function as transitional

sections to avoid monotonous composition and to make

the song colorful. Bridge means a transition between other

types of sections, and instrumental solo is predominantly

the special transitional section of instrumental sounds.

To investigate music structure, we conduct an analysis

of NTUMIR-60, which is a dataset consisting of 60 English popular songs [23]. Because the state-of-the-art

automatic music structure analysis is not as accurate as

expected [19], [21], we perform the analysis manually.

The results are shown in Table 1. We can see that verse

and chorus indeed make a large portion of a song and on

the average appear 3.13 and 2.37 times, respectively, per

song. This is consistent with the findings by musicologists

that verse and chorus is a widely used musical form (aka

the verse-chorus form) for song writers of popular music

[20]. This also suggests that verse and chorus are the most

memorable sections of a song [22] and represent the main

affection of the song. The corresponding emotion flow gives listeners an affective sensation.

3. MUSIC EMOTION REPRESENTATION

In either the categorical or the dimensional approach, the

typical representation of music emotion represents the

affective content of a song by a single emotion. The

categorical approach describes emotion using a finite

number of discrete affective terms [24], [25], whereas the

dimensional approach defines emotion in a continuous

space, such as the VA plane [26], [27]. In this section, we first review the point and curve representation in the

dimensional approach and then present the vector

representation in detail.

Table. 1. Music structure statistics of the 60 English popular songs of the NTUMIR-60 dataset.

Intro Verse Chorus Others Outro

Times per song

0.93 3.13 2.37 1.28 0.48

Proportion to song

0.09 0.44 0.29 0.11 0.07

Fig. 2. (a) Music structure of Smells Like Teen Spirit by Nirvana. (b) The arousal values and (c) the valence values of all 30-

second segments of the song.

(a)

(c)

(b)

Verse Verse VerseChorus Chorus Chorus

Page 3: Chia-Hao Chung and Homer Chen - ualberta.cavzhao/temp/Papers/MMSP15_058.pdf · Chia-Hao Chung and Homer Chen National Taiwan University Emails: {b99505003, homer}@ntu.edu.tw ABSTRACT

3.1. Point and curve representation

In the dimensional approach, the VA values of a music

segment can be predicted from the extracted features of

the music segment through a regression formulation of the

MER problem [26]. The emotion of the music segment is

represented by a point in the VA plane. Given the entire

music segments of a song, one may select one of them to

represent the whole song. This gives rise to the single

point representation of music emotion in the VA plane, and a user only has to specify the coordinates of the point

in the VA plane to retrieve the corresponding song.

Although this method provides an intuitive way for music

retrieval, as discussed in Section 1, it is impossible to

represent the emotion flow of whole song by a single

point in the VA plane. In addition, which music segment

really represents the entire song is difficult to determine

automatically by computer.

By dividing a song into a number of segments and

predicting the VA values of each music segment [11], [12],

the collection of VA points forms an affect curve of the song in the VA plane. One may also represent valence and

arousal of the song separately, each as a function of time.

Although such affect curves can indeed show the emotion

flow of a song, the representation is too complex to be

adopted in a music retrieval system, because most users

are unable to precisely specify the affect curve of a song

even if it is a familiar one. In addition, how to measure the

similarity (or distance) between two affect curves with

different lengths is an open issue. Therefore, a simple

approach is desirable.

3.2. Vector representation

By exploiting the repetitive property of music structure of

popular music, we can represent the characteristic of

emotion flow in a much simpler way than the affect curve

representation. As discussed in Section 2, the verse-chorus

form is a common music structure of popular music and

has a strong relation to the emotion flow of a song.

Therefore, we leverage it to construct the emotion flow

representation of a song. The resulting representation is a

vector pointing from a verse to its corresponding chorus in

the VA emotion plane, as illustrated in Fig. 3.

Besides the positional information of the verses and

choruses in the VA plane, the vector representation

indicates the direction and strength of the emotion flow of

a song. Therefore, the vector representation is more

informative than the point representation. Since the two

terminals of a vector represent the emotions of a verse and

its corresponding chorus of a song, this representation is

more intuitive and simpler to use than the affect curve,

which does not explicitly present the structural information of a song.

Indeed, the vector representation does express the

main emotion flow of a song characterized in the verse-

chorus form. Table 2 shows a qualitative comparison of

the point representation, the affect curve representation,

and the proposed vector representation. We can see that

the vector representation of emotion flow is novel, simple,

and intuitive. Users can easily search songs by specifying

a vector in the VA plane as the query, and a music

retrieval system can quickly respond to the query

according to the proximity of a candidate song to the vector. In practice, a set of candidate songs can be

generated and ordered according to the proximity when

presented to the user. With this representation of music

emotion flow, many innovative music retrieval

mechanisms can be developed to match the needs of a

specific application.

Although we focus on popular music in this paper, the

repetitive property of music structure can also be found in

other genres, such as the sonata form and the rondo form

of classical music [28]. The vector representation is good

for the visualization of the emotion flow of such music as well.

4. IMPLEMENTATION

The MER system described in [26] serves as the platform

to generate the VA values of musical sections (segments).

The MER system consists of two main steps, as shown in

Fig. 4. The first step performs regression model training,

and the second step takes musical sections as inputs and

generates their VA values. The details of regression model

training and vector representation generation are described

in this section.

Fig. 3. Illustration of the vector representation of music emotion flow. The two terminals of the vector represent a verse and its

corresponding chorus in the VA plane.

Table. 2. A comparison of the point, the curve, and the proposed vector representations.

Point Curve Vector

Locational information

X* X X

Dynamic

information X X

Structural information

X

Complexity Low High Medium

* A checked box means yes.

Page 4: Chia-Hao Chung and Homer Chen - ualberta.cavzhao/temp/Papers/MMSP15_058.pdf · Chia-Hao Chung and Homer Chen National Taiwan University Emails: {b99505003, homer}@ntu.edu.tw ABSTRACT

4.1. Regression model training

Adopting the dimensional approach for MER, we define the valence and arousal as real values in [–1, 1] and

formulate the prediction of VA values as a regression

problem. Denote the input training data by (xi, yi), where 1

≤ i ≤ N, xi is the feature vector for the ith input data, and yi

is the real value to be predicted for the ith input data. A

regression model (regressor) is trained by minimizing the

mean squared difference between the prediction and the

annotated value [26].

The dataset NTUMIR-60, which is composed of 60

English popular songs, is used for training and testing. For

fair comparison, each song is converted to a uniform format (22,050 Hz, 16 bits, and mono channel PCM WAV)

and normalized to the same volume level. Then, each song

is trimmed to a 30-second segment manually to conduct

the subjective test and the feature extraction. In the

subjective test, each segment is annotated by 40

participants, and the mean of the annotated VA values is

calculated and used as the ground truth of the segment.

Then the MIRToolbox [29] is applied to extract 177

features including the following five types of acoustic

features: two dynamic features (the mean and the standard

deviation of root-mean-squared energy), five rhythmic

features (fluctuation peak, fluctuation centroid, tempo, pulse clarity, and event density), 142 spectral features (the

mean and the standard deviation of centroid, brightness,

spread, skewness, kurtosis, rolloff 85%, rolloff 95%,

entropy, flatness, roughness, irregularity, 20 MFCCs, 20

delta MFCCs, and 20 delta-delta MFCCs), six timbre

features (the mean and the standard deviation of zero

crossing rate, low energy, and spectral flux), and 22 tonal

features (12-bin chromagram concatenated with the mean

and the standard deviation of chromagram peak,

chromagram centroid, key clarity, HCDF, and mode). The quality of NTUMIR-60 for MER is evaluated and reported

in [23].

The regression models of arousal and valence are

trained independently. For accuracy, the support vector

regression (SVR) [30], [31] with radial basis kernel

function is adopted to train the regressors. A grid-search is

applied to find the best kernel parameter γ and the best

penalty parameter C [32], where γ ∈ {10–4, 10–3, 10–2, 10–1}

and C ∈ {1, 101, 102, 103, 104}.

To evaluate the performances of the regressors, ten-

fold cross validation is conducted. The whole dataset is

randomly divided into 10 parts, nine of them for training

and the remaining one for testing. The above process is

repeated 50 times. The average performance in term of the

R-squared value [33] is 0.21 for valence and 0.76 for

arousal. This result is comparable to the one reported in

the previous work [23], [26].

4.2. Generating vector representation

The audio segmentation method proposed in [34] is

applied to segment each song of the NTUMIR-60 dataset.

All verses and choruses are manually selected from the

song based on the result of segmentation, and their VA

values are estimated independently. In our current

implementation the vector representation of the song in the VA plane is generated by connecting the point

representing the average verse with that representing the

average chorus. Fig. 5 shows the resulting vector

representations of all songs of the NTUMIR-60 dataset.

We can see that each vector clearly describes the emotion

flow of a song. For example, a vector in the first quadrant

pointing to the upper right corner indicates that the

corresponding song drives listeners toward a positive and

exciting feeling, whereas a vector in the second quadrant

pointing toward the upper left corner indicates that the

song it represents would drive listeners toward a negative and aggressive mood. We also see that, for most of the

songs, the arousal value of its representative chorus is

higher than that of the corresponding verse. That is, the

emotion vectors usually go upward. This reflects the fact

that the chorus is typically more exciting than its

corresponding verse [20].

5. EVALUATION

An experiment is conducted to evaluate the effectiveness

of the proposed vector representation of music emotion flow in comparison with two ad hoc methods. The

effectiveness of a method is measured in terms of the

approximation error between the method and the emotion

flow of a song. All songs of the NTUMIR-60 dataset are

considered in this experiment.

Fig. 5. The proposed vector representation provides an intuitive visualization of music emotion flow in the VA plane. This chart shows the emotion flows of all the songs in the NTUMIR-60 dataset. Each blue diamond represents the emotion of verses, and each red circle represents the emotion of choruses connected to

the corresponding verses by a line segment.

Fig. 4. Overview of an MER system

Page 5: Chia-Hao Chung and Homer Chen - ualberta.cavzhao/temp/Papers/MMSP15_058.pdf · Chia-Hao Chung and Homer Chen National Taiwan University Emails: {b99505003, homer}@ntu.edu.tw ABSTRACT

As discussed in Section 1, the emotion flow of a song

is difficult for a subject to specify; therefore, we use the

affect curve generated by MER as the ground truth.

Specifically, the affect curve of each song is generated by

dividing the full song into 30-second segments with 10-

second hop size and by predicting the VA values of all

segments. Then, a k-means algorithm [35] is applied to

partition the collection of VA points into two clusters. The

center points of these two clusters are used as reference to

calculate the approximation error of the proposed vector

representation and compare it with that of the two ad hoc methods.

The first ad hoc method randomly selects two 30-

second segments from a song and constructs a vector

representation from them. The second ad hoc method

selects the first segment from the 30th to 60th second of a

song and the second segment from the last 60th to last

30th second of the song. The VA values of the two

selected segments are predicted independently.

Two distance measures are considered: Euclidean

distance and cosine similarity [36]. The former is applied

to compute the difference of two vectors in length, and the latter is applied to compute the angular difference of two

vectors.

The experimental results are shown in Table 3. Note

that the process of randomly selecting two segments from

a song is repeated 100 times and the average results are

presented in the first column of Table 3. Compared with

the two ad hoc methods, the vector representation has the

smallest approximation error in both Euclidean distance

and cosine distance. This shows effectiveness of the

vector representation in capturing the emotion flow of

popular music.

In Fig. 6, the vector representation of the emotion flow of each song is plotted together with the affect curve of

the song and the emotion of each verse and chorus

identified for the song. We can see that most vectors are

located in the repetitive region of the affect curves. The

dangling parts of an affect curve normally correspond to

the intro and outro sections of the song, and hence they

are of no concern. We can also see that the verses are

located on one side of the affect curve of a song while the

choruses are located on the other side. Thus, using the

average verse and average chorus for the vector

representation can effectively characterize the affect curve and the emotion flow.

6. CONCLUSION

In this paper, we have investigated the repetitive property

of music structure and described a novel approach that represents the emotion flow of popular music by a vector

in the VA plane. The vector emerges from a representative

verse of a song and ends at the corresponding chorus. We

have also compared the proposed vector representation

Fig. 6. Most vectors (represented by a diamond-circle pair) generated by our method are in the repetitive region of the affect curves (shown in grey). The hollow diamond represents the emotion of a verse, and the hollow circle represents the

emotion of a chorus of a song.

Table. 3. Results of Euclidean and cosine distances between the ground truth and three different approaches.

Random F30L301 Vector

Euclidean distance

0.10 0.10 0.07

Cosine distance2

0.21 0.20 0.14

1 F30L30 means that the first segment is from the 30th to 60th

second and the second segment is from the last 60th to the last 30th second of a song. 2 Cosine distance is defined as 1 minus cosine similarity.

Page 6: Chia-Hao Chung and Homer Chen - ualberta.cavzhao/temp/Papers/MMSP15_058.pdf · Chia-Hao Chung and Homer Chen National Taiwan University Emails: {b99505003, homer}@ntu.edu.tw ABSTRACT

with point and curve representations of music emotion and

shown that the proposed method is an intuitive and

effective representation of emotion flow for popular music.

This property of our method is supported by experimental

results. This work is motivated by the increasing need for

effective music content representation and analysis in response to the explosive content growth. With the

proposed vector representation, the proximity of emotion

flow between two songs can be easily measured, which is

essential to music retrieval, and many innovative music

retrieval applications can be developed.

REFERENCES

[1] Y.-H. Yang and H. H. Chen, Music Emotion Recognition,

CRC Press, 2011. [2] Y.-H. Yang and H. H. Chen, “Machine recognition of

music emotion: A review,” ACM Trans. Intell. Syst. Technol., vol. 3, no. 3, article 40, 2012.

[3] Y. E. Kim, E. M. Schmidt, R. Migneco, B. G. Morton, P. Richardson, J. Scott, J. A. Speck, and D. Turnbull, “Music

emotion recognition: A state of the art review, ” in Proc. 11th Int. Soc. Music Inform. Retrieval Conf., pp. 255-266, Utrecht, Netherlands, 2010.

[4] J. A. Russell, “A circumplex model of affect,” J. Pers. Soc. Psychol., vol. 39, no. 6, pp. 1161-1178, 1980.

[5] T. Eerola and J. K. Vuoskoski, “A comparison of the discrete and dimensional models of emotion in music,” Psychol. Music, vol. 39, no. 1, pp. 18-49, 2010.

[6] X. Zhu, Y.-Y. Shi, H.-G. Kim, and K.-W. Eom, “An integrated music recommendation system,” IEEE Trans. Consum. Electron., vol. 53, no. 2, pp. 917-925, 2006.

[7] Y.-H. Yang, Y.-C. Lin, H.-T. Cheng, and H. H. Chen, “Mr. Emo: Music retrieval in the emotion plane,” in Proc. ACM Multimedia, pp. 1003-1004, Vancouver, Canada, 2008.

[8] E. Schubert, “Measurement and time series analysis of emotion in music,” Ph.D. dissertation, School of Music &

Music Education, University of New South Wales, Sydney, Australia, 1999.

[9] L. Lu, D. Liu, and H.-J. Zhang, “Automatic mood detection and tracking of music audio signals,” IEEE Trans. Audio, Speech, Language Process., vol. 14, no. 1, pp. 5-18, 2006.

[10] M. D. Korhonen, D. A. Clausi, and M. E. Jernigan, “Modeling emotional content of music using system identification,” IEEE Trans. Syst. Man, Cybern. B, Cybern., vol. 36, no. 3, pp. 588-599, 2006.

[11] R. Panda and R. P. Paiva, “Using support vector machines for automatic mood tracking in audio music,” Audio Engineering Soc. Convention 130, London, UK, 2011.

[12] E. M. Schmidt, D. Turnbull, and Y. E. Kim, “Feature selection for content-based, time-varying musical emotion regression,” in Proc. ACM Int. Conf. Multimedia Inform. Retrieval, pp. 267-274, Philadelphia, USA, 2010.

[13] A. Hanjalic and L.-Q. Xu, “Affective video content

representation and modeling,” IEEE Trans. Multimedia, vol. 7, no. 1, pp. 143-154, 2005.

[14] E. Schubert, S. Ferguson, N. Farrar, D. Taylor, and G. E. McPherson, “Continuous response to music using discrete emotion faces,” in Proc. 9th Int. Symp. Computer Music Modelling and Retrieval, pp. 1-17, London, UK, 2012.

[15] Y.-H. Yang, C.-C. Liu, and H. H. Chen, “Music emotion classification: A fuzzy approach,” in Proc. ACM

Multimedia, pp. 81-84, Santa Barbara, USA, 2006. [16] X. Wang, Y. Wu, X. Chen, and D. Yang, “Enhance popular

music emotion regression by importing structure information,” in Proc. Asia-Pacific Signal and Inform.

Process. Association Annu. Summit and Conf., pp. 1-4, Kaohsiung, Taiwan, 2013.

[17] B. Horner and T. Swiss, Key Terms in Popular Music and Culture, Blackwell Publishing, 1999.

[18] N. C. Maddage, C. Xu, M. S. Kankanhalli, and X. Shao,

“Content-based music structure analysis with applications to music semantics understanding,” in Proc. ACM Multimedia, pp. 112-119, NY, USA, 2004.

[19] J. Paulus, M. Müller, and A. Klapuri, “Audio-based music structure analysis,” in Proc. 11th Int. Soc. Music Inform. Retrieval Conf., pp. 625-636, Utrecht, Netherlands, 2010.

[20] D. Christopher, “Rockin' out: expressive modulation in verse–chorus form,” Music Theory Online, vol. 17, 2011.

[21] J. B. L. Smith, C.-H. Chuan, and E. Chew, “Audio properties of perceived boundaries in music,” IEEE Trans. Multimedia, vol. 16, no. 5, pp. 1219-1228, 2014.

[22] M. Cooper and J. Foote, “Summarizing popular music via structural similarity analysis,” in Proc. IEEE Workshop on Applications of Signal Process. Audio and Acoustic, pp. 127-130, New Paltz, NY, USA, 2003.

[23] Y.-H. Yang, Y.-F. Su, Y.-C. Lin, and H. H. Chen, “Music

emotion recognition: The role of individuality,” in Proc. ACM Int. Workshop on Human-centered Multimedia, pp. 13-21, Augsburg, Bavaria, Germany, 2007.

[24] X. Hu, J. S. Downie, C. Laurier, M. Bay, and A. F. Ehmann, “The 2007 MIREX audio mood classification task: Lessons learned,” in Proc. 9th Int. Conf. Music Inform. Retrieval, pp. 462-467, Philadelphia, USA, 2008.

[25] C. Laurier, J. Grivolla, and P. Herrera, “Multimodal music mood classification using audio and lyrics,” in Proc. IEEE

7th Int. Conf. Machine Learning and Applications, pp. 688-693, San Diego, California, USA, 2008.

[26] Y.-H. Yang, Y.-C. Lin, Y.-F. Su, and H. H. Chen, “A regression approach to music emotion recognition,” IEEE Trans. Audio, Speech, Language Process., vol. 16, no. 2, pp. 448-457, 2008.

[27] E. M. Schmidt and Y. E. Kim, “Projection of acoustic features to continuous valence-arousal mood labels via

regression,” in Proc. 10th Int. Soc. Music Inform. Retrieval Conf., Kobe, Japan, 2009.

[28] M. Hickey, “Assessment rubrics for music composition,” Music Educators Journal, vol. 85, no. 4, pp. 26-33, 1999.

[29] O. Lartillot and P. Toiviainen, “A MATLAB toolbox for musical feature extraction from audio,” in Proc. Int. Conf. Digital Audio Effects, pp. 237-244, Bordeaux, France, 2007.

[30] A. J. Smola and B. Schölkopf, “A tutorial on support vector

regression,” Stat. Comput., vol. 4, no. 3, pp. 199-222, 2004. [31] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support

vector machines,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, article 27, 2011.

[32] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, “A practical guide to support vector classification,” Technical report, National Taiwan University, 2010 [Online]. Available at: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf.

[33] A. Sen and M. S. Srivastava, Regression Analysis: Theory, Methods, and Applications, Springer Science & Business Media, 1990.

[34] J. Foote and M. Cooper, “Media segmentation using self-similarity decomposition,” in Proc. SPIE Storage and Retrieval for Multimedia Databases, vol. 5021, pp. 167-175, 2003.

[35] S. P. Lloyd, “Least squares quantization in PCM,” IEEE Trans. Inform. Theory, vol. 28, no. 2, pp. 129-137, 1982.

[36] L. Lee, “Measures of distributional similarity,” in Proc. 37th Annu. Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 25-32, PA, USA, 1999.


Top Related