[ieee 2009 international conference on advances in social network analysis and mining (asonam) -...

6
Mining the Dynamics of Music Preferences from a Social Networking Site Nico Schlitter Faculty of Computer Science Otto-v.-Guericke-University Magdeburg Magdeburg, Germany [email protected] Tanja Falkowski Faculty of Computer Science Otto-v.-Guericke-University Magdeburg Magdeburg, Germany [email protected] Abstract—In this paper we present an application of our incremental graph clustering algorithm (DENGRAPH) on a data set obtained from the music community site Last.fm. The aim of our study is to determine the music preferences of people and to observe how the taste in music changes over time. Over a period of 130 weeks, we extract for each interval user profiles of 1,800 users that represent their music listening behavior. By building and incrementally clustering a graph of similar users, we obtain groups of people with similar music preferences. We label these clusters with genres according to the user profiles of the cluster members. Due to the incremental nature of DENGRAPH we show how clusters evolve over time. Besides the growth and decrease of clusters we observe how new clusters emerge and old clusters die. Furthermore, we show the merge and split of clusters. The results of our experiments indicate that DENGRAPH is particularly useful to efficiently detect groups of similar users and to track them over time. Keywords-Social Networks; Network Dynamics; Community Analysis; Music Preferences I. I NTRODUCTION Music is a constant companion for most people. In 2000, Eurostat, the statistical office of the European Communities, carried out a survey on Europeans’ participation in cultural activities, one part of which concerned music. The results show, for example, that over 60% of Europeans listen to music every day [1]. Music is known to induce emotions such as ’happiness’, ’sadness’ and ’fear’ [2] and studies indicate that the meaning of music in the lives of people changes over time. Younger people tend to link music preferences to a variety of psychological characteristics such as personality, personal qualities and values. Since, individuals have clearly defined stereotypes about the fans of various music genres, music is used to convey information about oneself to observers [3]. On the other hand, for older people, music often contributes to positive aging by providing ways to maintain positive self- esteem, feel independent and avoid feelings of isolation or loneliness [4]. Based on this observation, the hypothesis is that the music preferences of people change as well over their lifetime. Changes in music preferences are often triggered by the environment a person lives in, such as media or friends. Many people appreciate of being introduced to works that they do not know of yet. This is a reason why recommendation systems for music that use e.g. collaborative filtering techniques recently became more and more popular (see, e.g., [5]). Collaborative filtering is a procedure that detects the preferences of users for items such as books, DVDs or music. These preferences are often represented in user profiles. The profiles can be collected either explicitly or implicitly. In the first case, a user is explicitly asked to rate items that he has purchased or that he likes. An implicit profile is based on observations that have been made based on interaction data (such as purchases). In this study we build for each user in different points in time, implicit profiles which represent the current music listening preferences. In this paper, we apply DENGRAPH, a clustering algorithm we presented in [6], to study the temporal dynamics of the music preferences of 1,800 individuals. We use data obtained from the social networking site Last.fm, which provides lists of the most listened artists for each week over the lifetime of a user. Based on this information we build user profiles by extracting the genres of the most listened artists. The artist’s genre is determined by the tags that the community members use to characterize the artist. The profile of a user is then represented by a vector of the weighted tags of his favorite artists. We represent users as nodes in a graph and connect them with an edge, if their profile similarity reaches a predefined threshold. Due to the incremental nature of DENGRAPH, temporal changes invoke an clustering update. By this we are able to observe and analyze when and how the taste in music of an individual user changes over time and how his membership to a certain community changes accordingly. The results of our study indicate that DENGRAPH is capable to efficiently extract communities of users with a similar taste in music and that it furthermore allows us to monitor and analyze the changes in the listening behavior of individuals and the resulting shifts in the membership to a certain community. The rest of the paper is organized as follows. In Section II we present our approach to represent the music preferences of users in user profiles. Section III discusses how clusters of similar users can be detected and observed over time using the density-based clustering algorithm DENGRAPH. The characteristics of the used data set and the results of our experiments are presented in Section IV. We conclude the paper in Section V. 2009 Advances in Social Network Analysis and Mining 978-0-7695-3689-7/09 $25.00 © 2009 IEEE DOI 10.1109/ASONAM.2009.26 243 2009 Advances in Social Network Analysis and Mining 978-0-7695-3689-7/09 $25.00 © 2009 IEEE DOI 10.1109/ASONAM.2009.26 243

Upload: tanja

Post on 14-Mar-2017

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: [IEEE 2009 International Conference on Advances in Social Network Analysis and Mining (ASONAM) - Athens, Greece (2009.07.20-2009.07.22)] 2009 International Conference on Advances in

Mining the Dynamics of Music Preferences from aSocial Networking Site

Nico Schlitter

Faculty of Computer Science

Otto-v.-Guericke-University Magdeburg

Magdeburg, Germany

[email protected]

Tanja Falkowski

Faculty of Computer Science

Otto-v.-Guericke-University Magdeburg

Magdeburg, Germany

[email protected]

Abstract—In this paper we present an application of ourincremental graph clustering algorithm (DENGRAPH) on a dataset obtained from the music community site Last.fm. The aim ofour study is to determine the music preferences of people and toobserve how the taste in music changes over time. Over a periodof 130 weeks, we extract for each interval user profiles of 1,800users that represent their music listening behavior. By buildingand incrementally clustering a graph of similar users, we obtaingroups of people with similar music preferences. We label theseclusters with genres according to the user profiles of the clustermembers. Due to the incremental nature of DENGRAPH we showhow clusters evolve over time. Besides the growth and decrease ofclusters we observe how new clusters emerge and old clusters die.Furthermore, we show the merge and split of clusters. The resultsof our experiments indicate that DENGRAPH is particularlyuseful to efficiently detect groups of similar users and to trackthem over time.

Keywords-Social Networks; Network Dynamics; CommunityAnalysis; Music Preferences

I. INTRODUCTION

Music is a constant companion for most people. In 2000,

Eurostat, the statistical office of the European Communities,

carried out a survey on Europeans’ participation in cultural

activities, one part of which concerned music. The results

show, for example, that over 60% of Europeans listen to musicevery day [1]. Music is known to induce emotions such as

’happiness’, ’sadness’ and ’fear’ [2] and studies indicate that

the meaning of music in the lives of people changes over time.

Younger people tend to link music preferences to a variety

of psychological characteristics such as personality, personal

qualities and values. Since, individuals have clearly defined

stereotypes about the fans of various music genres, music is

used to convey information about oneself to observers [3]. On

the other hand, for older people, music often contributes to

positive aging by providing ways to maintain positive self-

esteem, feel independent and avoid feelings of isolation or

loneliness [4]. Based on this observation, the hypothesis is

that the music preferences of people change as well over their

lifetime.

Changes in music preferences are often triggered by the

environment a person lives in, such as media or friends. Many

people appreciate of being introduced to works that they do not

know of yet. This is a reason why recommendation systems for

music that use e.g. collaborative filtering techniques recentlybecame more and more popular (see, e.g., [5]). Collaborative

filtering is a procedure that detects the preferences of users

for items such as books, DVDs or music. These preferences

are often represented in user profiles. The profiles can becollected either explicitly or implicitly. In the first case, a user

is explicitly asked to rate items that he has purchased or that

he likes. An implicit profile is based on observations that have

been made based on interaction data (such as purchases). In

this study we build for each user in different points in time,

implicit profiles which represent the current music listening

preferences.

In this paper, we apply DENGRAPH, a clustering algorithm

we presented in [6], to study the temporal dynamics of the

music preferences of 1,800 individuals. We use data obtained

from the social networking site Last.fm, which provides listsof the most listened artists for each week over the lifetime

of a user. Based on this information we build user profiles

by extracting the genres of the most listened artists. The

artist’s genre is determined by the tags that the community

members use to characterize the artist. The profile of a user

is then represented by a vector of the weighted tags of his

favorite artists. We represent users as nodes in a graph and

connect them with an edge, if their profile similarity reaches

a predefined threshold. Due to the incremental nature of

DENGRAPH, temporal changes invoke an clustering update.

By this we are able to observe and analyze when and how the

taste in music of an individual user changes over time and how

his membership to a certain community changes accordingly.

The results of our study indicate that DENGRAPH is capable

to efficiently extract communities of users with a similar taste

in music and that it furthermore allows us to monitor and

analyze the changes in the listening behavior of individuals and

the resulting shifts in the membership to a certain community.

The rest of the paper is organized as follows. In Section II

we present our approach to represent the music preferences

of users in user profiles. Section III discusses how clusters

of similar users can be detected and observed over time

using the density-based clustering algorithm DENGRAPH.

The characteristics of the used data set and the results of

our experiments are presented in Section IV. We conclude

the paper in Section V.

2009 Advances in Social Network Analysis and Mining

978-0-7695-3689-7/09 $25.00 © 2009 IEEE

DOI 10.1109/ASONAM.2009.26

243

2009 Advances in Social Network Analysis and Mining

978-0-7695-3689-7/09 $25.00 © 2009 IEEE

DOI 10.1109/ASONAM.2009.26

243

Page 2: [IEEE 2009 International Conference on Advances in Social Network Analysis and Mining (ASONAM) - Athens, Greece (2009.07.20-2009.07.22)] 2009 International Conference on Advances in

II. REPRESENTING MUSIC PREFERENCES

Last.fm1 is a music community with over 20 million activeusers based in more than 200 countries. After a user signs

up, a plugin is installed and all tracks a user listens to are

submitted to a database. To study the temporal evolution

of music listening behaviors, we use this information and

represent a user’s taste in music in form of user profiles. This

process is described in the following section. Furthermore, to

find groups of similar users, we discuss how a graph of similar

users is constructed.

A. Building User Profiles

In affiliation networks, users are connected based on a

certain property that they share. This property can be for

example a similar taste in music. A community is then defined

as a group of people who share similar music preferences. We

define a user by a profile that describes his music preferences

and determine communities of users by clustering similar

profiles.

Genre labels are often used as ground-truth for describing

song or artist similarity in music information retrieval (MIR)

systems. A genre is a rather high level label for an artist

and depends on a variety of aspects such as the rhythm,

melody or used instruments. Genres are usually predefined by

producers or the artist and the resulting classes are highly over-

lapping. Developing automatic genre classification systems is

a controversial (see, e.g., [7]) but still popular topic in music

information retrieval and remains a challenging task (cf., e.g,

[8], [9], [10]).

In contrast to using a single label, McKay and Fujinaga

[11] suggest to use a ranked list of multiple genres to label

one recording. Similar to this suggestions, Geleijnse et al.

[12] propose to use tags that result from community Websites

where users collectively attribute artists with labels. Contrary

to expert-defined genre classification, a community-based tag-

ging results in a ranked list of terms that is richer than a

single genre. Experiments have shown that the Last.fm tagsare descriptive and consistent with respect to their similarity

and that they present a valuable characterization of artists and

their music [12].

For each artist, Last.fm provides the tags that are used byusers to describe the artist. For example, the band ’ABBA’

has 41 tags. The most often used tags are ’pop’, ’disco’,

’swedish’ and ’70s’. The singer ’Amy Winehouse’ has 34 tags,

the most often used tags are ’soul’, ’jazz’, ’female vocalists’

and ’british’. We use the most often assigned genre tags for

artists, merge tags that are semantically identical (such as ’hip

hop’ and ’hip-hop’) and obtain a set G of 222 genre tags gi.The provided tags gi are used to describe the artists in a

genre vector as follows:

Definition 1: A = {�a1, . . . ,�an} is a set of artists. An artistai is defined as a genre vector �ai = (g1, g2, . . . , gk) of thek-most used genre tags gi ∈ G = {g1, g2, . . . , gk}, where Gis the set of genres.

1http://www.last.fm/

For each user p we obtain from Last.fm for each interval ta list of the most listened to artists including the number of

times the respective artist was heard by the user (playcount).The user profile in interval t is defined as follows:

Definition 2: For each user p in interval t, a user profile isdefined as

�pt =m∑

i=1

�ai · playcounti, (1)

where m is the number of artists listened to in the interval t.Thus, each genre element is weighted by the number of times

played. The weighted vectors are added and the profile is a

vector representing the user’s music preferences.

B. Building a Graph of Similar Users

To find groups of users with similar music preferences, we

first build a similarity graph. An edge in the similarity graph

indicates that two users are similar to a certain extend. We

determine the similarity of two profiles by using the cosinesimilarity. The cosine of the angle θ between two vectors a, bcan be interpreted as a measure of similarity and is calculated

as follows: cosθ = (a · b)/(|a||b|). A cosine value of zeroindicates that the two profile vectors are orthogonal, a value

of one indicates an exact match. The similarity between the

user p1 and p2 is then defined as follows:Definition 3: The similarity between two users p1 and p2

is defined as

sim(p1, p2) =∑m

i=1 p1i · p2i√∑mi=1(p1i)2 ·

∑mi=1(p2i)2

(2)

Using this formula, we calculate the pairwise similarity of

user profiles for each interval t. Since we are only interestedin profiles that are similar to a certain extend, we establish

links between nodes only if a similarity threshold δ is reached.Finally, we obtain an unweighted and undirected graph of

similar users.

Definition 4: The similarity graph G is represented by a listof nodes V = {p1, . . . , pn} and a list of edges E = {(px, py)},while sim(px, py) ≥ δ, ∀(px, py) ∈ E.

III. DETECTING COMMUNITIES AND EVOLUTION

To find and track communities of similar users with respect

to their music preferences, we apply our incremental DEN-

GRAPH clustering algorithm presented in [6]. We extend the

algorithm to allow for individuals being a member of more

than one cluster. In the following section, we briefly describe

the clustering algorithm and the incremental update procedure

to observe clusters over time. Furthermore, we present how

the cluster labels are determined.

A. Incremental Graph Clustering

The intention of DENGRAPH is to cluster similar nodes in

a graph into communities. The density-based approach applies

a local cluster criterion. Clusters are regarded as regions in the

graph in which the nodes are dense, and which are separated

from regions of low node density. To detect regions of higher

density, DENGRAPH computes neighborhoods which have a

244244

Page 3: [IEEE 2009 International Conference on Advances in Social Network Analysis and Mining (ASONAM) - Athens, Greece (2009.07.20-2009.07.22)] 2009 International Conference on Advances in

directly density reachable

p is

p

q

from q

density reachable

p is

pq

from q

density connected

p is

from q

core /border

core

p

q

m

p (q) and (q)|

Fig. 1. The concepts directly density reachability, density reachability anddensity connectedness to determine whether nodes are density connected.

given radius (ε) and must contain a minimum number of nodes(η) to ensure that the neighborhood is dense. A node that hassuch a neighborhood is termed a core node. Nodes that haveno such neighborhood are either border nodes if they are inthe neighborhood of a core node or noise nodes.To build a cluster, DENGRAPH traverses the graph by

randomly picking nodes and places all density connected (cf.Figure 1) nodes it encounters to the same cluster. If a node is

not density connected to the nodes seen thus far, it is assigned

to the next cluster candidate. Not each node becomes member

of a cluster: If a node does not have an adequately dense

neighborhood w.r.t. ε and η and is not density connected toany other node, then it is termed a noise node and its clustercandidate is dropped.

In Figure 1, the concepts of direct density reachability,

density reachability and direct connectivity are illustrated. For

a more detailed description cf. [13], [14], [6].

To allow for tracking and analyzing the temporal dynam-

ics of clusters, DENGRAPH is designed as an incremental

procedure: The clustering is updated incrementally based on

the changes that are observed in the graph structure from one

interval to another. Changes in the graph structure may evoke

one of the following clustering updates: (1) creation of a newcluster, (2) removal of a cluster, (3) absorption of a new clustermember, (4) reduction of a cluster member, (5) merge of twoor mores clusters and (6) split of a cluster into two or moreclusters. For a more detailed description of the incremental

graph clustering algorithm cf. [6].

A potential split of a cluster is handled before the update

method shown in Figure 2 is invoked if an edge between two

core nodes has been removed.

If no split has been observed, we check for each node

that has changed, whether the respective node p has an ε-neighborhood with more than η neighbors (Np(ε) ≥ η). Ifis has a neighborhood, we check whether (i) a new cluster is

established, (ii) the node becomes a member of another cluster

or if (iii) two or more clusters merge. If the node has no ε-neighborhood, we examine whether (i) a cluster is removed,

(ii) a cluster member has been removed, or (iii) a node has

been absorbed by another cluster. An overview of the updatefunctions is shown in Figure 2.

number of cores in N( )with distinct clusterID

Np( )

Creation MergeAbsorption

number of cores in N( )with distinct clusterID

p is core node p is core node

Removal Reduction Reduction Removal / Reduction / Absorption

n0

falsetrue

0 n1

falsetrue falsetrue

Fig. 2. The update Method to detect cluster transitions. (A potential split ishandled beforehand.)

B. Cluster Labeling

After determining groups of users with similar music pref-

erences in each interval, we label each cluster with the most

often used genre tags to characterize the cluster. For this, we

combine the genre vectors of the user profiles in the cluster.

Definition 5: A cluster label cl of cluster i is defined as

�cli =∑n

u=1 �pun

, (3)

where n is the number of users in cluster i.Each detected cluster is thus labeled with a genre vector just

as a user profile.

IV. EXPERIMENTS AND RESULTS

In the following section, we briefly present the data set and

its characteristics. Furthermore, we discuss how we determined

the parameters ε and η for the clustering procedure anddescribe the results obtained by applying the incremental

DENGRAPH algorithm.

A. The Data Set

From the Last.fm Website we obtained the user listeningbehavior of 1,800 users over an interval of 130 weeks (from

March 2005 to May 2008). Last.fm provides for each userand interval (here one week) a list of the most listened to

artists and the number of times the artist was played. Based

on this information we determine user profiles for each interval

according to Definition 2.

B. Determining the Parameters ε and η

Before building the similarity graph according to Definition

4, the parameters ε and η have to be determined. For this,we conduct experiments using data from 16 intervals with

400 parameter combinations regarding the number of clusters.

Our aim is to determine a parameter combination that yields

an average number of clusters. The number of clusters varies

between six and nine and the average is obtained with ε = 0.12and η = 7.The similarity graph consists of edges (px, py) where (1−

sim(px, py)) ≤ ε. Thus, two nodes are only connected with anedge, if their distance is lower than ε. By this we achieve that

245245

Page 4: [IEEE 2009 International Conference on Advances in Social Network Analysis and Mining (ASONAM) - Athens, Greece (2009.07.20-2009.07.22)] 2009 International Conference on Advances in

only users with similar music preferences are in each other’s

ε-neighborhood. (Note, that the similarity would be one if twousers have equal profiles and zero if their profile vectors are

orthogonal. Thus, by subtracting the similarity from one, we

obtain the distance between two users.)

C. Data Set Characteristics

In the following, we present characteristics of the similarity

graphs over all weeks. In Figure 3, the number of nodes and

edges per interval are shown. The number of nodes and edges

increases significantly in the first 60 intervals and decreases

slowly after interval 18/07. The increase in the beginning is

due to the fact that many of the observed users were not yet

active on the platform at this time. The slow decrease could be

explained by a rising diversity in the genres that the users listen

to since this would result in lower similarity values between

users.

In Figure 4, the number of nodes per cluster and the fraction

of noise nodes per interval are shown. The clustering reveals

in each interval one giant component and two to eight smaller

clusters. The large component consists user profiles with a high

weight for the genres ’indie’, ’indie rock’ and ’alternative’.

The percentage of unclustered nodes (noise) is constantly

high and varies between 0.63 and 0.75. This observation

stresses the need for a clustering procedure that is able to

remove noise objects in linear time such as DENGRAPH.

Fig. 3. Number of nodes and edges

D. Results

We applied DENGRAPH on the data set to analyze the

evolution of clusters. As discussed in the previous section,

we obtain on average four clusters in each week. Due to the

labeling of the clusters we observe four main clusters in our

data set: ’indie’, ’hip-hop’, ’metal’ and ’rock’.

In the experiments, several cluster transitions can be ob-

served. To illustrate the observations, we chose a period of

thirty weeks and depict the clustering results of six weeks

as shown schematically in Figure 5. In each week, five to

Fig. 4. Number of clustered nodes (top); Fraction of unclustered nodes(bottom)

seven clusters were detected. The graph visualizations of the

clusterings for four weeks of our example period are shown

in Figures 6 and 7.

One cluster was stable over all periods (cf. Cluster 1 in

Figure 5). The label of this cluster revealed that it consists

of people who listen mainly to artists tagged with ’indie’ and

’alternative’. The number of people in this cluster varies over

the observation period only slightly (around 500) as shown in

Figure 4. Another cluster that can be observed over a long

period is labeled with ’hip-hop’ (cf. Cluster 4 in Figure 5).

This cluster can be observed in most weeks and does never

overlap with any other cluster. An explanation could be that

the genre ’hip-hop’ is very unique and has a low similarity to

other observed genres.

a) Week 29/06: In the first week of the period, fourclusters are detected. The labels of these cluster can be seen

in the legend of Figure 5. The four clusters represent mainly

the four genres that we observe in our data set: Cluster 1 is

the ’indie’ cluster, cluster 4 the ’hip-hop’ cluster and cluster 2

the ’metal’ cluster. At this time, cluster 3 is tagged with both

genres ’rock’ and ’metal’. We will see at a later point (week

48/06) that this cluster finally splits to a ’rock’ and ’metal’

cluster.

b) Week 33/06: The four clusters detected in week 29/06are still observable and two new clusters are created: Cluster

5 labeled with the genres ’metal core’ and ’hard rock’ and

cluster 6 is labeled ’power metal’.

c) Week 37/06: Five of the six clusters are still observ-able, only cluster 6 with the tags ’power metal’ and ’heavy

metal’ has been removed. However, the cluster will show up

again in week 45/06 (cluster 9).

d) Week 40/06: After one month, clusters 2 and 5 havemerged to cluster 7. This transition can also be seen in the

screenshot of the visualization in Figure 6. Both clusters

were labeled with tags related to the genre ’metal’ and the

new cluster combines these labels. For example cluster 2 had

246246

Page 5: [IEEE 2009 International Conference on Advances in Social Network Analysis and Mining (ASONAM) - Athens, Greece (2009.07.20-2009.07.22)] 2009 International Conference on Advances in

Week29/06

Week33/06

Week37/06

Week40/06

Week45/06

Week48/06

Cluster Label

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

5

6

7

8

2

3

4

2

3

4

9

10

11

indie rock, alternative

melodic death metal, metal, death metal

progressive rock, progressive metal, rock, classic rock

hip-hop

metal core, hard rock, rock, emo, screamo, punk, metal

power metal, heavy metal, metal, symphonic metal

metal core, hard core, death metal, metal, melodic metal

electronic, ambient, idm, electronica, indie, chill out

power metal, metal, symphonic metal

progressive metal, progressive rock, metal, progressive

progressive rock, progressive metal, rock, metal

1

2

3

4

Cluster Evolution

11 cluster unchanged

new cluster

cluster removal

cluster merge

cluster split

5

6 ./.

1 1 1 1

3

4

3

4

7 7

4 4

10

8 8 8

9 9

4

3

2

1

5

2

7

3 11

Fig. 5. Genre Evolution

among others the label ’death metal’ and cluster 5 the label

’metal core’. The new cluster 7 has both labels. Furthermore,

in this week a new cluster has been detected (cluster 8). This

cluster is stable over the rest of our observation period and is

labeled among others with the tags ’electronic’, ’ambient’ and

’chill out’. The subgenres ’ambient’ and ’chill out’ indicate

that this cluster is closer to the genre ’indie’ than for example

to other subgenres of the electronic music genre such as

’house’ or ’techno’. This explains, why the cluster has in week

45/06 an overlap with cluster 1, the ’indie’ cluster (see Figure

7(a)).

e) Week 45/06: In week 45/06, a new (old) cluster hasbeen (re-)established (cluster 9). This is the ’power metal’

cluster that we already observed in week 33/06. Besides this

creation, the clustering remains unchanged.

f) Week 48/06: In the next month, the previously men-tioned split occurs. Cluster 3 which was labeled with both

’rock’ and ’metal’ tags is split into the clusters 10 and 11.

Cluster 10 is labeled with ’metal’-related tags and cluster 11

with ’rock’-related tags. Both clusters are still overlapping

since they have members in common. Interestingly, we can

also see that the ’rock’-genre has a higher similarity to the

’indie’-genre than the genre ’metal’: Members of cluster 11

(’rock’) are connected to members that again have neighbors

in the ’indie’-cluster, while there is no connection between

members of cluster 10 (’metal’) to members in cluster 1

(’indie’).

2

4

3

5

1

(a) Clustering in Week 37/06.

3

1

4

8

7

(b) Clustering in Week 40/06.

Fig. 6. Cluster 2 and 5 in (a) merge to cluster 7 in (b) and a new cluster iscreated (cluster 8 in (b)).

g) : Our analysis showed that DENGRAPH detects

groups of users with similar music preferences by clustering

their profile. We obtain groups of users with a similar music

listening behavior. Some of these groups overlap with each

other and others are separated over the entire period of

observation. Clusters that overlap are more similar, because

they have tags in common. On the other hand, we detect groups

that are separated, because they have no similarity with others,

such as the ’hip-hop’ cluster.

Furthermore, we could show that the incremental procedure

allows to track and observe temporal dynamics of user clusters.

We observe the growth and decline of clusters as well as the

creation of new clusters and the removal of formerly existing

clusters.

Using DENGRAPH, we can also observe the evolution

of genre clusters: Transitions such as merges and splits are

247247

Page 6: [IEEE 2009 International Conference on Advances in Social Network Analysis and Mining (ASONAM) - Athens, Greece (2009.07.20-2009.07.22)] 2009 International Conference on Advances in

8

4

93

1

7

(a) Clustering in Week 45/06.

9

8

4

1

7

11

10

(b) Clustering in Week 48/06.

Fig. 7. Cluster 3 in (a) is split into cluster 10 and 11 in (b).

detected and due to the cluster labeling we are able to verify

that these transitions are semantically reasonable. For example,

we observe that a cluster consisting of members listening to

music tagged with ’rock’ and ’metal’ splits at some point into

two separate clusters, while one has the label ’rock’ and the

other ’metal’. On the other hand, we observe the merge of

clusters with similar tags.

V. CONCLUSION

In this paper we presented an application of the DEN-

GRAPH clustering algorithm on a data set obtained from the

social networking site Last.fm. Over a period of 130 weeks, wedetermined for each interval user profiles that represent music

preferences. By building and incrementally clustering a graph

of similar users for each interval, we obtain groups of users

with similar music preferences. We label all clusters according

to the user profiles of the members of the respective cluster.

Due to the incremental nature of DENGRAPH we could show,

how clusters evolve over time. We observe how clusters grow

and shrink, how new clusters are born and old clusters die.

In all these cases we notice that the listening behavior of

people has changed and that these changes result in a joining

or leaving of communities. Furthermore, we showed the merge

and split of clusters. By using the labels of clusters, we could

verify that the observed merges and splits are semantically

explainable since in both cases clusters with similar labels

were involved.

The experiments indicate that DENGRAPH is a clustering

algorithm capable to efficiently detect and track communities

in large, noisy graph structures. This is an important advantage

over, for example, widely used hierarchical clustering tech-

niques (see, e.g., [15]) as they do not scale to large social

networks. Due to its incremental update function, clusters

are observable over time and the temporal dynamics in these

networks can be revealed.

REFERENCES

[1] Eurostat, “Cultural statistics, http://epp.eurostat.ec.europa.eu/,(retrieved 23/08/2008),” 2007. [Online]. Available:http://epp.eurostat.ec.europa.eu/

[2] G. Kreutz, U. Ott, D. Teichmann, P. Osawa, and D. Vaitl, “Using musicto induce emotions: Influences of musical preference and absorption,”Psychology of Music, vol. 36, no. 1, pp. 101–126, 2008. [Online].Available: http://pom.sagepub.com/cgi/content/abstract/36/1/101

[3] P. J. Rentfrow and S. D. Gosling, “The content and validityof music-genre stereotypes among college students,” Psychology ofMusic, vol. 35, no. 2, pp. 306–326, 2007. [Online]. Available:http://pom.sagepub.com/cgi/content/abstract/35/2/306

[4] T. Hays and V. Minichiello, “The meaning of music in thelives of older people: a qualitative study,” Psychology ofMusic, vol. 33, no. 4, pp. 437–451, 2005. [Online]. Available:http://pom.sagepub.com/cgi/content/abstract/33/4/437

[5] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item-basedcollaborative filtering recommendation algorithms,” in WWW ’01:Proceedings of the 10th international conference on World Wide Web.New York, NY, USA: ACM, 2001, pp. 285–295. [Online]. Available:http://portal.acm.org/citation.cfm?id=372071

[6] T. Falkowski, A. Barth, and M. Spiliopoulou, “Studying communitydynamics with an incremental graph mining algorithm,” in Proc. ofthe 14 th Americas Conference on Information Systems (AMCIS 2008),2008.

[7] J.-J. Aucouturier and F. Pachet, “Representing musical genre: A state ofthe art,” Journal of New Music Research, vol. 32, pp. 83–93, 2003.

[8] M. Schedl, T. Pohle, P. Knees, and G. Widmer, “Assigning and visualiz-ing music genres by web-based co-occurrence analysis,” in Proceedingsof the Seventh International Conference on Music Information Retrieval(ISMIR’06), Victoria, Canada, October 2006.

[9] R. Basili, A. Serafini, and A. Stellato, “Classification of musical genre:a machine learning approach.” in ISMIR, 2004.

[10] M. Levy and M. Sandler, “A semantic space for music derived fromsocial tags,” in 8th International Conference on Music InformationRetrieval (ISMIR 2007), 2007.

[11] C. McKay and I. Fujinaga, “Musical genre classification: Is it worthpursuing and how can it be improved?” in ISMIR, 2006, pp. 101–106.

[12] G. Geleijnse, M. Schedl, and P. Knees, “The quest for ground truth inmusical artist tagging in the social web era,” in Proceedings of the EighthInternational Conference on Music Information Retrieval (ISMIR’07),S. Dixon, D. Bainbridge, and R. Typke, Eds., Vienna, Austria, September2007, pp. 525 – 530.

[13] M. Ester, H.-P. Kriegel, J. Sander, M. Wimmer, and X. Xu, “Incrementalclustering for mining in a data warehouse environment,” in Proc. of 24thVLDB Conference, 1998.

[14] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithmfor discovering clusters in large spatial databases with noise,” in Proc. of2nd International Conference on Knowledge Discovery and Data Mining(KDD-96), 1996, pp. 226–231.

[15] M. E. J. Newman and M. Girvan, “Finding and evaluating communitystructure in networks,” Physical Review, vol. E 69, no. 026113, 2004.

248248