headphones on the wire - arxiv.org · french national syndicate of phonographic publishing [29],...

Headphones on the wireStatistical patterns of music listening practices

Thomas Louail1, 2, ∗ and Marc Barthelemy3, 4

1CNRS, UMR 8504 Geographie-cites, 13 rue du four, FR-75006 Paris2Instituto de Fısica Interdisciplinar y Sistemas Complejos

IFISC (CSIC-UIB), Campus UIB, ES-07122 Palma de Mallorca3Institut de Physique Theorique, CEA, CNRS-URA 2306, FR-91191 Gif-sur-Yvette

4CAMS (CNRS/EHESS) 190-198, avenue de France, FR-75244 Paris Cedex 13

We analyze a dataset providing the complete information on the effective plays of thousands ofmusic listeners during several months. Our analysis confirms a number of properties previouslyhighlighted by research based on interviews and questionnaires, but also uncover new statisticalpatterns, both at the individual and collective levels. In particular, we show that individuals fol-low common listening rhythms characterized by the same fluctuations, alternating heavy and lightlistening periods, and can be classified in four groups of similar sizes according to their temporalhabits - “early birds”, “working hours listeners”, “evening listeners” and “night owls”. We providea detailed radioscopy of the listeners’ interplay between repeated listening and discovery of newcontent. We show that different genres encourage different listening habits, from Classical or Jazzmusic with a more balanced listening among different songs, to Hip Hop and Dance with a moreheterogeneous distribution of plays. Finally, we provide measures of how distant people are fromeach other in terms of common songs. In particular, we show that the number of songs S a DJshould play to a random audience of size N such that everyone hears at least one song he/shecurrently listens to, is of the form S ∼ Nα where the exponent depends on the music genre and isin the range [0.5, 0.8]. More generally, our results show that the recent access to virtually infinitecatalogs of songs does not promote exploration for novelty, but that most users favor repetition ofthe same songs.

The reasons why human beings like listening to mu-sic, the variety of emotions music can arouse, its usesand functions in human societies: those are some longlasting questions which have been discussed by musiccritics and by scientists belonging to a wide range ofdisciplines. From the early musicology [1] to popularmusic studies [2] through sociology of cultural prac-tices [3], geography [4, 5], music history [6, 7], cul-tural economics [8, 9], educational and cognitive psy-chology [10–13], physiology and neurosciences [14, 15],an eclectic scientific litterature has illuminated manydifferent facets of music listening. At a collectivelevel, it has been demonstrated several times thatstatistical relations between inherited social charac-teristics of individuals and their musical preferencesexist [3, 16–18]. At the individual level, studies rely-ing on questionnaires, interviews or experiments con-ducted in controled environments have documentedboth the functions attributed to listening and theemotions aroused, in various situations of daily lifeand in different contexts [10, 13, 15, 19]. The influ-ence of the device on the listening practice [20], theeffects of listening on a number of daily activities –e.g. performance at work [21], driving [22], copingand regulating emotions [12, 23] – or the rewarding as-pects of music-evoked sadness [24] are other examplesof listening-related research. Classifications of listen-ers have been proposed, with some authors concludingabout the existence of a direct relation between musi-cal preferences and cognitive styles [25], other stress-

∗ Correspondence and requests for materials should be ad-dressed to TL (email: [email protected])

ing the uses of music and their self-declared impor-tance as relevant classifiers [26, 27]. Statistical physi-cists also contributed by highlighting structural prop-erties of artists and genres communities that emergefrom the analysis of personal libraries of audio files[28].

However, relatively little is known about how pre-cisely we listen to recorded music on a daily basis. Byhow we refer here to some kind of detailed, quantifiedradioscopy of our contemporary listening practices ofrecorded music, an important aspect of the relationwe entertain with music.

Until recently, any empirical research willing to an-swer to questions pertaining to daily listening prac-tices had to rely on surveys and interviews. Thetechnological and societal evolutions have sustainedthe development of new mobile devices, online toolsand listening possibilities, as well as new actors inthe music industry. Music-on-demand services havequickly gained in popularity over the last few years,and for example, according to a recent report of theFrench national syndicate of phonographic publishing[29], more than three million of French residents (ap-prox. 4% of the total population) were subscribing toan on-demand streaming music platform in 2016, androughly 1/3 of the total french population regularlystream audio content. The data recorded by stream-ing platforms offer great possibilities to analyze andhopefully better understand individual and collectivelistening practices.

Whenever an individual plays a song through sucha service, with a web browser or dedicated applica-tion, online or offline, all known information associ-ated with the stream are logged in the company’s

arX

iv:1

704.

0581

5v1

[ph

ysic

s.so

c-ph

] 1

9 A

pr 2

017

2

database. For example, when Amelie plays The Roots’Kool On on her mobile phone, a new entry is added tothe database, informing that at 8:23 am on 2/10/2015she played the song (Kool On) by The Roots, fromthe album Undun. Incidentally, we also know thegenres tags associated to the song, the stream’s du-ration (and consequently if she listened to the songentirely or not), the information declared during reg-istration, including her age and city of residence, andpossibly some additional contextual information (e.g.if the song was part of one of Amelie’s playlists, if shewas online or offline when listening to the song, possi-bly the city where Amelie was located when listening,etc.). Such a record of information is added to thedatabase for each single play of the millions of regis-tered users of the service. Once anonymized, users’listening history data can be processed and analyzed,and those digital traces passively produced while lis-tening consequently constitute an unprecedented em-pirical source to study quantitatively contemporarylistening practices.

In the following we analyze the listening historydata of one of the major streaming platforms. Thedata correspond to the entire listening history of aboutfive thousands users during one hundred days (see Ma-terial for details). We note that for some of thesepeople the music streamed may represent only a lim-ited subset of all the recorded music they listened toduring that period, and their data might not be rep-resentative of their entire listening practice [20]. Inorder to control this bias we selected a set of anony-mous users among those who displayed a frequent useof the service (see Appendix). For these listeners wecan reasonably assume that the streaming platform,if not exclusive, constitutes a daily source of music.

RESULTS

Rhythms

We start by quantifying the relation that individu-als have with music listening as a daily activity, andthe rhythms and typical hours at which they performthis activity. For all individuals, we compute the totaltime they spent listening during the 100 days periodunder study. Fig. 1a represents the cumulative distri-bution of the average daily listening time t computedover all individuals and all days. This quantity variestheoretically from 0 to 1440 (number of minutes in aday) and the empirical measure reveals a range from4 to 1200 minutes per day, a median value of approx-imately 80 minutes, and about 75% of individuals lis-tening more that one hour per day (and 25% listeningmore than 2 hours per day). In order to understandhow listeners behave over periods of several months,we extract the distribution of the daily listening timesti for all individuals (the index i refers to the individ-ual) and for the whole period, splitting listeners intothree groups according to their total listening time(“light”, “medium” and “heavy” listeners). We com-

pute for each individual its average ti over all daysand we show on Fig. 1b the normalized (ti/ti) distri-butions for the three groups. These normalized dis-tributions collapse onto a single curve, indicating thatwhatever how much music they listen to, individu-als display a common behavior characterized by thesame fluctuations in listening times, alternating dayswith relatively few music listened and days of heavylistening. These distribution are peaked and can befitted by an exponential function, suggesting a Pois-son nature of the listening behavior, in contrast withprevious results on daily human behavior [30].

We then wonder at what time of the day peoplelisten to music. We introduce u(h) which countsthe number of individuals listening to music at timeh. In order to highlight the collective rhythm we

plot the normalized values∫ h+1

hu(h)/

∫ 23

0u(h), where∫ 23

0u(h) is the total number of unique individuals that

listened to music during the day (see SupplementaryFigure 1). Like other daily activities which have beenheavily studied from individual traces [31, 32] the ag-gregated curves of activity display two characteristicpatterns, one for weekdays and another for weekends.However not everyone listens to music at the samehours, and for each listener i we calculate his/her pro-portion phi of plays that occurred between h and h+1,averaged over the 100 days period. We then constructthe 24-values vector (p0

i , . . . , p23i ) and using these time

profiles we cluster the listeners, and find four typicalgroups whose average profiles are shown in Fig. 1c(see the Appendix for details on the clustering). Thesetime profiles are very specific: in one group individualslisten to music mostly in the morning (“Early birds”);in another we observe two listening peaks, one in themorning and the other in the afternoon (“Workinghours listeners”); there are also individuals listeningto music mostly at the end of the day (“Evening lis-teners”); and finally those whose listening peak is latein the evening and during the night (“Night owls”).

Difference and repetition

We now investigate how individuals are distributedalong various dimensions of music listening. We de-note by Pi, Si, and Ai the total numbers of plays,unique songs and unique artists listened by individ-ual i during the 100 days period, respectively. Weshow on Fig. 2a the distribution p(P/P ), p(S/S) andp(A/A) of the normalized variables (where the aver-age values are computed over all individuals and thewhole 100 days period). These distributions collapseon a curve that can be fitted by a log-normal distri-bution of parameters µ ≈ 3.9 and σ ≈ 1.1. It is hereanother occurrence of the lognormal distribution insocial dynamics, although its origin here is not clearand would deserve further investigation. Also, whilewe can understand a priori that P and S display thesame behavior, it is more surprising that the distri-bution of the number of unique artists listened also

3

0.00

0.25

0.50

0.75

1.00

10 60 80 120 1200

t (minutes)

CDF

10−5

10−4

10−3

10−2

101 102 103

ba

"Early birds" "Working hours listeners" "Evening listeners" "Night owls"

0.00

0.03

0.06

0.09

0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21

Hour

⌠ ⌡ t

t+1

p(t)

⌠ ⌡

day

p(t)

17% 30% 34% 20%c

10−6

10−5

10−4

10−3

10−2

10−1

100

10−3 10−2 10−1 100 101

ti tiPDF

allLightMediumHeavy

FIG. 1. Music listening rhythms. (A) Cumulative distribution of the average daily listening time t (in minutes)computed over all listeners and all day. The y-value of each point gives the fraction of users that listened at most t minutesof music each day on average, during the 100 days period under study – inset: corresponding probability distribution.(B) The normalized distributions of daily listening times (ti/ti) for three different groups of listeners: heavy listeners(more than 2 hours per day on average, blue triangles), medium (between 1 and 2 hours, green diamonds), and light(less than 1 hour per day, orange squares).(C) Average profiles of different groups of listeners obtained by clustering ontheir listening temporal habits.

collapses on the same curve.In order to understand how individuals distribute

their plays among the songs they listen to, we com-pute the aggregated distributions p(Pi(α)) which con-tain the numbers of plays per song α for each listeneri (we then have

∑α Pi(α) = Pi) for the three groups

defined above – heavy, medium and light listeners,according to their total number of plays. We rep-resent in Fig. 2b these distributions p(Pi(α)/Pi(α))

(with Pi(α) = Pi/Si) which also collapse on a curvewhose tail can be fitted by a power law with an ex-

ponent ≈ 3.0 (the distributions p(Pi(α)) are shownin the inset). The fluctuations around the averagevalue P/S are therefore the same whatever the group:no matter how much music people listen to, they dis-tribute similarly their attention on the different songslistened. Patterns of Fig. 2a and b might result froma simple relation between P , S and A common to alllisteners and that would allow to make predictions forany of these variables knowing the value of another.As shown on Supplementary Figure 4, there are how-ever no clear relations among these variables and we

4

observe large fluctuations from one person to another.

We now focus on the quantity Si/Pi which can beseen as the exploration rate of individual i among thecatalog. By definition this ratio varies between 0 –the extreme case of an individual that would listen toone and only song again and again – and 1 – a pureexplorer that would never listen a song twice. Thebinned scatterplot of Fig. 2c shows that there is noclear relation betweeen the weight of exploration S/Pand the total time spent listening to music (charac-terized by P ). We also see that the average value ofS/P is around 1/3, indicating that the average num-ber of plays per song is ≈ 3. We observe a trend(despite large fluctuations) between the average rate

Si/Pi and the listeners’ age (shown in Fig. 2d; errorsbars correspond to one standard deviation), indicatinga decrease of repetition and an increase of explorationwith age (see also [33]).

Streaming audio is a different experience than lis-tening to the radio or browsing in a personnal col-lection of records or audio files. Several modes areoffered to users: they can search and play songs oneafter another, listen to an entire album or listen to aplaylist previously compiled by themselves, someoneelse or automatically generated (the latter gaining inimportance thanks to increasingly sophisticated rec-ommendation systems [34]). Vinyle records favor a se-quential listening from the first to the last track, CDsallowed direct access to any song of the record butstill contain albums which are “meant to” be playedentirely. Streaming platforms offer listeners an imme-diate access to any song. The possibility to pick songsamong a practically infinite catalog suggests the naiveassumption that we should observe a high versatilityin plays, and listening sessions mixing album-centeredhabits with handmade sequences of songs (in playlistsor not). In order to check this hypothesis we firstcompute for each listener the percentage of albumslistened entirely (while not necessarily in sequentialorder), and plot the distribution of this percentageamong the population of listeners on SupplementaryFigure 5. More than 50% of users played in their en-tirety less than 5% of the albums. We then computethe distribution of the number of songs played per al-bum, and compare it to the distribution of the numberof songs per album. If listeners had album-centeredpractices, then both distributions should match. Theresults shown on Fig. 2e tell a different story. The dis-tribution of the number of songs per album in the cat-alog (blue curve) has several small peaks around typi-cal values that correspond to different types of records:1 (singles), 10–12 (the typical number of songs on al-bums), and then 20/30/40 (likely corresponding todouble/triple albums and compilations/anthologies).In contrast we observe a very different distribution(red curve) when we consider the number of songsplayed per album, that displays a regular decay witha smaller peak around 10, corresponding to the re-mainder of album-centered listening practices.

Music genres and listening habits

Each song is indexed with one or several genre tags.While there are hundreds of unique tags in such songsdatabases, most of them are associated to a very smallproportion of songs only, and concern an even smallerproportion of plays. The distribution of the listen-ers’ plays per genre shows us first that over a periodof several months, most individuals listen to songs ofvery different genres (see the histogram of the num-ber of genres listened at least once on SupplementaryFigure 8). This first impression of broad eclecticism ischallenged by a closer look at the individuals’ distri-butions of plays among genres. For each individual iwe compute the Gini normalized coefficient (see Sup-plementary Methods in Appendix) of his/her distribu-tion p(Pg(i)) of plays in each music genre g, and ploton Supplementary Figure 9A the distribution of thisGini coefficient among listeners. We first observe thatthere are no listeners displaying small Gini values. Onthe contrary, most individuals have large Gini values,indicating that even eclectic individuals who listen tomany different genres tend to strongly favor a subsetof them. From the value of the Gini coefficient we ex-tract a typical number of dominant genres among thelistener’s plays (see Supplementary Methods in Ap-pendix). It appears that most individuals have 2 or3 genres that they clearly favor (cf. SupplementaryFigure 9B and C). This observation naturally leadsus to determine the couples of genres which often gotogether. We then determine for each listener his/hertwo most listened genres, and estimate the probabilityP (g2|g1) to have g2 as second favorite genre when g1 isthe favorite one. These probabilities are representedon Supplementary Figure 10. Beyond the leading roleof Pop music on streaming platforms, we recognizeclassic proximities, such as Metal-Rock, the Hip Hopfamily or the Classical-Jazz tandem. We also observedthat the temporal patterns of the music genres do notstrongly differ from each other (see SupplementaryFigure 11) [35]. Similarly, we could not distinguishgroups of artists that are preferentially listen to atcertain hours [36].

We now focus on 9 broad genres (Classical music,Jazz, Rock, Metal/Hard Rock, Reggae, Electro, Pop,Hip Hop, and Dance) to see if this classic, “recordstore alleys” classification allows to discern variouslistening habits. Considering how recorded music isproduced and distributed, some music genres favorthe emergence of “hits” and are more “inegalitarian”than others when it comes to the repartition of thecrowd’s attention towards songs (see SupplementaryFigure 12). The heterogeneity of the number of playsin each genre is represented by a “violin plot” shownin Fig. 3. Each genre is represented by a violin, whichgives vertically and symetrically the smoothed dis-tribution (kernel density estimation) of the listeners’Gini coefficient for songs in that genre. The Gini co-efficient of a given individual for a given genre en-codes the inequality of his/her distribution of playsamong the songs of this particular genre. The violins

5

10−810−710−610−510−410−310−210−1100

10−1 100 101 102

Pi Pi

PD

F

All

Light

Medium

Heavy

10−5

10−4

10−3

10−2

10−1

10−0

0 10 20 30 40 50Number of songs per album

De

nsity

in catalogplayed

10−4

10−3

10−2

10−1

100

10−1 100 101

A A

S S

P P

a b c

0.00

0.25

0.50

0.75

1.00

0 2500 5000 7500 10000

P

S/P

0.00

0.25

0.50

0.75

1.00

%listeners

0.2

0.3

0.4

0.5

0.6

20 30 40 50Age of listeners

S/P

ed

FIG. 2. Dimensions of contemporary listening practices. (A) Histogram of the total numbers of plays (P ), songs(S) and artists (A) listened among users during the 100 days period. (B) Normalized distributions of plays per song P/Pfor three groups of listeners (light, medium and heavy – according to their total number of plays). Inset: distributionp(Pi(α)) for the three groups of listeners. (C) Binned scatterplot of the listeners’ exploratory ratio S/P vs. their totalnumber of plays P . (D) Exploratory ratio S/P vs. listeners’age. For each age, the error bars correspond to a standarddeviation. (E) Histograms of the number of songs per album in the music catalog available to listeners (blue curve) andof the number of songs played per album listened (red curve).

are shown from left to right according to the aver-age value of the individuals’ Gini coefficients. Theseviolins show that the Gini coefficient is usually dis-tributed over the whole range [0, 1], indicating thatfor most genres there is a strong heterogeneity of lis-tening practices. We however observe that individu-als, when they listen to the most popular genres (Pop,Hip Hop or Dance), are more homogeneous and seemmore focused on a subset of songs. In contrast, inother genres we observe a greater variety of listen-ing practices (for example for Jazz or Classical). Wethen group listeners according to their favorite genre:Rock, Rap, Jazz, etc. We calculate the average ageof the people in each group, along with the averageexploration rates S/P inside and outside the favoritegenre (shown on the Supplementary Figure 13). Wenote that in all groups the exploration rate is largeroutside the favorite genre than inside, an indicationthat for most individuals what contributes to make agenre their favorite is the repetitive listening of a smallnumber songs of that genre. We also note substantialdifferences in terms of the average age of listeners inthe different groups, an expected observation in agree-ment with previous work [9, 33, 37].

Playing music at a party

Finally, we provide measures about the “distance”between individuals in terms of the music they lis-ten to. The size of the online music library is prac-tically infinite which implies that individuals’ playswould have a very small overlap if random. Musicalchoices however depend on many things, are stronglyinfluenced by social factors [3] and we could expect alarger value of the overlap than the one obtained bychance. We first estimate from the dataset the prob-ability that two randomly selected individuals sharea given number of songs. Fig. 4a shows the proba-bilities p = Prob(s ≥ S, δt) that they shared at leastS songs during a period δt varying between one hourand one month. This probability obviously increasesas one considers longer time periods, and the proba-bility p(S ≥ 1, δt = 30 days) that two people havelistened at least one common song in 30 consecutivedays is large p ≥ 0.7 (p ≥ 0.9 for a 100 days period).The four curves on Fig. 4a are very similar and displaya cut-off value of ≈ 10− 15 songs.

A related problem is to determine the minimumnumber of songs that a DJ should play to an audience

6

0.00

0.25

0.50

0.75

1.00

Classical Jazz Rock Metal/HardRock Reggae Electro Pop Hip Hop Dance

Gin

i nor

mal

ized

coe

ffici

ent o

f the

list

ener

s' d

istr

ibut

ion

of p

lays

am

ong

song

s of

a g

iven

gen

re

FIG. 3. Inequality of music genres. Each violin is a vertical and symetric representation of the smoothed distribution(kernel density estimation) of its listeners’ Gini coefficient of plays per song, for each genre. Each violin shows to whatextent individuals have an heterogeneous attention towards the songs they listen to, when they listen to that genre.

so that everyone hears at least one song that he/sherecently listened to. This problem is well-known bynon-professional DJs who have to deal with peopleharassing them to play specific songs at parties. Thisminimum number of songs S obviously depends of thesize N of the audience, and we call the function thatrelates S to N the DJ function. In order to evaluateempirically this function, we randomly sample differ-ent sets of listeners of increasing size N , and for eachset we determine the smallest number of songs S thatallow to satisfy them all (in other words, S is the sizeof the minimum 1-mode vertex cover in the bipartitesubgraph connecting the individuals sampled and thesongs they listened). When plotting this number S

vs. N , we observe a behavior of the form S ∼√N , as

shown by the black curve on Fig. 4b. We repeat thesame calculation by focusing on specific music genresto cover the case of “specialized” DJs. We select songs(and their listeners) of a given genre only, allowing usto evaluate a DJ function per genre. Each of them hasthe same general form S ∼ Nα, with 0.64 ≤ α ≤ 0.8(R2 > 0.99). These exponents give a lower bound tothe size of the setlist that one has to play if she/hewants to “satisfy” everybody in a random crowd fullof strangers. In particular, if the venue is big andthe audience large (≥ 104 individuals), the requirednumber of songs will be too large to be played dur-ing a single event, making the challenge impossiblewhatever the DJ. In reality, the crowd attending to agig is not random and gather individuals with similartaste. We then reproduce the same experiment butthis time by considering specialized audiences, com-posed of people whose favorite genre is the one playedby the DJ. As expected we obtain smaller exponents,

with 0.47 ≤ α ≤ 0.8 (R2 > 0.99), showing that spe-cialized audiences are easier to “satisfy”.

DISCUSSION

Streaming platforms contribute to increase possi-bilities of access to recorded music, and might possi-bly change the listening habits of their users. Thesecan experience legal and unlimited access to a giganticcatalogue containing more years of unique music thanwhat one could listen to in an entire life. Our resultschallenge a number of naıve assumptions about thecontemporary forms of an old and widespread culturalpractice. Considering the available catalog, one couldthink that it would encourage listeners to continuouslysearch for novelty, and browse in many genres, artistsand albums. Our results on the weight of repetitionand the small number of dominant genres per listenerindicate that it is currently not the case. This obser-vation takes place in the context of discussions aboutthe psychological function of repetition when listen-ing to music. These considerations are however be-yond the scope of this article, and we refer the read-ers to more specific work relying either on detailedinterviews with listeners [10, 38], self-reports [26] orneuroimaging [14]. We remind however that “talk ischeap” [39], and our observations that (i) heavy andlight listening days identically alternate whatever theperceived importance of music in the listeners’ dailylives, and (ii) that the weight of repetition is indepen-dent of the amount of music listened, challenge previ-ous results based on questionnaires and interviews.

It would be wise nonetheless not to generalize toohastily our results to music listening practices in gen-

7

1

10

100

1000

10 100 1000

N

S

Audience

random

specialized

DJ mix

Jazz

Electro

Hip Hop

All

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

1 10 100

S: Number of songs in common

Pro

b(s

>=

S)

hour

day

week

month

a b

FIG. 4. Organizing a party. (A) Estimated probabilities that two randomly chosen individuals listened at leastto S common songs, for different periods of time. (B) DJs’ functions capturing the relation between the size N of anaudience and the minimal number of songs S one should play so that everybody in the room hears at least one song thathe/she recently listened to. Colours correspond to DJs playing specific music genres, while shape of points correspondto random or specialized audiences.

eral, whatever the listening device. The availability ofpre-existing playlists and the automated generationof playlists fitted to the users’ taste might encouragea less involved listening process, resulting in distinctstatistical properties between “active” and “passive”listeners. For example, the authors of [40] concludedthat those who declare listening more music are alsomore involved in the choice of the music they listento. The data we analyzed include no contextual infor-mation that let us know if the songs listened were vol-untarily played after a proper search, or if they wererecommended and queued by the service itself. Weshould also mention that there is so far a limited pro-portion of individuals who use streaming platforms astheir main source of recorded music, but this propor-tion is constantly increasing [29]. We have restrictedour analysis to listeners whose activity suggests thatthey favor streaming. But considering that these maynot representative of the entire population (in termsof social and demographic criteria), our results needto be confirmed with richer datasets providing morecontextual information.

The collection of individual data by companiesraises privacy issues and legitimate concerns aboutsurveillance. For obvious reasons these companies arereluctant to share the raw data they collect, even af-ter proper anonymisation. This policy partly explainswhy very few results obtained from such data havebeen published so far in the scientific literature. Ques-tions similar to those we addressed are nonethelessstudied internally in a product-oriented research (seefor example Spotify Insights [41] or Music Machinery,,collections of blog posts discussing data-driven analy-

sis of listening practices, e.g [42]). However, “digitalfootprints” alone do not give researchers clues aboutthe individuals’ intentions explaining their behaviorand choices. More generally, these traces poorly in-form about the context of use, suffer several uncon-trolled bias, and might lead to misinterpretating theresults [43] (e.g. was the user really listening – or evenin the room – when the song was played?). Conse-quently any “blind” analysis of logs alone is doomedto be limited in scope, and in some cases may leadto wrong conclusions (e.g. see [44] for a discussionof the case of individual human trajectories recon-structed from unconventional data sources). Whilean increasing part of daily human activities produceelectronic traces, designing information collection pro-tocols which articulate the strengths of both traditions(detailed surveys and interviews in one hand and digi-tal footprints in the other) is a contemporary challengefaced by social research.

MATERIAL AND METHODS

Dataset

The dataset analyzed contains the raw streamingdata of 10,000 anonymous and registered users of amajor streaming platform. They inform us on theirentire listening history during the 6-months periodspanning from June 1st 2013 till December 1st 2013.These users were randomly selected among all the reg-istered French users of the service. We know noth-ing about how important is the streaming service for

8

these users, who might also heavily rely on other mu-sic sources and devices (their own personal library ofrecords or audio files, the radio, etc.), and might havedistinct practices depending on the source and device[10, 20, 38]. In order to mitigate this bias, we choseto focus on users who made a frequent use of their ac-count. We selected a set of 4,615 anonymised Frenchusers who actively listened to music (at least everyother day in average) during a 100 days period, from2013/8/15 to 2013/11/23. For these individuals wecan reasonably make the hypothesis that streaming,if not their unique, was one of their main music sourceduring this period (see the Appendix for details on thecleaning and filtering of the data).

Normalized Gini coefficient and extraction of thenumber of dominant terms

We assume that we have K classes and in each class,we have a random number Xi. The Gini coefficientcan then be computed as follows

GK(X) =1

2K2X

∑p,q

|Xp −Xq| (1)

This coefficient is a priori in the interval [0, 1] butwe will see that for finite K the maximum value isactually different from 1 and depends on K.

Case K and D = 1

We denote by D the number of dominant terms. IfD = 1 we have only one dominant term that we callX1 = a and all the other terms are much smaller thana and for simplicity – without any loss of generality –we take them Xi 6=1 = 1. The Gini coefficient is then

GK(a) =1

K2

(a− 1)(K − 1)a+K−1K

=K − 1

K

a− 1

a− 1 +K

In the limit a � K where the heterogeneity is maxi-mal and the Gini coefficient maximum, we then obtain

G∗K =K − 1

K(2)

We see on this formula that G∗ can actually be muchsmaller than 1 if K is not too large. In the case ofa small K we thus have to compute the normalizedGini coefficient G/G∗K which is in the interval [0, 1]and reaches 1 for the most heterogeneous distribution.

General K and D case

We assume here that we have D dominant termsand for simplicity we assume that X1 = X2 = · · · =

XD = a and Xi>D = 1. We then have

X =Da+ (K −D)

K(3)

We also obtain∑p,q

|Xp −Xq| = 2D(K −D)(a− 1) (4)

and the Gini coefficient is then

GK,D(X) =1

2K2

2D(K −D)(a− 1)Da+(K−D)

K

(5)

=K −DK

D(a− 1)

Da+K −D(6)

In the limit where a� 1 we then obtain the maximumvalue

G∗K,D =K −DK

(7)

Extracting the number of dominant terms

For a given observation of the Gini coefficient for{X}, we can ask what is the equivalent configurationwith Deff dominant terms ? In other words, if wemeasure G, this value is bounded

K −DK

< G <K −D + 1

K(8)

where the number of effective dominant terms is givenby

Deff = E[K(1−G)] (9)

where E[x] denotes the nearest (lower) integer of x.

Data availability

The raw data that support the findings of this studywere provided by a third-party company. Restrictionsapply to the availability of these data, which were usedunder license for the current study, and are not pub-licly available. Derived, aggregated data supportingthe findings presented in this article are however avail-able from the corresponding author upon request.

ACKNOWLEDGMENTS

The authors thank A. Sinton and A. Herault forsupporting the research, and A. Sinton for his helpwhen processing the data. T.L. thanks G. Ramelet,P. Chapron, Y. Renisio, A. Beaumont, R. Gallotti,R. Louf, J.-L. Giavitto, F. Lamanna, W. Nowak, N.Vallet and L. P. Vallet for inspiring discussions on theanalysis of listening practices. TL gives another highfive to G. Ramelet and L. Nahassia for their feedbackon an early version of the manuscript. Final thanks goto C.A. Burnett, T. Caruana, M.D. McCready, J.N.Osterberg, A.L. Peebles and D. Robitaille.

9

[1] Theodor W. Adorno. On the Fetish-Character in Mu-sic and the Regression of Listening. Zeitschrift furSozialforschung, 7, 1938.

[2] Philippe Le Guern and Hugh Dauncey, editors.Stereo: Comparative Perspectives on the SociologicalStudy of Popular Music in France and Britain. Rout-ledge, Farnham, Surrey, England ; Burlington, VT,December 2010.

[3] Pierre Bourdieu. Distinction: A Social Critique of theJudgement of Taste. Harvard University Press, 1984.

[4] George Carney. Music Geography. Journal of CulturalGeography, 18(1):1–10, September 1998.

[5] Conrad Lee and Padraig Cunningham. The Ge-ographic Flow of Music. In Proceedings of the2012 International Conference on Advances in So-cial Networks Analysis and Mining (ASONAM 2012),ASONAM ’12, pages 691–695, Washington, DC, USA,2012. IEEE Computer Society.

[6] Nik Cohn. Awopbopaloobop Alopbamboom: TheGolden Age of Rock. Grove Press, New York, 1972.

[7] Peter Guralnick. Sweet Soul Music: Rhythm AndBlues And The Southern Dream Of Freedom. Canon-gate Books, 1986.

[8] Raymond AK Cox, James M. Felton, and Kee H.Chung. The concentration of commercial success inpopular music: An analysis of the distribution of goldrecords. Journal of cultural economics, 19(4):333–340,1995.

[9] Juan Prieto-Rodrıguez and Vıctor Fernandez-Blanco.Are Popular and Classical Music Listeners the SamePeople? Journal of Cultural Economics, 24(2):147–164, May 2000.

[10] J. A. Sloboda. Everyday Uses of Music Listening: APreliminary Study. In S.W. Yi, editor, Music, Mind& Science, pages 354–369. Seoul National UniversityPress, 1999.

[11] Adrian C. North, David J. Hargreaves, and Susan A.O’Neill. The importance of music to adolescents.British Journal of Educational Psychology, 70(2):255–272, June 2000.

[12] Dave Miranda and Michel Claes. Music listening,coping, peer affiliation and depression in adolescence.Psychology of Music, 37(2):215–233, January 2009.

[13] Susan Hallam, Ian Cross, and Michael Thaut. Ox-ford Handbook of Music Psychology. Oxford Univer-sity Press, May 2011.

[14] Daniel J. Levitin. This Is Your Brain on Music: TheScience of a Human Obsession. Dutton, 2006.

[15] Valorie N. Salimpoor, Mitchel Benovoy, GregoryLongo, Jeremy R. Cooperstock, and Robert J. Za-torre. The Rewarding Aspects of Music Listening AreRelated to Degree of Emotional Arousal. PLOS ONE,4(10):e7487, October 2009.

[16] Nick Prior. Critique and Renewal in the Sociologyof Music: Bourdieu and Beyond. Cultural Sociology,5(1):121–138, January 2011.

[17] Bethany Bryson. ”Anything But Heavy Metal”: Sym-bolic Exclusion and Musical Dislikes. American Soci-ological Review, 61(5):884–899, 1996.

[18] Koen van Eijck. Social Differentiation in Musi-cal Taste Patterns. Social Forces, 79(3):1163–1185,March 2001.

[19] Patrik N. Juslin and Petri Laukka. Expression, Per-ception, and Induction of Musical Emotions: A Re-

view and a Questionnaire Study of Everyday Listen-ing. Journal of New Music Research, 33(3):217–238,September 2004.

[20] Amanda E. Krause, Adrian C. North, and Lauren Y.Hewitt. Music-listening in everyday life: Devices andchoice. Psychology of Music, August 2013.

[21] Teresa Lesiuk. The effect of music listening on workperformance. Psychology of Music, 33(2):173–191,January 2005.

[22] Nicola Dibben and Victoria J. Williamson. An ex-ploratory survey of in-vehicle music listening. Psy-chology of Music, 35(4):571–589, January 2007.

[23] Suvi Saarikallio. Music as emotional self-regulationthroughout adulthood. Psychology of Music,39(3):307–327, October 2010.

[24] Liila Taruffi and Stefan Koelsch. The Paradox ofMusic-Evoked Sadness: An Online Survey. PLOSONE, 9(10):e110490, October 2014.

[25] David M. Greenberg, Simon Baron-Cohen, David J.Stillwell, Michal Kosinski, and Peter J. Rentfrow.Musical Preferences are Linked to Cognitive Styles.PLOS ONE, 10(7):e0131151, July 2015.

[26] Tom F. M. Ter Bogt, Juul Mulder, Quinten A. W.Raaijmakers, and Saoirse Nic Gabhainn. Moved bymusic: A typology of music listeners. Psychology ofMusic, 39(2):147–163, January 2011.

[27] Thomas Schafer. The Goals and Effects of Music Lis-tening and Their Relationship to the Strength of Mu-sic Preference. PLOS ONE, 11(3):e0151634, March2016.

[28] R. Lambiotte and M. Ausloos. Uncovering collectivelistening habits and music genres in bipartite net-works. Physical Review E, 72(6):066107, December2005.

[29] Patricia Sarran. Barometre musicusages (annualreport on contemporary listening practices). SyndicatNational de l’edition Phonographique, 2016. availableonline at http://www.snepmusique.com/actualites-du-snep/barometre-musicusages-le-premier-tableau-de-bord-de-la-consommation-de-musique-en-ligne/.

[30] Albert-Laszlo Barabasi. The origin of burstsand heavy tails in human dynamics. Nature,435(7039):207–211, 2005.

[31] Maxime Lenormand, Miguel Picornell, Oliva Gar-cia Cantu-Ros, Antonia Tugores, Thomas Louail,Ricardo Herranz, Marc Barthelemy, Enrique Frıas-Martınez, and Jose J. Ramasco. Cross-Checking Dif-ferent Sources of Mobility Information. PLoS ONE,9(8):e105184, August 2014.

[32] Maxime Lenormand, Thomas Louail, Oliva Gar-cia Cantu Ros, Miguel Picornell, Ricardo Herranz,Juan Murillo Arias, Marc Barthelemy, Maxi SanMiguel, and Jose J. Ramasco. Influence of sociode-mographics on human mobility. Scientific Reports,5:10075, May 2015.

[33] Paul Lamere. Exploring age-specific prefer-ences in listening. Music Machinery blog post.https://musicmachinery.com/2014/02/13/age-specific-listening/

[34] Geoffray Bonnin and Dietmar Jannach. AutomatedGeneration of Music Playlists: Survey and Ex-periments. ACM Comput. Surv., 47(2):26:1–26:35,November 2014.

10

[35] M. Jagger and K. Richards. You can’t always getwhat you want. In Let It Bleed. Decca, 1969.

[36] F. Sinatra. In the Wee Small Hours. Capitol, 1955.[37] Albert LeBlanc, Young Chang Jin, Lelouda Stamou,

and Jan McCrary. Effect of Age, Country, and Gen-der on Music Listening Preferences. Bulletin of theCouncil for Research in Music Education, (141):72–76, 1999.

[38] Mohsen Kamalzadeh, Dominikus Baur, and TorstenMoller. Listen or interact? A Large-scale survey onmusic listening and management behaviours. Journalof New Music Research, 45(1):42–67, January 2016.

[39] Colin Jerolmack and Shamus Khan. Talk is cheapethnography and the attitudinal fallacy. SociologicalMethods & Research, 43(2):178–209, 2014.

[40] Alinka E. Greasley and Alexandra Lamont. Explor-ing engagement with music in everyday life using ex-perience sampling methodology. Musicae Scientiae,15(1):45–71, March 2011.

[41] Spotify Insights. https://insights.spotify.com[42] Paul Lamere. The Skip. Music Machinery blog post,

https://musicmachinery.com/2014/05/02/the-skip/[43] Kevin Lewis. Three fallacies of digital footprints.

Big Data & Society, 2(2):2053951715602496, Decem-ber 2015.

[44] Riccardo Gallotti, Armando Bazzani, Sandro Ram-baldi, and Marc Barthelemy. A stochastic model ofrandomly accelerated walkers for human mobility. Na-ture Communications, 7:12600, August 2016.

11

APPENDIX

Data pre-processing

After signing a non-disclosure aggrement (NDA)with a major music-on-demand company, we weregiven access to a dataset containing the raw streamingdata of 10,000 anonymous users of the service. Theseanonymous users were randomly selected among allthe registered French users of the service. The stream-ing data correspond to their entire listening historyduring the 6-months period spanning from June 1st2013 till December 1st 2013. The users were sampleduniformly among all their French users (no bias re-garding suscription plan, listening activity, sex, age,geographical location or years of use). Consequentlythe listening data are those of users with differenttypes of usage of the service. In particular, some ofthem are paying suscribers, while some other aren’t(”free” suscribing users). The latter have to listen ad-vertisement between songs, cannot listen offline, theaudio quality is lower, and for some of them the totallistening time is bounded. We note that these com-bined aspects can obviously influence the listening ac-tivity, probably decreasing the time spent using theservice each day.

Our goal was to focus on individuals who use themusic-on-demand streaming platform as one of theirmain music sources (if not the main one). Conse-quently we applied a number of filters to select rele-vant users and eliminate those whose streaming datamay not be representative of their listening habits ingeneral, whatever the source and device. The datainclude a few information on the users’ profiles (self-declared age, sex and city of residence), but we donot know if the user is a paying suscriber or not, andcould not get this information from the company. Itprevented us to simply filter them.

From the data we inspected the number of uniqueusers per day and realized that it displays large fluc-tuations. Not all users were active during the entire 6-month period (some appearing only after a given date,some disappearing). To circumvent this we focused ona 100 days period (from 15/08/2013 to 22/11/2013)during which the number of unique listeners per dayremained stable. We then selected users who dis-played regularity in their use of the service duringthis 100 days period and used the service one everytwo days in average. Hence the filter is not on thetotal activity (listening time) of the users but on theirfrequency of use. We ended with 4,615 anonymoususers.

Supplementary Methods

Clustering listeners according to their daily streamingrhythms

For each user i we construct a 24-values vector(p0i , . . . , p

23i ), where phi is the proportion of the user’s

plays that took place between hour h and hour h+ 1,during the 100 days period. We clusters these vectorswith the k-means method. A usual question whenclustering n individuals into k clusters (with k fixeda priori) is to determine an appropriate value of k. Asmall k value will result is a very simple picture butwhich may poorly capture the fluctuations in the data,while k ∼ n will capture the variance almost perfectlybut will be useless. To determine a reasonnable rangeof values for k we plotted on Supplementary Figure2 the averaged percentage of variance captured by kclusters resulting from the application of k-means tothe n listeners. From this curve we use the ’elbowmethod’ to determine the range of reasonnable valuesfor k. It appears that 4 =< k <= 12 are candidates,and we looked at the average time profiles resultingfrom the clustering with each value of k. From k >= 5we observed less distinct average profiles, which is whywe kept k = 4 for the figure discussed in the main text.

Characterizing individual distributions of plays

Supplementary Figure 3 gives the distribution ofthe listeners’ Gini coefficient resuming the heterogene-ity of their distribution of plays among songs (see theMethods section of the main text). Listeners who havea small Gini value (typically < 0.3) are those whosedistribution is almost flat, indicating that they lis-tened their songs approximately the same number oftimes. For such listeners the average value P/S (withP the total number of plays and S the total numberof unique songs listened) gives a clear picture of theirlistening practice and of their repetition/discovery be-havior. But for a large proportion of individuals theGini coefficient is large (> 0.5), revealing that theseusers have concentrated their plays on a limited num-ber of songs.

S vs. P. On Supplementary Figure 4A we plot foreach user i her/his number Si of distinct songs listenedversus her/his total number of plays Pi (here limitedto the individuals with P < 5000 – which captures95% of listeners). Each single point represents an in-dividual, and we see points distributed all across thetriangle (by construction we have S <= P ), whichindicates a wide variety of profiles in terms of repe-tition/exploration. Some listen to a relatively smallnumber of songs and listen to them a lot (small S andlarge P case), while to the opposite some other lis-ten to many different songs and listen each of thema few number of times (the points near the dashedline S = P ). Another way to capture the tendencyof individuals to concentrate their plays on a limited

12

fraction of the songs listened is to calculate the rela-tive dispersion σi/µi of their distribution of plays persong (Pi). The histogram of σi/µi on SupplementaryFigure 3B shows that there exists several types of lis-teners. Individuals with σ/µ� 1 have a distributionof plays per song peaked around the mean, and theaverage number of plays per song pi is hence an in-formative value. To the contrary for individuals withσ/µ� 1, the average number of plays per song Pi/Siwould not be representative of their listening practice.

Selection and aggregation of music genres

Each song of a music-on-demand service is taggedwith one or several tags (in the following we namethem basic genres tags and the histogram of the num-ber of tags per song is displayed on SupplementaryFigure 7). The weight of these basic genres – mea-sured through their total number of songs in thestreams – is very different from one genre to an-other. Furthermore, the analysis of the network ofbasic genre tags reveal different types of tags, andsome hierarchical relations between them (some en-tirely include/contain others). We build the weightedand directed network Sij of genre tags. It is the 1-mode projection of the bipartite network linking songsand genres tags. In this network the nodes representthe basic genres, and Si→j = k means that there are ksongs tagged with genre i which are also tagged withgenre j (the network is directed and in most casesSi→j 6= Sj→i). Some basic genre tags correspond tovery broad, higher-order categories (e.g. ’Pop’, ’Rock’’Alternative’, ’Dance’) and serve as coarse-grainedclassifiers. The analysis of this network reveals thatit has a hierarchical structure and that some of thesegenres tags entirely “contain” smaller, more informa-tive tags. For example all “Blues” songs are alsotagged as being “Rock” songs, and all “Metal/Hard-Rock” songs are also tagged as “Rock”. On the con-trary, there are some “Rock” songs that are taggedonly as “Rock”. The purpose of the filtering we per-formed and detail below was then to keep the mostinformative/precise tag(s) for each song, whenever itis possible and relevant.

To perform our genre analysis, we start with thesame dataset D used in subsections 1 (Rhythms), 2(Difference and repetition) and 4 (Playing music ata party) of the results section in the main text. Thisdataset contains the entire streams history of the 4,615users during 100 days. We first merge this datasetwith the songs database provided by the music-on-demand company, which contains the genre tags as-sociated to each song of the catalog. It results in adataset D′ giving us the complete streams associatedto each of the basic genre tags. We filter this datasetof ≈ 2 ∗ 107 entries by applying the following rules:

• we remove the purely “geographical” tags(World, France, Europe, North America,Central America/Caribean, South America,

Brazil, Africa, Maghreb, Middle East, Aus-tralia/Pacific) which give limited information onthe music genre itself;

• we filter out the basic genre tags which accountfor less than 0.01% (nb: arbitrary parameterchoice) of the songs listened – including ”Bolly-wood”, ”Finnish folk”, ”Medieval”, ”Chaabi”,”Comptines/Chansons”, ”Mento/Calypso”,”K-Pop”, ”Celtic music, ”Bachata”, ”Classi-cal turkish music”, ”Axe/Forro”, ”Regionalmexicain”, ”Instrumental Hip Hop, ”Mari-achi”, ”Argentinian folklore”, ”German rap”,”Banda”, ”Nederlandstalige volksmuziek”,”Brasilian rock”, ”South-African House”,”Teenthaı”, etc. –, in order to reduce the number ofgenres compared and focus on the most listenedones; after this step we have discarded ≈ 40%of the basic genre tags.

• we keep all the streams of songs which aretagged with one basic genre tag only (≈ 3× 105

songs); the basic genres tags that are used assingle tags are the following: Hip Hop, Dance,Pop, R&B/Soul/Funk, Reggae, Electro, Rock,Alternative, International pop, Jazz, Classical,Variete, Country, Brasil, Music for kids, Frenchchanson, Movies/Video games, Tropical, Latinrock.

• for the remaining streams of songs tagged withtwo or more basic genres tags (≈ 2.5 × 105

songs), we inspect the statistic of the pairs ofbasic genres tags among songs. For each pairof basic genres (g, g′), if it appears that g en-tirely contains g′ (i.e. all songs tagged as g′

are also tagged as g), then we remove g fromthe corresponding streams (i.e. streams whichwere previously associated to these two basicgenres will now be associated only with genreg′). For example, let’s say we wish to calcu-late aggregated statistics for “Rock”, “Blues”and “Metal” streams. Since all songs taggedas “Blues” (resp. “Metal”) are also tagged as“Rock”, to come up with more relevant statis-tics for the three genres we consider that itis more significant to discard streams associ-ated to “Blues” and “Metal” songs when com-puting reference statistics for “Rock” streams(and keep in the “Rock” category limited tothe songs tagged solely as “Rock” ). We endup in removing basic genre tags such as Pop,Rock, Electro, Hip Hop, Jazz, Classical, etc.among the tags that qualify songs with 2 or moretags, because they are systematically associatedwith more informative tags (e.g. indie rock,Metal/hard-rock, Blues, Techno/House, rock’nroll/rockabilly, Funk, chill out/trip-hop, instru-mental jazz, instrumental hip-hop, opera, etc.).

We end up with more than 7 × 106 streams. Thestatistics of the number of basic genre tags per song

13

#genres #songs weight

1 439582 9.002220e-01

2 38348 7.853304e-02

3 9384 1.921754e-02

4 970 1.986467e-03

5 20 4.095809e-05

TABLE SI. Number and proportion of songs having a givennumber k of genre tags, after filtering and removing higher-order genre tags.

in the remaining filtered streams are given in Table I.More than 90% of the songs are tagged with 1 genreonly, and almost all songs (98%) are tagged with 1 or2 genres, limiting the risks of confusion in the analysisof listening patterns associated to various genres.

14

SUPPLEMENTARY FIGURES

0.1

0.2

0 3 6 9 12 15 18 21

t

⌠ ⌡ t

t+1 U

(t)

⌠ ⌡ O23

U(t)

weekdays

weekends

Supplementary Figure 1. Hourly evolution of the proportion of active listeners, during an average weekdayand an average weekend day.During weekdays the number of individuals listening to music increases over the dayto reach a maximum around 7-8 p.m. We notice two small peaks, one in the morning aroung 8 a.m. and the other oneat lunchtime around 1 p.m. Most of the plays occur during the afternoon and evening. During weekends the collectiverhythm is different with two peaks, one around 12 p.m. and the other in the early evening, around 7-8 p.m. (error barsare small indicating a high regularity of temporal listening habits

15

0

20

40

60

0 10 20 30 40 50

Number of clusters

Per

cent

age

of v

aria

nce

expl

aine

d

Supplementary Figure 2. Percentage of variance explained when clustering listeners in k groups according to theirlistening time profile average over 100 days.

(a) (b)

0.00

0.05

0.10

0.15

0.00 0.25 0.50 0.75 1.00Gini(Pi)

Fre

quen

cy

0.00

0.04

0.08

0.12

0.7 1.3 3.11.0

σ(Pi) µ(Pi)

Fre

qu

en

cy

Supplementary Figure 3. (a) Histogram of the individuals’ Gini normalized coefficient resuming their distribution ofplays among the songs they listened (b) Histogram of the users’ relative dispersion of plays. For each user i, µi is themean value of his/her distribution of plays per song, while σi is the standard deviation of this distribution.

16

(a) (b) (c)

0

1000

2000

3000

4000

5000

0 1000 2000 3000 4000 5000

P

S

0

1000

2000

3000

4000

5000

0 1000 2000 3000 4000 5000

P

A

0

1000

2000

3000

0 1000 2000 3000

S

A

Supplementary Figure 4. On these scatterplots each dot represents a listener (a) Number of unique songs listened Svs. total number of plays P (b) Number of unique artists listened A vs. total number of plays P (c) A vs S. Thedata appears clouded, we see a lot of fluctuations and no clear relation linking P , S and A that would allow to makepredictions.

0

10

20

30

40

0 20 40 60% of albums listened

which are listened entirely

% o

f lis

tene

rs

Supplementary Figure 5. Proportion of albums listened entirely (but not necessarily in just one time andsequential order). More than 50% of the users listened entirely less than 5% of all the albums in which they grabbedsongs, underlining that album-oriented listening sessions are clearly not the standard practice on streaming platforms.

17

0

2

4

6

8

20 25 30 35Users Age

Ave

rage

son

gs a

ge

Supplementary Figure 6. Do we stay true to the music of our generation? Average age of songs listened as afunction of the listener’s age. While there are of course a lot of fluctuations and different listening behaviours inside agiven ’demographic group’, we observe a tendency: as people get older, they tend to listen to older music as well.

0.0

0.1

0.2

0.3

0.4

0.5

1 5 10

# basic genres tags

Pro

port

ion

of s

ongs

Supplementary Figure 7. Histogram of the number of basic genre tags per song, before filtering. The songsconsidered for building the histogram are those which have been listened to during the 100 days period under scrutinity.

18

0

10

20

30

0 25 50 75 100

Percentage of the main popular music genreslistened at least once in a 100 days period

Pe

rce

nta

ge

of in

div

idu

als

0

5

10

15

20

0 5 10 15

# among 15 popular genreslistened at least once

Supplementary Figure 8. Listeners seem eclectic at first sight. Histogram of the proportion of the main genreslistened at least once in a 100 days period ; (inset) Histogram of the number of genres listened at least once in a 100 daysperiod among 15 classic popular broad music genres (Pop, Rock, Hip Hop, Dance, R&B/Soul/Funk, Reggae, Electro,Alternative, Country, Jazz, Classical music, Chanson, ’Variete’, Tropical).

19

0.0

0.1

0.2

0.00 0.25 0.50 0.75 1.00

Normalized Gini coefficient of individuals'distribution of plays per genre

Pro

po

rtio

n o

f in

div

idu

als

0.0

0.1

0.2

0.3

1 2 3 4 5

Number of dominant genres among plays

Pro

po

rtio

n o

f in

div

idu

als

A B

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

#plays in the 3 favorite genres

#plays total

Cu

mu

lativ

e f

ractio

n o

f in

div

idu

als

C

Supplementary Figure 9. The listeners’ inegal repartition of plays among the music genres they listen to. (a)Distribution of the Gini coefficient values of individuals that resume the inequality of their distribution of plays amongthe different main music genres (b) Histogram of the number of dominant genres D among individuals (see Methods inthe main text for details about the calculation of the number of dominant terms in a distribution from its Gini coefficient)(c) CDF of the weight of the 3 favorite genres (determined from the number of plays) among listeners.

20

Classique

Dance

Electro

French pop

French rap

Hip Hop

Indie pop

Indie rock

Jazz

Metal/HardRock

Pop

Reggae

Rock

Alternative

Classique

Contemporary R&BDance

Electro

French pop

French rap

Hip Hop

Indie pop

Indie rock

Jazz

Metal/HardRock

Pop

R&B/Soul/Funk

Rap misc

ReggaeRock

Rock & Roll/Rockabilly

Techno/House

Second favorite genre

Favo

rite

genr

e

Supplementary Figure 10. Conditional probabilities to observe pairs of favorite genres among listeners. Thecolor intensity of each cell is indexed on the probability p(g2/g1) that a listener having g1 as favorite genre has g2 assecond favorite genre.

21

weekday friday saturday sunday

Classical

Blues

JazzR

ockR

&B

/Soul

Metal

Electro

Hip H

opD

ance

0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21

0.000.020.040.060.08

0.000.020.040.060.08

0.000.020.040.060.08

0.000.020.040.060.08

0.000.020.040.060.08

0.000.020.040.060.08

0.000.020.040.060.08

0.000.020.040.060.08

0.000.020.040.060.08

Hour

⌠ ⌡ tt+1 P

(t)

⌠ ⌡

day P

(t)

Supplementary Figure 11. Relative proportion of plays as a function of the hour of the day and day of theweek, for different music genres. Let alone micro variations – which might be due to the relatively small numberof listeners of certain genres at certain hours (nb: the total number of listeners under study is 4615) and to statisticalfluctuations – we see no significative differences between the average listening time profiles of the genres.

22

Pop Hip Hop Dance

Rock Alternative Electro

R&B/Soul/Funk Jazz Classical

10−810−610−410−2100

10−810−610−410−2100

10−810−610−410−2100

1 10102103104 1 10102103104 1 10102103104

#plays

Frequency

Pop Hip Hop Dance

Rock Alternative Electro

R&B/Soul/Funk Jazz Classical

101102103104

101102103104

101102103104

1 10 102 1031 10 102 1031 10 102 103

rank

#plays

A B

Supplementary Figure 12. The hierarchy of songs in various genres. (a) Histogram of the number of plays persong for the same set of genres (b) Rank-size plot for the 1000 most popular songs in the streams of nine classic musicgenres.

23

PopHip Hop

Dance

Indie rock

ElectroMetal

Rock

Jazz

Classical

0.4

0.5

0.6

0.7

0.2 0.3 0.4 0.5

Exploration rate S/Pin the favorite genre

Exp

lora

tion

ra

te S

/Pin

oth

er

ge

nre

s

25

30

35

40

Average ageof fans

Supplementary Figure 13. Exploration statistics for groups of fans of various music genres, along with the average ageof the group members (fill colour) and its importance among the population of listeners (circle size).

headphones on the wire - arxiv.org · french national syndicate of phonographic publishing [29],...

Documents