bbcsoundindex

BBC SoundIndexPulse of the Online Music Populace

http://www.almaden.ibm.com/cs/projects/iis/sound/

Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth, Multimodal Social Intelligence in a Real-Time Dashboard System to appear in a special issue of the VLDB Journal on

"Data Management and Mining for Social Networks and Social Media", 2010



The VisionWhat is ‘really’ hot?

!  Netizens do not always buy their music, let alone

buy in a CD store.

!  Traditional sales figures are a poor indicator of

music popularity.





BBC: Are online music communities good proxies for popular music listings?!


buy in a CD store.


music popularity.






IBM: Well, lets build and find out!


buy in a CD store.


music popularity.







BBC SoundIndex - “A pioneering project to tap into the online buzz surrounding artists and songs, by leveraging several popular online sources”


buy in a CD store.


music popularity.







BBC SoundIndex - “A pioneering project to tap into the online buzz surrounding artists and songs, by leveraging several popular online sources”


buy in a CD store.


music popularity.


“one chart for everyone” is so old!



“Multimodal Social Intelligence in a Real-Time Dashboard System”, VLDB Journal 2010 Special Issue: Data Management and Mining for Social Networks and Social Media.

User metadata, Artist/Track

Metadata

unstructured, structured attention

metadata

right data source, right crowd, timeliness of data..?

UIMA Analytics Environment

Album/Track identificationSentiment IdentificationSpam and off-topic comments




Exracted concepts into explorable datastructures


What are 18 year olds in London listening to?



What are 18 year olds in London listening to?

Validating crowd-sourced preferences



Pulse of the ‘foodie’ populace!Where are 20 something year olds going? Why?

Imagine doing this for a local business!

4

Fig. 2 SoundIndex architecture. Data sources are ingested and if necessary transformed into structured data using the MusicBrainzRDF and data miners. The resulting structured data is stored in a database and periodically extracted to update the front end.

Is the radio still the most popular medium for music? In

2007 less than half of all teenagers purchased a CD6 and

with the advent of portable MP3 players, fewer people

are listening to the radio.

With the rise of new ways in which communities

are exposed to music, the BBC saw a need to rethink

how popularity is measured. Could the wealth of in-

formation in online communities be leveraged by moni-

toring online public discussions, examining the volume

and content of messages left for artists on their pages

by fans, and looking at what music is being requested,

traded and sold in digital environments? Furthermore,

could the notion of “one chart for everyone” be replaced

with a notion that different groups might wish to gen-

erate their own charts reflecting the popularity of peo-

ple like themselves (as the Guardian7 put it — “What

do forty something female electronica fans in the US

rate?”). Providing the data in a way that users can ex-

plore the data of interest to them was a critical goal of

our project.

2.2 Basic Design

The basic design of the SoundIndex, which we will de-

scribe in more detail in the remainder of the paper, is

shown in Figure 2. We looked at several multimodal

sources of social data, including “discussion sites” such

as MySpace, Bebo and the comments on YouTube, sources

6 NPD Group: Consumers Acquired More Music in 2007, ButSpent Less

7 http://www.guardian.co.uk/music/2008/apr/18/popandrock.netmusic

of “plays” such as YouTube videos and LastFM tracks,

and purely structured data such as the number of sales

from iTunes. We then clean this data, using spam re-

moval, off-topic detection, and performed analytics to

spot songs, artists and sentiment expressions. The cleaned

and annotated data is combined, adjudicating for dif-

ferent sources, and uploaded to a front end for use in

analysis.

2.3 Challenges in Continuous Operation

Real world events often impact systems attempting to

use social data. Every time the news reports that MyS-

pace has been a subject of a DDOS attack, one can be

assured that those trying to pull their data are sufferingas well. When sites report wildly inconsistent numbers

for views or listens (e.g., total number of “all time”

listens today is less than the total number yesterday),

it also causes problems for the kind of aggregation we

wish to do.

Combine these unforeseeable events with the data

scale, multiple daily updates to the front end, and run-

ning a distributed system with components on differentcontinents, and it becomes clear why this kind of system

is challenging to build and operate.

2.4 Testing and Validation

Is the social network you are looking at the right one

for the questions you are asking? Is the data timely and

relevant? Is there enough signal? All of these questions

need to be asked about any data source, and even more

SoundIndex Architecture

UIMA Annotators

UIMA Annotators

MySpace user comments (~ Twitter)

Named Entity Recognition

Sentiment Recognition

Spam Elimination

I heart your new song Umbrella..

madge..ur pics on celebration concert with jesus r awesome!

Challenges, intuitions, findings, results..

Recognizing Named Entities

Cultural entities, Informal text, Context-poor utterances, restricted to the music sense..

“Ohh these sour times... rock!”

Recognizing Named Entities

Cultural entities, Informal text, Context-poor utterances, restricted to the music sense..

Problem Defn: Semantic Annotation of album/track names (using MusicBrainz) [at ISWC ‘09]

“Ohh these sour times... rock!”

NERGot to Be There Ben Music and Me Forever Michael Off the Wall Thriller Bad Dangerous

http://en.wikipedia.org/wiki/Got_to_Be_There


http://en.wikipedia.org/wiki/Ben_%28album%29


http://en.wikipedia.org/wiki/Music_%26_Me


http://en.wikipedia.org/wiki/Forever,_Michael


http://en.wikipedia.org/wiki/Off_the_Wall_%28album%29


http://en.wikipedia.org/wiki/Thriller_%28album%29


http://en.wikipedia.org/wiki/Bad_%28album%29


http://en.wikipedia.org/wiki/Dangerous_%28album%29


NER

Spot and Disambiguate Paradigm

Generate Candidate entities

“Thriller was my most fav MJ album”

“this is bad news, ill miss you MJ”

Disambiguate spots/mentions in context

Got to Be There Ben Music and Me Forever Michael Off the Wall Thriller Bad Dangerous

















NER

Spot and Disambiguate Paradigm

Generate Candidate entities

“Thriller was my most fav MJ album”


Disambiguate spots/mentions in context

Got to Be There Ben Music and Me Forever Michael Off the Wall Thriller Bad Dangerous

Disambiguation Intensive Approach

















60 songs with Merry Christmas

3600 songs with Yesterday

195 releases of American Pie

31 artists covering American Pie“Happy 25th! Loved

your song Smile ..”

Challenge 1: Multiple Senses, Same Domain

9

This new Merry Christmas tune.. SO GOOD!

Intuition: Scoped Graphs

Scoped Relationship graphs

using cues from the content, webpage title, url..

Which ‘Merry Christmas’? ‘So Good’ is also a song!

9

This new Merry Christmas tune.. SO GOOD!

Intuition: Scoped Graphs

Scoped Relationship graphs

using cues from the content, webpage title, url..

Reduce potential entity spot size

Generate candidate entities

Disambiguate in context

Which ‘Merry Christmas’? ‘So Good’ is also a song!

What Content Cues to Exploit?“I heart your new album Habits”

“Happy 25th lilly, love ur song smile”

”Congrats on your rising star award”..

Releases that are not new; artists who are at least 25;

new careers..

Experimenting with Restrictions

Career LengthAlbum ReleaseNo. of Albums

...Specific Artist

Gold Truth Dataset1800+ spots in MySpace user comments from artist pages

Keep your SMILE on!

good spot, bad spot, inconclusive spot?

4-way annotator agreements across spots

Madonna 90% agreement

Rihanna 84% agreement

Lily Allen 53% agreement

Sample Restrictions, Spot Precision



...Specific Artist

3 artists, 1800+ spots

Sample Restrictions, Spot Precision

!"""#$

!""#$

!"#$

!#$

#$

#"$

#""$!"""#$ !""#$ !"#$ !#$ #$ #"$ #""$

!"#$%&%'()'*)+,#)-.'++#"%&'()*+,-..)/*./&0%)1*-%*-%23*405&%%&*+-%6+**

*****789!9$*,/):0+0-%;

&/.0+.+*<5-+)*=0/+.*&2>?@*<&+*0%*.5)*,&+.*8*3)&/+

*&22*&/.0+.+*<5-*/)2)&+)1*&%*&2>?@*0%*.5)*,&+.*8*3)&/+

&.*2)&+.*8*&2>?@+)A&:.23*8*&2>?@+

*)%.0/)*B?+0:*C/&0%D*.&A-%-@3*7"!"""8$*,/):0+0-%;

From all of MusicBrainz (281890 artists, 6220519 tracks) to tracks of one artist



...Specific Artist


Sample Restrictions, Spot PrecisionClosely follows distribution of random restrictions, conforming loosely to

a Zipf distribution


!"!!!#$

!"!!#$

!"!#$

!"#$

#$

#!$

#!!$!"!!!#$ !"!!#$ !"!#$ !"#$ #$ #!$ #!!$

!"#$"%&'()'&*"'&+,(%(-.'/0"1'2.'&*"'03(&&"#

!#"$404(%'()'&*"'53(&&"#

%&'()*+(*,-'.,-.(/&012,-,34.51'&6.,-7)(*'(.8/1)1+(&)*'(*+'19&)1:&.&2;&+(&6*3;),9&3&-(..<=*;>6*'()*5?(*,-@



...Specific Artist


Sample Restrictions, Spot PrecisionClosely follows distribution of random restrictions, conforming loosely to

a Zipf distribution

Choosing which constraints to implement is simple - pick easiest first


!"!!!#$

!"!!#$

!"!#$

!"#$

#$

#!$

#!!$!"!!!#$ !"!!#$ !"!#$ !"#$ #$ #!$ #!!$

!"#$"%&'()'&*"'&+,(%(-.'/0"1'2.'&*"'03(&&"#

!#"$404(%'()'&*"'53(&&"#

%&'()*+(*,-'.,-.(/&012,-,34.51'&6.,-7)(*'(.8/1)1+(&)*'(*+'19&)1:&.&2;&+(&6*3;),9&3&-(..<=*;>6*'()*5?(*,-@



...Specific Artist


Scoped Entity Lists

User comments are on MySpace artist pages

Restriction: Artist name

Assumption: no other artist/work mention

Madonna’s tracks

Scoped Entity Lists


User comments are on MySpace artist pages

Restriction: Artist name

Assumption: no other artist/work mention

Naive spotter has advantage of spotting all possible mentions (modulo spelling errors)

Generates several false positives

Madonna’s tracks

Challenge 2: Disambiguating Spots

Got your new album Smile. Loved it!

Keep your SMILE on!

Separating Valid and invalid mentions of music named entities.

Intuition: Using Natural Language Cues

15

Syntactic features Notation-S+POS tag of s s.POS

POS tag of one token before s s.POSb

POS tag of one token after s s.POSa

Typed dependency between s and sentiment word s.POS-TDsent∗

Typed dependency between s and domain-specific term s.POS-TDdom∗

Boolean Typed dependency between s and sentiment s.B-TDsent∗

Boolean Typed dependency between s and domain-specific term s.B-TDdom∗

Word-level features Notation-W+Capitalization of spot s s.allCaps

+Capitalization of first letter of s s.firstCaps

+s in Quotes s.inQuotes

Domain-specific features Notation-DSentiment expression in the same sentence as s s.Ssent

Sentiment expression elsewhere in the comment s.Csent

Domain-related term in the same sentence as s s.Sdom

Domain-related term elsewhere in the comment s.Cdom+Refers to basic features, others are advanced features

∗These features apply only to one-word-long spots.

Table 6. Features used by the SVM learner

Valid spot: Got your new album Smile.

Simply loved it!

Encoding: nsubj(loved-8, Smile-5) imply-

ing that Smile is the nominal subject of

the expression loved.

Invalid spot: Keep your smile on. You’ll

do great !Encoding: No typed dependency between

smile and great

Table 7. Typed Dependencies Example

Typed Dependencies:

We also captured the typed de-

pendency paths (grammatical rela-

tions) via the s.POS-

TDsent and s.POS-TDdom boolean

features. These were obtained be-

tween a spot and co-occurring senti-

ment and domain-specific words by

the Stanford parser[12] (see exam-

ple in 7). We also encode a boolean

value indicating whether a relation

was found at all using the s.B-TDsent

and s.B-TDdom features. This allows us to accommodate parse errors given the

informal and often non-grammatical English in this corpus.

5.2 Data and Experiments

Our training and test data sets were obtained from the hand-tagged data (see

Table 3). Positive and negative training examples were all spots that all four

annotators had confirmed as valid or invalid respectively, for a total of 571 posi-

tive and 864 negative examples. Of these, we used 550 positive and 550 negative

examples for training. The remaining spots were used for test purposes.

Our positive and negative test sets comprised of all spots that three annota-

tors had confirmed as valid or invalid spots, i.e. had a 75% agreement. We also

included spots where 50% of the annotators had agreement on the validity of the

Generic syntactic, spot-level, domain features



15


















Simply loved it!






smile and great


Typed Dependencies:
























1. Sentiment Expressions:Slang sentiment gazetteer using Urban Dictionary

2. Domain specific termsmusic, album, concert..



15


















Simply loved it!






smile and great


Typed Dependencies:
























1. Sentiment Expressions:Slang sentiment gazetteer using Urban Dictionary

2. Domain specific termsmusic, album, concert..


















Simply loved it!






smile and great


Typed Dependencies:
























Binary Classification

SVM Binary Classifiers

Training Set:

Positive examples (+1): 550 valid spots

Negative examples (-1): 550 invalid spots

Test Set:

120 valid spots, 458 invalid spots


Valid Music mention or not?

Keep ur SMILE on!

Efficacy of Features

Prec

ision

inte

nsive

Recall intensive

Efficacy of Features

Identified 90% of valid spotsEliminated 35% of invalid spots

78-50

42-91

90-35

Prec

ision

inte

nsive

Recall intensive

Efficacy of Features- Feature combinations were most stable, best performing


78-50

42-91

- Gazetteer matched domain words and sentiment expressions proved

to be useful

90-35

Prec

ision

inte

nsive

Recall intensive

Efficacy of Features- Feature combinations were most stable, best performing

PR tradeoffs: choosing feature combinations depending on end application


78-50

42-91

- Gazetteer matched domain words and sentiment expressions proved

to be useful

90-35

Prec

ision

inte

nsive

Recall intensive

How did we do overall ?

!"

#!"

$!"

%!"

&!"

'!!"

()*+,

-./00,1

2!345

6&35!

6$3$!

6#345

6'36!

6!35%

%#3&$

%'36!

%!3&5

$53&%

$#32'

!"#$$%&%'()#**+(#*,)$-"%.$)/0#"%12%30#"%14

5('*%$%63)7)8'*#""

71,89-9/(:;/1:<9=>:?==,(71,89-9/(:;/1:@9A)(()71,89-9/(:;/1:B)C/(()@,8)==:D)==:0A1,,E

Step 1: Spot with naive spotter, restricted knowledge base

Madonna’s track spots-

23% precision


!"

#!"

$!"

%!"

&!"

'!!"

()*+,

-./00,1

2!345

6&35!

6$3$!

6#345

6'36!

6!35%

%#3&$

%'36!

%!3&5

$53&%

$#32'

!"#$$%&%'()#**+(#*,)$-"%.$)/0#"%12%30#"%14

5('*%$%63)7)8'*#""

71,89-9/(:;/1:<9=>:?==,(71,89-9/(:;/1:@9A)(()71,89-9/(:;/1:B)C/(()@,8)==:D)==:0A1,,E


Step 2: Disambiguate using NL features (SVM classifier)


23% precision 42-91 : “All features” setting


!"

#!"

$!"

%!"

&!"

'!!"

()*+,

-./00,1

2!345

6&35!

6$3$!

6#345

6'36!

6!35%

%#3&$

%'36!

%!3&5

$53&%

$#32'

!"#$$%&%'()#**+(#*,)$-"%.$)/0#"%12%30#"%14

5('*%$%63)7)8'*#""

71,89-9/(:;/1:<9=>:?==,(71,89-9/(:;/1:@9A)(()71,89-9/(:;/1:B)C/(()@,8)==:D)==:0A1,,E


Step 2: Disambiguate using NL features (SVM classifier)


23% precision


~60% precision

42-91 : “All features” setting

Lessons Learned..Using domain knowledge especially for cultural entities is non-trivial

Computationally intensive NLP is prohibitive

Two stage approach: NL learners over dictionary-based naive spotters

allows more time-intensive NL analytics to run on less than the full set of input data

control over precision and recall of final result!

"!!

#!!

$!!

%!!

"&!! "'!! "(!! ")!! "*!! #!!!!"#$%&'()#&*+&)#$",$&*#&+*-./".0&'()#&*+&1)./

23*$,&3(#&)#$",$

'!+,-./+012/+324/+56,176+,-.+8.7967:0129;4+32-.84-

Lessons Learned..Using domain knowledge especially for cultural entities is non-trivial

Computationally intensive NLP is prohibitive

Two stage approach: NL learners over dictionary-based naive spotters

allows more time-intensive NL analytics to run on less than the full set of input data

control over precision and recall of final result

Sentiment ExpressionsNER allows us to track and trend online mentions

Popularity assessment: sentiment associated with the target entity

Exploratory Project: what kinds of sentiments, distribution of positive and negative expressions..

Observations, ChallengesNegations, sarcasms, refutations are rare

Short sentences: OK to overlook target attachment

Target demographic: Teenagers!

Slang expressions: Wicked, tight, “the shit”, tropical, bad..

What are common positive and negative expressions this demographic uses?

Urbandictionary.com to the rescue!

Urbandictionary.com

Related tagsbad appears with terrible and good!

Glossary

Slang Dictionary!

Mining a Slang Sentiment DictionaryMap frequently used sentiment expressions to their orientations

Mining a Slang Sentiment Dictionary

Seed positive and negative oriented dictionary entries

good, terrible..

Map frequently used sentiment expressions to their orientations


Seed positive and negative oriented dictionary entries

good, terrible..

Step 1:For each seed, query UD, get related tags

Good ->awesome, sweet, fun, bad, rad..

Terrible ->bad, horrible, awful, shit..

Map frequently used sentiment expressions to their orientations


Good ->awesome, sweet, fun, bad, rad

Step 2: Calculate semantic orientation of each related tag [Turney]

SO(rad) = PMIud(rad, “good”) – PMIud(rad, “terrible”)


Good ->awesome, sweet, fun, bad, rad

Step 2: Calculate semantic orientation of each related tag [Turney]

SO(rad) = PMIud(rad, “good”) – PMIud(rad, “terrible”)

PMI over the Web vs. UD

SOweb(rock) = -0.112 SOud(rock) = 0.513


Step 3: Record orientation; add to dictionary

{good, rad}

{terrible}

Continue for other related tags and all new entries ..

Mined dictionary: 300+ entries (manually verified for noise)

Sentiment Expressions in Text“Your new album is wicked”

Shallow NL parse

verbs, adjectives [Hatzivassiloglou 97]

Your/PRP$ new/JJ album/NN is/VBZ wicked/JJ


Shallow NL parse


Look up word in mined dictionary, record orientation



Shallow NL parse


Look up word in mined dictionary, record orientation

Why dont we just spot using the dictionary!?

fuuunn! (coverage issues)

“super man is in town”; “i heard amy at tropical cafe yday..”


Dictionary Coverage Issues

Resort to corpus (transliterations)

Co-occurence in corpus (tight:top scoring dictionary entries)

“tight-awesome”:456, “tight-sweet”:136, “tight-hot” 429

Miscellaneous

Presence of entity improves confidence of identified sentiment

Short sentences, high coherence

Associate sentiment with all entities

If no entity spotted, associate with artist (whose page we are on)

“you are awesome!” on Madonna’s page

Evaluations - Mined Dictionary + Identifying Orientation

15

presence and orientation of sentiment expressions. Tun-able cut-off thresholds for annotators were determinedbased on experiments. Table 3 shows the accuracy ofthe annotator and illustrates the importance of usingtransliterations in such corpora.

Annotator Type Precision RecallPositive Sentiment 0.81 0.9Negative Sentiment 0.5 1.0

PS excluding transliterations 0.84 0.67

Table 3 Transliteration accuracy impact

These results indicate that the syntax and seman-tics of sentiment expression in informal text is difficultto determine. Words that were incorrectly identified assentiment bearing, because of incorrect parses due tosentence structure, resulted in inaccurate translitera-tions which contributed to low precision, especially inthe case of the Negative Sentiment annotator. We alsoexperimented with advanced NL features using depen-dency parses of comments, but found them to be ex-pensive and minimally effective due to poor sentenceconstructions.

Low recall was consistently traced back to one ofthe following reasons: A failure to catch sentiment in-dicators in a sentence (inaccurate NL parses) or a wordthat was not included in our mined dictionary, eitherbecause it did not appear in UD or because it hadinsufficient support in the corpus for the translitera-tion (e.g., ‘funnn’). Evaluating the annotator withoutthe mined dictionary, i.e., only with the corpus sup-port, significantly reduced the recall (owing to sparseco-occurrences in the corpus). However, it also slightlyimproved the precision, indicating the need for more se-lective transliterations. This speaks to our method formining a dictionary of transliterations that does nottake into account the context of the sentiment expres-sion in the comment – an important near-term inves-tigation for this work. Empirical analysis also revealedthat the use of negation words such as ‘not’, ‘never’was rare in this data and therefore ignored at this point.However, as this may not be true for other data sources,this is an important area of future work.

7.3 Dealing with Spam

Like many online data sets today, this corpus suffersfrom a fair amount of spam — off-topic comments thatare often a kind of advertising. A preliminary analy-sis shows that for some artists more than half of thecomment postings are spam (see Table 4). This level of

noise could significantly impact the data analysis andordering of artists if it is not accounted for.

Gorillaz 54% Placebo 39%Coldplay 42% Amy Winehouse 38%Lily Allen 40% Lady Sovereign 37%Keane 40% Joss Stone 36%

Table 4 Percentage of total comments that are spam for severalpopular artists.

A particular type of spam that is irrelevant to ourpopularity gauging exercise are comments unrelated tothe artist’s work; for example, those making referencesto an artist’s personal life. Eliminating such commentsand only counting those relevant to an artist or theirwork is an important goal in generating ranked artistlists.

Since many of the features in such irrelevant con-tent overlap with the ones of interest to us, classifyingspam comments only using spam-phrases or featureswas not very effective. However, a manual examinationof a random set of comments yielded useful cues to usein eliminating spam.

1. The majority of spam comments were related to thedomain, had the same buzz words as many non-spam comments and were often less than 300 wordslong.

2. Like any auto-generated content, there were severalpatterns in the corpus indicative of spam. Someexamples include “Buy our cd”, “Come check usout..”, etc.

3. Comments often had spam and appreciative contentin the same sentence which implied that a spamannotator would benefit from being aware of theprevious annotation results.

4. Empirical observations also suggested that the pres-ence of sentiment is pivotal in distinguishing spamcontent. Figure 8 illustrates the difference in distri-bution of sentiments in spam and non-spam content.

These observations also speak to the order in whichour annotators are applied. We first perform named en-tity spotting, followed by the sentiment annotator andfinally the spam filter.

7.3.1 Spotting Spam: Approach and Findings

Our approach to identifying spam (including those ir-relevant to an artist’s work) differs from past work, be-cause the small size of individual comments and the useof slang posed new challenges. Typical content-basedtechniques work by testing content on patterns or reg-ular expressions (Mason, 2002) that are indicative of


Negative sentiments: slang orientations out of context!

you were terrible last night at SNL but I still <3 you!

15






















Negative sentiments: slang orientations out of context!

you were terrible last night at SNL but I still <3 you!

Precision excluding corpus transliterations: Incorrect NL parses, selectivity

15





















Crucial Finding

-ve +ve

Sentiment Expression Frequencies in 60k comments, 26 weeks

< 4%

Crucial Finding

-ve +ve

Sentiment Expression Frequencies in 60k comments, 26 weeks

Forget about the sentiment annotator! Just use entity

mention volumes!

< 4%

Spam, Off-topic, PromotionsSpecial type of spam: unrelated to artists’ work

Paul McCartney’s divorce; Rihanna’s Abuse; Madge and Jesus

Self-Promotions

“check out my new cool sounding tracks..”

music domain, similar keywords, harder to tell apart

Standard Spam

“Buy cheap cellphones here..”

Observations

60k comments, 26 weeks

Common 4 grams pulled out several ‘spammy’ phrases

Phrases -> Spam, Non-spam buckets

16

SPAM: 80% have 0 sentiments

CHECK US OUT!!! ADD US!!!

PLZ ADD ME!

IF YOU LIKE THESE GUYS ADD US!!!

NON-SPAM: 50% have at least 3 sentiments

Your music is really bangin!You’re a genius! Keep droppin bombs!

u doin it up 4 real. i really love the album.keep doin wat u do best. u r so bad!

hey just hittin you up showin love to one ofchi-town’s own. MADD LOVE.

Fig. 8 Examples of sentiment in spam and non-spam comments.

spam, or by training Bayesian models over spam and

non-spam content (Blosser and Josephsen, 2004). Re-

cent investigations on removing spam in blogs use sim-

ilar statistical techniques with good results (Thoma-

son, 2007). These techniques were largely ineffective onour corpus because comments are rather short (1 or

2 sentences), share similar buzz words with non-spam

content, are poorly formed, and contain frequent vari-

ations of word/slang usage. Our approach of filtering

spam is an aggregate function that uses a finite set of

mined spam patterns from the corpus and other non-

spam content such as artist names, and sentiments that

are spotted in a comment.

Our spam annotator builds off a manually assem-

bled seed of 45 phrases found in the corpus that are

indicative of spam content. This seed was assembled

by empirical analysis of frequent 4-grams in comments

that are indicative of spam content. First, the algorithm

spots these possible spam phrases and their variations

in text using a window of words over the user comment

and computing the string similarity between the spam

phrase and the window of words in the user comment.

These spots along with a set of rules over the results

of the previous entity and sentiment annotators is used

to decide if a posting is spam. An example of such a

rule would be that if a spam phrase, artist/track name

and a positive sentiment were spotted, the comment

is probably not spam. By looking for previously spot-

ted meaningful entities we ensure that we only discard

spam comments that make no mention of an artist or

their work.

Table 5 shows the accuracy of the spam and non-

spam annotators for the hand-tagged comments. Our

analysis indicates that lowered precision or recall in the

spam annotator was a direct consequence of deficien-

cies in the preceding named entity and sentiment an-

notators. For example, in cases where the comment did

not have a spam pattern from our manually assembled

list, and the first annotator spotted incorrect tracks,

the spam annotator interpreted the comment to be re-

lated to music and classified it as non-spam. Other cases

included more clever promotional comments that in-

cluded the actual artist tracks, genuine sentiments and

very limited spam content. (e.g., “like umbrella ull love

this song. . . ”). As is evident, the amount of information

available in a comment (one or two sentences) in addi-

tion to poor grammar necessitates more sophisticated

techniques for spam identification. This is an important

focus of our ongoing research.

Annotator Type Precision RecallSpam 0.76 0.8

Non-Spam 0.83 0.88

Table 5 Spam annotator performance

8 Generation of the Hypercube

Many modern Business Intelligence systems are designed

to support exploration of the underlying dimensions of

“facts” and the correlations and relationships between

them. This can be modeled as a high dimensionality

data hypercube (also known as an OLAP cube(Codd

et al, 1993)). The system stores this in a relational

database (e.g., DB2) to allow the analyst to explore

the relative importance of various dimensions for the

task at hand (for the BBC the popularity of musical

topics). The dimensions of the cube are generated in

two ways: from the structured data (e.g., age, gender,

user location, timestamp, artist), and from the mea-

surements generated by a chain of annotator methods

that annotate artist, entity and sentiment mentions in

each comment16. The resulting tuple is then placed into

a star schema in which the primary measure is a rele-

vance with regards to musical topics. This is equivalent

of defining a function.

M : (Age,Gender, Location, T ime,Artist, ...) → M

In other words, we define a multidimensional grid

where the dimensions are quantities of interest. What

those are depends on the domain; for music, they are

values such as age, gender, etc. We “fill in” each point in

the grid with the number of occurrences of non-spam

comments made by a poster of that age, gender and

location posting at that time about that artist, song,

16 Due to the scale of data being examined these annotationsare generated automatically by the UIMA analysis chain. Whilethe results are spot checked by a human periodically to assurelanguage drift has not occurred, in general, they run unsuper-vised.

The spam annotator should be aware of other annotator results!

Spam Elimination

Aggregate function

Phrases indicative of spam (4-way annotator agreements)

Rules over previous annotator results

if a spam phrase, artist/track name and a positive sentiment were spotted, the comment is probably not spam.

Performance

Directly proportional to previous annotator results

incorrect entity, sentiment spots

Some self-promotions are just clever!

”like umbrella, ull love this song. . . “

Annotator Type Precision Recall

Spam 0.76 0.8

Non-Spam 0.83 0.88

Table 4: Spam annotator performance

the comment did not have a spam pattern, and the first

annotator spotted incorrect tracks, the spam annotator in-

terpreted the comment to be related to music and classified

it as non-spam. Other cases included more clever promo-

tional comments that included the actual artist tracks, gen-

uine sentiments and very limited spam content. (e.g. ‘like

umbrella ull love this song. . . ’). As is evident, the amount of

information available at hand in addition to grammatically

poor sentences necessitates more sophisticated techniques

for spam identification.

Given the amount of effort involved the obvious question

arises – why filter spam? For our end goal of using comment

counts to lead to positions on the list, filtering spam is key.

This is corroborated by the volume of spam and non-spam

content observed over a period of 26 weeks for 8 artists; see

Table 5. The chart indicates that some artists had at least

half as many spam as non-spam comments on their page.

This level of noise would significantly impact the ordering

of artists if it were not accounted for.

Gorillaz 54% Placebo 39%

Coldplay 42% Amy Winehouse 38%

Lily Allen 40% Lady Sovereign 37%

Keane 40% Joss Stone 36%

Table 5: Percentage of total comments that arespam for several popular artists.

4.3 Generation of the hypercubeWe use a data hypercube (also known as an OLAP cube[5])

stored in a DB2 database to explore the relative importance

of various dimensions to the popularity of musical topics.

The dimensions of the cube are generated in two ways: from

the structured data in the posting (e.g., age, gender, loca-

tion of the user commenting, timestamp of the comment,

artist), and from the measurements generated by the above

annotator methods. This annotates each comment with a

series of tags from unstructured and structured data. The

resulting tuple is then placed into a star schema in which

the primary measure is a relevance with regards to musical

topics. This is equivalent of defining a function.

M : (Age, Gender, Location, T ime, Artist, ...)→M (1)

In our case, we have stored the aggregation of occurrences

of non-spam comment at the intersecting dimension values

of the hypercube. Storing the data this way makes it easy to

examine rankings over various time intervals, weight various

dimensions differently, etc. Once (and if) a total ordering

approach is fixed this intermediate data staging step might

be eliminated.

4.4 Projecting to a listUltimately we are seeking to generate a “one dimensional”

ordered list from the various contributing dimensions of the

cube. In general we project to a one dimensional ranking

which is then used to sort the artists, tracks, etc. We can ag-

gregate and analyze the hypercube using a variety of multi-

dimensional data operations on it to derive what are essen-

tially custom popular lists for particular musical topics. In

addition to the traditional billboard “Top Artist” lists, we

can slice and project (marginalize dimensions) the cube for

lists such as “What is hot in New York City for 19 year

old males?” and “Who are the most popular artists from

San Francisco?” They translate to following mathematical

operations:

L1(X) :

X

T,...

M(A = 19, G = M, L = “NewY orkCity��, T, X, ...)

(2)

L2(X) :

X

T,A,G,...

M(A, G, L = “SanFrancisco��, X, ...) (3)

where,

X = Name of the artist

T = Timestamp

A = Age of the commenter

G = Gender

L = Location

Note that for the remainder of this paper we aggregate

tracks and albums to artists as we wanted as many com-

ments as possible for our experiments. Clearly track based

projections are equally possible and the rule for our ongoing

work.

This tremendous flexibility is one advantage of the cube

approach.

F

5. EXPERIMENTSTo test the effectiveness of our popularity ranking system

we conducted a series of experiments. We prepared a new

top-N list of popular music to contrast with the most recent

Billboard list. To validate the accuracy of our lists, we then

conducted a study.

5.1 Generating our Top-N listWe started with the top-50 artists in Billboard’s singles

chart during the week of September 22nd through 28th,

2007. If an artist had multiple singles in the chart and

appeared multiple times, we only kept the highest ranked

single to ensure a unique list of artists. MySpace pages of

the 45 unique artists were crawled, and all comments posted

in the corresponding week were collected.

We loaded the comments into DB2 as described in Section

4.1. The crawled comments were passed through the three

annotators to remove spam and identify sentiments. The

tables below show statistics on the crawling and annotation

processes.

Number of unique artists 45

Total number of comments collected 50489

Total number of unique posters 33414

Table 6: Crawl Data

Do not Confuse Activity with Popularity!

Artist popularity ranking changed dramatically after excluding spam!

% Spam, 60k comments over 26 weeks

spamnon-spam15





















Workflow

Unstructured chatter to explorable hypercube

Hypercube, List Projection

Exploring dimensions of popularity

Data Hypercube (DB2) from

structured data in posts (user demographics, location, age, gender, timestamp)

unstructured data (entity, sentiment, spam flag)

Intersection of dimensions => non-spam comment counts

Hypercube to One-Dimensional List

Roll-ups

What is hot in New York for 19 year old males?


Spam 0.76 0.8

Non-Spam 0.83 0.88













































be eliminated.













operations:

L1(X) :

X

T,...


(2)

L2(X) :

X

T,A,G,...


where,


T = Timestamp


G = Gender

L = Location





work.


approach.

F





conducted a study.












processes.




Table 6: Crawl Data


Spam 0.76 0.8

Non-Spam 0.83 0.88













































be eliminated.













operations:

L1(X) :

X

T,...


(2)

L2(X) :

X

T,A,G,...


where,


T = Timestamp


G = Gender

L = Location





work.


approach.

F





conducted a study.












processes.




Table 6: Crawl Data


Spam 0.76 0.8

Non-Spam 0.83 0.88













































be eliminated.













operations:

L1(X) :

X

T,...


(2)

L2(X) :

X

T,A,G,...


where,


T = Timestamp


G = Gender

L = Location





work.


approach.

F





conducted a study.












processes.




Table 6: Crawl Data

The Word on the Street

Billboards Top 50 Singles chart during the week of Sept 22-28 2007

(Unique) Top 45 MySpace artist pages

Crawl, Annotate, Build Hypercube, Project lists

21

38% of total comments were spam61% of total comments had positive sentiments4% of total comments had negative sentiments

35% of total comments had no identifiable sentiments

Table 7 Annotation Statistics

As described in Section 8, the structured metadata

(artist name, timestamp, etc.) and annotation results

(spam/non-spam, sentiment, etc.) were loaded in the

hypercube.

The data represented by each cell of the cube is the

number of comments for a given artist. The dimension-

ality of the cube is dependent on what variables we

are examining in our experiments. Timestamp, age and

gender of the poster, geography, and other factors are

all dimensions in hypercube, in addition to the mea-

sures derived from the annotators (spam, non-spam,

number of positive sentiments, etc.).

For the purposes of creating a top-N list, all dimen-

sions except for artist name are collapsed. The cube is

then sliced along the spam axis (to project only non-

spam comments) and the comment counts are projected

onto the artist name axis. Since the percentage of nega-

tive comments was very small (4%), the top-N list was

prepared by sorting artists on the number of non-spam

comments they had received independent of the senti-

ment scoring.

In Table 8 we show the top 10 most popular Bill-

board artists and the list generated by our analysis of

MySpace for the week of the survey. While some top

artists appear on both lists (e.g., Soulja Boy, Timba-

land, 50 Cent, and Pink), there are important differ-ences. In some cases, our MySpace Analysis list clearly

identified rising artists before they reached the Top-10

list on billboard (e.g., Fall Out Boy and Alicia Keys

both climbed to #1 on Billboard.com shortly after we

produced these lists). Overall, we can observe that the

Billboard.com list contains more artists with a long his-

tory and large body of work (e.g., Kanye West, Fer-

gie, Nickleback), whereas our MySpace Analysis List is

more likely to identify ”up and coming” artists. This is

consistent with our expectations, particularly in light

of the aforementioned industry reports which indicate

that teenagers are the biggest music influencers (Me-

diamark, 2004).

11.1.2 The Word on the Street

Using the above lists, we performed a casual preference

poll of 74 people in the target demographic. We con-

ducted a survey among students of an after-school pro-

gram (Group 1), Wright State (Group 2), and Carnegie

Mellon (Group 3). Of the three different groups, Group

Billboard.com MySpace Analysis

Soulja Boy T.I.Kanye West Soulja BoyTimbaland Fall Out BoyFergie RihannaJ. Holiday Keyshia Cole50 Cent Avril LavigneKeyshia Cole TimbalandNickelback PinkPink 50 CentColbie Caillat Alicia Keys

Table 8 Billboard’s Top Artists vs. our generated list

1 was comprised of respondents between ages 8 and 15;

while Group 2 and 3 were primarily comprised of col-

lege students in the 17–22 age group. Table 9 shows

statistics pertaining to the three survey groups.

Groups and Age Range No. of male No. of femalerespondents respondents

Group 1 (8–15) 8 9Group 2 (17–22) 21 26Group 3 (17–22) 7 3

Table 9 Survey Group Statistics

The survey was conducted as follows: the 74 respon-

dents were asked to study the two lists shown in Table 8.

One was generated by Billboard and the other through

the crawl of MySpace. They were then asked the fol-

lowing question: “Which list more accurately reflectsthe artists that were more popular last week?” Their re-

sponse along with their age, gender and the reason for

preferring a list was recorded.

The sources used to prepare the lists were not shown

to the respondents, so they would not be influenced by

the popularity of MySpace or Billboard. In addition,

we periodically switched the lists while conducting the

study to avoid any bias based on which list was pre-

sented first.

11.1.3 Results

The raw results of our study immediately suggest the

validity of the system, as can be seen in Table 10. The

MySpace data generated list is preferred over 2 to 1

to the Billboard list by our 74 test subjects, and the

preference is consistently in favor of our list across all

three survey groups.

More exactly, 68.9± 5.4% of subjects prefer the SI

derived list to the Billboard list. Looking specifically at

Group 1, the youngest survey group whose ages range

from 8–15, we can see that our list is even more suc-

cessful. Even with a smaller sample group (resulting in

Showing Top 10

The Word on the Street

Billboards Top 50 Singles chart during the week of Sept 22-28 2007

(Unique) Top 45 MySpace artist pages

Crawl, Annotate, Build Hypercube, Project lists

21







hypercube.

















ment scoring.

















diamark, 2004).





























sented first.

11.1.3 Results












* Top artists appear in both lists* Several Overlaps* Artists with long history/body of work vs. ‘up and coming’ artists

* Predictive power of MySpace - Billboard next week looked a lot like

MySpace this week..

Teenagers are big music influencers [MediaMark2004] Showing Top 10

Casual Preference Poll Results

“Which list more accurately reflects the artists that were more popular last week?”

75 participants

Overall 2:1 preference for MySpace list

Younger age groups: 6:1 (8-15 yrs) 21







hypercube.

















ment scoring.

















diamark, 2004).





























sented first.

11.1.3 Results












Challenging traditional polling methods!

Lessons Learned, Miles to go

Informal (teen-authored) text is not your average blog content..

Quality check is not on a day to day spot/comment level precision, but at a system level - are we missing a source/artist, is the crawl behaving,

adjudication techniques for multiple data sources..

Leveraging SI for BI is difficult - Validation is key, continuous fine tweaking with experts

bbcsoundindex

Education