social media & sentiment analysis splunk conf2012

42
Copyright © 2012 Splunk Inc. Social Media & Sentiment Analysis @michaelwilde | David Carasso Chief Mouth | Chief Mind How I learned to stop worrying and love the internets

Upload: michael-wilde

Post on 02-Dec-2014

6.104 views

Category:

Technology


1 download

DESCRIPTION

This presentation was delivered at Splunk's User Conference (conf2012). It covers info about social media data, how to index / use it with Splunk and a lot of content around Sentiment Analysis.

TRANSCRIPT

Page 1: Social media & sentiment analysis   splunk conf2012

Copyright  ©  2012  Splunk  Inc.  

Social Media & Sentiment Analysis

@michaelwilde | David Carasso Chief Mouth | Chief Mind

How  I  learned  to  stop  worrying  and  love  the  internets  

Page 2: Social media & sentiment analysis   splunk conf2012

"What  is  social  data"?  

2  

What is social data?

Page 3: Social media & sentiment analysis   splunk conf2012

"What  is  social  data"?  

3  

data generated from human activity on social networks

Page 4: Social media & sentiment analysis   splunk conf2012

"What  is  social  data"?  

4  

Oh yeah, right Twitter. But I work in IT… so,

who cares, right?

Page 5: Social media & sentiment analysis   splunk conf2012

Social Data Should be in Splunk! •  easy to analyze with

fields •  easy to create realtime/

historical dashboards and views

•  easy to translate many word problems in to questions

5  

{[-] checkin : {[-] badges : [], created : 1345093539, geolat : "41.7686007592", geolong : "-72.621648", mayor : {[-] type : "nochange" }, primarycategory : {[-] fullpathname : "Food:Mexican Restaurants", iconurl : "https://foursquare.com/img/categories/food/mexican_32.png", id : "4bf58dd8d48988d1c1941735", nodename : "Mexican Restaurants" }, timezone : "America/New_York", user : {[-] gender : "female" }, venue : {[-] id : "4b2adbd8f964a520fdb024e3", name : "Margaritas" } } }

Page 6: Social media & sentiment analysis   splunk conf2012

"What  is  social  data"?  

6  

Wilde, we just said we work

in IT and don’t care about

Twitter!

Page 7: Social media & sentiment analysis   splunk conf2012

"What  is  social  data"?  

7  

Except when we search on the words “site”

AND “is down”

Page 8: Social media & sentiment analysis   splunk conf2012

"What  is  social  data"?  

8  

Page 9: Social media & sentiment analysis   splunk conf2012

"What  is  social  data"?  

9  

Except when I search on the words “site”

AND “is down”

IT and the brand collide at times.

Page 10: Social media & sentiment analysis   splunk conf2012

"What  is  social  data"?  

10  

Getting Social Data Network

Method

Frequency:

Real-time Scheduled Push Pull

3rd Parties

Page 11: Social media & sentiment analysis   splunk conf2012

Best thing about Social Data?

11  

Its almost always Structured JSON!

Page 12: Social media & sentiment analysis   splunk conf2012

What can you do with it?

12  

Map Conversations

Analyze People

Page 13: Social media & sentiment analysis   splunk conf2012

What can you do with it?

13  

Enrich it with lookups

Track Olympians

Page 14: Social media & sentiment analysis   splunk conf2012

Indexing the social mother lode

14  

A single stream of big data

@itayNeeman’s curl splitter scripted input (TBR)

Multiple forwarders installed on a single server streaming to multiple indexers

Page 15: Social media & sentiment analysis   splunk conf2012

15  

…whoa.

Sir Bill, I believe the demos cometh..

Page 16: Social media & sentiment analysis   splunk conf2012

16  

When it comes to “numbers”, the search language rocks!

The Double Rainbow

“…what does it mean?!”

In social, what people “mean” matters. For that you’ll need some new tools that understand words and language

Page 17: Social media & sentiment analysis   splunk conf2012

17  

Analyzing Sentiment

Extract linguistic, subjective information of opinions, attitudes,

emotions, and perspectives

Page 18: Social media & sentiment analysis   splunk conf2012

18  

…and there are perspectives

Page 19: Social media & sentiment analysis   splunk conf2012

19  

…and there are perspectives

Page 20: Social media & sentiment analysis   splunk conf2012

Understanding brings…

Empathy with customers and prospects Intelligent business and design decisions

20  

Page 21: Social media & sentiment analysis   splunk conf2012

Brand Perception Impacts Stock In 2011, our friends at Netflix announced that it would be increasing its subscription prices. The feedback on its Facebook page was outrage and the impact on its stock price was dramatic.

21  

Page 22: Social media & sentiment analysis   splunk conf2012

Sentiment complements and informs

“We analyze several surveys on consumer confidence and political opinion over the 2008 to 2009 period, and find they correlate to sentiment word frequencies in contemporaneous Twitter messages… …as high as 80%, and capture important large-scale trends. The results highlight the potential of text streams as a substitute and supplement for traditional polling.”

22  

From  Tweets  to  Polls:  Linking  Text  SenOment  to  Public  Opinion  Time  Series  (CMU:  O'Connor,  Balasubramanyan,  Routledge,  and  Smith  2010)    

Page 23: Social media & sentiment analysis   splunk conf2012

Twitter vs. Traditional Polling

23  

Page 24: Social media & sentiment analysis   splunk conf2012

Box Office Revenue Forecasting

24  

“We use the chatter from Twitter.com to forecast box-office revenues for movies. We show that a simple model built from the rate at which tweets are created about particular topics can outperform market-based predictors. We further demonstrate how sentiments extracted from Twitter can be further utilized to improve the forecasting power of social media.”

Asur and Huberman 2010

Page 25: Social media & sentiment analysis   splunk conf2012

25  

Easy  

Page 26: Social media & sentiment analysis   splunk conf2012

What’s in a word? Terms have many context dependent meanings. "   depend on the writer, the

reader, and their relationship, history, goals and preferences

"   “unpredictable” bad in general, but good in movie reviews.

"   “jobs” data was affected by iPhone release

26  

Page 27: Social media & sentiment analysis   splunk conf2012

How are you feeling right now?

27  

Plutchik's Wheel of Emotions

Ekman’s Six Basic Emotions

Page 28: Social media & sentiment analysis   splunk conf2012

28  

Sentiment analysis gone wrong When Anne Hathaway is mentioned, it’s almost always in a positive context, and as a result some trading algorithms seem to purchase Berkshire Hathaway.

When she is mentioned in the news, the stock goes up.

Page 29: Social media & sentiment analysis   splunk conf2012

29  

Page 30: Social media & sentiment analysis   splunk conf2012

Bags of Words and Phrases

Many sentiment words and expressions are not directly influenced by what is around them:

That was fun :)

30  

But certainly they can be! They said it would be wonderful, but they were wrong. This "wonderful" movie turned out to be boring.

Page 31: Social media & sentiment analysis   splunk conf2012

Human Engineering vs. Machine Learning Hand-built expert systems and parse rules Similarly, human engineered lists of good and bad words (e.g., “good”, “great”, “bad”, “awful”)

Natural Language Processing & Speech Understand - statistical and data driven. Sentiment analysis generally uses statistics and training sets.

31  

Page 32: Social media & sentiment analysis   splunk conf2012

Machine Learning Choices "   Learning Type –  Supervised: + straightforward. – lots of training data. –  Unsupervised: + no training data. - may not find what you

want. –  Semi-Supervised: + small initial training data. – interactive

feedback.

"   Algorithms –  Naïve Bayes: +simplest probabilistic classifier model.

– assumes words are independent –  EM: +performs better, doesn’t assume independence.

- more complicated, over-fitting a problem 32  

Page 33: Social media & sentiment analysis   splunk conf2012

Supervised Learning

33  

Labeled  Training  Data  

Labeled    Test    Data  

Learn  Model  

Model  Validate  Model  

New  Unlabeled  

Data  

Predict  Labels  

New  Labeled  Data  

Page 34: Social media & sentiment analysis   splunk conf2012

The Effect of Negation “The food was not good” Strategies: Negating sentiment for all terms up to a breaking punctuation (i.e., comma or sentence end) Negation effect is dependent on the term.

34  

• Mild words negate about the same: not bad ≈ good • Extreme words negate towards neutral: not horrible ≈ average

 

Page 35: Social media & sentiment analysis   splunk conf2012

Learning Bias A  common  feature  of  online  user-­‐supplied  reviews  is  that  the  posiOve  reviews  vastly  out-­‐number  the  negaOve  ones.    Movie  reviews  at  IMDB:    

 

35  

More occurrences of “bad” in 10-star reviews than in 2-star ones. Normalize by accounting for relative frequencies.

Page 36: Social media & sentiment analysis   splunk conf2012

Sentiment in Social Media "   Emoticons: :-) ;( :/

–  Reliable measure of sentiment –  Simple regex can cover more than 95% of emoticons on twitter –  Ignores complex emotions

"   Lengthening –  This talk is greeeeeat! David is the beeeeeeest! Ahhhhhhhhh! –  In English 3 or more of the same char in a row doesn’t exist,

except for 7 obscure terms in unix dict. –  Can indicate heightened emotion, but actual lengths are probably

not meaningful. –  Useful to normalize because of how common they are (hiiii è hi)

36  

Page 37: Social media & sentiment analysis   splunk conf2012

Maybe it’s not so hard? “We are only interested in aggregate sentiment. A high error rate merely implies the sentiment detector is a noisy measurement instrument. With a fairly large number of measurements, these errors will cancel out relative to the quantity we are interested in estimating…

37  

From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series

 

Page 38: Social media & sentiment analysis   splunk conf2012

Splunk Sentiment Analysis App  

38  

Page 39: Social media & sentiment analysis   splunk conf2012

Design  Decisions  •  Use supervised learning. Why? Doesn’t require interactive

feedback. Learning get almost the best they are going to do with only a few hundred or perhaps a few thousand documents

•  Use  naïve  bayes.    Why?  Dirt  simple  and  understandable.    The  difference  between  the  best  algorithms  and  a  simple  naïve  bayes  is  generally  only  a  few  percent.    

39  

Page 40: Social media & sentiment analysis   splunk conf2012

Design Decision •  Handle lengthening. Greeeat! •  Ignore negation. In the aggregate

it won’t matter much. •  Supply multiple trained models: •  Movie reviews (using IMDB ratings) •  Tweets (using emoticons to create

training sets) •  Please suggest more

40

Page 41: Social media & sentiment analysis   splunk conf2012

•  Sentiment analysis helps you understand your customers and marketplace.

•  True sentiment analysis is hard. •  Aggregate sentiment analysis is easier but still very

valuable. •  The simplest algorithms work almost as well as the most

complex, given a few thousand training points. •  Splunk has a Sentiment App. •  Download it and give feedback. •  Integrate Social data into your existing corporate data •  Share your trained models with others.

41  

Summary

Page 42: Social media & sentiment analysis   splunk conf2012

42

Teh End “Worst talk. Ever.”

“I actually learned something! Not.”

“#splunk #sentiment niiice.”

“splunk now knows when you’ve been naughty or nice #sentiment”

“keep-it-simple sentiment works #conf2012”

Golf clapping at #sentiment_talk

If you’re reading this, start clapping. The talk is over.