social media & sentiment analysis splunk conf2012
DESCRIPTION
This presentation was delivered at Splunk's User Conference (conf2012). It covers info about social media data, how to index / use it with Splunk and a lot of content around Sentiment Analysis.TRANSCRIPT
Copyright © 2012 Splunk Inc.
Social Media & Sentiment Analysis
@michaelwilde | David Carasso Chief Mouth | Chief Mind
How I learned to stop worrying and love the internets
"What is social data"?
2
What is social data?
"What is social data"?
3
data generated from human activity on social networks
"What is social data"?
4
Oh yeah, right Twitter. But I work in IT… so,
who cares, right?
Social Data Should be in Splunk! • easy to analyze with
fields • easy to create realtime/
historical dashboards and views
• easy to translate many word problems in to questions
5
{[-] checkin : {[-] badges : [], created : 1345093539, geolat : "41.7686007592", geolong : "-72.621648", mayor : {[-] type : "nochange" }, primarycategory : {[-] fullpathname : "Food:Mexican Restaurants", iconurl : "https://foursquare.com/img/categories/food/mexican_32.png", id : "4bf58dd8d48988d1c1941735", nodename : "Mexican Restaurants" }, timezone : "America/New_York", user : {[-] gender : "female" }, venue : {[-] id : "4b2adbd8f964a520fdb024e3", name : "Margaritas" } } }
"What is social data"?
6
Wilde, we just said we work
in IT and don’t care about
Twitter!
"What is social data"?
7
Except when we search on the words “site”
AND “is down”
"What is social data"?
8
"What is social data"?
9
Except when I search on the words “site”
AND “is down”
IT and the brand collide at times.
"What is social data"?
10
Getting Social Data Network
Method
Frequency:
Real-time Scheduled Push Pull
3rd Parties
Best thing about Social Data?
11
Its almost always Structured JSON!
What can you do with it?
12
Map Conversations
Analyze People
What can you do with it?
13
Enrich it with lookups
Track Olympians
Indexing the social mother lode
14
A single stream of big data
@itayNeeman’s curl splitter scripted input (TBR)
Multiple forwarders installed on a single server streaming to multiple indexers
15
…whoa.
Sir Bill, I believe the demos cometh..
16
When it comes to “numbers”, the search language rocks!
The Double Rainbow
“…what does it mean?!”
In social, what people “mean” matters. For that you’ll need some new tools that understand words and language
17
Analyzing Sentiment
Extract linguistic, subjective information of opinions, attitudes,
emotions, and perspectives
18
…and there are perspectives
19
…and there are perspectives
Understanding brings…
Empathy with customers and prospects Intelligent business and design decisions
20
Brand Perception Impacts Stock In 2011, our friends at Netflix announced that it would be increasing its subscription prices. The feedback on its Facebook page was outrage and the impact on its stock price was dramatic.
21
Sentiment complements and informs
“We analyze several surveys on consumer confidence and political opinion over the 2008 to 2009 period, and find they correlate to sentiment word frequencies in contemporaneous Twitter messages… …as high as 80%, and capture important large-scale trends. The results highlight the potential of text streams as a substitute and supplement for traditional polling.”
22
From Tweets to Polls: Linking Text SenOment to Public Opinion Time Series (CMU: O'Connor, Balasubramanyan, Routledge, and Smith 2010)
Twitter vs. Traditional Polling
23
Box Office Revenue Forecasting
24
“We use the chatter from Twitter.com to forecast box-office revenues for movies. We show that a simple model built from the rate at which tweets are created about particular topics can outperform market-based predictors. We further demonstrate how sentiments extracted from Twitter can be further utilized to improve the forecasting power of social media.”
Asur and Huberman 2010
25
Easy
What’s in a word? Terms have many context dependent meanings. " depend on the writer, the
reader, and their relationship, history, goals and preferences
" “unpredictable” bad in general, but good in movie reviews.
" “jobs” data was affected by iPhone release
26
How are you feeling right now?
27
Plutchik's Wheel of Emotions
Ekman’s Six Basic Emotions
28
Sentiment analysis gone wrong When Anne Hathaway is mentioned, it’s almost always in a positive context, and as a result some trading algorithms seem to purchase Berkshire Hathaway.
When she is mentioned in the news, the stock goes up.
29
Bags of Words and Phrases
Many sentiment words and expressions are not directly influenced by what is around them:
That was fun :)
30
But certainly they can be! They said it would be wonderful, but they were wrong. This "wonderful" movie turned out to be boring.
Human Engineering vs. Machine Learning Hand-built expert systems and parse rules Similarly, human engineered lists of good and bad words (e.g., “good”, “great”, “bad”, “awful”)
Natural Language Processing & Speech Understand - statistical and data driven. Sentiment analysis generally uses statistics and training sets.
31
Machine Learning Choices " Learning Type – Supervised: + straightforward. – lots of training data. – Unsupervised: + no training data. - may not find what you
want. – Semi-Supervised: + small initial training data. – interactive
feedback.
" Algorithms – Naïve Bayes: +simplest probabilistic classifier model.
– assumes words are independent – EM: +performs better, doesn’t assume independence.
- more complicated, over-fitting a problem 32
Supervised Learning
33
Labeled Training Data
Labeled Test Data
Learn Model
Model Validate Model
New Unlabeled
Data
Predict Labels
New Labeled Data
The Effect of Negation “The food was not good” Strategies: Negating sentiment for all terms up to a breaking punctuation (i.e., comma or sentence end) Negation effect is dependent on the term.
34
• Mild words negate about the same: not bad ≈ good • Extreme words negate towards neutral: not horrible ≈ average
Learning Bias A common feature of online user-‐supplied reviews is that the posiOve reviews vastly out-‐number the negaOve ones. Movie reviews at IMDB:
35
More occurrences of “bad” in 10-star reviews than in 2-star ones. Normalize by accounting for relative frequencies.
Sentiment in Social Media " Emoticons: :-) ;( :/
– Reliable measure of sentiment – Simple regex can cover more than 95% of emoticons on twitter – Ignores complex emotions
" Lengthening – This talk is greeeeeat! David is the beeeeeeest! Ahhhhhhhhh! – In English 3 or more of the same char in a row doesn’t exist,
except for 7 obscure terms in unix dict. – Can indicate heightened emotion, but actual lengths are probably
not meaningful. – Useful to normalize because of how common they are (hiiii è hi)
36
Maybe it’s not so hard? “We are only interested in aggregate sentiment. A high error rate merely implies the sentiment detector is a noisy measurement instrument. With a fairly large number of measurements, these errors will cancel out relative to the quantity we are interested in estimating…
37
From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series
Splunk Sentiment Analysis App
38
Design Decisions • Use supervised learning. Why? Doesn’t require interactive
feedback. Learning get almost the best they are going to do with only a few hundred or perhaps a few thousand documents
• Use naïve bayes. Why? Dirt simple and understandable. The difference between the best algorithms and a simple naïve bayes is generally only a few percent.
39
Design Decision • Handle lengthening. Greeeat! • Ignore negation. In the aggregate
it won’t matter much. • Supply multiple trained models: • Movie reviews (using IMDB ratings) • Tweets (using emoticons to create
training sets) • Please suggest more
40
• Sentiment analysis helps you understand your customers and marketplace.
• True sentiment analysis is hard. • Aggregate sentiment analysis is easier but still very
valuable. • The simplest algorithms work almost as well as the most
complex, given a few thousand training points. • Splunk has a Sentiment App. • Download it and give feedback. • Integrate Social data into your existing corporate data • Share your trained models with others.
41
Summary
42
Teh End “Worst talk. Ever.”
“I actually learned something! Not.”
“#splunk #sentiment niiice.”
“splunk now knows when you’ve been naughty or nice #sentiment”
“keep-it-simple sentiment works #conf2012”
Golf clapping at #sentiment_talk
If you’re reading this, start clapping. The talk is over.