customer insights workshop - consumer text analytics conference
DESCRIPTION
Lexalytics VP Seth Redmore talks about the nuts and bolts of text and sentiment analysis to spot and react to consumer feedback online. From the 2014 Terrapin Consumer Text Analytics Conference. Read more here: http://www.lexalytics.com/blog/2014/consumer-text-analytics-conferenceTRANSCRIPT
Customer Insights Workshop
No, you can't always get what you want You can't always get what you want You can't always get what you want
And if you try sometime you find You get what you need
Seth Redmore, VP Marketing and Product Management@sredmore, @lexalytics, http://www.lexalytics.com/blog
©2014 Lexalytics Inc. All rights reserved. 2
Agenda
• Common Extractions
• “Accuracy” is imprecise
• One low-level view of an NLP engine
• Starbucks example w/real data
• OK great – let’s get everyone on board
Text Mining Key Extractions/Enhancements
• Core Engine processes text to tell you “who”, “where”, “when”, “what” and
“how” so that you can figure out “why” (and what you want to do about it).
• Key Features are: Entity Extraction, Facets, Themes, Text Categorization, and
Sentiment Analysis
• Core NLP is used for 2 different purposes:
Discovery (“tell me what’s in here!”) and
Tracking (“ok, now I want to follow the trends”)
Key Text Enhancements
What
How
Entity Extraction
Facets/Themes Text Categorization
Sentiment Analysis
Discovery Tracking
Who / Where / When
© 2014 Lexalytics Inc. All rights reserved. lexalytics.com 3
“Accuracy” is imprecise.
© 2014 Lexalytics Inc. All rights reserved.© 2014 Lexalytics Inc. All rights reserved. 4
• Because sentiment is personal (e.g. over what dataset is sentiment “accurate”?)
• Because you may care more about precision, or you may care more about recall
Sentiment Accuracy is Personal!
• “Wells Fargo lost $200M last month”
• “Kölnisch wasser smells like my grandmother.”
• “We’re switching to Direct TV.”
• “Microsoft is dropping their prices.”
© 2014 Lexalytics Inc. All rights reserved. 5
Precision, Recall, F1
• Precision:
“of the items you coded, what % are correct?”
• Recall is
“of all the possible items that match the code, what % did you retrieve?”
• F1 is the harmonic mean of precision and recall
2*((precision*recall)/precision+recall)
6© 2014 Lexalytics Inc. All rights reserved.
Different apps require different balance
• High precision -> Social media trending
Want to know that what you’re graphing has absolutely no crap
• High recall -> Customer support requests
Really don’t want to miss even a single pissed off customer, even at the cost of having to filter through lots of not-upset customers
7© 2014 Lexalytics Inc. All rights reserved.
HIGHPRECISION
HIGHRECALL
Sentiment F1 scores (and “accuracy”) bounded by IRA
8© 2014 Lexalytics Inc. All rights reserved.
• MPQA Corpus
Wiebe, et al., 2005 “Annotating expressions of opinions and emotions in language. Language Resources and Evaluation, 39:165-210
Grad students, 40 hours of training, 16k sentences, ~80% IRA
• To ponder: If people max out at 80%, how can a machine be scored any better?
• Answer: it can’t.
A machine will do a “poor” job of scoring content that people can’t agree on.
So, you want to maximize your own accuracy?
9© 2014 Lexalytics Inc. All rights reserved.
• Get a clear goal on what you’re optimizing for
Precision/recall
What is “sentiment” – does an opinion have to be expressed, or?
Bounds of neutral
• Score a set of content yourself
• Crowdsource
Key Enhancements
• The NLP Engine recognizes the important subjects of the text while understanding the sentiment used with each subject.
• “How” is the conversation occurring? Is it positive or negative?
1. Entity Type Sentiment
Yahoo Company .534
Twitter Company .48
Facebook Company .534
U.S. Place .534
2. Theme Score Sentiment
Cloud computing technology
4.110 .6
E-mail service 2.672 .81
Top users requests 2.669 .62
3. Category Score Sentiment
Software and Internet .56 0.0
Social Media .60 .48
Technology .49 .49
Business .72 .49
Entities: “Who”, “Where”, “When”
Themes: “What” (Discovery)
Categories: “What” (Tracking)
Unstructured Data Sample
Yahoo wants to make its Web e-mail service a place you never want to— or more importantly—have to leave to get your social fix. The company on Wednesday is releasing an overhauled version of its Yahoo Mail Beta client that it says is twice as fast as the previous version, while managing to tack on new features like an integrated Twitter client, rich media previews and a more full-featured instant messaging client.
Yahoo says this speed boost should be especially noticeable to users outside the U.S. with latency issues, due mostly to the new version making use of the company's cloud computing technology. This means that if you're on a spotty connection, the app can adjust its behavior to keep pages from timing out, or becoming unresponsive.
Besides the speed and performance increase, which Yahoo says were the top users requests, the company has added a very robust Twitter client, which joins the existing social-sharing tools for Facebook and Yahoo.
© 2014 Lexalytics Inc. All rights reserved. lexalytics.com 10
Salience ProcessingOverview
Tokenizer
TextThe waiter was rude! Our food was cold.
POSTagger
ChunkerSentenceBreaker
Prepare Text
Parsed Text
The waiter _NNP was _VP rude _JP . Our food _NP was _VP cold _JP.
GetDocumentSentiment
GetNamed/UserDefined/(Collection)Entities
GetDocument/(Collection)Themes
GetConcept/QueryDefined/(Collection)Topics
Get(Collection)Facets
© 2014 Lexalytics Inc. All rights reserved. lexalytics.com 11
©2014 Lexalytics Inc. All rights reserved. 12
Iterative Process
• Start with one set of rules
• Run Data
• Inspect Results
• Adjust parameters
• Lather, Rinse, Repeat
©2014 Lexalytics Inc. All rights reserved. 13
Starbucks Content Example
• 160k Tweets
• 2 locations reviews from Yelp
• All completely public information, all available right now