customer insights workshop - consumer text analytics conference

14
Customer Insights Workshop No, you can't always get what you want You can't always get what you want You can't always get what you want And if you try sometime you find You get what you need Seth Redmore, VP Marketing and Product Management @sredmore, @lexalytics, http://www.lexalytics.com/blog

Upload: mekkin-bjarnadottir

Post on 27-Jun-2015

2.937 views

Category:

Data & Analytics


2 download

DESCRIPTION

Lexalytics VP Seth Redmore talks about the nuts and bolts of text and sentiment analysis to spot and react to consumer feedback online. From the 2014 Terrapin Consumer Text Analytics Conference. Read more here: http://www.lexalytics.com/blog/2014/consumer-text-analytics-conference

TRANSCRIPT

Page 1: Customer Insights Workshop - Consumer Text Analytics Conference

Customer Insights Workshop

No, you can't always get what you want You can't always get what you want You can't always get what you want

And if you try sometime you find You get what you need

Seth Redmore, VP Marketing and Product Management@sredmore, @lexalytics, http://www.lexalytics.com/blog

Page 2: Customer Insights Workshop - Consumer Text Analytics Conference

©2014 Lexalytics Inc. All rights reserved. 2

Agenda

• Common Extractions

• “Accuracy” is imprecise

• One low-level view of an NLP engine

• Starbucks example w/real data

• OK great – let’s get everyone on board

Page 3: Customer Insights Workshop - Consumer Text Analytics Conference

Text Mining Key Extractions/Enhancements

• Core Engine processes text to tell you “who”, “where”, “when”, “what” and

“how” so that you can figure out “why” (and what you want to do about it).

• Key Features are: Entity Extraction, Facets, Themes, Text Categorization, and

Sentiment Analysis

• Core NLP is used for 2 different purposes:

Discovery (“tell me what’s in here!”) and

Tracking (“ok, now I want to follow the trends”)

Key Text Enhancements

What

How

Entity Extraction

Facets/Themes Text Categorization

Sentiment Analysis

Discovery Tracking

Who / Where / When

© 2014 Lexalytics Inc. All rights reserved. lexalytics.com 3

Page 4: Customer Insights Workshop - Consumer Text Analytics Conference

“Accuracy” is imprecise.

© 2014 Lexalytics Inc. All rights reserved.© 2014 Lexalytics Inc. All rights reserved. 4

• Because sentiment is personal (e.g. over what dataset is sentiment “accurate”?)

• Because you may care more about precision, or you may care more about recall

Page 5: Customer Insights Workshop - Consumer Text Analytics Conference

Sentiment Accuracy is Personal!

• “Wells Fargo lost $200M last month”

• “Kölnisch wasser smells like my grandmother.”

• “We’re switching to Direct TV.”

• “Microsoft is dropping their prices.”

© 2014 Lexalytics Inc. All rights reserved. 5

Page 6: Customer Insights Workshop - Consumer Text Analytics Conference

Precision, Recall, F1

• Precision:

“of the items you coded, what % are correct?”

• Recall is

“of all the possible items that match the code, what % did you retrieve?”

• F1 is the harmonic mean of precision and recall

2*((precision*recall)/precision+recall)

6© 2014 Lexalytics Inc. All rights reserved.

Page 7: Customer Insights Workshop - Consumer Text Analytics Conference

Different apps require different balance

• High precision -> Social media trending

Want to know that what you’re graphing has absolutely no crap

• High recall -> Customer support requests

Really don’t want to miss even a single pissed off customer, even at the cost of having to filter through lots of not-upset customers

7© 2014 Lexalytics Inc. All rights reserved.

HIGHPRECISION

HIGHRECALL

Page 8: Customer Insights Workshop - Consumer Text Analytics Conference

Sentiment F1 scores (and “accuracy”) bounded by IRA

8© 2014 Lexalytics Inc. All rights reserved.

• MPQA Corpus

Wiebe, et al., 2005 “Annotating expressions of opinions and emotions in language. Language Resources and Evaluation, 39:165-210

Grad students, 40 hours of training, 16k sentences, ~80% IRA

• To ponder: If people max out at 80%, how can a machine be scored any better?

• Answer: it can’t.

A machine will do a “poor” job of scoring content that people can’t agree on.

Page 9: Customer Insights Workshop - Consumer Text Analytics Conference

So, you want to maximize your own accuracy?

9© 2014 Lexalytics Inc. All rights reserved.

• Get a clear goal on what you’re optimizing for

Precision/recall

What is “sentiment” – does an opinion have to be expressed, or?

Bounds of neutral

• Score a set of content yourself

• Crowdsource

Page 10: Customer Insights Workshop - Consumer Text Analytics Conference

Key Enhancements

• The NLP Engine recognizes the important subjects of the text while understanding the sentiment used with each subject.

• “How” is the conversation occurring? Is it positive or negative?

1. Entity Type Sentiment

Yahoo Company .534

Twitter Company .48

Facebook Company .534

U.S. Place .534

2. Theme Score Sentiment

Cloud computing technology

4.110 .6

E-mail service 2.672 .81

Top users requests 2.669 .62

3. Category Score Sentiment

Software and Internet .56 0.0

Social Media .60 .48

Technology .49 .49

Business .72 .49

Entities: “Who”, “Where”, “When”

Themes: “What” (Discovery)

Categories: “What” (Tracking)

Unstructured Data Sample

Yahoo wants to make its Web e-mail service a place you never want to— or more importantly—have to leave to get your social fix. The company on Wednesday is releasing an overhauled version of its Yahoo Mail Beta client that it says is twice as fast as the previous version, while managing to tack on new features like an integrated Twitter client, rich media previews and a more full-featured instant messaging client.

Yahoo says this speed boost should be especially noticeable to users outside the U.S. with latency issues, due mostly to the new version making use of the company's cloud computing technology. This means that if you're on a spotty connection, the app can adjust its behavior to keep pages from timing out, or becoming unresponsive.

Besides the speed and performance increase, which Yahoo says were the top users requests, the company has added a very robust Twitter client, which joins the existing social-sharing tools for Facebook and Yahoo.

© 2014 Lexalytics Inc. All rights reserved. lexalytics.com 10

Page 11: Customer Insights Workshop - Consumer Text Analytics Conference

Salience ProcessingOverview

Tokenizer

TextThe waiter was rude! Our food was cold.

POSTagger

ChunkerSentenceBreaker

Prepare Text

Parsed Text

The waiter _NNP was _VP rude _JP . Our food _NP was _VP cold _JP.

GetDocumentSentiment

GetNamed/UserDefined/(Collection)Entities

GetDocument/(Collection)Themes

GetConcept/QueryDefined/(Collection)Topics

Get(Collection)Facets

© 2014 Lexalytics Inc. All rights reserved. lexalytics.com 11

Page 12: Customer Insights Workshop - Consumer Text Analytics Conference

©2014 Lexalytics Inc. All rights reserved. 12

Iterative Process

• Start with one set of rules

• Run Data

• Inspect Results

• Adjust parameters

• Lather, Rinse, Repeat

Page 13: Customer Insights Workshop - Consumer Text Analytics Conference

©2014 Lexalytics Inc. All rights reserved. 13

Starbucks Content Example

• 160k Tweets

• 2 locations reviews from Yelp

• All completely public information, all available right now

Page 14: Customer Insights Workshop - Consumer Text Analytics Conference