What Business Innovators Need to Know about Sentiment Analysis
Claire Cardie
Department of Computer ScienceChair, Information Science Department
Cornell University
Co-founderChief Scientist
Plan for the Talk
Subjectivity and sentiment in languageContinuum of capabilities– Surface-level in-depth understanding– Document-level phrase-level
Next steps…
Subjective Language
Subjective text expresses speculations, beliefs, emotions, evaluations, goals, opinions, judgments, …
• Jill said, "I hate Bill." • John thought about whom to vote for. • Seth knew his symposium would go well.
Subjectivity vs. Sentiment
Sentiment-bearing text expresses positive and negative speculations, beliefs, emotions, evaluations, goals, opinions, judgments,…
• Jill said, "I hate Bill." • John thought about whom to vote for. • Seth knew his symposium would go well.
+
-
sentiment analysis tome [Pang & Lee, 2008]
~
A Word on Polarity (tone, valence)
Positive “I love NY.”Negative “I hate NY.”
Neither positive nor negative– Objective?
“I thought about NY.”– Neutral?
“I’m ambivalent about NY.”– Mixed polarity?
“Sometimes I love NY; other times I hate it.”
And What About Intensity?
Strength/intensity
– Low, medium, high, very high, extreme– ratings– rotten tomatoes
“I love NY.”
“I absolutely adore NY!”
Plan for the Talk
Subjectivity and sentiment in languageContinuum of capabilities– Surface-level in-depth understanding– Document-level phrase-level
Next steps…
Document-level Sentiment Analysis
Is the overall sentiment in the document
positive? negative? neutral?
Document
Identifying Tone of a Collection
Sentiment (w.r.t. a topic)–Example: Tone on “economic stimulus”
Detecting “chatter” or “buzz”
Chatter (w.r.t. a topic)–Example: Buzz on “economic stimulus”
Keyword-based Approaches
Search the text for the presence of specific terms from a manually created “sentiment lexicon”– +: “great”, “praise”, “peace”, “superb”, …– -: “war”, “dull”, “messy”, “criticize”, …
Sentiment is based on the counts– E.g.,
If more positive terms than negative terms, then return +, else return –
Keyword-based Approaches
Complications– Inherent ambiguities of language…
– This laptop is a great deal.– A great deal of media attention surrounded the
release of the new laptop model.– If you think this laptop is a great deal, I’ve got
a nice bridge for you to buy.
[Examples from Lillian Lee]
[Pang & Lee, 2008]
Machine-learning Approaches
Learn from training dataAre better able to take advantage of context to disambiguate terms
examples
ML Algorithm
statistical model
(program)(novel) examples class
Measuring Performance
Precision: #correct / #attemptedRecall: #correct / #possibleF-measure: harmonic mean of P and R
1. _______
2. _______
3. _______
4. _______
P = 3 / 4 = .75P = 3 / 3 = 1.00R = 3 / 4 = .75
accuracy
Measuring Performance
How well do document-level sentiment analysis systems work?
It depends…– Product reviews easier than Movie reviews,
easier than News/editorials– Shorter documents harder than longer ones– Messy documents harder than clean ones
~75 F - ~85 F
This is actually quite good…
Comparison is not vs. 100% P/R…but vs. human sentiment analysis accuracy– Cohen’s kappa
Machine-learning methods for sentiment analysis approach human agreement levels– ~85 F: for positive/negative – ~75 F: when neutrals are included
Sentiment Analysis at Passage Level
Passage tone– Optionally w.r.t. a
topic – E.g., AIG or Geithner
The suggestion that the White House never took seriously an issue that infuriated millions of Americans was supported by Senator Robert Menendez, a New Jersey Democrat who claimed that several weeks earlier he warned Timothy Geithner, the Treasury secretary, that AIG was planning to use taxpayer funds to pay out $165m in bonuses…speculation that Obama will have to replace him, despite the president’s insistence to Leno that Geithner is doing "an outstanding job“.
Sentiment Analysis at Phrase Level
Fine-grained opinion analysisIdentify who is saying what about what
Fine-Grained Sentiment Extraction
The suggestion that the White House never took seriously an issue that infuriated millions of Americans was supported by Senator Robert Menendez, a New Jersey Democrat who claimed that several weeks earlier he warned Timothy Geithner, the Treasury secretary, that AIG was planning to use taxpayer funds to pay out $165m in bonuses… speculation that Obama will have to replace him, despite thepresident’s insistence to Leno that Geithner is doing "an outstanding job".
Fine-Grained Sentiment Extraction
– Opinion trigger– Polarity – Intensity– Opinion holder– Target (topic)
…the president insisted to Leno that Geithner is doing "an outstanding job".
Opinion FramePolarity: positive Intensity: highOpinion Holder: “the president”Target: “Geithner”
Example – fine-grained opinions
opinion frameopinion frame
opinion frameopinion frame
opinion frame
opinion frame
opinion frame
The suggestion that the White House never took seriously an issue that infuriated millions of Americans was supported by Senator Robert Menendez, a New Jersey Democrat who claimedthat several weeks earlier he warned Timothy Geithner, the Treasury secretary, that AIG was planning to use taxpayer funds to pay out $165m in bonuses…the president insisted to Leno that Geithner is doing "an outstanding job".
Example – Opinion Summary
AIG
Obama
Menendez
Geithner
Americans
Example – Opinion Summary
AIGAIG
Summarize thoughts and views acrossdocuments– Critical addition: opinion holder
What makes this hard?
Same issues of ambiguity as before plus…Need to associate opinion with topic and with opinion holderRequires different machine learning methodsRequires many language-processing modules
Noun Phrase Coreference Resolution
Ng & Cardie [2002, 2003]; Stoyanov & Cardie [2006, 2008]
The suggestion that the White House never took seriously an issue that infuriated millions of Americans was supported by Senator Robert Menendez, a New Jersey Democrat who claimed that several weeks earlier he warned Timothy Geithner, the Treasury secretary, that AIG was planning to use taxpayer funds to pay out $165m in bonuses…speculation thatObama will have to replace Geithner, despite the president’s insistence to Leno that he is doing "an outstanding job".
Performance
opinion extraction
OH extractor
link classifier
79F
69F
82F
Choi, Breck & Cardie [2006, 2007]
–<opinion holder> expresses <opinion>
Plan for the Talk
Subjectivity and sentiment in languageContinuum of capabilities– Surface-level in-depth understanding– Document-level phrase-level
Next steps…
Next Steps…
Predicting business outcomes from opinions– Doable in some settings
Determining the key influencers
Thank you!
Questions?