Dan Jurafsky
Lecture 1: Sentiment Lexicons and Sentiment Classification
Computational Extraction of Social and Interactional Meaning
SSLST, Summer 2011
IP notice: many slides for today from Chris Manning, William Cohen, Chris Potts and Janyce Wiebe, plus some from Marti Hearst and Marta Tatu
Scherer Typology of Affective StatesEmotion: brief organically synchronized … evaluation of an
major event as significantangry, sad, joyful, fearful, ashamed, proud, elated
Mood: diffuse non-caused low-intensity long-duration change in subjective feelingcheerful, gloomy, irritable, listless, depressed, buoyant
Interpersonal stances: affective stance toward another person in a specific interactionfriendly, flirtatious, distant, cold, warm, supportive, contemptuous
Attitudes: enduring, affectively coloured beliefs, dispositions towards objects or persons liking, loving, hating, valueing, desiring
Personality traits: stable personality dispositions and typical behavior tendenciesnervous, anxious,reckless, morose, hostile, jealous
Extracting social/interactional meaningEmotion and Mood
Annoyance in talking to dialog systemsUncertainty of students in tutoringDetecting Trauma or Depression
Interpersonal StanceRomantic interest, flirtation, friendlinessAlignment/accommodation/entrainment
Attitudes = Sentiment (positive or negative)Movie or Products or Politics: is a text positive or negative?“Twitter mood predicts the stock market.”
Personality TraitsOpen, Conscienscious, Extroverted, Anxious
Social identity (Democrat, Republican, etc.)
Overview of Coursehttp://www.stanford.edu/~jurafsky/sslst11/
Outline for TodaySentiment Analysis (Attitude Detection)
1. Sentiment Tasks and Datasets2. Sentiment Classification Example: Movie Reviews3. The Dirty Details: Naïve Bayes Text Classification4. Sentiment Lexicons: Hand-built5. Sentiment Lexicons: Automatic
Sentiment AnalysisExtraction of opinions and attitudes from
text and speechWhen we say “sentiment analysis” We often mean a binary or an ordinal
tasklike X/ dislike Xone-star to 5-stars
1: Sentiment Tasks and Datasets
IMDB
slide from Chris Potts
Amazon
slide from Chris Potts
OpenTable
slide from Chris Potts
TripAdvisor
slide from Chris Potts
Richer sentiment on the web(not just positive/negative)Experience Project
http://www.experienceproject.com/confessions.php?cid=184000
FMyLifehttp://www.fmylife.com/miscellaneous/14613102
My Life is Averagehttp://mylifeisaverage.com/
It Made My Dayhttp://immd.icanhascheezburger.com/
2: Sentiment Classification Example: Movie Reviews
Pang and Lee’s (2004) movie review data from IMDBPolarity data 2.0:
http://www.cs.cornell.edu/people/pabo/movie-review-data
Pang and Lee IMDB dataRating: pos
when _star wars_ came out some twenty years ago , the image of traveling throughout the starshas become a commonplace image .
…when han solo goes light speed , the stars change to bright lines , going towards
the viewer in lines that converge at an invisible point . cool . _october sky_ offers a much simpler image–that of a single white dot , traveling
horizontally across the night sky . [. . . ]
Rating: neg “ snake eyes ” is the most aggravating kind of movie : the kind that shows so much
potential thenbecomes unbelievably disappointing . it’s not just because this is a brian depalma film , and since he’s a great director
and one who’s films are always greeted with at least some fanfare . and it’s not even because this was a film starring nicolas cage and since he gives a
brauvara performance , this film is hardly worth his talents .
Pang and Lee AlgorithmClassification using different classifiers
Naïve BayesMaxEntSVM
Cross-validationBreak up data into 10 foldsFor each fold
Choose the fold as a temporary “test set”Train on 9 folds, compute performance on the test fold
Report the average performance of the 10 runs.
Negation in Sentiment Analysis
They have not succeeded, and will never succeed, in breaking the will of this valiant people.
Slide from Janyce Wiebe
Negation in Sentiment Analysis
They have not succeeded, and will never succeed, in breaking the will of this valiant people.
Slide from Janyce Wiebe
Negation in Sentiment Analysis
They have not succeeded, and will never succeed, in breaking the will of this valiant people.
Slide from Janyce Wiebe
Negation in Sentiment Analysis
They have not succeeded, and will never succeed, in breaking the will of this valiant people.
Slide from Janyce Wiebe
Pang and Lee on Negationadded the tag NOT to every word between a negation word
(“not”, “isn’t”, “didn’t”, etc.) and the first punctuation mark following the negation word.
didn’t like this movie, but I
didn’t NOT_like NOT_this NOT_movie
Pang and Lee interesting Observation“Feature presence”
i.e. 1 if a word occurred in a document, 0 if it didn’tworked better than unigram probabilityWhy might this be?
Other difficulties in movie review classification
What makes movies hard to classify?Sentiment can be subtle:
Perfume review in “Perfumes: the Guide”:“If you are reading this because it is your darling fragrance, please
wear it at home exclusively, and tape the windows shut.”“She runs the gamut of emotions from A to B”
(Dorothy Parker on Katherine Hepburn)Order effects
This film should be brilliant. It sounds like a great plot, the actors are first grade, and the supporting cast is good as well, and Stallone is attempting to deliver a good performance. However, it can’t hold up.
22
3: Naïve Bayes text classification
Is this spam?
More Applications of Text Classification
Authorship identificationAge/gender identificationLanguage IdentificationAssigning topics such as Yahoo-categories
e.g., "finance," "sports," "news>world>asia>business"Genre-detection
e.g., "editorials" "movie-reviews" "news“Opinion/sentiment analysis on a person/product
e.g., “like”, “hate”, “neutral”Labels may be domain-specific
e.g., “contains adult language” : “doesn’t”
Text Classification: definitionThe classifier:
Input: a document dOutput: a predicted class c from some fixed set of labels
c1,...,cK
The learner:Input: a set of m hand-labeled documents (d1,c1),....,
(dm,cm)Output: a learned classifier f:d c
Slide from William Cohen
Multimedia GUIGarb.Coll.SemanticsML Planning
planningtemporalreasoningplanlanguage...
programmingsemanticslanguageproof...
learningintelligencealgorithmreinforcementnetwork...
garbagecollectionmemoryoptimizationregion...
“planning language proof intelligence”
TrainingData:
TestData:
Classes:(AI)
Document Classification
Slide from Chris Manning
(Programming) (HCI)
... ...
Classification Methods: Hand-coded rulesSome spam/email filters, etc. E.g., assign category if document contains a given
boolean combination of wordsAccuracy is often very high if a rule has been carefully
refined over time by a subject expertBuilding and maintaining these rules is expensive
Slide from Chris Manning
Classification Methods:Machine LearningSupervised Machine LearningTo learn a function from documents (or sentences)
to labelsNaive Bayes (simple, common method)Others
k-Nearest Neighbors (simple, powerful)Support-vector machines (new, more powerful)… plus many other methods
No free lunch: requires hand-classified training dataBut data can be built up (and refined) by amateurs
Slide from Chris Manning
Naïve Bayes Intuition
Representing text for classification
Slide from William Cohen
ARGENTINE 1986/87 GRAIN/OILSEED REGISTRATIONSBUENOS AIRES, Feb 26Argentine grain board figures show crop registrations of grains, oilseeds
and their products to February 11, in thousands of tonnes, showing those for future shipments month, 1986/87 total and 1985/86 total to February 12, 1986, in brackets:
• Bread wheat prev 1,655.8, Feb 872.0, March 164.6, total 2,692.4 (4,161.0).
• Maize Mar 48.0, total 48.0 (nil).• Sorghum nil (nil)• Oilseed export registrations were:• Sunflowerseed total 15.0 (7.9)• Soybean May 20.0, total 20.0 (nil)The board also detailed export registrations for subproducts, as follows....
f( )=c? What is the best representation for
the document d being classified?
simplest useful
Bag of words representation
Slide from William Cohen
ARGENTINE 1986/87 GRAIN/OILSEED REGISTRATIONSBUENOS AIRES, Feb 26Argentine grain board figures show crop registrations of grains,
oilseeds and their products to February 11, in thousands of tonnes, showing those for future shipments month, 1986/87 total and 1985/86 total to February 12, 1986, in brackets:
• Bread wheat prev 1,655.8, Feb 872.0, March 164.6, total 2,692.4 (4,161.0).
• Maize Mar 48.0, total 48.0 (nil).• Sorghum nil (nil)• Oilseed export registrations were:• Sunflowerseed total 15.0 (7.9)• Soybean May 20.0, total 20.0 (nil)The board also detailed export registrations for subproducts, as follows....
Categories: grain, wheat
Bag of words representation
Slide from William Cohen
xxxxxxxxxxxxxxxxxxx GRAIN/OILSEED xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx grain xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx grains, oilseeds
xxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxx tonnes, xxxxxxxxxxxxxxxxx shipments xxxxxxxxxxxx total xxxxxxxxx total xxxxxxxx xxxxxxxxxxxxxxxxxxxx:
• Xxxxx wheat xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, total xxxxxxxxxxxxxxxx
• Maize xxxxxxxxxxxxxxxxx• Sorghum xxxxxxxxxx• Oilseed xxxxxxxxxxxxxxxxxxxxx• Sunflowerseed xxxxxxxxxxxxxx• Soybean xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx....
Categories: grain, wheat
Bag of words representation
Slide from William Cohen
xxxxxxxxxxxxxxxxxxx GRAIN/OILSEED xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx grain xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx grains,
oilseeds xxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxx tonnes, xxxxxxxxxxxxxxxxx shipments xxxxxxxxxxxx total xxxxxxxxx total xxxxxxxx xxxxxxxxxxxxxxxxxxxx:
• Xxxxx wheat xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, total xxxxxxxxxxxxxxxx
• Maize xxxxxxxxxxxxxxxxx• Sorghum xxxxxxxxxx• Oilseed xxxxxxxxxxxxxxxxxxxxx• Sunflowerseed xxxxxxxxxxxxxx• Soybean xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx....
Categories: grain, wheat
grain(s) 3
oilseed(s) 2total 3wheat 1
maize 1soybean 1tonnes 1
... ...
word freq
Formalizing Naïve Bayes
Bayes’ Rule
)()()|()|(
APBPBAPABP =
•Allows us to swap the conditioning•Sometimes easier to estimate one kind of dependence than the other
Deriving Bayes’ Rule
€
P(B | A) = P(A ∩ B)P(A)
€
P(A | B) = P(A ∩ B)P(B)
€
P(B | A)P(A) = P(A ∩ B)
€
P(A | B)P(B) = P(A ∩ B)
€
P(A | B)P(B) = P(B | A)P(A)
€
P(A | B) = P(B | A)P(A)P(B)
Bayes’ Rule Applied to Documents and Classes
Slide from Chris Manning
€
P(C,D) = P(C | D)P(D) = P(D | C)P(C)
€
P(C | D) = P(D | C)P(C)P(D)
Using a supervised learning method, we want to learn a classifier (or classification function ):
We denote the supervised learning method by:
The learning method takes the training set D as input and returns the learned classifier .
Once we have learned , we can apply it to the test set (or test
data).
The Text Classification Problem
Slide from Chien Chin Chen
CX :
= )(D
Naïve Bayes Text ClassificationThe Multinomial Naïve Bayes model (NB) is a
probabilistic learning method.
In text classification, our goal is to find the “best” class for the document:
Slide from Chien Chin Chen
)|(maxarg dcPcCc
map
=
)()|()(maxarg
dPcdPcP
Cc=
)|()(maxarg cdPcPCc
=
The probability of a document d being in class c.
Bayes’ Rule
We can ignore the denominator
Naive Bayes ClassifiersWe represent an instance D based on some attributes.
Task: Classify a new instance D based on a tuple of attribute values into one of the classes cj C
Slide from Chris Manning
nxxxD ,,, 21 =
),,,|(argmax 21 njCc
MAP xxxcPcj
=
),,,()()|,,,(
argmax21
21
n
jjn
Cc xxxPcPcxxxP
j
=
)()|,,,(argmax 21 jjnCc
cPcxxxPj
=
The probability of a document d being in class c.
Bayes’ Rule
We can ignore the denominator
Naïve Bayes Classifier: Naïve Bayes AssumptionP(cj)
Can be estimated from the frequency of classes in the training examples.
P(x1,x2,…,xn|cj) O(|X|n•|C|) parametersCould only be estimated if a very, very large number of
training examples was available.Naïve Bayes Conditional Independence Assumption:Assume that the probability of observing the conjunction
of attributes is equal to the product of the individual probabilities P(xi|cj).
Slide from Chris Manning
Flu
X1 X2 X5X3 X4feversinus coughrunnynose muscle-ache
The Naïve Bayes Classifier
Conditional Independence Assumption: features are independent of each other given the
class:
Slide from Chris Manning
)|()|()|()|,,( 52151 CXPCXPCXPCXXP =
Using Multinomial Naive Bayes Classifiers to Classify Text:
Attributes are text positions, values are words.
Slide from Chris Manning
Still too many possibilities Assume that classification is independent of the positions of the
words Use same parameters for each position Result is bag of words model (over tokens not types)
)|text""()|our""()(argmax
)|()(argmax
1j
j
jnjjCc
ijij
CcNB
cxPcxPcP
cxPcPc
===
=
Learning the Model
Simplest: maximum likelihood estimatesimply use the frequencies in the data
Slide from Chris Manning
)(),(
)|(ˆj
jiiji cCN
cCxXNcxP
===
=
C
X1 X2 X5X3 X4 X6
NcCN
cP jj
)()(ˆ ==
Smoothing to Avoid Overfitting
kcCNcCxXN
cxPj
jiiji =
===
)(1),(
)|(ˆ
Slide from Chris Manning
# of values of Xi
• Laplace:
Textj single document containing all docsj
for each word wk in Vocabulary nk number of occurrences of wk in Textj
Naïve Bayes: LearningFrom training corpus, extract VocabularyCalculate required P(cj) and P(wk | cj) terms
For each cj in C dodocsj subset of documents for which the target class is cj
Slide from Chris Manning€
P(wk | c j ) ← nk + αn + α |Vocabulary |
|documents # total|||
)( jj
docscP
Naïve Bayes: Classifyingpositions all word positions in current document
which contain tokens found in Vocabulary
Return cNB, where
Slide from Chris Manning€
cNB = argmaxcj ∈C
P(c j ) P(wi | c j )i∈positions
∏
4: Sentiment Lexicons: Hand-BuiltKey task: VocabularyThe previous work uses all the words in a documentCan we do better by focusing on subset of words?
How to find words, phrases, patterns that express sentiment or polarity?
49
4: Sentiment/Affect Lexicons: GenInq
Harvard General Inquirer DatabaseContains 3627 negative and positive word-strings: http://www.wjh.harvard.edu/~inquirer/http://www.wjh.harvard.edu/~inquirer/homecat.htmPositiv (1915 words) versus Negativ (2291 words)Strong vs WeakActive vs PassiveOverstated versus UnderstatedPleasure, Pain, Virtue, ViceMotivation, Cognitive Orientation, etc
5: Sentiment/Affect Lexicons: LIWCLIWC (Linguistic Inquiry and Word Count)
Pennebaker, Francis, & Booth, 2001dictionary of 2300 words grouped into > 70 classes
Affective Processes negative emotion (bad, weird, hate, problem, tough) positive emotion (love, nice, sweet)
Cognitive Processes Tentative (maybe, perhaps, guess) Inhibition (block, constraint, stop)
Bodily Proceeses sexual (sex, horny, love, incest)
Pronouns 1st person pronouns (I me mine myself I’d I’ll I’m…) 2nd person pronouns
Negation (no, not, never), Quantifiers (few, many, much), http://www.wjh.harvard.edu/~inquirer/homecat.htm.
Sentiment Lexicons and outcomesPotts “On the Negativity of Negation”Is logical negation associated with negative
sentiment?Pott’s experiment
Get counts of the word not, n’t, no, never, and compounds formed with no
In online reviews, etcAnd regress against the review rating
More logical negation in IMDB reviews which have negative sentiment
More logical negation in all reviews which have negative sentimentAmazon, GoodReads, OpenTable, Tripadvisor
Voting no (after removing the word “no”)a
5: Sentiment Lexicons: Automatically ExtractedAdjectives
positive: honest important mature large patientHe is the only honest man in Washington. Her writing is unbelievably mature and is only likely to get
better. To humour me my patient father agrees yet again to my
choice of filmnegative: harmful hypocritical inefficient insecure
It was a macabre and hypocritical circus. Why are they being so inefficient ?
Slide from Janyce Wiebe 56
Slide from Janyce Wiebe
Other parts of speechVerbspositive: praise, lovenegative: blame, criticize
Nounspositive: pleasure, enjoymentnegative: pain, criticism
57
Slide adapted form Janyce Wiebe
PhrasesPhrases containing adjectives and adverbs
positive: high intelligence, low costnegative: little variation, many troubles
58
Slide adapted from Janyce Wiebe
Intuition for identifying polarity words
Assume that contexts are coherentFair and legitimate, corrupt and brutal*fair and brutal, *corrupt and legitimate
59
Hatzivassiloglou & McKeown 1997Predicting the semantic orientation of adjectives
Step 1 From 21-million word WSJ corpus For every adjective with frequency > 20
Label for polarity Total of 1336 adjectives
657 positive 679 negative
60
ICWSM 2008
Step 2: Extract all conjoined adjectives
61
Hatzivassiloglou & McKeown 1997
Slide adapted from Janyce Wiebe 61
nice and comfortablenice and scenic
Slide adapted from Janyce Wiebe
Hatzivassiloglou & McKeown 19973. A supervised learning algorithm builds a graph of
adjectives linked by the same or different semantic orientation
62
nice
handsometerrible
comfortable
painful
expensivefun
scenic
Slide from Janyce Wiebe
Hatzivassiloglou & McKeown 19974. A clustering algorithm partitions the adjectives into
two subsets
63
nice
handsometerrible
comfortable
painful
expensivefun
scenic slow+
Hatzivassiloglou & McKeown 1997
Slide from Marta Tatu
Turney (2002): Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews
Input: review Identify phrases that contain adjectives or adverbs
by using a part-of-speech tagger Estimate the semantic orientation of each phrase Assign a class to the given review based on the
average semantic orientation of its phrasesOutput: classification ( or )
65
66
Turney Step 1Extract all two-word phrases including an adjective
First Word Second Word Third Word(not extracted)
1. JJ NN or NNS Anything2. RB, RBR, or RBS JJ Not NN nor NNS3. JJ JJ Not NN nor NNS4. NN or NNS JJ Not NN nor NNS5. RB, RBR, or RBS VB, VBD, VBN, or VBG Anything
Slide from Marta Tatu
67
Turney Step 2Estimate the semantic orientation of the extracted phrases
using Pointwise Mutual Information
Slide from Marta Tatu
Pointwise Mutual InformationMutual information: between 2 random variables X
and Y
Pointwise mutual information: measure of how often two events x and y occur, compared with what we would expect if they were independent:
Weighting: Mutual Information
Pointwise mutual information: measure of how often two events x and y occur, compared with what we would expect if they were independent:
PMI between two words: how much more often they occur together than we would expect if they were independent
€
PMI(word1,word2) = log2p(word1 ,word 2 )
p(word1 )p(word 2 )( )€
PMI(x,y) = log2p(x,y )
p(x )p(y )( )
70
Turney Step 2Semantic Orientation of a phrase defined as:
Estimate PMI by issuing queries to a search engine (Altavista, ~350 million pages)
)poor"",(PMI)excellent"",(PMI)(SO phrasephrasephrase =
Slide from Marta Tatu
=
)excellent")hits("poor"" NEAR hits()poor")hits("excellent"" NEAR hits(log)(SO 2 phrase
phrasephrase
Turney Step 3Calculate average semantic orientation of phrases in review
Positive: Negative:
Phrase POS tags
SO
direct deposit JJ NN 1.288local branch JJ NN 0.421small part JJ NN 0.053online service JJ NN 2.780well other RB JJ 0.237low fees JJ NNS 0.333…true service JJ NN -0.732other bank JJ NN -0.850inconveniently located
RB VBN -1.541
Average Semantic Orientation
0.322
Slide adapted from Marta Tatu 71
72
Experiments410 reviews from Epinions
170 (41%) ()240 (59%) ()Average phrases per review: 26
Baseline accuracy: 59%
Domain Accuracy CorrelationAutomobiles 84.00% 0.4618Banks 80.00% 0.6167Movies 65.83% 0.3608Travel Destinations 70.53% 0.4155All 74.39% 0.5174
Slide from Marta Tatu
Summary on SentimentGenerally modeled as classification or regression task
predict a binary or ordinal labelFunction words can be a good cueUsing all words (in naïve bayes) works well for some
tasksFinding subsets of words may help in other tasks
OutlineSentiment Analysis (Attitude Detection)
1. Sentiment Tasks and Datasets2. Sentiment Classification Example: Movie Reviews3. The Dirty Details: Naïve Bayes Text Classification4. Sentiment Lexicons: Hand-built5. Sentiment Lexicons: Automatic