Download - Computational Extraction of Social and Interactional Meaning SSLST, Summer 2011

Dan Jurafsky

Lecture 1: Sentiment Lexicons and Sentiment Classification

Computational Extraction of Social and Interactional Meaning

SSLST, Summer 2011

IP notice: many slides for today from Chris Manning, William Cohen, Chris Potts and Janyce Wiebe, plus some from Marti Hearst and Marta Tatu

Scherer Typology of Affective StatesEmotion: brief organically synchronized … evaluation of an

major event as significantangry, sad, joyful, fearful, ashamed, proud, elated

Mood: diffuse non-caused low-intensity long-duration change in subjective feelingcheerful, gloomy, irritable, listless, depressed, buoyant

Interpersonal stances: affective stance toward another person in a specific interactionfriendly, flirtatious, distant, cold, warm, supportive, contemptuous

Attitudes: enduring, affectively coloured beliefs, dispositions towards objects or persons liking, loving, hating, valueing, desiring

Personality traits: stable personality dispositions and typical behavior tendenciesnervous, anxious,reckless, morose, hostile, jealous

Extracting social/interactional meaningEmotion and Mood

Annoyance in talking to dialog systemsUncertainty of students in tutoringDetecting Trauma or Depression

Interpersonal StanceRomantic interest, flirtation, friendlinessAlignment/accommodation/entrainment

Attitudes = Sentiment (positive or negative)Movie or Products or Politics: is a text positive or negative?“Twitter mood predicts the stock market.”

Personality TraitsOpen, Conscienscious, Extroverted, Anxious

Social identity (Democrat, Republican, etc.)

Overview of Coursehttp://www.stanford.edu/~jurafsky/sslst11/

http://www.stanford.edu/~jurafsky/sslst11/

Outline for TodaySentiment Analysis (Attitude Detection)

1. Sentiment Tasks and Datasets2. Sentiment Classification Example: Movie Reviews3. The Dirty Details: Naïve Bayes Text Classification4. Sentiment Lexicons: Hand-built5. Sentiment Lexicons: Automatic

Sentiment AnalysisExtraction of opinions and attitudes from

text and speechWhen we say “sentiment analysis” We often mean a binary or an ordinal

tasklike X/ dislike Xone-star to 5-stars

1: Sentiment Tasks and Datasets

IMDB

slide from Chris Potts

Amazon


OpenTable


TripAdvisor


Richer sentiment on the web(not just positive/negative)Experience Project

http://www.experienceproject.com/confessions.php?cid=184000

FMyLifehttp://www.fmylife.com/miscellaneous/14613102

My Life is Averagehttp://mylifeisaverage.com/

It Made My Dayhttp://immd.icanhascheezburger.com/



http://www.fmylife.com/miscellaneous/14613102

http://mylifeisaverage.com/

http://immd.icanhascheezburger.com/

2: Sentiment Classification Example: Movie Reviews

Pang and Lee’s (2004) movie review data from IMDBPolarity data 2.0:

http://www.cs.cornell.edu/people/pabo/movie-review-data



Pang and Lee IMDB dataRating: pos

when _star wars_ came out some twenty years ago , the image of traveling throughout the starshas become a commonplace image .

…when han solo goes light speed , the stars change to bright lines , going towards

the viewer in lines that converge at an invisible point . cool . _october sky_ offers a much simpler image–that of a single white dot , traveling

horizontally across the night sky . [. . . ]

Rating: neg “ snake eyes ” is the most aggravating kind of movie : the kind that shows so much

potential thenbecomes unbelievably disappointing . it’s not just because this is a brian depalma film , and since he’s a great director

and one who’s films are always greeted with at least some fanfare . and it’s not even because this was a film starring nicolas cage and since he gives a

brauvara performance , this film is hardly worth his talents .

Pang and Lee AlgorithmClassification using different classifiers

Naïve BayesMaxEntSVM

Cross-validationBreak up data into 10 foldsFor each fold

Choose the fold as a temporary “test set”Train on 9 folds, compute performance on the test fold

Report the average performance of the 10 runs.

Negation in Sentiment Analysis

They have not succeeded, and will never succeed, in breaking the will of this valiant people.

Slide from Janyce Wiebe

Pang and Lee on Negationadded the tag NOT to every word between a negation word

(“not”, “isn’t”, “didn’t”, etc.) and the first punctuation mark following the negation word.

didn’t like this movie, but I

didn’t NOT_like NOT_this NOT_movie

Pang and Lee interesting Observation“Feature presence”

i.e. 1 if a word occurred in a document, 0 if it didn’tworked better than unigram probabilityWhy might this be?

Other difficulties in movie review classification

What makes movies hard to classify?Sentiment can be subtle:

Perfume review in “Perfumes: the Guide”:“If you are reading this because it is your darling fragrance, please

wear it at home exclusively, and tape the windows shut.”“She runs the gamut of emotions from A to B”

(Dorothy Parker on Katherine Hepburn)Order effects

This film should be brilliant. It sounds like a great plot, the actors are first grade, and the supporting cast is good as well, and Stallone is attempting to deliver a good performance. However, it can’t hold up.

22

3: Naïve Bayes text classification

Is this spam?

More Applications of Text Classification

Authorship identificationAge/gender identificationLanguage IdentificationAssigning topics such as Yahoo-categories

e.g., "finance," "sports," "news>world>asia>business"Genre-detection

e.g., "editorials" "movie-reviews" "news“Opinion/sentiment analysis on a person/product

e.g., “like”, “hate”, “neutral”Labels may be domain-specific

e.g., “contains adult language” : “doesn’t”

Text Classification: definitionThe classifier:

Input: a document dOutput: a predicted class c from some fixed set of labels

c1,...,cK

The learner:Input: a set of m hand-labeled documents (d1,c1),....,

(dm,cm)Output: a learned classifier f:d c

Slide from William Cohen

Multimedia GUIGarb.Coll.SemanticsML Planning

planningtemporalreasoningplanlanguage...

programmingsemanticslanguageproof...

learningintelligencealgorithmreinforcementnetwork...

garbagecollectionmemoryoptimizationregion...

“planning language proof intelligence”

TrainingData:

TestData:

Classes:(AI)

Document Classification

Slide from Chris Manning

(Programming) (HCI)

... ...

Classification Methods: Hand-coded rulesSome spam/email filters, etc. E.g., assign category if document contains a given

boolean combination of wordsAccuracy is often very high if a rule has been carefully

refined over time by a subject expertBuilding and maintaining these rules is expensive


Classification Methods:Machine LearningSupervised Machine LearningTo learn a function from documents (or sentences)

to labelsNaive Bayes (simple, common method)Others

k-Nearest Neighbors (simple, powerful)Support-vector machines (new, more powerful)… plus many other methods

No free lunch: requires hand-classified training dataBut data can be built up (and refined) by amateurs


Naïve Bayes Intuition

Representing text for classification


ARGENTINE 1986/87 GRAIN/OILSEED REGISTRATIONSBUENOS AIRES, Feb 26Argentine grain board figures show crop registrations of grains, oilseeds

and their products to February 11, in thousands of tonnes, showing those for future shipments month, 1986/87 total and 1985/86 total to February 12, 1986, in brackets:

• Bread wheat prev 1,655.8, Feb 872.0, March 164.6, total 2,692.4 (4,161.0).

• Maize Mar 48.0, total 48.0 (nil).• Sorghum nil (nil)• Oilseed export registrations were:• Sunflowerseed total 15.0 (7.9)• Soybean May 20.0, total 20.0 (nil)The board also detailed export registrations for subproducts, as follows....

f( )=c? What is the best representation for

the document d being classified?

simplest useful

Bag of words representation


ARGENTINE 1986/87 GRAIN/OILSEED REGISTRATIONSBUENOS AIRES, Feb 26Argentine grain board figures show crop registrations of grains,

oilseeds and their products to February 11, in thousands of tonnes, showing those for future shipments month, 1986/87 total and 1985/86 total to February 12, 1986, in brackets:

• Bread wheat prev 1,655.8, Feb 872.0, March 164.6, total 2,692.4 (4,161.0).

• Maize Mar 48.0, total 48.0 (nil).• Sorghum nil (nil)• Oilseed export registrations were:• Sunflowerseed total 15.0 (7.9)• Soybean May 20.0, total 20.0 (nil)The board also detailed export registrations for subproducts, as follows....

Categories: grain, wheat



xxxxxxxxxxxxxxxxxxx GRAIN/OILSEED xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx grain xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx grains, oilseeds

xxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxx tonnes, xxxxxxxxxxxxxxxxx shipments xxxxxxxxxxxx total xxxxxxxxx total xxxxxxxx xxxxxxxxxxxxxxxxxxxx:

• Xxxxx wheat xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, total xxxxxxxxxxxxxxxx

• Maize xxxxxxxxxxxxxxxxx• Sorghum xxxxxxxxxx• Oilseed xxxxxxxxxxxxxxxxxxxxx• Sunflowerseed xxxxxxxxxxxxxx• Soybean xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx....




xxxxxxxxxxxxxxxxxxx GRAIN/OILSEED xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx grain xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx grains,

oilseeds xxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxx tonnes, xxxxxxxxxxxxxxxxx shipments xxxxxxxxxxxx total xxxxxxxxx total xxxxxxxx xxxxxxxxxxxxxxxxxxxx:

• Xxxxx wheat xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, total xxxxxxxxxxxxxxxx

• Maize xxxxxxxxxxxxxxxxx• Sorghum xxxxxxxxxx• Oilseed xxxxxxxxxxxxxxxxxxxxx• Sunflowerseed xxxxxxxxxxxxxx• Soybean xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx....


grain(s) 3

oilseed(s) 2total 3wheat 1

maize 1soybean 1tonnes 1

... ...

word freq

Formalizing Naïve Bayes

Bayes’ Rule

)()()|()|(

APBPBAPABP =

•Allows us to swap the conditioning•Sometimes easier to estimate one kind of dependence than the other

Bayes’ Rule Applied to Documents and Classes


€

P(C,D) = P(C | D)P(D) = P(D | C)P(C)

€

P(C | D) = P(D | C)P(C)P(D)

Using a supervised learning method, we want to learn a classifier (or classification function ):

We denote the supervised learning method by:

The learning method takes the training set D as input and returns the learned classifier .

Once we have learned , we can apply it to the test set (or test

data).

The Text Classification Problem

Slide from Chien Chin Chen

CX :

= )(D

Naïve Bayes Text ClassificationThe Multinomial Naïve Bayes model (NB) is a

probabilistic learning method.

In text classification, our goal is to find the “best” class for the document:

Slide from Chien Chin Chen

)|(maxarg dcPcCc

map

=

)()|()(maxarg

dPcdPcP

Cc=

)|()(maxarg cdPcPCc

=

The probability of a document d being in class c.

Bayes’ Rule

We can ignore the denominator

Naive Bayes ClassifiersWe represent an instance D based on some attributes.

Task: Classify a new instance D based on a tuple of attribute values into one of the classes cj C


nxxxD ,,, 21 =

),,,|(argmax 21 njCc

MAP xxxcPcj

=

),,,()()|,,,(

argmax21

21

n

jjn

Cc xxxPcPcxxxP

j

=

)()|,,,(argmax 21 jjnCc

cPcxxxPj

=

The probability of a document d being in class c.

Bayes’ Rule

We can ignore the denominator

Naïve Bayes Classifier: Naïve Bayes AssumptionP(cj)

Can be estimated from the frequency of classes in the training examples.

P(x1,x2,…,xn|cj) O(|X|n•|C|) parametersCould only be estimated if a very, very large number of

training examples was available.Naïve Bayes Conditional Independence Assumption:Assume that the probability of observing the conjunction

of attributes is equal to the product of the individual probabilities P(xi|cj).


Flu

X1 X2 X5X3 X4feversinus coughrunnynose muscle-ache

The Naïve Bayes Classifier

Conditional Independence Assumption: features are independent of each other given the

class:


)|()|()|()|,,( 52151 CXPCXPCXPCXXP =

Using Multinomial Naive Bayes Classifiers to Classify Text:

Attributes are text positions, values are words.


Still too many possibilities Assume that classification is independent of the positions of the

words Use same parameters for each position Result is bag of words model (over tokens not types)

)|text""()|our""()(argmax

)|()(argmax

1j

j

jnjjCc

ijij

CcNB

cxPcxPcP

cxPcPc

===

=

Learning the Model

Simplest: maximum likelihood estimatesimply use the frequencies in the data


)(),(

)|(ˆj

jiiji cCN

cCxXNcxP

===

=

C

X1 X2 X5X3 X4 X6

NcCN

cP jj

)()(ˆ ==

Smoothing to Avoid Overfitting

kcCNcCxXN

cxPj

jiiji =

===

)(1),(

)|(ˆ


# of values of Xi

• Laplace:

Textj single document containing all docsj

for each word wk in Vocabulary nk number of occurrences of wk in Textj

Naïve Bayes: LearningFrom training corpus, extract VocabularyCalculate required P(cj) and P(wk | cj) terms

For each cj in C dodocsj subset of documents for which the target class is cj

Slide from Chris Manning€

P(wk | c j ) ← nk + αn + α |Vocabulary |

|documents # total|||

)( jj

docscP

Naïve Bayes: Classifyingpositions all word positions in current document

which contain tokens found in Vocabulary

Return cNB, where

Slide from Chris Manning€

cNB = argmaxcj ∈C

P(c j ) P(wi | c j )i∈positions

∏

4: Sentiment Lexicons: Hand-BuiltKey task: VocabularyThe previous work uses all the words in a documentCan we do better by focusing on subset of words?

How to find words, phrases, patterns that express sentiment or polarity?

49

4: Sentiment/Affect Lexicons: GenInq

Harvard General Inquirer DatabaseContains 3627 negative and positive word-strings: http://www.wjh.harvard.edu/~inquirer/http://www.wjh.harvard.edu/~inquirer/homecat.htmPositiv (1915 words) versus Negativ (2291 words)Strong vs WeakActive vs PassiveOverstated versus UnderstatedPleasure, Pain, Virtue, ViceMotivation, Cognitive Orientation, etc

http://www.wjh.harvard.edu/~inquirer/

http://www.wjh.harvard.edu/~inquirer/homecat.htm

5: Sentiment/Affect Lexicons: LIWCLIWC (Linguistic Inquiry and Word Count)

Pennebaker, Francis, & Booth, 2001dictionary of 2300 words grouped into > 70 classes

Affective Processes negative emotion (bad, weird, hate, problem, tough) positive emotion (love, nice, sweet)

Cognitive Processes Tentative (maybe, perhaps, guess) Inhibition (block, constraint, stop)

Bodily Proceeses sexual (sex, horny, love, incest)

Pronouns 1st person pronouns (I me mine myself I’d I’ll I’m…) 2nd person pronouns

Negation (no, not, never), Quantifiers (few, many, much), http://www.wjh.harvard.edu/~inquirer/homecat.htm.

Sentiment Lexicons and outcomesPotts “On the Negativity of Negation”Is logical negation associated with negative

sentiment?Pott’s experiment

Get counts of the word not, n’t, no, never, and compounds formed with no

In online reviews, etcAnd regress against the review rating

More logical negation in IMDB reviews which have negative sentiment

More logical negation in all reviews which have negative sentimentAmazon, GoodReads, OpenTable, Tripadvisor

Voting no (after removing the word “no”)a

5: Sentiment Lexicons: Automatically ExtractedAdjectives

positive: honest important mature large patientHe is the only honest man in Washington. Her writing is unbelievably mature and is only likely to get

better. To humour me my patient father agrees yet again to my

choice of filmnegative: harmful hypocritical inefficient insecure

It was a macabre and hypocritical circus. Why are they being so inefficient ?

Slide from Janyce Wiebe 56


Other parts of speechVerbspositive: praise, lovenegative: blame, criticize

Nounspositive: pleasure, enjoymentnegative: pain, criticism

57

Slide adapted form Janyce Wiebe

PhrasesPhrases containing adjectives and adverbs

positive: high intelligence, low costnegative: little variation, many troubles

58

Slide adapted from Janyce Wiebe

Intuition for identifying polarity words

Assume that contexts are coherentFair and legitimate, corrupt and brutal*fair and brutal, *corrupt and legitimate

59

Hatzivassiloglou & McKeown 1997Predicting the semantic orientation of adjectives

Step 1 From 21-million word WSJ corpus For every adjective with frequency > 20

Label for polarity Total of 1336 adjectives

657 positive 679 negative

60

ICWSM 2008

Step 2: Extract all conjoined adjectives

61

Hatzivassiloglou & McKeown 1997

Slide adapted from Janyce Wiebe 61

nice and comfortablenice and scenic

Slide adapted from Janyce Wiebe

Hatzivassiloglou & McKeown 19973. A supervised learning algorithm builds a graph of

adjectives linked by the same or different semantic orientation

62

nice

handsometerrible

comfortable

painful

expensivefun

scenic


Hatzivassiloglou & McKeown 19974. A clustering algorithm partitions the adjectives into

two subsets

63

nice

handsometerrible

comfortable

painful

expensivefun

scenic slow+

Hatzivassiloglou & McKeown 1997

Slide from Marta Tatu

Turney (2002): Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

Input: review Identify phrases that contain adjectives or adverbs

by using a part-of-speech tagger Estimate the semantic orientation of each phrase Assign a class to the given review based on the

average semantic orientation of its phrasesOutput: classification ( or )

65

66

Turney Step 1Extract all two-word phrases including an adjective

First Word Second Word Third Word(not extracted)

1. JJ NN or NNS Anything2. RB, RBR, or RBS JJ Not NN nor NNS3. JJ JJ Not NN nor NNS4. NN or NNS JJ Not NN nor NNS5. RB, RBR, or RBS VB, VBD, VBN, or VBG Anything


67

Turney Step 2Estimate the semantic orientation of the extracted phrases

using Pointwise Mutual Information


Pointwise Mutual InformationMutual information: between 2 random variables X

and Y

Pointwise mutual information: measure of how often two events x and y occur, compared with what we would expect if they were independent:

Weighting: Mutual Information

Pointwise mutual information: measure of how often two events x and y occur, compared with what we would expect if they were independent:

PMI between two words: how much more often they occur together than we would expect if they were independent

€

PMI(word1,word2) = log2p(word1 ,word 2 )

p(word1 )p(word 2 )( )€

PMI(x,y) = log2p(x,y )

p(x )p(y )( )

70

Turney Step 2Semantic Orientation of a phrase defined as:

Estimate PMI by issuing queries to a search engine (Altavista, ~350 million pages)

)poor"",(PMI)excellent"",(PMI)(SO phrasephrasephrase =


=

)excellent")hits("poor"" NEAR hits()poor")hits("excellent"" NEAR hits(log)(SO 2 phrase

phrasephrase

Turney Step 3Calculate average semantic orientation of phrases in review

Positive: Negative:

Phrase POS tags

SO

direct deposit JJ NN 1.288local branch JJ NN 0.421small part JJ NN 0.053online service JJ NN 2.780well other RB JJ 0.237low fees JJ NNS 0.333…true service JJ NN -0.732other bank JJ NN -0.850inconveniently located

RB VBN -1.541

Average Semantic Orientation

0.322

Slide adapted from Marta Tatu 71

72

Experiments410 reviews from Epinions

170 (41%) ()240 (59%) ()Average phrases per review: 26

Baseline accuracy: 59%

Domain Accuracy CorrelationAutomobiles 84.00% 0.4618Banks 80.00% 0.6167Movies 65.83% 0.3608Travel Destinations 70.53% 0.4155All 74.39% 0.5174


Summary on SentimentGenerally modeled as classification or regression task

predict a binary or ordinal labelFunction words can be a good cueUsing all words (in naïve bayes) works well for some

tasksFinding subsets of words may help in other tasks

OutlineSentiment Analysis (Attitude Detection)

1. Sentiment Tasks and Datasets2. Sentiment Classification Example: Movie Reviews3. The Dirty Details: Naïve Bayes Text Classification4. Sentiment Lexicons: Hand-built5. Sentiment Lexicons: Automatic

Download - Computational Extraction of Social and Interactional Meaning SSLST, Summer 2011

Top Related