cs 6120/cs4120: natural language processing · 2017-11-20 · •opinion mining •sentiment mining...

Post on 10-Jul-2018

222 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CS6120/CS4120:NaturalLanguageProcessing

Instructor:Prof.LuWangCollegeofComputerandInformationScience

NortheasternUniversityWebpage:www.ccs.neu.edu/home/luwang

Presentationandreport• ProblemDescription(10point)Whatisthetask?SysteminputandoutputExampleswillbehelpful

• Reference/Relatedwork(20points)Putyourworkincontext:whathasbeendonebefore?Youneedtohavereference!What’snewinyourwork?

• Methodology:Whatyouhavedone(30points)PreprocessingofthedataWhatareyourdata?Featuresused?Whatareeffective,andwhatarenot?Whatmethodsdoyouexperimentwith?Andwhydoyouthinkthey’rereasonableandsuitableforthetask?

• Experiments(40points)Datasetssize,train/test/developmentEvaluationmetrics:whatareusedandaretheypropertocalibratesystemperformance?Baselines:whatarethey?Results,tables,figures,etc

SentimentAnalysis

Positiveornegativemoviereview?

• unbelievablydisappointing• Fullofzanycharactersandrichlyappliedsatire,andsomegreatplottwists

• thisisthegreatestscrewballcomedyeverfilmed• Itwaspathetic.Theworstpartaboutitwastheboxingscenes.

GoogleProductSearch

• a

BingShopping

• a

TwittersentimentversusGallupPollofConsumerConfidence

BrendanO'Connor,Ramnath Balasubramanyan,BryanR.Routledge,andNoahA.Smith.2010.FromTweetstoPolls:LinkingTextSentimenttoPublicOpinionTimeSeries.InICWSM-2010

Twittersentiment:

JohanBollen,Huina Mao,Xiaojun Zeng.2011.Twittermoodpredictsthestockmarket,JournalofComputationalScience2:1,1-8.10.1016/j.jocs.2010.12.007.

TargetSentimentonTwitter

• TwitterSentimentApp• AlecGo,Richa Bhayani,LeiHuang.2009.TwitterSentimentClassificationusingDistantSupervision

Sentimentanalysishasmanyothernames

•Opinionextraction•Opinionmining•Sentimentmining•Subjectivityanalysis

Whysentimentanalysis?

•Movie:isthisreviewpositiveornegative?•Products:whatdopeoplethinkaboutthenewiPhone?•Publicsentiment:howisconsumerconfidence?Isdespairincreasing?

•Politics:whatdopeoplethinkaboutthiscandidateorissue?•Prediction:predictelectionoutcomesormarkettrendsfromsentiment

SchererTypologyofAffectiveStates

• Emotion:brieforganicallysynchronized…evaluationofamajorevent• angry,sad,joyful,fearful,ashamed,proud,elated

• Mood:diffusenon-causedlow-intensitylong-durationchangeinsubjectivefeeling• cheerful,gloomy,irritable,listless,depressed,buoyant

• Interpersonalstances:affectivestancetowardanotherpersoninaspecificinteraction• friendly,flirtatious,distant,cold,warm,supportive,contemptuous

• Attitudes:enduring,affectivelycoloredbeliefs,dispositionstowardsobjectsorpersons• liking,loving,hating,valuing,desiring

• Personalitytraits:stablepersonalitydispositionsandtypicalbehaviortendencies• nervous,anxious,reckless,morose,hostile,jealous

SchererTypologyofAffectiveStates

• Emotion:brieforganicallysynchronized…evaluationofamajorevent• angry,sad,joyful,fearful,ashamed,proud,elated

• Mood:diffusenon-causedlow-intensitylong-durationchangeinsubjectivefeeling• cheerful,gloomy,irritable,listless,depressed,buoyant

• Interpersonalstances:affectivestancetowardanotherpersoninaspecificinteraction• friendly,flirtatious,distant,cold,warm,supportive,contemptuous

• Attitudes:enduring,affectivelycoloredbeliefs,dispositionstowardsobjectsorpersons• liking,loving,hating,valuing,desiring

• Personalitytraits:stablepersonalitydispositionsandtypicalbehaviortendencies• nervous,anxious,reckless,morose,hostile,jealous

SentimentAnalysis

• Sentimentanalysisisthedetectionofattitudes“enduring,affectivelycoloredbeliefs,dispositionstowardsobjectsorpersons”1. Holder(source)ofattitude2. Target(aspect)ofattitude3. Typeofattitude

• Fromasetoftypes• Like,love,hate,value,desire, etc.

• Or(morecommonly)simpleweightedpolarity:• positive,negative,neutral,togetherwithstrength

4. Text containingtheattitude• Sentence orentiredocument

SentimentAnalysis

•Simplesttask:• Istheattitudeofthistextpositiveornegative?

•Morecomplex:•Ranktheattitudeofthistextfrom1to5

•Advanced:•Detectthetarget,source,orcomplexattitudetypes

SentimentAnalysis

•Simplesttask:• Istheattitudeofthistextpositiveornegative?

•Morecomplex:•Ranktheattitudeofthistextfrom1to5

•Advanced:•Detectthetarget,source,orcomplexattitudetypes

Sentiment Classification in Movie Reviews

• Polaritydetection:• IsanIMDBmoviereviewpositiveornegative?

• Data:PolarityData2.0:• http://www.cs.cornell.edu/people/pabo/movie-review-data

BoPang,LillianLee,andShivakumar Vaithyanathan.2002.Thumbsup?SentimentClassificationusingMachineLearningTechniques.EMNLP-2002,79—86.BoPangandLillianLee.2004.ASentimentalEducation:SentimentAnalysisUsingSubjectivitySummarizationBasedonMinimumCuts.ACL,271-278

IMDBdatainthePangandLeedatabase

when_starwars_cameoutsometwentyyearsago,theimageoftravelingthroughoutthestarshasbecomeacommonplaceimage.[…]whenhan sologoeslightspeed,thestarschangetobrightlines,goingtowardstheviewerinlinesthatconvergeataninvisiblepoint.cool._october sky_offersamuchsimplerimage–thatofasinglewhitedot,travelinghorizontallyacrossthenightsky.[...]

“snakeeyes”isthemostaggravatingkindofmovie:thekindthatshowssomuchpotentialthenbecomesunbelievablydisappointing.it’snotjustbecausethisisabriandepalma film,andsincehe’sagreatdirectorandonewho’sfilmsarealwaysgreetedwithatleastsomefanfare.andit’snotevenbecausethiswasafilmstarringnicolas cageandsincehegivesabrauvara performance,thisfilmishardlyworthhistalents.

✓ ✗

BaselineAlgorithm(adaptedfromPangandLee)•Tokenization•FeatureExtraction•Classificationusingdifferentclassifiers

• NaïveBayes• MaxEnt• SVM

SentimentTokenizationIssues

• DealwithHTMLandXMLmarkup• Twittermark-up(names,hashtags)• Capitalization(preserveforwordsinallcaps)

• Phonenumbers,dates• Emoticons• Usefulcode:

• ChristopherPottssentimenttokenizer• BrendanO’Connortwittertokenizer

[<>]? # optional hat/brow[:;=8] # eyes[\-o\*\']? # optional nose[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth | #### reverse orientation[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth[\-o\*\']? # optional nose[:;=8] # eyes[<>]? # optional hat/brow

Pottsemoticons

ExtractingFeaturesforSentimentClassification

• Howtohandlenegation• I didn’t like this movie

vs• I really like this movie

• Whichwordstouse?• Onlyadjectives• Allwords

• Allwordsturnsouttoworkbetter,atleastonthisdata

Negation

AddNOT_toeverywordbetweennegationandfollowingpunctuation:

didn’t like this movie , but I

didn’t NOT_like NOT_this NOT_movie but I

Das,Sanjiv andMikeChen.2001.Yahoo!forAmazon:Extractingmarketsentimentfromstockmessageboards.InProceedingsoftheAsiaPacificFinanceAssociationAnnualConference(APFA).BoPang,LillianLee,andShivakumar Vaithyanathan.2002.Thumbsup?SentimentClassificationusingMachineLearningTechniques.EMNLP-2002,79—86.

Reminder:Naïve Bayes

P̂(w | c) = count(w,c)+1count(c)+ V

cNB = argmaxc j∈C

P(cj ) P(wi | cj )i∈positions∏

Binarized (Booleanfeature)MultinomialNaïve Bayes

• Intuition:• Forsentiment(andprobablyforothertextclassificationdomains)• Wordoccurrencemaymattermorethanwordfrequency

• Theoccurrenceofthewordfantastic tellsusalot• Thefactthatitoccurs5timesmaynottellusmuchmore.

• BooleanMultinomialNaïve Bayes• Clipsallthewordcountsineachdocumentat1

BooleanMultinomialNaïveBayes:Learning

• CalculateP(cj) terms• Foreachcj inC do

docsj¬ alldocswithclass=cj

P(cj )←| docsj |

| total # documents|

P(wk | cj )←nk +α

n+α |Vocabulary |

• Textj¬ singledoccontainingalldocsj• For eachwordwk inVocabulary

nk¬ #ofoccurrencesofwk inTextj

• Fromtrainingcorpus,extractVocabulary• CalculateP(wk | cj) terms

• Removeduplicatesineachdoc:• Foreachwordtypewindocj• Retainonlyasingleinstanceofw

BooleanMultinomialNaïve Bayesonatestdocumentd

• Firstremoveallduplicatewordsfromd• ThencomputeNBusingthesameequation:

cNB = argmaxc j∈C

P(cj ) P(wi | cj )i∈positions∏

Normalvs.BooleanMultinomialNBNormal Doc Words ClassTraining 1 Chinese BeijingChinese c

2 ChineseChineseShanghai c3 ChineseMacao c4 TokyoJapanChinese j

Test 5 ChineseChineseChineseTokyo Japan ?

Boolean Doc Words ClassTraining 1 Chinese Beijing c

2 ChineseShanghai c3 ChineseMacao c4 TokyoJapanChinese j

Test 5 ChineseTokyo Japan ?

Binarized (Booleanfeature)MultinomialNaïve Bayes

•Binaryseemstoworkbetterthanfullwordcounts•Otherpossibility:log(freq(w))

B.Pang,L.Lee,andS.Vaithyanathan.2002.Thumbsup?SentimentClassificationusingMachineLearningTechniques.EMNLP-2002,79—86.V.Metsis,I.Androutsopoulos,G.Paliouras.2006.SpamFilteringwithNaiveBayes– WhichNaiveBayes?CEAS2006- ThirdConferenceonEmailandAnti-Spam.K.-M.Schneider.2004.OnwordfrequencyinformationandnegativeevidenceinNaiveBayestextclassification.ICANLP,474-485.JDRennie,LShih,JTeevan.2003.Tacklingthepoorassumptionsofnaivebayes textclassifiers.ICML2003

Cross-Validation

• Breakupdatainto5 folds• (Equalpositiveandnegativeinsideeachfold?)

• Foreachfold• Choosethefoldasatemporarytestset

• Trainon4folds,computeperformanceonthetestfold

• Reportaverageperformanceofthe4 runs

TrainingTest

Test

Test

Test

Test

Training

Training Training

Training

Training

Iteration

1

2

3

4

5

OtherissuesinClassification

• MaxEnt andSVMtendtodobetterthanNaïve Bayes

Problems:Whatmakesreviewshardtoclassify?

•Subtlety:• PerfumereviewinPerfumes:theGuide:

• “Ifyouarereadingthisbecauseitisyourdarlingfragrance,pleasewearitathomeexclusively,andtapethewindowsshut.”

• DorothyParkeronKatherineHepburn• “SherunsthegamutofemotionsfromAtoB”

ThwartedExpectationsandOrderingEffects

• “Thisfilmshouldbebrilliant.Itsoundslikeagreatplot,theactorsarefirstgrade,andthesupportingcastisgoodaswell,andStalloneisattemptingtodeliveragoodperformance.However,itcan’tholdup.”

•WellasusualKeanuReevesisnothingspecial,butsurprisingly,theverytalentedLaurenceFishbourne isnotsogoodeither,Iwassurprised.

SentimentLexicons

TheGeneralInquirer

• Homepage:http://www.wjh.harvard.edu/~inquirer• ListofCategories:http://www.wjh.harvard.edu/~inquirer/homecat.htm

• Spreadsheet:http://www.wjh.harvard.edu/~inquirer/inquirerbasic.xls• Categories:

• Positiv (1915words)andNegativ (2291words)• Strongvs Weak,Activevs Passive,OverstatedversusUnderstated• Pleasure,Pain,Virtue,Vice,Motivation,CognitiveOrientation,etc

• FreeforResearchUse

PhilipJ.Stone,DexterCDunphy,MarshallS.Smith,DanielM.Ogilvie.1966.TheGeneralInquirer:AComputerApproachtoContentAnalysis.MITPress

LIWC(LinguisticInquiryandWordCount)Pennebaker,J.W.,Booth,R.J.,&Francis,M.E.(2007).LinguisticInquiryandWordCount:LIWC2007.Austin,TX

• Homepage:http://www.liwc.net/• 2300words,>70classes• AffectiveProcesses

• negativeemotion(bad,weird,hate,problem,tough)• positiveemotion(love,nice,sweet)

• CognitiveProcesses• Tentative(maybe,perhaps,guess),Inhibition(block,constraint)

• Pronouns,Negation(no,never),Quantifiers(few,many)• Notfreethough!

MPQASubjectivityCuesLexicon

• Homepage:http://www.cs.pitt.edu/mpqa/subj_lexicon.html• 6885wordsfrom8221lemmas

• 2718positive• 4912negative

• Eachwordannotatedforintensity(strong,weak)• GNUGPL

Theresa Wilson,Janyce Wiebe,andPaulHoffmann(2005).Recognizing Contextual Polarity inPhrase-LevelSentiment Analysis.Proc.ofHLT-EMNLP-2005.

Riloff andWiebe (2003).Learningextractionpatternsforsubjectiveexpressions.EMNLP-2003.

BingLiuOpinionLexicon

• BingLiu'sPageonOpinionMining• http://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar

•6786words• 2006positive• 4783negative

Minqing HuandBingLiu.MiningandSummarizingCustomerReviews.ACMSIGKDD-2004.

SentiWordNetStefanoBaccianella,AndreaEsuli,andFabrizioSebastiani.2010SENTIWORDNET3.0:AnEnhanced Lexical ResourceforSentiment AnalysisandOpinionMining.LREC-2010

• Homepage:http://sentiwordnet.isti.cnr.it/• AllWordNet synsets automaticallyannotatedfordegreesofpositivity,

negativity,andneutrality/objectiveness• [estimable(J,3)]“maybecomputedorestimated”

Pos 0 Neg 0 Obj 1 • [estimable(J,1)]“deservingofrespectorhighregard”

Pos .75 Neg 0 Obj .25

Disagreementsbetweenpolaritylexicons

OpinionLexicon

GeneralInquirer

SentiWordNet LIWC

MPQA 33/5402 (0.6%) 49/2867(2%) 1127/4214(27%) 12/363(3%)

OpinionLexicon 32/2411 (1%) 1004/3994 (25%) 9/403(2%)

GeneralInquirer 520/2306(23%) 1/204 (0.5%)

SentiWordNet 174/694(25%)

LIWC

ChristopherPotts,SentimentTutorial,2011

AnalyzingthepolarityofeachwordinIMDB

• Howlikelyiseachwordtoappearineachsentimentclass?• Count(“bad”)in1-star,2-star,3-star,etc.• Butcan’tuserawcounts:• Instead,likelihood:

• Makethemcomparablebetweenwords• Scaledlikelihood:

Potts,Christopher.2011.Onthenegativityofnegation.SALT20,636-659.

P(w | c) = f (w,c)f (w,c)

w∈c∑

P(w | c)P(w)

AnalyzingthepolarityofeachwordinIMDB

●●

●●

●●

●●

POS good (883,417 tokens)

1 2 3 4 5 6 7 8 9 10

0.080.10.12

● ● ● ● ●●

amazing (103,509 tokens)

1 2 3 4 5 6 7 8 9 10

0.05

0.17

0.28

●●

●●

great (648,110 tokens)

1 2 3 4 5 6 7 8 9 10

0.05

0.11

0.17

● ● ● ●●

awesome (47,142 tokens)

1 2 3 4 5 6 7 8 9 10

0.05

0.16

0.27

Pr(c|w)

Rating

● ● ● ●

●● ●

NEG good (20,447 tokens)

1 2 3 4 5 6 7 8 9 10

0.03

0.1

0.16● ●

●●

●● ● ●

depress(ed/ing) (18,498 tokens)

1 2 3 4 5 6 7 8 9 10

0.080.110.13

●● ●

bad (368,273 tokens)

1 2 3 4 5 6 7 8 9 10

0.04

0.12

0.21

●● ● ●

terrible (55,492 tokens)

1 2 3 4 5 6 7 8 9 10

0.03

0.16

0.28

Pr(c|w)

Rating

Scaledlikelihoo

dP(w|c)/P(w)

Scaledlikelihoo

dP(w|c)/P(w)

Potts,Christopher.2011.Onthenegativityofnegation.SALT20,636-659.

Othersentimentfeature:Logicalnegation

• Islogicalnegation(no,not)associatedwithnegativesentiment?

•Pottsexperiment:• Countnegation(not,n’t,no,never)inonlinereviews• Regressagainstthereviewrating

Potts,Christopher.2011.Onthenegativityofnegation.SALT20,636-659.

Potts2011Results:Morenegationinnegativesentiment

a

Scaledlikelihoo

dP(w|c)/P(w)

LearningSentimentLexicons

Semi-supervisedlearningoflexicons

•Useasmallamountofinformation• Afewlabeledexamples• Afewhand-builtpatterns

•Tobootstrapalexicon

Hatzivassiloglou andMcKeown intuitionforidentifyingwordpolarity

•Adjectivesconjoinedby“and”havesamepolarity• Fairand legitimate,corruptand brutal• *fairand brutal,*corruptand legitimate

•Adjectivesconjoinedby“but”donot• fairbutbrutal

Vasileios Hatzivassiloglou andKathleenR.McKeown.1997.PredictingtheSemanticOrientationofAdjectives.ACL,174–181

Hatzivassiloglou &McKeown 1997Step1• Labelseedsetof1336adjectives(all>20in21millionwordWSJcorpus)

• 657positive• adequatecentralcleverfamousintelligentremarkablereputedsensitiveslenderthriving…

• 679negative• contagiousdrunkenignorantlankylistlessprimitivestridenttroublesomeunresolvedunsuspecting…

Hatzivassiloglou &McKeown 1997Step2

•Expandseedsettoconjoinedadjectives

nice, helpful

nice, classy

Hatzivassiloglou &McKeown 1997Step3• Supervisedclassifierassigns“polaritysimilarity”toeachwordpair,resultingingraph:

classy

nice

helpful

fair

brutal

irrationalcorrupt

Hatzivassiloglou &McKeown 1997Step4• Clusteringforpartitioningthegraphintotwo

classy

nice

helpful

fair

brutal

irrationalcorrupt

+ -

Outputpolaritylexicon

• Positive• bolddecisivedisturbinggenerousgoodhonestimportantlargematurepatientpeacefulpositiveproudsoundstimulatingstraightforwardstrangetalentedvigorouswitty…

• Negative• ambiguouscautiouscynicalevasiveharmfulhypocriticalinefficientinsecureirrationalirresponsibleminoroutspokenpleasantrecklessriskyselfishtediousunsupportedvulnerablewasteful…

Outputpolaritylexicon

• Positive• bolddecisivedisturbing generousgoodhonestimportantlargematurepatientpeacefulpositiveproudsoundstimulatingstraightforwardstrangetalentedvigorouswitty…

• Negative• ambiguouscautious cynicalevasiveharmfulhypocriticalinefficientinsecureirrationalirresponsibleminoroutspoken pleasant recklessriskyselfishtediousunsupportedvulnerablewasteful…

Turney Algorithm

1. Extractaphrasallexiconfromreviews2. Learnpolarityofeachphrase3. Rateareviewbytheaveragepolarityofitsphrases

Turney (2002):ThumbsUporThumbsDown?SemanticOrientationAppliedtoUnsupervisedClassificationofReviews

Extracttwo-wordphraseswithadjectives

FirstWord SecondWord ThirdWord (notextracted)

JJ NNorNNS anythingRB, RBR,RBS JJ NotNNnorNNSJJ JJ NotNNorNNSNNorNNS JJ NorNNnor NNSRB,RBR,orRBS VB,VBD,VBN,VBG anything

Howtomeasurepolarityofaphrase?

• Positivephrasesco-occurmorewith“excellent”• Negativephrasesco-occurmorewith“poor”• Buthowtomeasureco-occurrence?

Pointwise MutualInformation

•Mutualinformationbetween2randomvariablesXandY

•Pointwise mutualinformation:• Howmuchmoredoeventsxandyco-occurthaniftheywereindependent?

I(X,Y ) = P(x, y)y∑

x∑ log2

P(x,y)P(x)P(y)

PMI(X,Y ) = log2P(x,y)P(x)P(y)

Pointwise MutualInformation

•Pointwise mutualinformation:• Howmuchmoredoeventsxandyco-occurthaniftheywereindependent?

•PMIbetweentwowords:• Howmuchmoredotwowordsco-occurthaniftheywereindependent?

PMI(word1,word2 ) = log2P(word1,word2)P(word1)P(word2)

PMI(X,Y ) = log2P(x,y)P(x)P(y)

HowtoEstimatePointwise MutualInformation

•Querysearchengine(Altavista)•P(word)estimatedbyhits(word)/N•P(word1,word2)byhits(word1 NEAR word2)/N

• (MorecorrectlythebigramdenominatorshouldbekN,becausethereareatotalofNconsecutivebigrams(word1,word2),butkN bigramsthatarekwordsapart,butwejustuseNontherestofthisslideandthenext.)

PMI(word1,word2 ) = log2

1Nhits(word1 NEAR word2)

1Nhits(word1) 1

Nhits(word2)

Doesphraseappearmorewith“poor”or“excellent”?

Polarity(phrase) = PMI(phrase,"excellent")−PMI(phrase,"poor")

= log2hits(phrase NEAR "excellent")hits("poor")hits(phrase NEAR "poor")hits("excellent")!

"#

$

%&

= log2hits(phrase NEAR "excellent")

hits(phrase)hits("excellent")hits(phrase)hits("poor")

hits(phrase NEAR "poor")

= log2

1N hits(phrase NEAR "excellent")1N hits(phrase) 1

N hits("excellent")− log2

1N hits(phrase NEAR "poor")1N hits(phrase) 1

N hits("poor")

Phrasesfromathumbs-upreview

Phrase POStags Polarity

online service JJNN 2.8

onlineexperience JJNN 2.3

directdeposit JJNN 1.3

localbranch JJNN 0.42…

lowfees JJNNS 0.33

trueservice JJNN -0.73

other bank JJNN -0.85

inconveniently located JJNN -1.5

Average 0.32

Phrasesfromathumbs-downreview

Phrase POStags Polarity

directdeposits JJNNS 5.8

onlineweb JJNN 1.9

veryhandy RB JJ 1.4…

virtual monopoly JJNN -2.0

lesserevil RBRJJ -2.3

otherproblems JJNNS -2.8

low funds JJNNS -6.8

unethical practices JJNNS -8.5

Average -1.2

ResultsofTurney algorithm

• 410reviewsfromEpinions• 170(41%)negative• 240(59%)positive

• Majorityclassbaseline:59%• Turney algorithm:74%

• Phrasesratherthanwords• Learnsdomain-specificinformation

UsingWordNet tolearnpolarity

• WordNet:onlinethesaurus(coveredinlaterlecture).• Createpositive(“good”)andnegativeseed-words(“terrible”)• FindSynonymsandAntonyms

• PositiveSet:Addsynonymsofpositivewords(“well”)andantonymsofnegativewords

• NegativeSet:Addsynonymsofnegativewords(“awful”)andantonymsofpositivewords(”evil”)

• Repeat,followingchainsofsynonyms• Filter

S.M.KimandE.Hovy.2004.Determiningthesentimentofopinions.COLING2004M.HuandB.Liu.Miningandsummarizingcustomerreviews.InProceedingsofKDD,2004

SummaryonLearningLexicons

•Advantages:• Canbedomain-specific• Canbemorerobust(morewords)

• Intuition• Startwithaseedsetofwords(‘good’,‘poor’)• Findotherwordsthathavesimilarpolarity:

• Using“and”and“but”• Usingwordsthatoccurnearbyinthesamedocument• UsingWordNet synonymsandantonyms

• Useseedsandsemi-supervisedlearningtoinducelexicons

OtherSentimentTasks

• Importantforfindingaspectsorattributes• Targetofsentiment

• The food was great but the service was awful

Findingaspect/attribute/targetofsentiment

• Frequentphrases+rules• Findallhighlyfrequentphrasesacrossreviews(“fish tacos”)• Filterbyruleslike“occursrightaftersentimentword”

• “…great fish tacos”meansfish tacos alikelyaspect

Casino casino,buffet,pool,resort,bedsChildren’s Barber haircut,job,experience,kidsGreekRestaurant food,wine,service,appetizer,lambDepartmentStore selection,department,sales,shop,clothing

M.HuandB.Liu.2004.Miningandsummarizingcustomerreviews.InProceedingsofKDD.S.Blair-Goldensohn,K.Hannan,R.McDonald,T.Neylon,G.Reis,andJ.Reynar.2008.BuildingaSentimentSummarizerforLocalServiceReviews.WWWWorkshop.

Findingaspect/attribute/targetofsentiment

• Theaspectnamemaynotbeinthesentence• Forrestaurants/hotels,aspectsarewell-understood• Supervisedclassification

• Hand-labelasmallcorpusofrestaurantreviewsentenceswithaspect• food,décor,service,value,NONE

• Trainaclassifiertoassignanaspecttoasentence• “Giventhissentence,istheaspectfood,décor,service,value,or NONE”

Puttingitalltogether:Findingsentimentforaspects

ReviewsFinalSummary

Sentences&Phrases

Sentences&Phrases

Sentences&Phrases

TextExtractor

SentimentClassifier

AspectExtractor

Aggregator

S.Blair-Goldensohn,K.Hannan,R.McDonald,T.Neylon,G.Reis,andJ.Reynar.2008.BuildingaSentimentSummarizerforLocalServiceReviews.WWWWorkshop

ResultsofBlair-Goldensohn etal.method

Rooms (3/5stars,41comments)(+) Theroomwascleanandeverythingworkedfine– eventhewaterpressure...(+)Wewentbecauseofthefreeroomandwaspleasantlypleased...(-)…theworsthotelIhadeverstayedat...

Service (3/5stars,31comments)(+)Uponcheckingoutanothercouplewascheckingearlyduetoaproblem...(+)Everysinglehotelstaffmembertreatedusgreatandansweredevery...(-)ThefoodiscoldandtheservicegivesnewmeaningtoSLOW.

Dining (3/5stars,18comments)(+)ourfavoriteplacetostayinbiloxi.thefoodisgreatalsotheservice...(+)OfferoffreebuffetforjoiningthePlay

SummaryonSentiment

•Generallymodeledasclassificationorregressiontask• predictabinaryorordinallabel

•Features:• Negationisimportant• Usingallwords(innaïvebayes)workswellforsometasks• Findingsubsetsofwordsmayhelpinothertasks

• Hand-builtpolaritylexicons• Useseedsandsemi-supervisedlearningtoinducelexicons

Emotions

Scherer’stypologyofaffectivestatesEmotion:relativelybriefepisodeofsynchronizedresponseofallormostorganismicsubsystemsinresponsetotheevaluationofaneventasbeingofmajorsignificance

angry,sad,joyful,fearful,ashamed,proud,desperateMood:diffuseaffectstate…changeinsubjectivefeeling,oflowintensitybutrelativelylongduration,oftenwithoutapparentcause

cheerful,gloomy,irritable,listless,depressed,buoyantInterpersonalstance:affectivestancetakentowardanotherpersoninaspecificinteraction,coloringtheinterpersonalexchange

distant,cold,warm,supportive,contemptuous

Attitudes:relativelyenduring,affectivelycoloredbeliefs,preferencespredispositionstowardsobjectsorpersons

liking,loving,hating,valuing,desiringPersonalitytraits:emotionallyladen,stablepersonalitydispositionsandbehaviortendencies,typicalforaperson

nervous,anxious,reckless,morose,hostile,envious,jealous

Twofamiliesoftheoriesofemotion

• Atomicbasicemotions• Afinitelistof6or8,fromwhichothersaregenerated

• Dimensionsofemotion• Valence(positivenegative)• Arousal(strong,weak)• Control

Ekman’s6basicemotions:Surprise,happiness,anger,fear,disgust,sadness

Valence/ArousalDimensions

Higharousal,lowpleasure Higharousal,highpleasureanger excitement

Lowarousal,lowpleasureLowarousal,highpleasuresadness relaxation

arou

sal

valence

Atomicunitsvs.Dimensions

Distinctive• Emotionsareunits.• Limitednumberofbasicemotions.• Basicemotionsareinnateanduniversal

Dimensional• Emotionsaredimensions.• Limited#oflabelsbutunlimitednumberofemotions.

• Emotionsareculturallylearned.

AdaptedfromJuliaBraverman

Oneemotionlexiconfromeachparadigm!

1. 8basicemotions:• NRCWord-EmotionAssociationLexicon(MohammadandTurney 2011)

2. Dimensionsofvalence/arousal/dominance• Warriner,A.B., Kuperman,V.,andBrysbaert,M.(2013)

• BothbuiltusingAmazonMechanicalTurk

Plutchick’s wheelofemotion

• 8basicemotions• infouropposingpairs:

• joy–sadness• anger–fear• trust–disgust• anticipation–surprise

NRCWord-EmotionAssociationLexiconMohammadandTurney 2011

• 10,000wordschosenmainlyfromearlierlexicons• LabeledbyAmazonMechanicalTurk• 5Turkers perhit• GiveTurkers anideaoftherelevantsenseoftheword• Result:

amazingly anger 0amazingly anticipation 0amazingly disgust 0amazingly fear 0amazingly joy 1amazingly sadness 0amazingly surprise 1amazingly trust 0amazingly negative 0amazingly positive 1

TheAMTHit

Lexiconofvalence,arousal,anddominance

• Warriner,A.B., Kuperman,V.,andBrysbaert,M.(2013). Normsofvalence,arousal,anddominancefor13,915Englishlemmas. BehaviorResearchMethods45,1191-1207.

• Supplementarydata: Thisworkislicensedundera CreativeCommonsAttribution-NonCommercial-NoDerivs3.0UnportedLicense.

• Ratingsfor14,000wordsforemotionaldimensions:• valence (thepleasantnessofthestimulus)• arousal (theintensityofemotionprovokedbythestimulus)• dominance (thedegreeofcontrolexertedbythestimulus)

Lexiconofvalence,arousal,anddominance• valence (thepleasantnessofthestimulus)

9:happy,pleased,satisfied,contented,hopeful1:unhappy,annoyed,unsatisfied,melancholic,despaired,orbored

• arousal (theintensityofemotionprovokedbythestimulus)9:stimulated,excited,frenzied,jittery,wide-awake,oraroused1:relaxed,calm,sluggish,dull,sleepy,orunaroused;

• dominance (thedegreeofcontrolexertedbythestimulus)9:incontrol,influential,important,dominant,autonomous,orcontrolling1:controlled,influenced,cared-for,awed,submissive,orguided

• AgainproducedbyAMT

Lexiconofvalence,arousal,anddominance:Examples

Valence Arousal Dominancevacation 8.53 rampage 7.56 self 7.74happy 8.47 tornado 7.45 incredible 7.74whistle 5.7 zucchini 4.18 skillet 5.33conscious 5.53 dressy 4.15 concur 5.29torture 1.4 dull 1.67 earthquake 2.14

Lexiconsfordetectingdocumentaffect:Simplestunsupervisedmethod

• Sentiment:• Sumtheweightsofeachpositivewordinthedocument• Sumtheweightsofeachnegativewordinthedocument• Choosewhichevervalue(positiveornegative)hashighersum

• Emotion:• Dothesameforeachemotionlexicon

Lexiconsfordetectingdocumentaffect:Simplestsupervisedmethod

• Buildaclassifier• Predictsentiment(oremotion,orpersonality)givenfeatures• Use“countsoflexiconcategories”asafeatures• Samplefeatures:

• LIWCcategory“cognition”hadcountof7• NRCEmotioncategory“anticipation”hadcountof2

• Baseline• Insteadusecountsofall thewordsandbigramsinthetrainingset• Thisishardtobeat• Butonlyworksifthetrainingandtestsetsareverysimilar

Personality

Scherer’stypologyofaffectivestatesEmotion:relativelybriefepisodeofsynchronizedresponseofallormostorganismicsubsystemsinresponsetotheevaluationofaneventasbeingofmajorsignificance

angry,sad,joyful,fearful,ashamed,proud,desperateMood:diffuseaffectstate…changeinsubjectivefeeling,oflowintensitybutrelativelylongduration,oftenwithoutapparentcause

cheerful,gloomy,irritable,listless,depressed,buoyantInterpersonalstance:affectivestancetakentowardanotherpersoninaspecificinteraction,coloringtheinterpersonalexchange

distant,cold,warm,supportive,contemptuous

Attitudes:relativelyenduring,affectivelycoloredbeliefs,preferencespredispositionstowardsobjectsorpersons

liking,loving,hating,valuing,desiringPersonalitytraits:emotionallyladen,stablepersonalitydispositionsandbehaviortendencies,typicalforaperson

nervous,anxious,reckless,morose,hostile,envious,jealous

Personality

• Theinternalstructuresandpropensitiesthatexplainaperson’scharacteristicpatternsofthought,emotion,andbehavior.

• Personalitycaptureswhatpeoplearelike.

McGraw-Hill/IrwinChapter9

90

TheBigFiveDimensionsofPersonality

Extraversionvs.Introversionsociable,assertive,playfulvs.aloof,reserved,shy

Emotionalstabilityvs.Neuroticismcalm,unemotionalvs.insecure,anxious

Agreeablenessvs.Disagreeablefriendly,cooperativevs.antagonistic,faultfinding

Conscientiousnessvs.Unconscientiousself-disciplined,organised vs.inefficient,careless

Opennesstoexperienceintellectual,insightfulvs.shallow,unimaginative

BigFivePersonality:Agreeableness

warm,kind,cooperative,sympathetic,helpful,andcourteous.• Strongdesiretoobtainacceptanceinpersonalrelationshipsasameansofexpressingpersonality.

• Agreeablepeoplefocuson“gettingalong,”notnecessarily“gettingahead.”

McGraw-Hill/IrwinChapter9

BigFivePersonality:Extraversion

talkative,sociable,passionate,assertive,bold,anddominant• Easiesttojudgeimmediatelyonfirstmeeting• Prioritizedesiretoobtainpowerandinfluencewithinasocialstructureasameansofexpressingpersonality.

• Highinpositiveaffectivity— atendencytoexperiencepleasant,engagingmoodssuchasenthusiasm,excitement,andelation.

McGraw-Hill/IrwinChapter9

BigFivePersonality:Neuroticism• experienceunpleasantmoods:hostility,nervousness,andannoyance.• morelikelytoappraiseday-to-daysituationsasstressful.• lesslikelytobelievetheycancopewiththestressorsthattheyexperience.• relatedtolocusofcontrol (attributecausesofeventstothemselvesortotheexternalenvironment)

• Neurotics:externallocusofcontrol:believethattheeventsthatoccuraroundthemaredrivenbyluck,chance,orfate.

• lessneuroticpeopleholdinternallocusofcontrol:believethattheirownbehaviordictatesevents.

McGraw-Hill/IrwinChapter9

ExternalandInternalLocusofControl

McGraw-Hill/IrwinChapter9

BigFivePersonality:OpennesstoExperience

curious,imaginative,creative,complex,sophisticated• Alsocalled“Inquisitiveness”or“Intellectualness”• highlevelsofcreativity,thecapacitytogeneratenovelandusefulideasandsolutions.

• Highlyopenindividualsaremorelikelytomigrateintoartisticandscientificfields.

McGraw-Hill/IrwinChapter9

ChangesinBigFiveDimensionsOvertheLifeSpan

McGraw-Hill/IrwinChapter9

Aside:DoAnimalsHavePersonalities?

• Gosling(1998)studiedspottedhyenas.• 4humanobserversrated44personalitytraitsofhyenas• RanPCAontheratings• Fivedimensions:Assertiveness,Excitability,Human-DirectedAgreeableness,Sociability,andCuriosity

• Relatedto3humandimensions:neuroticism(excitability),openness(curiosity),agreeableness(sociability+agree)

Varioustextcorporalabeledforpersonalityofauthor

Pennebaker,JamesW.,andLauraA.King.1999."Linguisticstyles:languageuseasanindividualdifference."Journalofpersonalityandsocialpsychology 77,no.6.

• 2,479essaysfrompsychologystudents(1.9millionwords),“writewhatevercomesintoyourmind”for20minutes

Mehl,MatthiasR,SDGosling,JWPennebaker.2006.Personalityinitsnaturalhabitat:manifestationsandimplicitfolktheoriesofpersonalityindailylife.Journalofpersonalityandsocialpsychology90(5),862

• SpeechfromElectronicallyActivatedRecorder(EAR)• Randomsnippetsofconversationrecorded,transcribed• 96participants,totalof97,468wordsand15,269utterances

Schwartz,H.Andrew,JohannesC.Eichstaedt,MargaretL.Kern,LukaszDziurzynski,StephanieM.Ramones,Megha Agrawal,AchalShahetal.2013."Personality,gender,andageinthelanguageofsocialmedia:Theopen-vocabularyapproach."PloS one 8,no.9

• Facebook• 75,000volunteers• 309millionwords• Alltookapersonalitytest

Ears(speech)corpus(Mehl etal.)

Essayscorpus(Pennebaker andKing)

Classifiers

• Mairesse,François,MarilynA.Walker,MatthiasR.Mehl,andRogerK.Moore."Usinglinguisticcuesfortheautomaticrecognitionofpersonalityinconversationandtext."Journalofartificialintelligenceresearch(2007):457-500.

• Variousclassifiers,lexicon-basedandprosodicfeatures

• Schwartz,H.Andrew,JohannesC.Eichstaedt,MargaretL.Kern,LukaszDziurzynski,StephanieM.Ramones,Megha Agrawal,Achal Shahetal.2013."Personality,gender,andageinthelanguageofsocialmedia:Theopen-vocabularyapproach."PloS one 8,no.

• regressionandSVM,lexicon-basedandall-words

SampleLIWCFeaturesLIWC(LinguisticInquiryandWordCount)Pennebaker,J.W.,Booth,R.J.,&Francis,M.E.(2007).LinguisticInquiryandWordCount:LIWC2007.Austin,TX

Facebookstudy,Learnedwords,ExtraversionversusIntroversion

Facebookstudy,LearnedwordsNeuroticismversusEmotionalStability

top related