cs 6120/cs4120: natural language processing · 2017-11-20 · •opinion mining •sentiment mining...

104
CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang

Upload: leque

Post on 10-Jul-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

CS6120/CS4120:NaturalLanguageProcessing

Instructor:Prof.LuWangCollegeofComputerandInformationScience

NortheasternUniversityWebpage:www.ccs.neu.edu/home/luwang

Page 2: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Presentationandreport• ProblemDescription(10point)Whatisthetask?SysteminputandoutputExampleswillbehelpful

• Reference/Relatedwork(20points)Putyourworkincontext:whathasbeendonebefore?Youneedtohavereference!What’snewinyourwork?

• Methodology:Whatyouhavedone(30points)PreprocessingofthedataWhatareyourdata?Featuresused?Whatareeffective,andwhatarenot?Whatmethodsdoyouexperimentwith?Andwhydoyouthinkthey’rereasonableandsuitableforthetask?

• Experiments(40points)Datasetssize,train/test/developmentEvaluationmetrics:whatareusedandaretheypropertocalibratesystemperformance?Baselines:whatarethey?Results,tables,figures,etc

Page 3: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

SentimentAnalysis

Page 4: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Positiveornegativemoviereview?

• unbelievablydisappointing• Fullofzanycharactersandrichlyappliedsatire,andsomegreatplottwists

• thisisthegreatestscrewballcomedyeverfilmed• Itwaspathetic.Theworstpartaboutitwastheboxingscenes.

Page 5: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

GoogleProductSearch

• a

Page 6: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

BingShopping

• a

Page 7: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

TwittersentimentversusGallupPollofConsumerConfidence

BrendanO'Connor,Ramnath Balasubramanyan,BryanR.Routledge,andNoahA.Smith.2010.FromTweetstoPolls:LinkingTextSentimenttoPublicOpinionTimeSeries.InICWSM-2010

Page 8: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Twittersentiment:

JohanBollen,Huina Mao,Xiaojun Zeng.2011.Twittermoodpredictsthestockmarket,JournalofComputationalScience2:1,1-8.10.1016/j.jocs.2010.12.007.

Page 9: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

TargetSentimentonTwitter

• TwitterSentimentApp• AlecGo,Richa Bhayani,LeiHuang.2009.TwitterSentimentClassificationusingDistantSupervision

Page 10: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Sentimentanalysishasmanyothernames

•Opinionextraction•Opinionmining•Sentimentmining•Subjectivityanalysis

Page 11: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Whysentimentanalysis?

•Movie:isthisreviewpositiveornegative?•Products:whatdopeoplethinkaboutthenewiPhone?•Publicsentiment:howisconsumerconfidence?Isdespairincreasing?

•Politics:whatdopeoplethinkaboutthiscandidateorissue?•Prediction:predictelectionoutcomesormarkettrendsfromsentiment

Page 12: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

SchererTypologyofAffectiveStates

• Emotion:brieforganicallysynchronized…evaluationofamajorevent• angry,sad,joyful,fearful,ashamed,proud,elated

• Mood:diffusenon-causedlow-intensitylong-durationchangeinsubjectivefeeling• cheerful,gloomy,irritable,listless,depressed,buoyant

• Interpersonalstances:affectivestancetowardanotherpersoninaspecificinteraction• friendly,flirtatious,distant,cold,warm,supportive,contemptuous

• Attitudes:enduring,affectivelycoloredbeliefs,dispositionstowardsobjectsorpersons• liking,loving,hating,valuing,desiring

• Personalitytraits:stablepersonalitydispositionsandtypicalbehaviortendencies• nervous,anxious,reckless,morose,hostile,jealous

Page 13: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

SchererTypologyofAffectiveStates

• Emotion:brieforganicallysynchronized…evaluationofamajorevent• angry,sad,joyful,fearful,ashamed,proud,elated

• Mood:diffusenon-causedlow-intensitylong-durationchangeinsubjectivefeeling• cheerful,gloomy,irritable,listless,depressed,buoyant

• Interpersonalstances:affectivestancetowardanotherpersoninaspecificinteraction• friendly,flirtatious,distant,cold,warm,supportive,contemptuous

• Attitudes:enduring,affectivelycoloredbeliefs,dispositionstowardsobjectsorpersons• liking,loving,hating,valuing,desiring

• Personalitytraits:stablepersonalitydispositionsandtypicalbehaviortendencies• nervous,anxious,reckless,morose,hostile,jealous

Page 14: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

SentimentAnalysis

• Sentimentanalysisisthedetectionofattitudes“enduring,affectivelycoloredbeliefs,dispositionstowardsobjectsorpersons”1. Holder(source)ofattitude2. Target(aspect)ofattitude3. Typeofattitude

• Fromasetoftypes• Like,love,hate,value,desire, etc.

• Or(morecommonly)simpleweightedpolarity:• positive,negative,neutral,togetherwithstrength

4. Text containingtheattitude• Sentence orentiredocument

Page 15: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

SentimentAnalysis

•Simplesttask:• Istheattitudeofthistextpositiveornegative?

•Morecomplex:•Ranktheattitudeofthistextfrom1to5

•Advanced:•Detectthetarget,source,orcomplexattitudetypes

Page 16: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

SentimentAnalysis

•Simplesttask:• Istheattitudeofthistextpositiveornegative?

•Morecomplex:•Ranktheattitudeofthistextfrom1to5

•Advanced:•Detectthetarget,source,orcomplexattitudetypes

Page 17: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Sentiment Classification in Movie Reviews

• Polaritydetection:• IsanIMDBmoviereviewpositiveornegative?

• Data:PolarityData2.0:• http://www.cs.cornell.edu/people/pabo/movie-review-data

BoPang,LillianLee,andShivakumar Vaithyanathan.2002.Thumbsup?SentimentClassificationusingMachineLearningTechniques.EMNLP-2002,79—86.BoPangandLillianLee.2004.ASentimentalEducation:SentimentAnalysisUsingSubjectivitySummarizationBasedonMinimumCuts.ACL,271-278

Page 18: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

IMDBdatainthePangandLeedatabase

when_starwars_cameoutsometwentyyearsago,theimageoftravelingthroughoutthestarshasbecomeacommonplaceimage.[…]whenhan sologoeslightspeed,thestarschangetobrightlines,goingtowardstheviewerinlinesthatconvergeataninvisiblepoint.cool._october sky_offersamuchsimplerimage–thatofasinglewhitedot,travelinghorizontallyacrossthenightsky.[...]

“snakeeyes”isthemostaggravatingkindofmovie:thekindthatshowssomuchpotentialthenbecomesunbelievablydisappointing.it’snotjustbecausethisisabriandepalma film,andsincehe’sagreatdirectorandonewho’sfilmsarealwaysgreetedwithatleastsomefanfare.andit’snotevenbecausethiswasafilmstarringnicolas cageandsincehegivesabrauvara performance,thisfilmishardlyworthhistalents.

✓ ✗

Page 19: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

BaselineAlgorithm(adaptedfromPangandLee)•Tokenization•FeatureExtraction•Classificationusingdifferentclassifiers

• NaïveBayes• MaxEnt• SVM

Page 20: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

SentimentTokenizationIssues

• DealwithHTMLandXMLmarkup• Twittermark-up(names,hashtags)• Capitalization(preserveforwordsinallcaps)

• Phonenumbers,dates• Emoticons• Usefulcode:

• ChristopherPottssentimenttokenizer• BrendanO’Connortwittertokenizer

[<>]? # optional hat/brow[:;=8] # eyes[\-o\*\']? # optional nose[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth | #### reverse orientation[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth[\-o\*\']? # optional nose[:;=8] # eyes[<>]? # optional hat/brow

Pottsemoticons

Page 21: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

ExtractingFeaturesforSentimentClassification

• Howtohandlenegation• I didn’t like this movie

vs• I really like this movie

• Whichwordstouse?• Onlyadjectives• Allwords

• Allwordsturnsouttoworkbetter,atleastonthisdata

Page 22: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Negation

AddNOT_toeverywordbetweennegationandfollowingpunctuation:

didn’t like this movie , but I

didn’t NOT_like NOT_this NOT_movie but I

Das,Sanjiv andMikeChen.2001.Yahoo!forAmazon:Extractingmarketsentimentfromstockmessageboards.InProceedingsoftheAsiaPacificFinanceAssociationAnnualConference(APFA).BoPang,LillianLee,andShivakumar Vaithyanathan.2002.Thumbsup?SentimentClassificationusingMachineLearningTechniques.EMNLP-2002,79—86.

Page 23: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Reminder:Naïve Bayes

P̂(w | c) = count(w,c)+1count(c)+ V

cNB = argmaxc j∈C

P(cj ) P(wi | cj )i∈positions∏

Page 24: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Binarized (Booleanfeature)MultinomialNaïve Bayes

• Intuition:• Forsentiment(andprobablyforothertextclassificationdomains)• Wordoccurrencemaymattermorethanwordfrequency

• Theoccurrenceofthewordfantastic tellsusalot• Thefactthatitoccurs5timesmaynottellusmuchmore.

• BooleanMultinomialNaïve Bayes• Clipsallthewordcountsineachdocumentat1

Page 25: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

BooleanMultinomialNaïveBayes:Learning

• CalculateP(cj) terms• Foreachcj inC do

docsj¬ alldocswithclass=cj

P(cj )←| docsj |

| total # documents|

P(wk | cj )←nk +α

n+α |Vocabulary |

• Textj¬ singledoccontainingalldocsj• For eachwordwk inVocabulary

nk¬ #ofoccurrencesofwk inTextj

• Fromtrainingcorpus,extractVocabulary• CalculateP(wk | cj) terms

• Removeduplicatesineachdoc:• Foreachwordtypewindocj• Retainonlyasingleinstanceofw

Page 26: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

BooleanMultinomialNaïve Bayesonatestdocumentd

• Firstremoveallduplicatewordsfromd• ThencomputeNBusingthesameequation:

cNB = argmaxc j∈C

P(cj ) P(wi | cj )i∈positions∏

Page 27: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Normalvs.BooleanMultinomialNBNormal Doc Words ClassTraining 1 Chinese BeijingChinese c

2 ChineseChineseShanghai c3 ChineseMacao c4 TokyoJapanChinese j

Test 5 ChineseChineseChineseTokyo Japan ?

Boolean Doc Words ClassTraining 1 Chinese Beijing c

2 ChineseShanghai c3 ChineseMacao c4 TokyoJapanChinese j

Test 5 ChineseTokyo Japan ?

Page 28: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Binarized (Booleanfeature)MultinomialNaïve Bayes

•Binaryseemstoworkbetterthanfullwordcounts•Otherpossibility:log(freq(w))

B.Pang,L.Lee,andS.Vaithyanathan.2002.Thumbsup?SentimentClassificationusingMachineLearningTechniques.EMNLP-2002,79—86.V.Metsis,I.Androutsopoulos,G.Paliouras.2006.SpamFilteringwithNaiveBayes– WhichNaiveBayes?CEAS2006- ThirdConferenceonEmailandAnti-Spam.K.-M.Schneider.2004.OnwordfrequencyinformationandnegativeevidenceinNaiveBayestextclassification.ICANLP,474-485.JDRennie,LShih,JTeevan.2003.Tacklingthepoorassumptionsofnaivebayes textclassifiers.ICML2003

Page 29: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Cross-Validation

• Breakupdatainto5 folds• (Equalpositiveandnegativeinsideeachfold?)

• Foreachfold• Choosethefoldasatemporarytestset

• Trainon4folds,computeperformanceonthetestfold

• Reportaverageperformanceofthe4 runs

TrainingTest

Test

Test

Test

Test

Training

Training Training

Training

Training

Iteration

1

2

3

4

5

Page 30: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

OtherissuesinClassification

• MaxEnt andSVMtendtodobetterthanNaïve Bayes

Page 31: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Problems:Whatmakesreviewshardtoclassify?

•Subtlety:• PerfumereviewinPerfumes:theGuide:

• “Ifyouarereadingthisbecauseitisyourdarlingfragrance,pleasewearitathomeexclusively,andtapethewindowsshut.”

• DorothyParkeronKatherineHepburn• “SherunsthegamutofemotionsfromAtoB”

Page 32: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

ThwartedExpectationsandOrderingEffects

• “Thisfilmshouldbebrilliant.Itsoundslikeagreatplot,theactorsarefirstgrade,andthesupportingcastisgoodaswell,andStalloneisattemptingtodeliveragoodperformance.However,itcan’tholdup.”

•WellasusualKeanuReevesisnothingspecial,butsurprisingly,theverytalentedLaurenceFishbourne isnotsogoodeither,Iwassurprised.

Page 33: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

SentimentLexicons

Page 34: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

TheGeneralInquirer

• Homepage:http://www.wjh.harvard.edu/~inquirer• ListofCategories:http://www.wjh.harvard.edu/~inquirer/homecat.htm

• Spreadsheet:http://www.wjh.harvard.edu/~inquirer/inquirerbasic.xls• Categories:

• Positiv (1915words)andNegativ (2291words)• Strongvs Weak,Activevs Passive,OverstatedversusUnderstated• Pleasure,Pain,Virtue,Vice,Motivation,CognitiveOrientation,etc

• FreeforResearchUse

PhilipJ.Stone,DexterCDunphy,MarshallS.Smith,DanielM.Ogilvie.1966.TheGeneralInquirer:AComputerApproachtoContentAnalysis.MITPress

Page 35: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

LIWC(LinguisticInquiryandWordCount)Pennebaker,J.W.,Booth,R.J.,&Francis,M.E.(2007).LinguisticInquiryandWordCount:LIWC2007.Austin,TX

• Homepage:http://www.liwc.net/• 2300words,>70classes• AffectiveProcesses

• negativeemotion(bad,weird,hate,problem,tough)• positiveemotion(love,nice,sweet)

• CognitiveProcesses• Tentative(maybe,perhaps,guess),Inhibition(block,constraint)

• Pronouns,Negation(no,never),Quantifiers(few,many)• Notfreethough!

Page 36: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

MPQASubjectivityCuesLexicon

• Homepage:http://www.cs.pitt.edu/mpqa/subj_lexicon.html• 6885wordsfrom8221lemmas

• 2718positive• 4912negative

• Eachwordannotatedforintensity(strong,weak)• GNUGPL

Theresa Wilson,Janyce Wiebe,andPaulHoffmann(2005).Recognizing Contextual Polarity inPhrase-LevelSentiment Analysis.Proc.ofHLT-EMNLP-2005.

Riloff andWiebe (2003).Learningextractionpatternsforsubjectiveexpressions.EMNLP-2003.

Page 37: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

BingLiuOpinionLexicon

• BingLiu'sPageonOpinionMining• http://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar

•6786words• 2006positive• 4783negative

Minqing HuandBingLiu.MiningandSummarizingCustomerReviews.ACMSIGKDD-2004.

Page 38: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

SentiWordNetStefanoBaccianella,AndreaEsuli,andFabrizioSebastiani.2010SENTIWORDNET3.0:AnEnhanced Lexical ResourceforSentiment AnalysisandOpinionMining.LREC-2010

• Homepage:http://sentiwordnet.isti.cnr.it/• AllWordNet synsets automaticallyannotatedfordegreesofpositivity,

negativity,andneutrality/objectiveness• [estimable(J,3)]“maybecomputedorestimated”

Pos 0 Neg 0 Obj 1 • [estimable(J,1)]“deservingofrespectorhighregard”

Pos .75 Neg 0 Obj .25

Page 39: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Disagreementsbetweenpolaritylexicons

OpinionLexicon

GeneralInquirer

SentiWordNet LIWC

MPQA 33/5402 (0.6%) 49/2867(2%) 1127/4214(27%) 12/363(3%)

OpinionLexicon 32/2411 (1%) 1004/3994 (25%) 9/403(2%)

GeneralInquirer 520/2306(23%) 1/204 (0.5%)

SentiWordNet 174/694(25%)

LIWC

ChristopherPotts,SentimentTutorial,2011

Page 40: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

AnalyzingthepolarityofeachwordinIMDB

• Howlikelyiseachwordtoappearineachsentimentclass?• Count(“bad”)in1-star,2-star,3-star,etc.• Butcan’tuserawcounts:• Instead,likelihood:

• Makethemcomparablebetweenwords• Scaledlikelihood:

Potts,Christopher.2011.Onthenegativityofnegation.SALT20,636-659.

P(w | c) = f (w,c)f (w,c)

w∈c∑

P(w | c)P(w)

Page 41: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

AnalyzingthepolarityofeachwordinIMDB

●●

●●

●●

●●

POS good (883,417 tokens)

1 2 3 4 5 6 7 8 9 10

0.080.10.12

● ● ● ● ●●

amazing (103,509 tokens)

1 2 3 4 5 6 7 8 9 10

0.05

0.17

0.28

●●

●●

great (648,110 tokens)

1 2 3 4 5 6 7 8 9 10

0.05

0.11

0.17

● ● ● ●●

awesome (47,142 tokens)

1 2 3 4 5 6 7 8 9 10

0.05

0.16

0.27

Pr(c|w)

Rating

● ● ● ●

●● ●

NEG good (20,447 tokens)

1 2 3 4 5 6 7 8 9 10

0.03

0.1

0.16● ●

●●

●● ● ●

depress(ed/ing) (18,498 tokens)

1 2 3 4 5 6 7 8 9 10

0.080.110.13

●● ●

bad (368,273 tokens)

1 2 3 4 5 6 7 8 9 10

0.04

0.12

0.21

●● ● ●

terrible (55,492 tokens)

1 2 3 4 5 6 7 8 9 10

0.03

0.16

0.28

Pr(c|w)

Rating

Scaledlikelihoo

dP(w|c)/P(w)

Scaledlikelihoo

dP(w|c)/P(w)

Potts,Christopher.2011.Onthenegativityofnegation.SALT20,636-659.

Page 42: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Othersentimentfeature:Logicalnegation

• Islogicalnegation(no,not)associatedwithnegativesentiment?

•Pottsexperiment:• Countnegation(not,n’t,no,never)inonlinereviews• Regressagainstthereviewrating

Potts,Christopher.2011.Onthenegativityofnegation.SALT20,636-659.

Page 43: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Potts2011Results:Morenegationinnegativesentiment

a

Scaledlikelihoo

dP(w|c)/P(w)

Page 44: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

LearningSentimentLexicons

Page 45: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Semi-supervisedlearningoflexicons

•Useasmallamountofinformation• Afewlabeledexamples• Afewhand-builtpatterns

•Tobootstrapalexicon

Page 46: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Hatzivassiloglou andMcKeown intuitionforidentifyingwordpolarity

•Adjectivesconjoinedby“and”havesamepolarity• Fairand legitimate,corruptand brutal• *fairand brutal,*corruptand legitimate

•Adjectivesconjoinedby“but”donot• fairbutbrutal

Vasileios Hatzivassiloglou andKathleenR.McKeown.1997.PredictingtheSemanticOrientationofAdjectives.ACL,174–181

Page 47: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Hatzivassiloglou &McKeown 1997Step1• Labelseedsetof1336adjectives(all>20in21millionwordWSJcorpus)

• 657positive• adequatecentralcleverfamousintelligentremarkablereputedsensitiveslenderthriving…

• 679negative• contagiousdrunkenignorantlankylistlessprimitivestridenttroublesomeunresolvedunsuspecting…

Page 48: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Hatzivassiloglou &McKeown 1997Step2

•Expandseedsettoconjoinedadjectives

nice, helpful

nice, classy

Page 49: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Hatzivassiloglou &McKeown 1997Step3• Supervisedclassifierassigns“polaritysimilarity”toeachwordpair,resultingingraph:

classy

nice

helpful

fair

brutal

irrationalcorrupt

Page 50: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Hatzivassiloglou &McKeown 1997Step4• Clusteringforpartitioningthegraphintotwo

classy

nice

helpful

fair

brutal

irrationalcorrupt

+ -

Page 51: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Outputpolaritylexicon

• Positive• bolddecisivedisturbinggenerousgoodhonestimportantlargematurepatientpeacefulpositiveproudsoundstimulatingstraightforwardstrangetalentedvigorouswitty…

• Negative• ambiguouscautiouscynicalevasiveharmfulhypocriticalinefficientinsecureirrationalirresponsibleminoroutspokenpleasantrecklessriskyselfishtediousunsupportedvulnerablewasteful…

Page 52: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Outputpolaritylexicon

• Positive• bolddecisivedisturbing generousgoodhonestimportantlargematurepatientpeacefulpositiveproudsoundstimulatingstraightforwardstrangetalentedvigorouswitty…

• Negative• ambiguouscautious cynicalevasiveharmfulhypocriticalinefficientinsecureirrationalirresponsibleminoroutspoken pleasant recklessriskyselfishtediousunsupportedvulnerablewasteful…

Page 53: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Turney Algorithm

1. Extractaphrasallexiconfromreviews2. Learnpolarityofeachphrase3. Rateareviewbytheaveragepolarityofitsphrases

Turney (2002):ThumbsUporThumbsDown?SemanticOrientationAppliedtoUnsupervisedClassificationofReviews

Page 54: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Extracttwo-wordphraseswithadjectives

FirstWord SecondWord ThirdWord (notextracted)

JJ NNorNNS anythingRB, RBR,RBS JJ NotNNnorNNSJJ JJ NotNNorNNSNNorNNS JJ NorNNnor NNSRB,RBR,orRBS VB,VBD,VBN,VBG anything

Page 55: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Howtomeasurepolarityofaphrase?

• Positivephrasesco-occurmorewith“excellent”• Negativephrasesco-occurmorewith“poor”• Buthowtomeasureco-occurrence?

Page 56: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Pointwise MutualInformation

•Mutualinformationbetween2randomvariablesXandY

•Pointwise mutualinformation:• Howmuchmoredoeventsxandyco-occurthaniftheywereindependent?

I(X,Y ) = P(x, y)y∑

x∑ log2

P(x,y)P(x)P(y)

PMI(X,Y ) = log2P(x,y)P(x)P(y)

Page 57: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Pointwise MutualInformation

•Pointwise mutualinformation:• Howmuchmoredoeventsxandyco-occurthaniftheywereindependent?

•PMIbetweentwowords:• Howmuchmoredotwowordsco-occurthaniftheywereindependent?

PMI(word1,word2 ) = log2P(word1,word2)P(word1)P(word2)

PMI(X,Y ) = log2P(x,y)P(x)P(y)

Page 58: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

HowtoEstimatePointwise MutualInformation

•Querysearchengine(Altavista)•P(word)estimatedbyhits(word)/N•P(word1,word2)byhits(word1 NEAR word2)/N

• (MorecorrectlythebigramdenominatorshouldbekN,becausethereareatotalofNconsecutivebigrams(word1,word2),butkN bigramsthatarekwordsapart,butwejustuseNontherestofthisslideandthenext.)

PMI(word1,word2 ) = log2

1Nhits(word1 NEAR word2)

1Nhits(word1) 1

Nhits(word2)

Page 59: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Doesphraseappearmorewith“poor”or“excellent”?

Polarity(phrase) = PMI(phrase,"excellent")−PMI(phrase,"poor")

= log2hits(phrase NEAR "excellent")hits("poor")hits(phrase NEAR "poor")hits("excellent")!

"#

$

%&

= log2hits(phrase NEAR "excellent")

hits(phrase)hits("excellent")hits(phrase)hits("poor")

hits(phrase NEAR "poor")

= log2

1N hits(phrase NEAR "excellent")1N hits(phrase) 1

N hits("excellent")− log2

1N hits(phrase NEAR "poor")1N hits(phrase) 1

N hits("poor")

Page 60: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Phrasesfromathumbs-upreview

Phrase POStags Polarity

online service JJNN 2.8

onlineexperience JJNN 2.3

directdeposit JJNN 1.3

localbranch JJNN 0.42…

lowfees JJNNS 0.33

trueservice JJNN -0.73

other bank JJNN -0.85

inconveniently located JJNN -1.5

Average 0.32

Page 61: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Phrasesfromathumbs-downreview

Phrase POStags Polarity

directdeposits JJNNS 5.8

onlineweb JJNN 1.9

veryhandy RB JJ 1.4…

virtual monopoly JJNN -2.0

lesserevil RBRJJ -2.3

otherproblems JJNNS -2.8

low funds JJNNS -6.8

unethical practices JJNNS -8.5

Average -1.2

Page 62: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

ResultsofTurney algorithm

• 410reviewsfromEpinions• 170(41%)negative• 240(59%)positive

• Majorityclassbaseline:59%• Turney algorithm:74%

• Phrasesratherthanwords• Learnsdomain-specificinformation

Page 63: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

UsingWordNet tolearnpolarity

• WordNet:onlinethesaurus(coveredinlaterlecture).• Createpositive(“good”)andnegativeseed-words(“terrible”)• FindSynonymsandAntonyms

• PositiveSet:Addsynonymsofpositivewords(“well”)andantonymsofnegativewords

• NegativeSet:Addsynonymsofnegativewords(“awful”)andantonymsofpositivewords(”evil”)

• Repeat,followingchainsofsynonyms• Filter

S.M.KimandE.Hovy.2004.Determiningthesentimentofopinions.COLING2004M.HuandB.Liu.Miningandsummarizingcustomerreviews.InProceedingsofKDD,2004

Page 64: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

SummaryonLearningLexicons

•Advantages:• Canbedomain-specific• Canbemorerobust(morewords)

• Intuition• Startwithaseedsetofwords(‘good’,‘poor’)• Findotherwordsthathavesimilarpolarity:

• Using“and”and“but”• Usingwordsthatoccurnearbyinthesamedocument• UsingWordNet synonymsandantonyms

• Useseedsandsemi-supervisedlearningtoinducelexicons

Page 65: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

OtherSentimentTasks

Page 66: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

• Importantforfindingaspectsorattributes• Targetofsentiment

• The food was great but the service was awful

Page 67: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Findingaspect/attribute/targetofsentiment

• Frequentphrases+rules• Findallhighlyfrequentphrasesacrossreviews(“fish tacos”)• Filterbyruleslike“occursrightaftersentimentword”

• “…great fish tacos”meansfish tacos alikelyaspect

Casino casino,buffet,pool,resort,bedsChildren’s Barber haircut,job,experience,kidsGreekRestaurant food,wine,service,appetizer,lambDepartmentStore selection,department,sales,shop,clothing

M.HuandB.Liu.2004.Miningandsummarizingcustomerreviews.InProceedingsofKDD.S.Blair-Goldensohn,K.Hannan,R.McDonald,T.Neylon,G.Reis,andJ.Reynar.2008.BuildingaSentimentSummarizerforLocalServiceReviews.WWWWorkshop.

Page 68: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Findingaspect/attribute/targetofsentiment

• Theaspectnamemaynotbeinthesentence• Forrestaurants/hotels,aspectsarewell-understood• Supervisedclassification

• Hand-labelasmallcorpusofrestaurantreviewsentenceswithaspect• food,décor,service,value,NONE

• Trainaclassifiertoassignanaspecttoasentence• “Giventhissentence,istheaspectfood,décor,service,value,or NONE”

Page 69: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Puttingitalltogether:Findingsentimentforaspects

ReviewsFinalSummary

Sentences&Phrases

Sentences&Phrases

Sentences&Phrases

TextExtractor

SentimentClassifier

AspectExtractor

Aggregator

S.Blair-Goldensohn,K.Hannan,R.McDonald,T.Neylon,G.Reis,andJ.Reynar.2008.BuildingaSentimentSummarizerforLocalServiceReviews.WWWWorkshop

Page 70: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

ResultsofBlair-Goldensohn etal.method

Rooms (3/5stars,41comments)(+) Theroomwascleanandeverythingworkedfine– eventhewaterpressure...(+)Wewentbecauseofthefreeroomandwaspleasantlypleased...(-)…theworsthotelIhadeverstayedat...

Service (3/5stars,31comments)(+)Uponcheckingoutanothercouplewascheckingearlyduetoaproblem...(+)Everysinglehotelstaffmembertreatedusgreatandansweredevery...(-)ThefoodiscoldandtheservicegivesnewmeaningtoSLOW.

Dining (3/5stars,18comments)(+)ourfavoriteplacetostayinbiloxi.thefoodisgreatalsotheservice...(+)OfferoffreebuffetforjoiningthePlay

Page 71: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

SummaryonSentiment

•Generallymodeledasclassificationorregressiontask• predictabinaryorordinallabel

•Features:• Negationisimportant• Usingallwords(innaïvebayes)workswellforsometasks• Findingsubsetsofwordsmayhelpinothertasks

• Hand-builtpolaritylexicons• Useseedsandsemi-supervisedlearningtoinducelexicons

Page 72: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Emotions

Page 73: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Scherer’stypologyofaffectivestatesEmotion:relativelybriefepisodeofsynchronizedresponseofallormostorganismicsubsystemsinresponsetotheevaluationofaneventasbeingofmajorsignificance

angry,sad,joyful,fearful,ashamed,proud,desperateMood:diffuseaffectstate…changeinsubjectivefeeling,oflowintensitybutrelativelylongduration,oftenwithoutapparentcause

cheerful,gloomy,irritable,listless,depressed,buoyantInterpersonalstance:affectivestancetakentowardanotherpersoninaspecificinteraction,coloringtheinterpersonalexchange

distant,cold,warm,supportive,contemptuous

Attitudes:relativelyenduring,affectivelycoloredbeliefs,preferencespredispositionstowardsobjectsorpersons

liking,loving,hating,valuing,desiringPersonalitytraits:emotionallyladen,stablepersonalitydispositionsandbehaviortendencies,typicalforaperson

nervous,anxious,reckless,morose,hostile,envious,jealous

Page 74: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Twofamiliesoftheoriesofemotion

• Atomicbasicemotions• Afinitelistof6or8,fromwhichothersaregenerated

• Dimensionsofemotion• Valence(positivenegative)• Arousal(strong,weak)• Control

Page 75: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Ekman’s6basicemotions:Surprise,happiness,anger,fear,disgust,sadness

Page 76: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Valence/ArousalDimensions

Higharousal,lowpleasure Higharousal,highpleasureanger excitement

Lowarousal,lowpleasureLowarousal,highpleasuresadness relaxation

arou

sal

valence

Page 77: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Atomicunitsvs.Dimensions

Distinctive• Emotionsareunits.• Limitednumberofbasicemotions.• Basicemotionsareinnateanduniversal

Dimensional• Emotionsaredimensions.• Limited#oflabelsbutunlimitednumberofemotions.

• Emotionsareculturallylearned.

AdaptedfromJuliaBraverman

Page 78: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Oneemotionlexiconfromeachparadigm!

1. 8basicemotions:• NRCWord-EmotionAssociationLexicon(MohammadandTurney 2011)

2. Dimensionsofvalence/arousal/dominance• Warriner,A.B., Kuperman,V.,andBrysbaert,M.(2013)

• BothbuiltusingAmazonMechanicalTurk

Page 79: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Plutchick’s wheelofemotion

• 8basicemotions• infouropposingpairs:

• joy–sadness• anger–fear• trust–disgust• anticipation–surprise

Page 80: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

NRCWord-EmotionAssociationLexiconMohammadandTurney 2011

• 10,000wordschosenmainlyfromearlierlexicons• LabeledbyAmazonMechanicalTurk• 5Turkers perhit• GiveTurkers anideaoftherelevantsenseoftheword• Result:

amazingly anger 0amazingly anticipation 0amazingly disgust 0amazingly fear 0amazingly joy 1amazingly sadness 0amazingly surprise 1amazingly trust 0amazingly negative 0amazingly positive 1

Page 81: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

TheAMTHit

Page 82: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Lexiconofvalence,arousal,anddominance

• Warriner,A.B., Kuperman,V.,andBrysbaert,M.(2013). Normsofvalence,arousal,anddominancefor13,915Englishlemmas. BehaviorResearchMethods45,1191-1207.

• Supplementarydata: Thisworkislicensedundera CreativeCommonsAttribution-NonCommercial-NoDerivs3.0UnportedLicense.

• Ratingsfor14,000wordsforemotionaldimensions:• valence (thepleasantnessofthestimulus)• arousal (theintensityofemotionprovokedbythestimulus)• dominance (thedegreeofcontrolexertedbythestimulus)

Page 83: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Lexiconofvalence,arousal,anddominance• valence (thepleasantnessofthestimulus)

9:happy,pleased,satisfied,contented,hopeful1:unhappy,annoyed,unsatisfied,melancholic,despaired,orbored

• arousal (theintensityofemotionprovokedbythestimulus)9:stimulated,excited,frenzied,jittery,wide-awake,oraroused1:relaxed,calm,sluggish,dull,sleepy,orunaroused;

• dominance (thedegreeofcontrolexertedbythestimulus)9:incontrol,influential,important,dominant,autonomous,orcontrolling1:controlled,influenced,cared-for,awed,submissive,orguided

• AgainproducedbyAMT

Page 84: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Lexiconofvalence,arousal,anddominance:Examples

Valence Arousal Dominancevacation 8.53 rampage 7.56 self 7.74happy 8.47 tornado 7.45 incredible 7.74whistle 5.7 zucchini 4.18 skillet 5.33conscious 5.53 dressy 4.15 concur 5.29torture 1.4 dull 1.67 earthquake 2.14

Page 85: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Lexiconsfordetectingdocumentaffect:Simplestunsupervisedmethod

• Sentiment:• Sumtheweightsofeachpositivewordinthedocument• Sumtheweightsofeachnegativewordinthedocument• Choosewhichevervalue(positiveornegative)hashighersum

• Emotion:• Dothesameforeachemotionlexicon

Page 86: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Lexiconsfordetectingdocumentaffect:Simplestsupervisedmethod

• Buildaclassifier• Predictsentiment(oremotion,orpersonality)givenfeatures• Use“countsoflexiconcategories”asafeatures• Samplefeatures:

• LIWCcategory“cognition”hadcountof7• NRCEmotioncategory“anticipation”hadcountof2

• Baseline• Insteadusecountsofall thewordsandbigramsinthetrainingset• Thisishardtobeat• Butonlyworksifthetrainingandtestsetsareverysimilar

Page 87: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Personality

Page 88: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Scherer’stypologyofaffectivestatesEmotion:relativelybriefepisodeofsynchronizedresponseofallormostorganismicsubsystemsinresponsetotheevaluationofaneventasbeingofmajorsignificance

angry,sad,joyful,fearful,ashamed,proud,desperateMood:diffuseaffectstate…changeinsubjectivefeeling,oflowintensitybutrelativelylongduration,oftenwithoutapparentcause

cheerful,gloomy,irritable,listless,depressed,buoyantInterpersonalstance:affectivestancetakentowardanotherpersoninaspecificinteraction,coloringtheinterpersonalexchange

distant,cold,warm,supportive,contemptuous

Attitudes:relativelyenduring,affectivelycoloredbeliefs,preferencespredispositionstowardsobjectsorpersons

liking,loving,hating,valuing,desiringPersonalitytraits:emotionallyladen,stablepersonalitydispositionsandbehaviortendencies,typicalforaperson

nervous,anxious,reckless,morose,hostile,envious,jealous

Page 89: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Personality

• Theinternalstructuresandpropensitiesthatexplainaperson’scharacteristicpatternsofthought,emotion,andbehavior.

• Personalitycaptureswhatpeoplearelike.

McGraw-Hill/IrwinChapter9

Page 90: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

90

TheBigFiveDimensionsofPersonality

Extraversionvs.Introversionsociable,assertive,playfulvs.aloof,reserved,shy

Emotionalstabilityvs.Neuroticismcalm,unemotionalvs.insecure,anxious

Agreeablenessvs.Disagreeablefriendly,cooperativevs.antagonistic,faultfinding

Conscientiousnessvs.Unconscientiousself-disciplined,organised vs.inefficient,careless

Opennesstoexperienceintellectual,insightfulvs.shallow,unimaginative

Page 91: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

BigFivePersonality:Agreeableness

warm,kind,cooperative,sympathetic,helpful,andcourteous.• Strongdesiretoobtainacceptanceinpersonalrelationshipsasameansofexpressingpersonality.

• Agreeablepeoplefocuson“gettingalong,”notnecessarily“gettingahead.”

McGraw-Hill/IrwinChapter9

Page 92: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

BigFivePersonality:Extraversion

talkative,sociable,passionate,assertive,bold,anddominant• Easiesttojudgeimmediatelyonfirstmeeting• Prioritizedesiretoobtainpowerandinfluencewithinasocialstructureasameansofexpressingpersonality.

• Highinpositiveaffectivity— atendencytoexperiencepleasant,engagingmoodssuchasenthusiasm,excitement,andelation.

McGraw-Hill/IrwinChapter9

Page 93: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

BigFivePersonality:Neuroticism• experienceunpleasantmoods:hostility,nervousness,andannoyance.• morelikelytoappraiseday-to-daysituationsasstressful.• lesslikelytobelievetheycancopewiththestressorsthattheyexperience.• relatedtolocusofcontrol (attributecausesofeventstothemselvesortotheexternalenvironment)

• Neurotics:externallocusofcontrol:believethattheeventsthatoccuraroundthemaredrivenbyluck,chance,orfate.

• lessneuroticpeopleholdinternallocusofcontrol:believethattheirownbehaviordictatesevents.

McGraw-Hill/IrwinChapter9

Page 94: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

ExternalandInternalLocusofControl

McGraw-Hill/IrwinChapter9

Page 95: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

BigFivePersonality:OpennesstoExperience

curious,imaginative,creative,complex,sophisticated• Alsocalled“Inquisitiveness”or“Intellectualness”• highlevelsofcreativity,thecapacitytogeneratenovelandusefulideasandsolutions.

• Highlyopenindividualsaremorelikelytomigrateintoartisticandscientificfields.

McGraw-Hill/IrwinChapter9

Page 96: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

ChangesinBigFiveDimensionsOvertheLifeSpan

McGraw-Hill/IrwinChapter9

Page 97: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Aside:DoAnimalsHavePersonalities?

• Gosling(1998)studiedspottedhyenas.• 4humanobserversrated44personalitytraitsofhyenas• RanPCAontheratings• Fivedimensions:Assertiveness,Excitability,Human-DirectedAgreeableness,Sociability,andCuriosity

• Relatedto3humandimensions:neuroticism(excitability),openness(curiosity),agreeableness(sociability+agree)

Page 98: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Varioustextcorporalabeledforpersonalityofauthor

Pennebaker,JamesW.,andLauraA.King.1999."Linguisticstyles:languageuseasanindividualdifference."Journalofpersonalityandsocialpsychology 77,no.6.

• 2,479essaysfrompsychologystudents(1.9millionwords),“writewhatevercomesintoyourmind”for20minutes

Mehl,MatthiasR,SDGosling,JWPennebaker.2006.Personalityinitsnaturalhabitat:manifestationsandimplicitfolktheoriesofpersonalityindailylife.Journalofpersonalityandsocialpsychology90(5),862

• SpeechfromElectronicallyActivatedRecorder(EAR)• Randomsnippetsofconversationrecorded,transcribed• 96participants,totalof97,468wordsand15,269utterances

Schwartz,H.Andrew,JohannesC.Eichstaedt,MargaretL.Kern,LukaszDziurzynski,StephanieM.Ramones,Megha Agrawal,AchalShahetal.2013."Personality,gender,andageinthelanguageofsocialmedia:Theopen-vocabularyapproach."PloS one 8,no.9

• Facebook• 75,000volunteers• 309millionwords• Alltookapersonalitytest

Page 99: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Ears(speech)corpus(Mehl etal.)

Page 100: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Essayscorpus(Pennebaker andKing)

Page 101: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Classifiers

• Mairesse,François,MarilynA.Walker,MatthiasR.Mehl,andRogerK.Moore."Usinglinguisticcuesfortheautomaticrecognitionofpersonalityinconversationandtext."Journalofartificialintelligenceresearch(2007):457-500.

• Variousclassifiers,lexicon-basedandprosodicfeatures

• Schwartz,H.Andrew,JohannesC.Eichstaedt,MargaretL.Kern,LukaszDziurzynski,StephanieM.Ramones,Megha Agrawal,Achal Shahetal.2013."Personality,gender,andageinthelanguageofsocialmedia:Theopen-vocabularyapproach."PloS one 8,no.

• regressionandSVM,lexicon-basedandall-words

Page 102: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

SampleLIWCFeaturesLIWC(LinguisticInquiryandWordCount)Pennebaker,J.W.,Booth,R.J.,&Francis,M.E.(2007).LinguisticInquiryandWordCount:LIWC2007.Austin,TX

Page 103: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Facebookstudy,Learnedwords,ExtraversionversusIntroversion

Page 104: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining •Subjectivity analysis. ... •Sentiment analysis is the detection of attitudes

Facebookstudy,LearnedwordsNeuroticismversusEmotionalStability