sentiment analysis - stanford university...dan(jurafsky(howtoesmate, pointwise,mutual,informa;on,...

22
Sentiment Analysis Learning Sen*ment Lexicons

Upload: others

Post on 17-Jul-2020

12 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Sentiment Analysis

Learning  Sen*ment  Lexicons  

Page 2: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Dan  Jurafsky  

Semi-­‐supervised  learning  of  lexicons  

•  Use  a  small  amount  of  informa*on  •  A  few  labeled  examples  •  A  few  hand-­‐built  pa@erns  

•  To  bootstrap  a  lexicon  

2  

Page 3: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Dan  Jurafsky  

Hatzivassiloglou  and  McKeown  intui;on  for  iden;fying  word  polarity  

•  Adjec*ves  conjoined  by  “and”  have  same  polarity  •  Fair  and  legi*mate,  corrupt  and  brutal  •  *fair  and  brutal,  *corrupt  and  legi*mate  

•  Adjec*ves  conjoined  by  “but”  do  not  •  fair  but  brutal  

3  

Vasileios  Hatzivassiloglou  and  Kathleen  R.  McKeown.  1997.  Predic*ng  the  Seman*c  Orienta*on  of  Adjec*ves.  ACL,  174–181  

Page 4: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Dan  Jurafsky  

Hatzivassiloglou  &  McKeown  1997  Step  1  

•  Label  seed  set  of  1336  adjec*ves  (all  >20  in  21  million  word  WSJ  corpus)  •  657  posi*ve  •  adequate  central  clever  famous  intelligent  remarkable  reputed  sensi*ve  slender  thriving…  

•  679  nega*ve  •  contagious  drunken  ignorant  lanky  listless  primi*ve  strident  troublesome  unresolved  unsuspec*ng…  

4  

Page 5: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Dan  Jurafsky  

Hatzivassiloglou  &  McKeown  1997  Step  2  

•  Expand  seed  set  to  conjoined  adjec*ves  

5  

nice, helpful

nice, classy

Page 6: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Dan  Jurafsky  

Hatzivassiloglou  &  McKeown  1997  Step  3  

•  Supervised  classifier  assigns  “polarity  similarity”  to  each  word  pair,  resul*ng  in  graph:  

6  

classy

nice

helpful

fair

brutal

irrational corrupt

Page 7: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Dan  Jurafsky  

Hatzivassiloglou  &  McKeown  1997  Step  4  

•  Clustering  for  par**oning  the  graph  into  two  

7  

classy

nice

helpful

fair

brutal

irrational corrupt

+   -­‐  

Page 8: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Dan  Jurafsky  

Output  polarity  lexicon  

•  Posi*ve  •  bold  decisive  disturbing  generous  good  honest  important  large  mature  pa*ent  peaceful  posi*ve  proud  sound  s*mula*ng  straighhorward  strange  talented  vigorous  wi@y…  

•  Nega*ve  •  ambiguous  cau*ous  cynical  evasive  harmful  hypocri*cal  inefficient  insecure  irra*onal  irresponsible  minor  outspoken  pleasant  reckless  risky  selfish  tedious  unsupported  vulnerable  wasteful…  

8  

Page 9: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Dan  Jurafsky  

Output  polarity  lexicon  

•  Posi*ve  •  bold  decisive  disturbing  generous  good  honest  important  large  mature  pa*ent  peaceful  posi*ve  proud  sound  s*mula*ng  straighhorward  strange  talented  vigorous  wi@y…  

•  Nega*ve  •  ambiguous  cau;ous  cynical  evasive  harmful  hypocri*cal  inefficient  insecure  irra*onal  irresponsible  minor  outspoken  pleasant  reckless  risky  selfish  tedious  unsupported  vulnerable  wasteful…  

9  

Page 10: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Dan  Jurafsky  

Turney  Algorithm  

1.  Extract  a  phrasal  lexicon  from  reviews  2.  Learn  polarity  of  each  phrase  3.  Rate  a  review  by  the  average  polarity  of  its  phrases  

10  

Turney (2002): Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews  

Page 11: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Dan  Jurafsky  

Extract  two-­‐word  phrases  with  adjec;ves  

First  Word   Second  Word   Third  Word    (not  extracted)  

JJ   NN  or  NNS   anything  RB,  RBR,  RBS   JJ   Not  NN  nor  NNS  JJ   JJ   Not  NN  or  NNS  NN  or  NNS   JJ   Nor  NN  nor  NNS  RB,  RBR,  or  RBS   VB,  VBD,  VBN,  VBG   anything  11  

Page 12: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Dan  Jurafsky  

How  to  measure  polarity  of  a  phrase?  

•  Posi*ve  phrases  co-­‐occur  more  with  “excellent”  •  Nega*ve  phrases  co-­‐occur  more  with  “poor”  •  But  how  to  measure  co-­‐occurrence?  

12  

Page 13: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Dan  Jurafsky  

Pointwise  Mutual  Informa;on  

•  Mutual  informa;on  between  2  random  variables  X  and  Y      •  Pointwise  mutual  informa;on:    

•  How  much  more  do  events  x  and  y  co-­‐occur  than  if  they  were  independent?  

   

I(X,Y ) = P(x, y)y∑

x∑ log2

P(x,y)P(x)P(y)

PMI(X,Y ) = log2P(x,y)P(x)P(y)

Page 14: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Dan  Jurafsky  

Pointwise  Mutual  Informa;on  

•  Pointwise  mutual  informa;on:    •  How  much  more  do  events  x  and  y  co-­‐occur  than  if  they  were  independent?  

•  PMI  between  two  words:  •   How  much  more  do  two  words  co-­‐occur  than  if  they  were  independent?  

PMI(word1,word2 ) = log2P(word1,word2)P(word1)P(word2)

PMI(X,Y ) = log2P(x,y)P(x)P(y)

Page 15: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Dan  Jurafsky  

How  to  Es;mate  Pointwise  Mutual  Informa;on  

• Query  search  engine    (Altavista)  • P(word)  es*mated  by        hits(word)/N!• P(word1,word2)  by      hits(word1 NEAR word2)/N!•  (More  correctly  the  bigram  denominator  should  be  kN,  because  there  are  a  total  of  N  consecu*ve  bigrams  (word1,word2),  but  kN  bigrams  that  are  k  words  apart,  but  we  just  use  N  on  the  rest  of  this  slide  and  the  next.)  

 

  PMI(word1,word2 ) = log2

1Nhits(word1 NEAR word2)

1Nhits(word1) 1

Nhits(word2)

Page 16: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Dan  Jurafsky  

Does  phrase  appear  more  with  “poor”  or  “excellent”?  

16  

Polarity(phrase) = PMI(phrase,"excellent")−PMI(phrase,"poor")

= log2hits(phrase NEAR "excellent")hits("poor")hits(phrase NEAR "poor")hits("excellent")!

"#

$

%&

= log2hits(phrase NEAR "excellent")

hits(phrase)hits("excellent")hits(phrase)hits("poor")

hits(phrase NEAR "poor")

= log2

1N hits(phrase NEAR "excellent")1N hits(phrase) 1

N hits("excellent")− log2

1N hits(phrase NEAR "poor")1N hits(phrase) 1

N hits("poor")

Page 17: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Dan  Jurafsky  

Phrases  from  a  thumbs-­‐up  review  

17  

Phrase   POS  tags   Polarity  

online  service   JJ  NN   2.8!

online  experience   JJ  NN   2.3!

direct  deposit   JJ  NN   1.3!

local  branch   JJ  NN   0.42!…  

low  fees   JJ  NNS   0.33!

true  service   JJ  NN   -0.73!

other  bank   JJ  NN   -0.85!

inconveniently  located   JJ  NN   -1.5!

Average   0.32!

Page 18: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Dan  Jurafsky  

Phrases  from  a  thumbs-­‐down  review  

18  

Phrase   POS  tags   Polarity  

direct  deposits   JJ  NNS   5.8!

online  web   JJ  NN   1.9!

very  handy   RB  JJ   1.4!…  

virtual  monopoly   JJ  NN   -2.0!

lesser  evil   RBR  JJ   -2.3!

other  problems   JJ  NNS   -2.8!

low  funds   JJ  NNS   -6.8!

unethical  prac*ces   JJ  NNS   -8.5!

Average   -1.2!

Page 19: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Dan  Jurafsky  

Results  of  Turney  algorithm  

•  410  reviews  from  Epinions  •  170  (41%)  nega*ve  •  240  (59%)  posi*ve  

•  Majority  class  baseline:  59%  •  Turney  algorithm:  74%  

•  Phrases  rather  than  words  •  Learns  domain-­‐specific  informa*on  19  

Page 20: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Dan  Jurafsky  

Using  WordNet  to  learn  polarity  

•  WordNet:  online  thesaurus  (covered  in  later  lecture).  •  Create  posi*ve  (“good”)  and  nega*ve  seed-­‐words  (“terrible”)  •  Find  Synonyms  and  Antonyms  

•  Posi*ve  Set:    Add    synonyms  of  posi*ve  words  (“well”)  and  antonyms  of  nega*ve  words    

•  Nega*ve  Set:  Add  synonyms  of  nega*ve  words  (“awful”)    and  antonyms  of  posi*ve  words  (”evil”)  

•  Repeat,  following  chains  of  synonyms  •  Filter  20  

 S.M.  Kim  and  E.  Hovy.  2004.  Determining  the  sen*ment  of  opinions.  COLING  2004  M.  Hu  and  B.  Liu.  Mining  and  summarizing  customer  reviews.  In  Proceedings  of  KDD,  2004  

Page 21: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Dan  Jurafsky  

Summary  on  Learning  Lexicons  

•  Advantages:  •  Can  be  domain-­‐specific  •  Can  be  more  robust  (more  words)  

•  Intui*on  •  Start  with  a  seed  set  of  words  (‘good’,  ‘poor’)  •  Find  other  words  that  have  similar  polarity:  •  Using  “and”  and  “but”  •  Using  words  that  occur  nearby  in  the  same  document  •  Using  WordNet  synonyms  and  antonyms  

•  Use  seeds  and  semi-­‐supervised  learning  to  induce  lexicons  

Page 22: Sentiment Analysis - Stanford University...Dan(Jurafsky(HowtoEsmate, Pointwise,Mutual,Informa;on, • Query(search(engine(((Altavista) • P(word)(es*mated(by((((hits(word)/N!• P(word

Sentiment Analysis

Learning  Sen*ment  Lexicons