christopher potts - stanford university · table:disagreement levels for the sentiment lexicons....

82
Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs Supervised sentiment analysis Christopher Potts Stanford Linguistics CS 224U: Natural language understanding April 15 1 / 57

Upload: others

Post on 13-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Supervised sentiment analysis

Christopher Potts

Stanford Linguistics

CS 224U: Natural language understandingApril 15

1 / 57

Page 2: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Overview

1. Sentiment as a deep and important NLU problem2. General practical tips for sentiment analysis3. The Stanford Sentiment Treebank (SST)4. sst.py5. Methods: hyperparameters and classifier comparison6. Feature representation7. RNN classifiers8. Tree-structured networks

2 / 57

Page 3: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Associated materials

1. Codea. sst.pyb. sst_01_overview.ipynbc. sst_02_hand_build_features.ipynbd. sst_03_neural_networks.ipynb

2. Core reading: Socher et al. 2013

3. Auxiliary readings: Pang and Lee 2008; Goldberg 2015

3 / 57

Page 4: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Conceptual challengesWhich of the following sentences express sentiment? What istheir sentiment polarity (pos/neg), if any?

1. There was an earthquake in California.2. The team failed to complete the physical challenge. (We

win/lose!)3. They said it would be great.4. They said it would be great, and they were right.5. They said it would be great, and they were wrong.6. The party fat-cats are sipping their expensive imported

wines.7. Oh, you’re terrible!8. Here’s to ya, ya bastard!9. Of 2001, “Many consider the masterpiece bewildering,

boring, slow-moving or annoying, . . . ”10. long-suffering fans, bittersweet memories, hilariously

embarrassing moments, . . .4 / 57

Page 5: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Affective dimensions, relations, and transitions

(Sudhof et al. 2014)5 / 57

Page 6: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Lots of applications, but what’s the real goal?

Many business leaders think they want this:

Positive 70Negative 30

Positive 65Negative 35

When they see it, they realize that it does not help them withdecision-making. The distributions (assuming they areaccurately measured) are hiding the phenomena that areactually relevant.

6 / 57

Page 7: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Related tasks in affective computingWith selected papers that make excellent entry pointsbecause of their positioning and/or associated public data:

• Subjectivity (Pang and Lee 2008)• Bias (Recasens et al. 2013; Pryzant et al. 2020)• Stance (Anand et al. 2011)• Hate-speech (Nobata et al. 2016)• Microaggressions (Breitfeller et al. 2019)• Condescension (Wang and Potts 2019)• Sarcasm (Khodak et al. 2017)• Deception and betrayal (Niculae et al. 2015)• Online trolls (Cheng et al. 2017)• Polarization (Gentzkow et al. 2019)• Politeness (Danescu-Niculescu-Mizil et al. 2013)• Linguistic alignment (Doyle et al. 2016)

7 / 57

Page 8: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

General practical tips

1. Sentiment as a deep and important NLU problem2. General practical tips for sentiment analysis3. The Stanford Sentiment Treebank (SST)4. sst.py5. Methods: hyperparameters and classifier comparison6. Feature representation7. RNN classifiers8. Tree-structured networks

8 / 57

Page 9: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Selected sentiment datasetsThere are too many to try to list, so I picked some with noteworthyproperties, limiting to the core task of sentiment analysis:

• IMDb movie reviews (50K) (Maas et al. 2011):http://ai.stanford.edu/~amaas/data/sentiment/index.html

• Datasets from Lillian Lee’s group:http://www.cs.cornell.edu/home/llee/data/

• Datasets from Bing Liu’s group:https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html

• RateBeer (McAuley et al. 2012; McAuley and Leskovec 2013):http://snap.stanford.edu/data/web-RateBeer.html

• Amazon Customer Review data:https://s3.amazonaws.com/amazon-reviews-pds/readme.html

• Amazon Product Data (McAuley et al. 2015; He and McAuley 2016):http://jmcauley.ucsd.edu/data/amazon/

• Sentiment and social networks together (West et al. 2014)http://infolab.stanford.edu/~west1/TACL2014/

• Stanford Sentiment Treebank (SST; Socher et al. 2013)https://nlp.stanford.edu/sentiment/

9 / 57

Page 10: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Lexica

• Bing Liu’s Opinion Lexicon: nltk.corpus.opinion_lexicon

• SentiWordNet: nltk.corpus.sentiwordnet

• MPQA subjectivity lexicon: http://mpqa.cs.pitt.edu

• Harvard General InquirerÉ Download:

http://www.wjh.harvard.edu/~inquirer/spreadsheet_guide.htmÉ Documentation:

http://www.wjh.harvard.edu/~inquirer/homecat.htm

• Linguistic Inquiry and Word Counts (LIWC):https://liwc.wpengine.com

• Hamilton et al. (2016): SocialSenthttps://nlp.stanford.edu/projects/socialsent/

• Brysbaert et al. (2014): Norms of valence, arousal, anddominance for 13,915 English lemmas

10 / 57

Page 11: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Relationships between sentiment lexica

OpinionMPQA Lexicon Inquirer SentiWordNet LIWC

MPQA — 33/5402 (0.6%) 49/2867 (2%) 1127/4214 (27%) 12/363 (3%)Opinion Lexicon — 32/2411 (1%) 1004/3994 (25%) 9/403 (2%)

Inquirer — 520/2306 (23%) 1/204 (0.5%)SentiWordNet — 174/694 (25%)

LIWC —

Table: Disagreement levels for the sentiment lexicons.

• Where a lexicon had POS tags, I removed them and selected the mostsentiment-rich sense available for the resulting string.

• For SentiWordNet, I counted a word as positive if its positive score waslarger than its negative score; negative if its negative score was largerthan its positive score; else neutral, which means that words with equalnon-0 positive and negative scores are neutral.

11 / 57

Page 12: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Tokenizing

Raw text@NLUers: can't wait for the Jun 9 #projects!YAAAAAAY!!! >:-D http://stanford.edu/class/cs224u/.

A good start: nltk.tokenize.casual.TweetTokenizer

12 / 57

Page 13: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Tokenizing

Isolate mark-up, and replace HTML entities.@NLUers: can’t wait for the Jun 9 #projects! YAAAAAAY!!!>:-D http://stanford.edu/class/cs224u/.

A good start: nltk.tokenize.casual.TweetTokenizer

12 / 57

Page 14: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Tokenizing

Isolate mark-up, and replace HTML entities.@NLUers: can’t wait for the Jun 9 #projects! YAAAAAAY!!!>:-D http://stanford.edu/class/cs224u/.

Whitespace tokenizer@NLUers:can’twaitfortheJun9#projectsYAAAAAAY!!!>:-Dhttp://stanford.edu/class/cs224u/.

A good start: nltk.tokenize.casual.TweetTokenizer

12 / 57

Page 15: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Tokenizing

Isolate mark-up, and replace HTML entities.@NLUers: can’t wait for the Jun 9 #projects! YAAAAAAY!!!>:-D http://stanford.edu/class/cs224u/.

Treebank tokenizer@NLUers:can’twaitfortheJun9#projects

!YAAAAAAY!!!>:-Dhttp://stanford.edu/class/cs224u/.

A good start: nltk.tokenize.casual.TweetTokenizer

12 / 57

Page 16: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Tokenizing

Isolate mark-up, and replace HTML entities.@NLUers: can’t wait for the Jun 9 #projects! YAAAAAAY!!!>:-D http://stanford.edu/class/cs224u/.

Elements of a sentiment-aware tokenizer• Isolates emoticons• Respects Twitter and other domain-specific markup• Uses the underlying mark-up (e.g., <strong> tags)• Captures those #$%ing masked curses!• Preserves capitalization where it seems meaningful• Regularizes lengthening (e.g., YAAAAAAY⇒YAAAY)• Captures significant multiword expressions (e.g., out of

this world)

A good start: nltk.tokenize.casual.TweetTokenizer12 / 57

Page 17: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Tokenizing

Isolate mark-up, and replace HTML entities.@NLUers: can’t wait for the Jun 9 #projects! YAAAAAAY!!!>:-D http://stanford.edu/class/cs224u/.

Sentiment-aware tokenizer@nluers:can’twaitfortheJun_9#projects

!YAAAY!!!>:-Dhttp://stanford.edu/class/cs224u/.

A good start: nltk.tokenize.casual.TweetTokenizer12 / 57

Page 18: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

The impact of sentiment-aware tokenizing

Softmax classifier. Training on 12,000 OpenTable reviews(6000 positive/4-5 stars; 6000 negative/1-2 stars).

13 / 57

Page 19: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

The impact of sentiment-aware tokenizing

Softmax classifier. Training on 12,000 OpenTable reviews(6000 positive/4-5 stars; 6000 negative/1-2 stars).

13 / 57

Page 20: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

The dangers of stemming

• Stemming collapses distinct word forms.

• Three common stemming algorithms in the context ofsentiment:É the Porter stemmerÉ the Lancaster stemmerÉ the WordNet stemmer

• Porter and Lancaster destroy too many sentimentdistinctions.

• The WordNet stemmer does not have this problem nearlyso severely, but it generally doesn’t do enoughcollapsing to be worth the resources necessary to run it.

14 / 57

Page 21: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

The dangers of stemmingThe Porter stemmer heuristically identifies word suffixes(endings) and strips them off, with some regularization of theendings.

Positiv Negativ Porter stemmed

defense defensive defensextravagance extravagant extravagaffection affectation affectcompetence compete competimpetus impetuous impetuobjective objection objecttemperance temper tempertolerant tolerable toler

Table: Sample of instances in which the Porter stemmer destroys aHarvard Inquirer Positiv/Negativ distinction.

14 / 57

Page 22: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

The dangers of stemmingThe Lancaster stemmer uses the same strategy as the Porterstemmer.

Positiv Negativ Lancaster stemmed

call callous calcompliment complicate complydependability dependent dependfamous famished famfill filth filflourish floor flonotoriety notorious notpassionate passe passsavings savage savtruth truant tru

Table: Sample of instances in which the Lancaster stemmerdestroys a Harvard Inquirer Positiv/Negativ distinction.

14 / 57

Page 23: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

The dangers of stemmingThe WordNet stemmer (NLTK) is high-precision. It requiresword–POS pairs. Its only general issue for sentiment is that itremoves comparative morphology.

Positiv WordNet stemmed

(exclaims, v) exclaim(exclaimed, v) exclaim(exclaiming, v) exclaim(exclamation, n) exclamation(proved, v) prove(proven, v) prove(proven, a) proven(happy, a) happy(happier, a) happy(happiest, a) happy

Table: Representative examples of what WordNet stemming doesand doesn’t do.

14 / 57

Page 24: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

The impact of stemming

Softmax classifier. Training on 12,000 OpenTable reviews(6000 positive/4-5 stars; 6000 negative/1-2 stars).

15 / 57

Page 25: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Part-of-speech (POS) taggingWord Tag1 Val1 Tag2 Val2

arrest jj Positiv vb Negativeven jj Positiv vb Negativeven rb Positiv vb Negativfine jj Positiv nn Negativfine jj Positiv vb Negativfine nn Negativ rb Positivfine rb Positiv vb Negativhelp jj Positiv vbn Negativhelp nn Positiv vbn Negativhelp vb Positiv vbn Negativhit jj Negativ vb Positivmind nn Positiv vb Negativorder jj Positiv vb Negativorder nn Positiv vb Negativpass nn Negativ vb Positiv

Table: Harvard Inquirer POS contrasts.16 / 57

Page 26: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

The dangers of POS tagging1,424 cases where a (word, tag) pair is consis-tent with pos. and neg. lemma-level sentiment

Word Tag ScoreDiff

mean s 1.75abject s 1.625benign a 1.625modest s 1.625positive s 1.625smart s 1.625solid s 1.625sweet s 1.625artful a 1.5clean s 1.5evil n 1.5firm s 1.5gross s 1.5iniquity n 1.5marvellous s 1.5marvelous s 1.5plain s 1.5rank s 1.5serious s 1.5sheer s 1.5sorry s 1.5stunning s 1.5wickedness n 1.5

[. . . ]unexpectedly r 0.25velvet s 0.25vibration n 0.25weather-beaten s 0.25well-known s 0.25whine v 0.25wizard n 0.25wonderland n 0.25yawn v 0.25

17 / 57

Page 27: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Simple negation marking

The phenomenon

1. I didn’t enjoy it.2. I never enjoy it.3. No one enjoys it.4. I have yet to enjoy it.5. I don’t think I will enjoy it.

18 / 57

Page 28: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Simple negation marking

The phenomenon

1. I didn’t enjoy it.2. I never enjoy it.3. No one enjoys it.4. I have yet to enjoy it.5. I don’t think I will enjoy it.

The method (Das and Chen 2001; Pang et al. 2002)Append a _NEG suffix to every word appearing between anegation and a clause-level punctuation mark.

18 / 57

Page 29: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Simple negation marking

No one enjoys it. noone_NEGenjoys_NEGit_NEG.

I don’t think I will enjoy it, but I might. idon’tthink_NEGi_NEGwill_NEGenjoy_NEGit_NEG,butimight.

18 / 57

Page 30: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

The impact of negation marking

Softmax classifier. Training on 12,000 OpenTable reviews(6000 positive/4-5 stars; 6000 negative/1-2 stars).

19 / 57

Page 31: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

The impact of negation marking

Softmax classifier. Training on 12,000 OpenTable reviews(6000 positive/4-5 stars; 6000 negative/1-2 stars).

19 / 57

Page 32: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

SST

1. Sentiment as a deep and important NLU problem2. General practical tips for sentiment analysis3. The Stanford Sentiment Treebank (SST)4. sst.py5. Methods: hyperparameters and classifier comparison6. Feature representation7. RNN classifiers8. Tree-structured networks

20 / 57

Page 33: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

SST project overview

1. Socher et al. (2013)

2. Full code and data release:https://nlp.stanford.edu/sentiment/

3. Sentence-level corpus (10,662 sentences)

4. Original data from Rotten Tomatoes (Pang and Lee 2005)

5. Fully-labeled trees (crowdsourced labels)

6. The 5-way labels were extracted from workers’ sliderresponses.

21 / 57

Page 34: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Fully labeled trees

4

2

NLU

4

2

is

4

enlightening

These are novel examples,and the labels are actual output from

https://nlp.stanford.edu/sentiment/

22 / 57

Page 35: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Fully labeled trees

2

2

They

3

2

said

3

2

it

2

2

would

3

2

be

4

great

These are novel examples,and the labels are actual output from

https://nlp.stanford.edu/sentiment/

22 / 57

Page 36: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Fully labeled trees1

2

2

They

3

2

said

3

2

it

2

2

would

3

2

be

4

great

2

;

1

2

they

1

2

were

1

wrong

These are novel examples,and the labels are actual output from

https://nlp.stanford.edu/sentiment/

22 / 57

Page 37: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Fully labeled trees2

2

2

They

3

3

said

3

3

it

2

3

would

3

2

be

4

great

2

;

2

2

they

2

2

were

3

right

These are novel examples,and the labels are actual output from

https://nlp.stanford.edu/sentiment/

22 / 57

Page 38: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Root-level tasksFive-way problem

Label Meaning Train Dev

0 very negative 1,092 1391 negative 2,218 2892 neutral 1,624 2293 positive 2,322 2794 very positive 1,288 165

8,544 1,101

Note: 4 > 3 (more positive) but 0 > 1 (more negative)

23 / 57

Page 39: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Root-level tasksFive-way problem

Label Meaning Train Dev

0 very negative 1,092 1391 negative 2,218 2892 neutral 1,624 2293 positive 2,322 2794 very positive 1,288 165

8,544 1,101

Note: 4 > 3 (more positive) but 0 > 1 (more negative)

Ternary problem

Label Meaning Train Dev

0, 1 negative 3,310 4282 neutral 1,624 2293, 4 positive 3,610 444

8,544 1,101

23 / 57

Page 40: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Root-level tasksFive-way problem

Label Meaning Train Dev

0 very negative 1,092 1391 negative 2,218 2892 neutral 1,624 2293 positive 2,322 2794 very positive 1,288 165

8,544 1,101

Note: 4 > 3 (more positive) but 0 > 1 (more negative)

Binary problem (neutral data simply excluded)

Label Meaning Train Dev

0, 1 negative 3,310 4283, 4 positive 3,610 444

6,920 872

23 / 57

Page 41: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

All-nodes tasksFive-way problem

Label Meaning Train Dev

0 very negative 40,774 5,2171 negative 82,854 10,7572 neutral 58,398 8,2273 positive 89,308 11,0014 very positive 47,248 6,245

318,582 41,447

Note: 4 > 3 (more positive) but 0 > 1 (more negative)

24 / 57

Page 42: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

All-nodes tasksFive-way problem

Label Meaning Train Dev

0 very negative 40,774 5,2171 negative 82,854 10,7572 neutral 58,398 8,2273 positive 89,308 11,0014 very positive 47,248 6,245

318,582 41,447

Note: 4 > 3 (more positive) but 0 > 1 (more negative)

Ternary problem

Label Meaning Train Dev

0, 1 negative 123,628 15,9742 neutral 58,398 8,2273, 4 positive 136,556 17,246

318,582 41,447

24 / 57

Page 43: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

All-nodes tasksFive-way problem

Label Meaning Train Dev

0 very negative 40,774 5,2171 negative 82,854 10,7572 neutral 58,398 8,2273 positive 89,308 11,0014 very positive 47,248 6,245

318,582 41,447

Note: 4 > 3 (more positive) but 0 > 1 (more negative)

Binary problem (neutral data simply excluded)

Label Meaning Train Dev

0, 1 negative 123,628 15,9743, 4 positive 136,556 17,246

260,184 33,220

24 / 57

Page 44: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

sst.py

1. Sentiment as a deep and important NLU problem2. General practical tips for sentiment analysis3. The Stanford Sentiment Treebank (SST)4. sst.py5. Methods: hyperparameters and classifier comparison6. Feature representation7. RNN classifiers8. Tree-structured networks

25 / 57

Page 45: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Readers bbin+Q/2nyRnbQHp2/

J�`+? R8- kyky

(R), 7`QK MHiFXi`22 BKTQ`i h`22BKTQ`i QbBKTQ`i bbi

(k), aahn>PJ1 4 QbXT�i?XDQBMU^/�i�^- ^i`22b^V

(j), O �HH aah `2�/2`b �`2 ;2M2`�iQ`b i?�i vB2H/ Ui`22- b+Q`2V T�B`bXi`�BMn`2�/2` 4 bbiXi`�BMn`2�/2`Uaahn>PJ1V

(9), i`22- b+Q`2 4 M2tiUi`�BMn`2�/2`V

(8), bbiXi`�BMn`2�/2`Uaahn>PJ1- +H�bbn7mM+4bbiXi2`M�`vn+H�bbn7mM+V

(e), bbiXi`�BMn`2�/2`Uaahn>PJ1- +H�bbn7mM+4bbiX#BM�`vn+H�bbn7mM+V

(d), bbiX/2pn`2�/2`Uaahn>PJ1V

(3), bbiX/2pn`2�/2`Uaahn>PJ1- +H�bbn7mM+4bbiXi2`M�`vn+H�bbn7mM+V

(N), bbiX/2pn`2�/2`Uaahn>PJ1- +H�bbn7mM+4bbiX#BM�`vn+H�bbn7mM+V

( ),

( ),

( ),

( ),

( ),

( ),

( ),

( ),

R

26 / 57

Page 46: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

nltk.tree.Tree( ),

( ),

( ),

( ),

(Ry), i`22 4 h`22X7`QKbi`BM;U]]]U9 Uk LGlV U9 Uk BbV U9 �K�xBM;VVV]]]V

(RR), i`22

(RR),

(Rk), i`22XH�#2HUV

(Rk), ^9^

(Rj), i`22(y)

(Rj),

(R9), i`22(R)

(R9),

k

(R8), 7Q` bm#i`22 BM i`22Xbm#i`22bUV,T`BMiUbm#i`22V

U9 Uk LGlV U9 Uk BbV U9 �K�xBM;VVVUk LGlVU9 Uk BbV U9 �K�xBM;VVUk BbVU9 �K�xBM;V

(Re), 7`QK ASvi?QMX/BbTH�v BKTQ`i /BbTH�v- AK�;2BKTQ`i T�M/�b �b T/

(Rd), 7Q` +H�bbn7mM+ BM ULQM2- bbiXi2`M�`vn+H�bbn7mM+- bbiX#BM�`vn+H�bbn7mM+V,7Q` `2�/2` BM UbbiXi`�BMn`2�/2`- bbiX/2pn`2�/2`V,

T`BMiU]4] dyVT`BMiU`2�/2`- +H�bbn7mM+VH�#2Hb 4 (v 7Q` i`22- v BM `2�/2`Uaahn>PJ1- +H�bbn7mM+4+H�bbn7mM+V)T`BMiUT/Xa2`B2bUH�#2HbVXp�Hm2n+QmMibUVXbQ`inBM/2tUVVT`BMiU]1t�KTH2b, &,-']X7Q`K�iUH2MUH�#2HbVVV

4444444444444444444444444444444444444444444444444444444444444444444444I7mM+iBQM i`�BMn`2�/2` �i ytR�jN8N8R83= LQM2y RyNkR kkR3k Rek9j kjkk9 Rk33/ivT2, BMie91t�KTH2b, 3-8994444444444444444444444444444444444444444444444444444444444444444444444I7mM+iBQM /2pn`2�/2` �i ytR�jN8N8R2y= LQM2y RjNR k3Nk kkNj kdN9 Re8/ivT2, BMie91t�KTH2b, R-RyR4444444444444444444444444444444444444444444444444444444444444444444444I7mM+iBQM i`�BMn`2�/2` �i ytR�jN8N8R83= I7mM+iBQM i2`M�`vn+H�bbn7mM+ �iytR�jN8N8y/y=M2;�iBp2 jjRyM2mi`�H Rek9TQbBiBp2 jeRy/ivT2, BMie91t�KTH2b, 3-8994444444444444444444444444444444444444444444444444444444444444444444444I7mM+iBQM /2pn`2�/2` �i ytR�jN8N8R2y= I7mM+iBQM i2`M�`vn+H�bbn7mM+ �i

j

( ),

( ),

( ),

( ),

(Ry), i`22 4 h`22X7`QKbi`BM;U]]]U9 Uk LGlV U9 Uk BbV U9 �K�xBM;VVV]]]V

(RR), i`22

(RR),

(Rk), i`22XH�#2HUV

(Rk), ^9^

(Rj), i`22(y)

(Rj),

(R9), i`22(R)

(R9),

k

27 / 57

Page 47: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Feature functions

bbin+Q/2nyknbQHp2/

J�`+? R8- kyky

( ), BKTQ`i bvbbvbXT�i?X�TT2M/U]flb2`bf+;TQiibf.Q+mK2Mibfi2�+?BM;fkyRN@kykyft+bkk9mf+bkk9mf]V

(R), 7`QK +QHH2+iBQMb BKTQ`i *QmMi2`7`QK MHiFXi`22 BKTQ`i h`22BKTQ`i bbi

(k), /27 mMB;`�KbnT?BUi`22V,]]]h?2 #�bBb 7Q` � mMB;`�Kb 72�im`2 7mM+iBQMX

S�`�K2i2`b@@@@@@@@@@i`22 , MHiFXi`22

h?2 i`22 iQ `2T`2b2MiX

_2im`Mb@@@@@@@*QmMi2`

� K�T 7`QK bi`BM;b iQ i?2B` +QmMib BM <i`22<X

]]]`2im`M *QmMi2`Ui`22XH2�p2bUVV

(j), i`22 4 h`22X7`QKbi`BM;U]]]U9 Uk LGlV U9 Uk BbV U9 �K�xBM;VVV]]]V

(9), mMB;`�KbnT?BUi`22V

(9), *QmMi2`U&^LGl^, R- ^Bb^, R- ^�K�xBM;^, R'V

( ),

( ),

( ),

( ),

R

28 / 57

Page 48: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Model wrappers

( ),

( ),

( ),

(8), 7`QK bFH2�`MXHBM2�`nKQ/2H BKTQ`i GQ;BbiB+_2;`2bbBQM

(e), /27 7BinbQ7iK�tn+H�bbB7B2`Us- vV,]]]q`�TT2` 7Q` <bFH2�`MXHBM2�`XKQ/2HXGQ;BbiB+_2;`2bbBQM<X h?Bb Bb�HbQ +�HH2/ � J�tBKmK 1Mi`QTv UJ�t1MiV *H�bbB7B2`- r?B+? Bb KQ`27BiiBM; 7Q` i?2 KmHiB+H�bb +�b2X

S�`�K2i2`b@@@@@@@@@@s , k/ MTX�``�v

h?2 K�i`Bt Q7 72�im`2b- QM2 2t�KTH2 T2` `QrXv , HBbi

h?2 HBbi Q7 H�#2Hb 7Q` `Qrb BM <s<X

_2im`Mb@@@@@@@bFH2�`MXHBM2�`XKQ/2HXGQ;BbiB+_2;`2bbBQM

� i`�BM2/ <GQ;BbiB+_2;`2bbBQM< BMbi�M+2X

]]]KQ/ 4 GQ;BbiB+_2;`2bbBQMU

7BinBMi2`+2Ti4h`m2- bQHp2`4^HB#HBM2�`^- KmHiBn+H�bb4^�miQ^VKQ/X7BiUs- vV`2im`M KQ/

( ),

( ),

( ),

( ),

( ),

( ),

( ),

( ),

k

29 / 57

Page 49: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

sst.experiment

( ),

( ),

( ),

( ),

( ),

(d), BKTQ`i QbBKTQ`i miBHb

(3), aahn>PJ1 4 QbXT�i?XDQBMU^/�i�^- ^i`22b^V

(N), mMB;`�KbnbQ7iK�tn2tT2`BK2Mi 4 bbiX2tT2`BK2MiUaahn>PJ1-mMB;`�KbnT?B-7BinbQ7iK�tn+H�bbB7B2`-i`�BMn`2�/2`4bbiXi`�BMn`2�/2`- O h?2 /27�mHi�bb2bbn`2�/2`4LQM2- O h?2 /27�mHii`�BMnbBx24yXd- O h?2 /27�mHi+H�bbn7mM+4bbiXi2`M�`vn+H�bbn7mM+- O h?2 /27�mHib+Q`2n7mM+4miBHbXb�72nK�+`Qn7R- O h?2 /27�mHip2+iQ`Bx24h`m2- O h?2 /27�mHip2`#Qb24h`m2V O h?2 /27�mHi

T`2+BbBQM `2+�HH 7R@b+Q`2 bmTTQ`i

M2;�iBp2 yXeRd yXeNj yXe8j NdNM2mi`�H yXkNj yXRjd yXR3d 9N8

TQbBiBp2 yXeee yXd89 yXdyd RyNy

�++m`�+v yXeRk k8e9K�+`Q �p; yX8ke yX8k3 yX8Re k8e9

r2B;?i2/ �p; yX8de yXeRk yX83e k8e9

(Ry), HBbiUmMB;`�KbnbQ7iK�tn2tT2`BK2MiXF2vbUVV

(Ry), (^KQ/2H^-^T?B^-^i`�BMn/�i�b2i^-^�bb2bbn/�i�b2i^-^T`2/B+iBQMb^-^K2i`B+^-^b+Q`2^)

j

Our default metric foralmost all our work:gives equal weight toall classes regardlessof size, while balancingprecision and recall.

30 / 57

Page 50: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

sst.experiment

( ),

( ),

( ),

( ),

( ),

(d), BKTQ`i QbBKTQ`i miBHb

(3), aahn>PJ1 4 QbXT�i?XDQBMU^/�i�^- ^i`22b^V

(N), mMB;`�KbnbQ7iK�tn2tT2`BK2Mi 4 bbiX2tT2`BK2MiUaahn>PJ1-mMB;`�KbnT?B-7BinbQ7iK�tn+H�bbB7B2`-i`�BMn`2�/2`4bbiXi`�BMn`2�/2`- O h?2 /27�mHi�bb2bbn`2�/2`4LQM2- O h?2 /27�mHii`�BMnbBx24yXd- O h?2 /27�mHi+H�bbn7mM+4bbiXi2`M�`vn+H�bbn7mM+- O h?2 /27�mHib+Q`2n7mM+4miBHbXb�72nK�+`Qn7R- O h?2 /27�mHip2+iQ`Bx24h`m2- O h?2 /27�mHip2`#Qb24h`m2V O h?2 /27�mHi

T`2+BbBQM `2+�HH 7R@b+Q`2 bmTTQ`i

M2;�iBp2 yXeRd yXeNj yXe8j NdNM2mi`�H yXkNj yXRjd yXR3d 9N8

TQbBiBp2 yXeee yXd89 yXdyd RyNy

�++m`�+v yXeRk k8e9K�+`Q �p; yX8ke yX8k3 yX8Re k8e9

r2B;?i2/ �p; yX8de yXeRk yX83e k8e9

(Ry), HBbiUmMB;`�KbnbQ7iK�tn2tT2`BK2MiXF2vbUVV

(Ry), (^KQ/2H^-^T?B^-^i`�BMn/�i�b2i^-^�bb2bbn/�i�b2i^-^T`2/B+iBQMb^-^K2i`B+^-^b+Q`2^)

j

Our default metric foralmost all our work:gives equal weight toall classes regardlessof size, while balancingprecision and recall.

30 / 57

Page 51: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

sst.experimentThe return value of sst.experiment is a dict packaging upthe objects and info needed to test this model in newsettings and conduct deep error analysis:

( ),

(Ry), HBbiUmMB;`�KbnbQ7iK�tn2tT2`BK2MiXF2vbUVV

(Ry), (^KQ/2H^-^T?B^-^i`�BMn/�i�b2i^-^�bb2bbn/�i�b2i^-^T`2/B+iBQMb^-^K2i`B+^-^b+Q`2^)

(RR), HBbiUmMB;`�KbnbQ7iK�tn2tT2`BK2Mi(^i`�BMn/�i�b2i^)XF2vbUVV

(RR), (^s^- ^v^- ^p2+iQ`Bx2`^- ^`�rn2t�KTH2b^)

( ),

(Rk), 7`QK +QHH2+iBQMb BKTQ`i *QmMi2`7`QK MHiFXi`22 BKTQ`i h`22BKTQ`i Qb7`QK bFH2�`MXHBM2�`nKQ/2H BKTQ`i GQ;BbiB+_2;`2bbBQMBKTQ`i bbi

(Rj), aahn>PJ1 4 QbXT�i?XDQBMU^/�i�^- ^i`22b^V

(R9), /27 T?BUi`22V,O h`22 iQ *QmMi2`X`2im`M *QmMi2`Ui`22XH2�p2bUVV

(R8), /27 7BinKQ/2HUs- vV,O s- v iQ � KQ/2H � 7Bii2/ KQ/2H rBi? � T`2/B+i K2i?Q/XKQ/ 4 GQ;BbiB+_2;`2bbBQMU

7BinBMi2`+2Ti4h`m2- bQHp2`4^HB#HBM2�`^- KmHiBn+H�bb4^�miQ^VKQ/X7BiUs- vV`2im`M KQ/

(Re), 2tT2`BK2Mi 4 bbiX2tT2`BK2MiUaahn>PJ1- T?B- 7BinKQ/2HV

T`2+BbBQM `2+�HH 7R@b+Q`2 bmTTQ`i

M2;�iBp2 yXejR yXe3y yXe88 Ryy3M2mi`�H yXkdd yXRyd yXR88 8y9

TQbBiBp2 yXekN yXded yXeNR Ry8k

�++m`�+v yXeyj k8e9K�+`Q �p; yX8Rk yX8R3 yX8yy k8e9

9

30 / 57

Page 52: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Bringing it all together

bbin+Q/2nyjnbQHp2/

J�`+? R8- kyky

( ), BKTQ`i bvbbvbXT�i?X�TT2M/U]flb2`bf+;TQiibf.Q+mK2Mibfi2�+?BM;fkyRN@kykyft+bkk9mf+bkk9mf]V

(R), 7`QK +QHH2+iBQMb BKTQ`i *QmMi2`BKTQ`i Qb7`QK bFH2�`MXHBM2�`nKQ/2H BKTQ`i GQ;BbiB+_2;`2bbBQMBKTQ`i bbi

(k), aahn>PJ1 4 QbXT�i?XDQBMU^/�i�^- ^i`22b^V

(j), /27 T?BUi`22V,O h`22 iQ *QmMi2`X`2im`M *QmMi2`Ui`22XH2�p2bUVV

(9), /27 7BinKQ/2HUs- vV,O s- v iQ � 7Bii2/ KQ/2H rBi? � T`2/B+i K2i?Q/XKQ/ 4 GQ;BbiB+_2;`2bbBQMU

7BinBMi2`+2Ti4h`m2- bQHp2`4^HB#HBM2�`^- KmHiBn+H�bb4^�miQ^VKQ/X7BiUs- vV`2im`M KQ/

(8), 2tT2`BK2Mi 4 bbiX2tT2`BK2MiUaahn>PJ1- T?B- 7BinKQ/2HV

T`2+BbBQM `2+�HH 7R@b+Q`2 bmTTQ`i

M2;�iBp2 yXe9y yXedy yXe88 RykkM2mi`�H yXjyR yXR8e yXky8 9d8

TQbBiBp2 yXe9e yXd88 yXeNe Ryed

�++m`�+v yXeRy k8e9K�+`Q �p; yX8kN yX8kd yX8RN k8e9

r2B;?i2/ �p; yX83y yXeRy yX83N k8e9

R

31 / 57

Page 53: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

sklearn.feature_extraction.DictVectorizer

bbin+Q/2ny9nbQHp2/

J�`+? R8- kyky

(R), BKTQ`i T�M/�b �b T/7`QK bFH2�`MX72�im`2n2ti`�+iBQM BKTQ`i .B+io2+iQ`Bx2`

(k), i`�BMn72�ib 4 (&^�^, R- ^#^, R'-&^#^, R- ^+^, k')

(j), p2+ 4 .B+io2+iQ`Bx2`UbT�`b246�Hb2V O lb2 <bT�`b24h`m2< 7Q` `2�H T`Q#H2Kb5

(9), sni`�BM 4 p2+X7Bini`�Mb7Q`KUi`�BMn72�ibV

(8), T/X.�i�6`�K2Usni`�BM- +QHmKMb4p2+X;2in72�im`2nM�K2bUVV

(8), � # +y RXy RXy yXyR yXy RXy kXy

(e), i2bin72�ib 4 (&^�^, k'-&^�^, 9- ^#^, k- ^/^, R')

(d), sni2bi 4 p2+Xi`�Mb7Q`KUi2bin72�ibV O LQi <7Bini`�Mb7Q`K<5

(3), T/X.�i�6`�K2Usni2bi- +QHmKMb4p2+X;2in72�im`2nM�K2bUVV

(3), � # +y kXy yXy yXyR 9Xy kXy yXy

R

32 / 57

Page 54: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Methods

1. Sentiment as a deep and important NLU problem2. General practical tips for sentiment analysis3. The Stanford Sentiment Treebank (SST)4. sst.py5. Methods: hyperparameters and classifier comparison6. Feature representation7. RNN classifiers8. Tree-structured networks

33 / 57

Page 55: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Hyperparameter search: Rationale

1. The parameters of a model are those whose values arelearned as part of optimizing the model itself.

2. The hyperparameters of a model are any settings thatare set outside of this optimization. Examples:a. GloVe or LSA dimensionalityb. GloVe xmax and αc. Regularization terms, hidden dimensionalities,

learning rates, activation functionsd. Optimization methods

3. Hyperparameter optimization is crucial to building apersuasive argument: every model must be put in itsbest light!

4. Otherwise, one could appear to have evidence that onemodel is better than other simply by strategically pickinghyperparameters that favored the outcome.

34 / 57

Page 56: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Hyperparameter search in sst.pybbin+Q/2ny8nbQHp2/

J�`+? R8- kyky

(R), 7`QK +QHH2+iBQMb BKTQ`i *QmMi2`BKTQ`i Qb7`QK bFH2�`MXHBM2�`nKQ/2H BKTQ`i GQ;BbiB+_2;`2bbBQMBKTQ`i bbiBKTQ`i miBHb

(k), aahn>PJ1 4 QbXT�i?XDQBMU^/�i�^- ^i`22b^V

(j), /27 T?BUi`22V,`2im`M *QmMi2`Ui`22XH2�p2bUVV

(9), /27 7BinbQ7iK�tnrBi?n+`Qbbp�HB/�iBQMUs- vV,#�b2KQ/ 4 GQ;BbiB+_2;`2bbBQMUbQHp2`4^HB#HBM2�`^- KmHiBn+H�bb4^�miQ^V+p 4 8T�`�Kn;`B/ 4 &^7BinBMi2`+2Ti^, (h`m2- 6�Hb2)-

^*^, (yX9- yXe- yX3- RXy- kXy- jXy)-^T2M�Hiv^, (^HR^-^Hk^)'

#2binKQ/ 4 miBHbX7Bin+H�bbB7B2`nrBi?n+`Qbbp�HB/�iBQMUs- v- #�b2KQ/- +p- T�`�Kn;`B/V

`2im`M #2binKQ/

(8), 2tT2`BK2Mi 4 bbiX2tT2`BK2MiUaahn>PJ1- T?B- 7BinbQ7iK�tnrBi?n+`Qbbp�HB/�iBQMV

"2bi T�`�Kb, &^*^, jXy- ^7BinBMi2`+2Ti^, 6�Hb2- ^T2M�Hiv^, ^Hk^'"2bi b+Q`2, yX8R3

T`2+BbBQM `2+�HH 7R@b+Q`2 bmTTQ`i

M2;�iBp2 yXeRN yXeeR yXejN N3dM2mi`�H yXjyj yXR38 yXkjy 93e

TQbBiBp2 yXe88 yXdkN yXeNy RyNR

KB+`Q �p; yX8NN yX8NN yX8NN k8e9K�+`Q �p; yX8ke yX8k8 yX8ky k8e9

r2B;?i2/ �p; yX8d9 yX8NN yX83j k8e9

(e), 2tT2`BK2Mi(^KQ/2H^)

R

35 / 57

Page 57: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Classifier comparison: Rationale

1. Suppose you’ve assessed a baseline model B and yourfavored model M, and your chosen assessment metricfavors M. Is M really better?

2. If the difference between B and M is clearly of practicalsignificance, then you might not need to do anythingbeyond presenting the numbers. Still, is there variationin how B or M performs?

3. Demsar (2006) advises the Wilcoxon signed-rank test forsituations in which you can afford to repeatedly assess Band M on different train/test splits. We’ll talk later in theterm about the rationale for this.

4. For situations where you can’t repeatedly assess B andM, McNemar’s test is a reasonable alternative. Itoperates on the confusion matrices produced by the twomodels, testing the null hypothesis that the two modelshave the same error rate.

36 / 57

Page 58: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Classifier comparison in sst.pybbin+Q/2nyenbQHp2/

J�`+? R8- kyky

(R), 7`QK +QHH2+iBQMb BKTQ`i *QmMi2`BKTQ`i QbBKTQ`i b+BTvXbi�ib7`QK bFH2�`MXHBM2�`nKQ/2H BKTQ`i GQ;BbiB+_2;`2bbBQM7`QK bFH2�`MXM�Bp2n#�v2b BKTQ`i JmHiBMQKB�HL"BKTQ`i bbiBKTQ`i miBHb

(k), aahn>PJ1 4 QbXT�i?XDQBMU^/�i�^- ^i`22b^V

(j), /27 T?BUi`22V,`2im`M *QmMi2`Ui`22XH2�p2bUVV

(9), /27 7BinbQ7iK�tUs- vV,KQ/ 4 GQ;BbiB+_2;`2bbBQMU

7BinBMi2`+2Ti4h`m2-bQHp2`4^HB#HBM2�`^-KmHiBn+H�bb4^�miQ^V

KQ/X7BiUs- vV`2im`M KQ/

(8), /27 7BinM�Bp2#�v2bUs- vV,KQ/ 4 JmHiBMQKB�HL"U7BinT`BQ`4h`m2VKQ/X7BiUs- vV`2im`M KQ/

( ),

( ),

( ),

( ),

( ),

( ),

R

37 / 57

Page 59: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Classifier comparison in sst.pyWilcoxon signed rank test

( ),

( ),

( ),

( ),

( ),

( ),

( ),

( ),

(e), KQ/Rnb+Q`2b- KQ/knb+Q`2b- T 4 bbiX+QKT�`2nKQ/2HbUaahn>PJ1-T?BR4T?B-T?Bk4LQM2- O .27�mHib iQ <T?BR<i`�BMn7mM+R47BinbQ7iK�t-i`�BMn7mM+k47BinM�Bp2#�v2b- O .27�mHib iQ <i`�BMn7mM+R<bi�ibni2bi4b+BTvXbi�ibXrBH+QtQM- O .27�mHii`B�Hb4Ry- O .27�mHi`2�/2`4bbiXi`�BMn`2�/2`- O .27�mHii`�BMnbBx24yXd- O .27�mHi+H�bbn7mM+4bbiXi2`M�`vn+H�bbn7mM+- O .27�mHib+Q`2n7mM+4miBHbXb�72nK�+`Qn7RV O .27�mHi

JQ/2H R K2�M, yX8RyJQ/2H k K2�M, yX9NkT 4 yXyy8

(d), bQ7iK�tn2tT2`BK2Mi 4 bbiX2tT2`BK2MiUaahn>PJ1- T?B- 7BinbQ7iK�tV

(3), M�Bp2#�v2bn2tT2`BK2Mi 4 bbiX2tT2`BK2MiUaahn>PJ1- T?B- 7BinM�Bp2#�v2bV

(N), bi�i- T 4 miBHbXK+M2K�`UbQ7iK�tn2tT2`BK2Mi(^�bb2bbn/�i�b2i^)(^v^)-M�Bp2#�v2bn2tT2`BK2Mi(^T`2/B+iBQMb^)-bQ7iK�tn2tT2`BK2Mi(^T`2/B+iBQMb^)V

k

37 / 57

Page 60: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Classifier comparison in sst.pyMcNemar’s test

( ),

( ),

( ),

( ),

( ),

( ),

( ),

( ),

(e), KQ/Rnb+Q`2b- KQ/knb+Q`2b- T 4 bbiX+QKT�`2nKQ/2HbUaahn>PJ1-T?BR4T?B-T?Bk4LQM2- O .27�mHib iQ <T?BR<i`�BMn7mM+R47BinbQ7iK�t-i`�BMn7mM+k47BinM�Bp2#�v2b- O .27�mHib iQ <i`�BMn7mM+R<bi�ibni2bi4b+BTvXbi�ibXrBH+QtQM- O .27�mHii`B�Hb4Ry- O .27�mHi`2�/2`4bbiXi`�BMn`2�/2`- O .27�mHii`�BMnbBx24yXd- O .27�mHi+H�bbn7mM+4bbiXi2`M�`vn+H�bbn7mM+- O .27�mHib+Q`2n7mM+4miBHbXb�72nK�+`Qn7RV O .27�mHi

JQ/2H R K2�M, yX8RyJQ/2H k K2�M, yX9NkT 4 yXyy8

(d), bQ7iK�tn2tT2`BK2Mi 4 bbiX2tT2`BK2MiUaahn>PJ1- T?B- 7BinbQ7iK�tV

(3), M�Bp2#�v2bn2tT2`BK2Mi 4 bbiX2tT2`BK2MiUaahn>PJ1- T?B- 7BinM�Bp2#�v2bV

(N), bi�i- T 4 miBHbXK+M2K�`UbQ7iK�tn2tT2`BK2Mi(^�bb2bbn/�i�b2i^)(^v^)-M�Bp2#�v2bn2tT2`BK2Mi(^T`2/B+iBQMb^)-bQ7iK�tn2tT2`BK2Mi(^T`2/B+iBQMb^)V

k

37 / 57

Page 61: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Feature representation

1. Sentiment as a deep and important NLU problem2. General practical tips for sentiment analysis3. The Stanford Sentiment Treebank (SST)4. sst.py5. Methods: hyperparameters and classifier comparison6. Feature representation7. RNN classifiers8. Tree-structured networks

38 / 57

Page 62: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Hand-built features: Bags of subparts( ),

(R), 7`QK +QHH2+iBQMb BKTQ`i *QmMi2`7`QK MHiFXi`22 BKTQ`i h`22

(k), i`22 4 h`22X7`QKbi`BM;U]]]U9 Uk LGlV U9 Uk BbV U9 �K�xBM;VVV]]]Vi`22

(k),

(j), /27 T?Bn#B;`�KbUi`22V,iQFb 4 (]Ib=]) Y i`22XH2�p2bUV Y (]Ifb=])#B;`�Kb 4 (UrR- rkV 7Q` rR- rk BM xBTUiQFb(, @R)- iQFb(R, )V)`2im`M *QmMi2`U#B;`�KbV

(9), T?Bn#B;`�KbUi`22V

(9), *QmMi2`U&U^Ib=^- ^LGl^V, R-U^LGl^- ^Bb^V, R-U^Bb^- ^�K�xBM;^V, R-U^�K�xBM;^- ^Ifb=^V, R'V

(8), /27 T?BnT?`�b2bUi`22V,T?`�b2b 4 ()7Q` bm#i`22 BM i`22Xbm#i`22bUV,

B7 bm#i`22X?2B;?iUV I4 j,T?`�b2bX�TT2M/UimTH2Ubm#i`22XH2�p2bUVVV

`2im`M *QmMi2`UT?`�b2bV

(e), T?BnT?`�b2bUi`22V

(e), *QmMi2`U&U^LGl^-V, R- U^Bb^- ^�K�xBM;^V, R- U^Bb^-V, R- U^�K�xBM;^-V, R'V

k

39 / 57

Page 63: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Hand-built feature: NegationSimple negation markingThe dialogue was n’t very_NEG good_NEG but_NEG the_NEG acting_NEGwas_NEG amazing_NEG ._NEG

Negation marking based on structureS

S

NP

Det

The

N

dialogue

VP

V

was

AP

NEG

n’t

AP

Adv

very_NEG

A

good_NEG

S

but the acting was amazing

40 / 57

Page 64: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Extension to other kinds of scope-taking

S

NP

They

VP

V

said

S

NP

it

VP

V

was

AP

great

41 / 57

Page 65: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Extension to other kinds of scope-taking

S

NP

It

VP

V

might

VP

V

be

AP

successful

41 / 57

Page 66: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Other ideas for hand-built feature functions

• Lexicon-derived features

• Modal adverbs:É “It is quite possibly a masterpiece.”É “It is totally amazing.”

• Thwarted expectations:É “Many consider the movie bewildering, boring,

slow-moving or annoying.”É “It was hailed as a brilliant, unprecedented artistic

achievement worthy of multiple Oscars.”

• Non-literal language:É “Not exactly a masterpiece.”É “Like 50 hours long.”É “The best movie in the history of the universe.”

42 / 57

Page 67: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Assessing individual feature functions1. sklearn.feature_selection offers functions to assess how much

information your feature functions contain with respect to your labels.

2. Take care when assessing feature functions individually; correlationsbetwen them will make these assessments hard to interpret:

X1 X2 X3 y

1 1 0 T1 0 1 T1 0 0 T0 1 1 T0 1 0 F0 0 1 F0 0 1 F0 0 1 F

chi2(X1,y) = 3

chi2(X2,y) = 0.33

chi2(X3,y) = 0.2

What do the scores tell us about the best model? In truth, a linearmodel performs best with just X1, and including X2 hurts.

3. Consider more holistic assessment methods: systematically removingor disrupting features in the context of a full model and comparingperformance before and after.

43 / 57

Page 68: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Distributed representations as features

44 / 57

Page 69: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Distributed representations as featuresbbin+Q/2nyNnbQHp2/

J�`+? R8- kyky

(R), BKTQ`i MmKTv �b MTBKTQ`i Qb7`QK bFH2�`MXHBM2�`nKQ/2H BKTQ`i GQ;BbiB+_2;`2bbBQMBKTQ`i bbiBKTQ`i miBHb

(k), :GPo1n>PJ1 4 QbXT�i?XDQBMU^/�i�^- ^;HQp2Xe"^Vaahn>PJ1 4 QbXT�i?XDQBMU^/�i�^- ^i`22b^V

(j), ;HQp2nHQQFmT 4 miBHbX;HQp2k/B+iUQbXT�i?XDQBMU:GPo1n>PJ1- ^;HQp2Xe"Xjyy/Xiti^VV

(9), /27 pbKnH2�p2bnT?BUi`22- HQQFmT- MTn7mM+4MTXbmKV,�HHp2+b 4 MTX�``�vU(HQQFmT(r) 7Q` r BM i`22XH2�p2bUV B7 r BM HQQFmT)VB7 H2MU�HHp2+bV 44 y,

/BK 4 H2MUM2tiUBi2`UHQQFmTXp�Hm2bUVVVV72�ib 4 MTXx2`QbU/BKV

2Hb2,72�ib 4 MTn7mM+U�HHp2+b- �tBb4yV

`2im`M 72�ib

(8), /27 ;HQp2nH2�p2bnT?BUi`22- MTn7mM+4MTXbmKV,`2im`M pbKnH2�p2bnT?BUi`22- ;HQp2nHQQFmT- MTn7mM+4MTn7mM+V

(e), /27 7BinbQ7iK�tUs- vV,KQ/ 4 GQ;BbiB+_2;`2bbBQMU

7BinBMi2`+2Ti4h`m2- bQHp2`4^HB#HBM2�`^- KmHiBn+H�bb4^�miQ^VKQ/X7BiUs- vV`2im`M KQ/

(d), ;HQp2nbmKn2tT2`BK2Mi 4 bbiX2tT2`BK2MiUaahn>PJ1-;HQp2nH2�p2bnT?B-7BinbQ7iK�t-p2+iQ`Bx246�Hb2V O h2HH <2tT2`BK2Mi< Bi M22/M^i mb2 � .B+io2+iQ`Bx2`X

T`2+BbBQM `2+�HH 7R@b+Q`2 bmTTQ`i

R

44 / 57

Page 70: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

RNN classifiers

1. Sentiment as a deep and important NLU problem2. General practical tips for sentiment analysis3. The Stanford Sentiment Treebank (SST)4. sst.py5. Methods: hyperparameters and classifier comparison6. Feature representation7. RNN classifiers8. Tree-structured networks

45 / 57

Page 71: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Model overview

For complete details, see the referenceimplementation np_rnn_classifier.py

46 / 57

Page 72: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Standard RNN dataset preparation

Examples[a, b, a][b, c]⇓

Indices[1, 2, 1][2, 3]⇓

Vectors

h

[−0.42 0.10 0.12], [−0.16 −0.21 0.29], [−0.42 0.10 0.12]i

h

[−0.16 −0.21 0.29], [−0.26 0.31 0.37]i

Embedding

1 −0.42 0.10 0.122 −0.16 −0.21 0.293 −0.26 0.31 0.37

47 / 57

Page 73: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

A note on LSTMs

1. Plain RNNs tend to perform poorly with very longsequences; as information flows back through thenetwork, it is lost or distorted.

2. LSTM cells are a prominent response to this problem:they introduce mechanisms that control the flow ofinformation.

3. We won’t review all the mechanism for this here. Iinstead recommend these excellent blog posts, whichinclude intuitive diagrams and discuss the motivationsfor the various pieces in detail:É Towards Data Science: Illustrated Guide to LSTM’s

and GRU’s: A step by step explanationÉ colah’s blog: Understanding LSTM networks

48 / 57

Page 74: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Code snippetsbbin+Q/2nRynbQHp2/

J�`+? R8- kyky

(R), BKTQ`i QbBKTQ`i bbi7`QK iQ`+?n`MMn+H�bbB7B2` BKTQ`i hQ`+?_LL*H�bbB7B2`BKTQ`i iQ`+?XMM �b MMBKTQ`i miBHb

(k), :GPo1n>PJ1 4 QbXT�i?XDQBMU^/�i�^- ^;HQp2Xe"^Vaahn>PJ1 4 QbXT�i?XDQBMU^/�i�^- ^i`22b^V

(j), :GPo1nGPPElS 4 miBHbX;HQp2k/B+iUQbXT�i?XDQBMU:GPo1n>PJ1- ^;HQp2Xe"X8y/Xiti^VV

(9), /27 `MMnT?BUi`22V,`2im`M i`22XH2�p2bUV

(8), /27 7Bin`MMUs- vV,bbini`�BMnpQ+�# 4 miBHbX;2inpQ+�#Us- MnrQ`/b4RyyyyV;HQp2n2K#2//BM;- bbin;HQp2npQ+�# 4 miBHbX+`2�i2nT`2i`�BM2/n2K#2//BM;U

:GPo1nGPPElS- bbini`�BMnpQ+�#VKQ/ 4 hQ`+?_LL*H�bbB7B2`U

bbin;HQp2npQ+�#-2i�4yXy8-2K#2//BM;4;HQp2n2K#2//BM;-#�i+?nbBx24Ryyy-?B//2Mn/BK48y-K�tnBi2`48y-Hknbi`2M;i?4yXyyR-#B/B`2+iBQM�H4h`m2-?B//2Mn�+iBp�iBQM4MMX_2GlUVV

KQ/X7BiUs- vV`2im`M KQ/

(e), `MMn2tT2`BK2Mi 4 bbiX2tT2`BK2MiUaahn>PJ1- `MMnT?B- 7Bin`MM- p2+iQ`Bx246�Hb2V

R

49 / 57

Page 75: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Tree-structured networks

1. Sentiment as a deep and important NLU problem2. General practical tips for sentiment analysis3. The Stanford Sentiment Treebank (SST)4. sst.py5. Methods: hyperparameters and classifier comparison6. Feature representation7. RNN classifiers8. Tree-structured networks

50 / 57

Page 76: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Model overview

For complete details, see the referenceimplementation np_tree_nn.py

51 / 57

Page 77: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Some alternative composition functions

Basic, as in the previous diagram (Pollack 1990)h = f ([a;c]W + b)

a c

Matrix–Vector (Socher et al. 2012)All nodes are represented by both vectors and matries, andthe combination function creates a lot of multiplicativeinteractions between them.

Tensor (Socher et al. 2013)An extension of our basic model with a 3d tensor that allowsfor multiplicative interactions between the child vectors.

LSTM (Tai et al. 2015)Each parent node combines separately-gated memory andhidden states of its children.

52 / 57

Page 78: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Subtree supervision

53 / 57

Page 79: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

Overview General practical tips SST sst.py Methods Feature representation RNNs TreeNNs

Code snippets

bbin+Q/2nRRnbQHp2/

J�`+? R8- kyky

( ), BKTQ`i bvbbvbXT�i?X�TT2M/U]flb2`bf+;TQiibf.Q+mK2Mibfi2�+?BM;fkyRN@kykyft+bkk9mf+bkk9mf]V

(R), 7`QK +QHH2+iBQMb BKTQ`i *QmMi2`BKTQ`i QbBKTQ`i bbi7`QK iQ`+?ni`22nMM BKTQ`i hQ`+?h`22LLBKTQ`i miBHb

(k), aahn>PJ1 4 QbXT�i?XDQBMU^/�i�^- ^i`22b^V

(j), /27 ;2ini`22npQ+�#Us- MnrQ`/b4LQM2V,r+ 4 *QmMi2`U(r 7Q` 2t BM s 7Q` r BM 2tXH2�p2bUV)Vr+ 4 r+XKQbin+QKKQMUMnrQ`/bV B7 MnrQ`/b 2Hb2 r+XBi2KbUVpQ+�# 4 &r 7Q` r- + BM r+'pQ+�#X�//U]0lLE]V`2im`M bQ`i2/UpQ+�#V

(9), /27 i`22nT?BUi`22V,`2im`M i`22

(8), /27 7Bini`22Us- vV,bbini`�BMnpQ+�# 4 ;2ini`22npQ+�#Us- MnrQ`/b4RyyyyVKQ/ 4 hQ`+?h`22LLU

bbini`�BMnpQ+�#-2K#2//BM;4LQM2-2K#2/n/BK48y-K�tnBi2`4Ry-2i�4yXy8V

KQ/X7BiUs- vV`2im`M KQ/

(e), i`22n2tT2`BK2Mi 4 bbiX2tT2`BK2MiUaahn>PJ1- i`22nT?B- 7Bini`22- p2+iQ`Bx246�Hb2V

6BMBb?2/ 2TQ+? Ry Q7 Ryc 2``Q` Bb kX89jeNRkRR3RyddN

T`2+BbBQM `2+�HH 7R@b+Q`2 bmTTQ`i

R

54 / 57

Page 80: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

References

References IPranav Anand, Marilyn Walker, Rob Abbott, Jean E. Fox Tree, Robeson Bowmani, and Michael Minor. 2011. Cats rule and

dogs drool!: Classifying stance in online debate. In Proceedings of the 2nd Workshop on Computational Approaches toSubjectivity and Sentiment Analysis, pages 1–9, Portland, Oregon. Association for Computational Linguistics.

Luke Breitfeller, Emily Ahn, Aldrian Obaja Muis, David Jurgens, and Yulia Tsvetkov. 2019. Finding microaggressions in thewild: A case for locating elusive phenomena in social media posts. In Proceedings of 2019 Conference on EmpiricalMethods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing.Association for Computational Linguistics.

Marc Brysbaert, Amy Beth Warriner, and Victor Kuperman. 2014. Concreteness ratings for 40 thousand generally knownEnglish word lemmas. Behavior Research Methods, 46(3):904–911.

Justin Cheng, Michael Bernstein, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. 2017. Anyone can become a troll:Causes of trolling behavior in online discussions. In Proceedings of the 2017 ACM Conference on Computer SupportedCooperative Work and Social Computing, CSCW ’17, pages 1217–1230, New York, NY, USA. ACM.

Cristian Danescu-Niculescu-Mizil, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. A computationalapproach to politeness with application to social factors. In Proceedings of the 2013 Annual Conference of theAssociation for Computational Linguistics, pages 250–259, Stroudsburg, PA. Association for Computational Linguistics.

Sanjiv Das and Mike Chen. 2001. Yahoo! for Amazon: Extracting market sentiment from stock message boards. InProceedings of the 8th Asia Pacific Finance Association Annual Conference.

Janez Demsar. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research,7:1–30.

Gabriel Doyle, Dan Yurovsky, and Michael C. Frank. 2016. A robust framework for estimating linguistic alignment in twitterconversations. In Proceedings of the 25th International World Wide Web Conference, WWW ’16, pages 637–648,Republic and Canton of Geneva, Switzerland. International World Wide Web Conferences Steering Committee.

Matthew Gentzkow, Jesse M. Shapiro, and Matt Taddy. 2019. Measuring group differences in high-dimensional choices:Method and application to congressional speech. Ms, Stanford University, Brown Universitym and Amazon.

Yoav Goldberg. 2015. A primer on neural network models for natural language processing. Ms., Bar Ilan University.William L. Hamilton, Kevin Clark, Jure Leskovec, and Dan Jurafsky. 2016. Inducing domain-specific sentiment lexicons from

unlabeled corpora. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing,pages 595–605, Austin, Texas. Association for Computational Linguistics.

Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visual evolution of fashion trends with one-classcollaborative filtering. In Proceedings of the 25th International Conference on World Wide Web, WWW ’16, pages507–517, Republic and Canton of Geneva, Switzerland. International World Wide Web Conferences SteeringCommittee.

Mikhail Khodak, Nikunj Saunshi, and Kiran Vodrahalli. 2017. A large self-annotated corpus for sarcasm. arXiv preprintarXiv:1704.05579.

55 / 57

Page 81: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

References

References IIAndrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning word

vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for ComputationalLinguistics, pages 142–150, Portland, Oregon. Association for Computational Linguistics.

Julian McAuley and Jure Leskovec. 2013. From amateurs to connoisseurs: Modeling the evolution of user expertise throughonline reviews. In Proceedings of the 22nd International World Wide Web Conference, pages 897–907, New York. ACM.

Julian McAuley, Jure Leskovec, and Dan Jurafsky. 2012. Learning attitudes and attributes from multi-aspect reviews. In12th International Conference on Data Mining, pages 1020–1025, Washington, D.C. IEEE Computer Society.

Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. 2015. Image-based recommendations onstyles and substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Developmentin Information Retrieval, SIGIR ’15, pages 43–52, New York, NY, USA. ACM.

Vlad Niculae, Srijan Kumar, Jordan Boyd-Graber, and Cristian Danescu-Niculescu-Mizil. 2015. Linguistic harbingers ofbetrayal: A case study on an online strategy game. In Proceedings of the 53rd Annual Meeting of the Association forComputational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: LongPapers), pages 1650–1659, Beijing, China. Association for Computational Linguistics.

Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang. 2016. Abusive language detection in onlineuser content. In Proceedings of the 25th International Conference on World Wide Web, WWW ’16, pages 145–153,Republic and Canton of Geneva, Switzerland. International World Wide Web Conferences Steering Committee.

Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect torating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages115–124, Ann Arbor, MI. Association for Computational Linguistics.

Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval,2(1):1–135.

Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? sentiment classification using machine learningtechniques. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages79–86, Philadelphia. Association for Computational Linguistics.

Jordan B. Pollack. 1990. Recursive distributed representations. Artificial Intelligence, 46(1):77–105.Reid Pryzant, Richard Diehl Martinez, Nathan Dass, Sadao Kurohashi, Dan Jurafsky, and Diyi Yang. 2020. Automatically

neutralizing subjective bias in text. In Proceedings of AAAI.Marta Recasens, Cristian Danescu-Niculescu-Mizil, and Dan Jurafsky. 2013. Linguistic models for analyzing and detecting

biased language. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume1: Long Papers), pages 1650–1659, Sofia, Bulgaria. Association for Computational Linguistics.

Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng. 2012. Semantic compositionality throughrecursive matrix-vector spaces. In Proceedings of the 2012 Joint Conference on Empirical Methods in NaturalLanguage Processing and Computational Natural Language Learning, pages 1201–1211, Stroudsburg, PA.

56 / 57

Page 82: Christopher Potts - Stanford University · Table:Disagreement levels for the sentiment lexicons. • Where a lexicon had POS tags, I removed them and selected the most sentiment-rich

References

References III

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts.2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Stroudsburg, PA. Association forComputational Linguistics.

Moritz Sudhof, Andrés Gómez Emilsson, Andrew L. Maas, and Christopher Potts. 2014. Sentiment expression conditionedby affective transitions and social forces. In Proceedings of 20th Conference on Knowledge Discovery and DataMining, pages 1136–1145, New York. ACM.

Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations fromtree-structured Long Short-Term Memory networks. In Proceedings of the 53rd Annual Meeting of the Association forComputational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: LongPapers), pages 1556–1566, Beijing, China. Association for Computational Linguistics.

Zijian Wang and Christopher Potts. 2019. TalkDown: A corpus for condescension detection in context. In Proceedings ofthe 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferenceon Natural Language Processing (EMNLP-IJCNLP), pages 3702–3710, Stroudsburg, PA. Association for ComputationalLinguistics.

Robert West, Hristo S. Paskov, Jure Leskovec, and Christopher Potts. 2014. Exploiting social network structure forperson-to-person sentiment analysis. Transactions of the Association for Computational Linguistics, 2(2):297–310.

57 / 57