deep learning for natural language sentiment and...
TRANSCRIPT
![Page 1: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/1.jpg)
Deep Learning for Natural Language
Sentiment and Affect
Muhammad Abdul-Mageed
The University of British Columbia
(Abdul-Mageed & Kralj Novak, 2018)
Petra Kralj Novak
Jožef Stefan Institute
![Page 2: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/2.jpg)
Outline
• Introduction
• Classical Methods
• Deep Learning Methods – on separate slides
• Multilingual Approaches
• Resources
• Ethics
2
![Page 3: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/3.jpg)
Introduction
3
![Page 4: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/4.jpg)
How far can we go with machines?
4
![Page 5: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/5.jpg)
Information Overload
5https://beta.techcrunch.com/2017/06/27/facebook-2-billion-users/
![Page 7: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/7.jpg)
Sentiment analysis (broad definition)
• Sentiment analysis and opinion mining is the field of study that analyzes people’s
• opinions,
• sentiments,
• evaluations,
• attitudes, and
• emotions
from written language.
7
![Page 8: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/8.jpg)
Sentiment analysis (narrow) definition
• Sentiment Analysis is the process of computationally determining whether a piece of text is positive, neutral or negative.
• Sentiment polarity & subjectivity
8
![Page 9: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/9.jpg)
Sentiment analysis (narrow) definition
• Sentiment Analysis is the process of computationally determining whether a piece of text is positive, neutral or negative.
• Sentiment polarity & subjectivity
9
![Page 10: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/10.jpg)
Examples
• Opposite orientations in different application domains• “This camera sucks.” • “This vacuum cleaner really sucks.”
• Sarcasm:• “What a great car! It stopped working in two days.”
• Opinions without sentiment words• “This washer uses a lot of water.”
• Ambiguous• “It is my birthday today.”
• Language specific• “Na ECML komferenčni večerji smo se zabavali ob čudoviti glasbi in plesu.”
10
![Page 11: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/11.jpg)
Granularity level
• Word• Sentence, paragraph• Document
11
![Page 12: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/12.jpg)
Classical methods
12
![Page 13: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/13.jpg)
Sentiment lexicons
• Good, wonderful, amazing
• Bad, poor, terrible• Cost someone an arm and a leg
13
![Page 14: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/14.jpg)
Lexical sentiment analysis Loughran and McDonald Sentiment Word Lists
14
![Page 15: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/15.jpg)
Lexical sentiment analysis of mainstream news: Bitcoin
15http://newstream.ijs.si/
![Page 16: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/16.jpg)
Lexical vs. machine learning methods
Lexical Machine learning
Maite Taboada, Sentiment Analysis: An Overview from Linguistics. Annual Review of Linguistics 2016 2:1, 325-347 16
![Page 17: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/17.jpg)
Stance analysis
• Stance detection is the task of automatically determining whether the author of the text is in favor of, neutral or against towards a target
• Example:• Target: legalization of abortion
• Tweet: ”A fetus has rights too! Make your voice heard.”
17
![Page 18: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/18.jpg)
Slovenian presidential elections 2012
• Stance analysis on manually annotated Twitter data: • Tweets annotated if it is in favor of, neutral or against each of the
candidates
• Linear kernel SVM model
18
![Page 20: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/20.jpg)
[Credit: https://www.youtube.com/watch?v=Ixkp0T3-1YE]
Emotion in Public Discourse
20
![Page 21: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/21.jpg)
Source: https://www.theatlantic.com/health/archive/2015/02/hard-feelings-sciences-struggle-to-define-emotions/385711
Hard Feelings: Science’s Struggle to
Define Emotions
21
![Page 22: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/22.jpg)
What is emotion?
• “[E]veryone knows what an emotion is, until asked to give a definition. Then, it seems, none knows” (Fehr & Russel, 1984)
• Definitions vary as a function of:• discipline or approach
• time or culture
• ~ 100 definitions of emotion (Kleinginna & Kleinginna, 1984)
22
![Page 23: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/23.jpg)
Models of emotion
• Categorical models of basic emotion
(e.g., Matsumoto & Ekman, 2009; Panksepp, 2005)
• Bidimensional models
(e.g., Russel, 2009)
• Appraisal models
(e.g., Arnold, 1950; 1960; Lazarus, 1991; Scherer et al., 2001)
• Other…
23
![Page 24: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/24.jpg)
Basic emotion models
• Categorical models (e.g., Matsumoto & Ekman, 2009; Panksepp, 2005)
anger, disgust, fear,
joy, sadness, surprise
24
![Page 25: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/25.jpg)
Bidimensional Models arousal
valence
aroused
sleepy
pleasedfrustrated
25
![Page 26: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/26.jpg)
Bidimensional Models arousal
valence
26
![Page 27: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/27.jpg)
Plutchik Wheel of Emotions
27
![Page 28: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/28.jpg)
3 Circles of Arousal
Core, Primary, and
Secondary (p1, p2, p3)
8 dimensions
28
![Page 29: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/29.jpg)
Arousal
29
![Page 30: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/30.jpg)
2 Dimensions of Valence
30
![Page 31: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/31.jpg)
Learning emotion
• Multiclass classification task
• Similar to learning sentiment (text classification)
31
![Page 32: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/32.jpg)
The sentiment analysis pipeline
32
![Page 33: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/33.jpg)
The sentiment analysis pipeline
Millions of documents
Thousands of documents classifier
1 2 3
5Millions of documents
4
33
![Page 34: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/34.jpg)
Data acquisition and labeling
• Acquisition: Relevant data
• Annotation: • Representative sample
• Sample size: 20 – 100K
• Duplicates
• Annotators• Clear instructions with examples
• Annotator self-agreement
• Inter-annotator agreement
Zollo, F., Novak, P.K., Del Vicario, M., Bessi, A., Mozetič, I., Scala, A., Caldarelli, G. and Quattrociocchi, W., 2015. Emotional dynamics in the age of misinformation. PloS one, 10(9), p.e0138740. 34
![Page 35: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/35.jpg)
Size of training dataset: saturation pointMonitor classifier performance while feeding increasingly larger training sets
Inter-annotator agreement Classifier performance
Saturation point not reached at 90,000 tweets Saturation point at 70,000 tweets
Mozetič, I., Grčar, M. and Smailović, J., 2016. Multilingual Twitter sentiment classification: The role of human annotators. PloS one, 11(5), p.e0155036. 35
![Page 36: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/36.jpg)
The role of human sentiment annotators
Comparison of annotators self-agreement, the inter-annotator agreement, and an automated sentiment classifier in terms of Krippendorff’s Alpha.
Mozetič, I., Grčar, M. and Smailović, J., 2016. Multilingual Twitter sentiment classification: The role of human annotators. PloS one, 11(5), p.e0155036. 36
![Page 37: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/37.jpg)
Distant supervision
To build a dataset
• Emoticon/emoji
• #tags
• Seed words (good, bad)
Remove the hints while training
37
![Page 38: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/38.jpg)
The sentiment analysis pipeline
Millions of documents
Thousands of documents classifier
1 2 3
5Millions of documents
4
38
![Page 39: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/39.jpg)
2, 3 or more class problem?
• 2-class problem• Whether a review posted online (of a movie, a book, or a consumer product)
is positive or negative towards the item being reviewed
• 3-class problem• Whether the sentiment of the text is positive, neutral or negative
• More-class problem• Emotion detection
39
![Page 40: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/40.jpg)
Exercise: confusion matrix of a classifier
40
![Page 41: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/41.jpg)
Exercise: confusion matrix of a classifier
• Accuracy = 80% in both cases
• The errors in the first matrix are heavier then in the second
41
![Page 42: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/42.jpg)
Problem formulation: Ordinal regression
• Three class problem: negative, neutral, positive
• Error from positive to negative is bigger then the error from positive to neutral
42
![Page 43: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/43.jpg)
Problem formulation: Ordinal regression
• Three class problem: negative, neutral, positive
• Error from positive to negative is bigger then the error from positive to neutral
• Measures of quality:• Accuracy, Accuracy@1
• f1
• MAE, MSE
• Choen’s Kappa
• Krippendorff’s Alpha
43
![Page 44: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/44.jpg)
Exercise: confusion matrix of a classifier
• Accuracy = 80%
• F1 = 0.71
• Accuracy = 80%
• F1 = 0.83
44
![Page 45: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/45.jpg)
The sentiment analysis pipeline
Millions of documents
Thousands of documents classifier
1 2 3
5Millions of documents
4
45
![Page 46: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/46.jpg)
Classifier
Traditional approaches: SVM, Naïve Bayes
Neural networks
46
![Page 47: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/47.jpg)
Data representation 1: BOW
• Each word is one dimension
• Each document is one point on a hypersphere
47
![Page 48: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/48.jpg)
Social media specific sentiment features
48
![Page 49: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/49.jpg)
Data representation: Additional features
• BOW bag of words + additional features• Word N-grams: (Justin Bieber, video games, not happy)
• Punctuation:
• Emoticons and emoji:
• Preprocessing: baaaaaaad → baaad
• Capitalization: SCREAMING
• Language specific • Lists of positive and negative words: SentiWordNet
• Spellings of swearing: f**k
• Language (keyboard) specific emoticons: ಠ_ಠ , ƸӜƷ
49
![Page 50: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/50.jpg)
Precision-recall tuning
• Precision & Recall should be similar for both the positive and the negative class
50
![Page 51: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/51.jpg)
Deep learning methods
51
![Page 52: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/52.jpg)
Multilingual sentiment analysis• Lo, S.L., Cambria, E., Chiong, R. and Cornforth, D., 2017. Multilingual sentiment analysis: from formal
to informal and scarce resource languages. Artificial Intelligence Review, 48(4), pp.499-527.
• Korayem, M., Aljadda, K. and Crandall, D., 2016. Sentiment/subjectivity analysis survey for languages other than English. Social network analysis and mining, 6(1), p.75.
52
![Page 53: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/53.jpg)
NLP != English LP
53Image from https://fledu.uz
Languages in the world Languages on Twitter
![Page 54: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/54.jpg)
Multilingual sentiment analysis approaches
A. Translation-based sentiment AnalysisB. Corpus basedC. Lexicon-based sentiment analysis D. Machine learning approachesE. Language independent approaches
54
![Page 55: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/55.jpg)
Translation based sentiment analysis (2)
Original documentEnglish document Sentiment classification
Machine
translation
Apply English
sentiment
analysis
55
![Page 56: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/56.jpg)
Translation based sentiment analysis (2)
Sentiment labeled corpus
(English)
Machine translate to
target language
Corpus in target language
Build a ML model
Original document
Sentiment model for
target language
Sentiment classification56
![Page 57: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/57.jpg)
Corpus based
Parallel corpora
Apply “English”
sentiment model
Transfer
labels
Build sentiment
model for target
language
Sentiment model for
target language
57
![Page 58: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/58.jpg)
Lexicon-based sentiment analysis
• Build a sentiment lexicon for target language• Translation of lexica (+ check 10.000 most frequent words)
• Word net (words and semantic relations) + seed words
58
![Page 59: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/59.jpg)
Machine learning approaches
1. Labeled dataset• Manual annotation
• Distant supervision• Emoji/emoticon
• Positive and negative #tags
• Seed words
2. Build a machine learning model
59
![Page 60: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/60.jpg)
Languages of rich morphology
60
![Page 61: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/61.jpg)
(Abdul-Mageed, 2018)
Arabic
61
![Page 62: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/62.jpg)
(Abdul-Mageed, 2018)62
![Page 63: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/63.jpg)
(Abdul-Mageed, 2018)63
![Page 64: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/64.jpg)
(Abdul-Mageed, 2018)64
![Page 65: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/65.jpg)
(Abdul-Mageed, 2018)
Segmentation
65
![Page 66: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/66.jpg)
(Abdul-Mageed, 2018)
POS Tagging
66
![Page 67: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/67.jpg)
(Abdul-Mageed, 2018)
ASMA: Segmentation &
Morphosyntactic Disambiguation
ASMA: A Real-World Example
67
![Page 68: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/68.jpg)
(Abdul-Mageed, 2018)
Modeling in lexical space
Modeling in morphosyntactic space
(Abdul-Mageed, 2015. Dissertation)68
![Page 69: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/69.jpg)
(Abdul-Mageed, 2018)69
![Page 70: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/70.jpg)
Resources & Venues
70
![Page 71: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/71.jpg)
Sentiment Resources
• Lexicons
• Models & libraries
• Annotated sentiment data
71
![Page 72: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/72.jpg)
Lexicons• AFINN
• Bing Liu's Opinion Lexicon
• MPQA Subjectivity Lexicon
• Harvard General Inquirer
• SentiWordNet
• Loughran-McDonald Sentiment Word Lists
• Sentiment Lexicons for 81 Languages
• Emoji sentiment ranking
• Emoticon Sentiment Lexicon
• Sifat (Arabic adjectives)
72
![Page 73: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/73.jpg)
AFINN
• A list of English words rated for valence
• Scale [-5,5]
• 2477 words and phrases
• Licence: Open Database License (ODbL) v1.0
• An evaluation of the word list is available in:Finn Årup Nielsen"A new ANEW: Evaluation of a word list for sentiment analysis in microblogs",Proceedings of the ESWC2011 Workshop on 'Making Sense of Microposts':Big things come in small packages 718 in CEUR Workshop Proceedings : 93-98. 2011 May.http://arxiv.org/abs/1103.2903
73
![Page 74: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/74.jpg)
Emoji sentiment ranking
• Sentiment of 751 (most common) emojis
• Constructed from manually sentiment labeled 75,000 tweets with emoji in 13 European languages
• Similar format to SentiWordNet
• Kralj Novak P, Smailović J, Sluban B, Mozetič I (2015) Sentiment of Emojis. PLoS ONE 10(12): e0144296. https://doi.org/10.1371/journal.pone.0144296
• http://kt.ijs.si/data/Emoji_sentiment_ranking/
74
![Page 75: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/75.jpg)
Models
• TextBlob• PatternAnalyzer: based on a lexicon of adjectives
• NaiveBayesAnalyzer: a NLTK classifier trained on a movie reviews corpus
• (Python) https://textblob.readthedocs.io/en/dev/
• Ipubila sentiment analysis • English, German, French and Italian.
• (Python, REST) https://github.com/ipublia/sentiment-analysis
75
![Page 76: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/76.jpg)
Annotated sentiment data
• Twitter sentiment for 15 European languages (1,643,735 manually annotated tweets)
• SemEval competition data• Bing Liu’s customer reviews and other datasets• Product reviews: this dataset consists of a few million Amazon customer reviews with
star ratings, super useful for training a sentiment analysis model.• Restaurant reviews: this dataset consists of 5,2 million Yelp reviews with star ratings.• Movie reviews: this dataset consists of 1,000 positive and 1,000 negative processed
reviews. It also provides 5,331 positive and 5,331 negative processed sentences / snippets.
• Fine food reviews: this dataset consists of ~500,000 food reviews from Amazon. It includes product and user information, ratings, and a plain text version of every review.
• Twitter airline sentiment on Kaggle: this dataset consists of ~15,000 labeled tweets (positive, neutral, and negative) about airlines.
• First GOP Debate Twitter Sentiment: this dataset consists of ~14,000 labeled tweets (positive, neutral, and negative) about the first GOP debate in 2016.
76
![Page 77: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/77.jpg)
Emotion Resources
• Lexicons• NRC emotion lexicon• UBC emotion lexicon (ongoing work)
• Data
• SemEval 2007; 2018; 2019
• Aman and Szpakowicz (2007)
• Abdul-Mageed and Ungar (2017)
• Alhuzali, Abdul-Mageed, and Ungar (2018) (Arabic)
77
![Page 78: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/78.jpg)
Biases & Ethics
78
![Page 79: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/79.jpg)
Biases: Social media data is not representative
• Demographic differences between social media users and “target population”
• Behaviour biases
• Linking biases
• Temporal variations
79
![Page 80: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/80.jpg)
Ethics
• Types of social media research
• Users publishing content might have not anticipated a particular use
Aware Not aware
Manipulated Lab studies A/B testing
Not manipulated Opt-in study Observational studiesSentiment
analysis
80
![Page 81: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/81.jpg)
Ethics
• Private or public?• PRIVATE: a password protected ‘private’ Facebook group
• PUBLIC: an open discussion on Twitter in which people broadcast their opinions using a #tag (in order to associate their thoughts on a subject with others’ thoughts on the same subject)
• Public != Non-sensitive
Townsend, L. and Wallace, C., 2016. Social media research: A guide to ethics. University of
Aberdeen, pp.1-16.
81
![Page 82: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/82.jpg)
Ethics: Case Study - Marihuana
• Twitter: #cannabis, #legalize, #ismokeit
• Concerns: • Sensitive: illegal activity
• May be users under the age of 18
• Solution:• Present results from aggregate data,
• Avoid compromising anonymity: paraphrased quotes (removing ID handles)
• Direct quotes may be used with informed consent from the platform (over 18) user.
82
![Page 83: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/83.jpg)
Take-home messages
• On real data, human annotators disagree → hard problem
• The best classifier can not outperform the inter-annotator agreement
• Data representation• BOW + Social media specific features: punctuation, emojis, …
• Embedding + deep learning: need lots of data (unlabeled, distant supervision)
• NLP != English LP
83
![Page 84: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/84.jpg)
ReferencesINCOMPLETE• Liu B. Sentiment analysis: mining opinions, sentiments, and emotions. The Cambridge University Press, 2015.
• Zhang, L., Wang, S., & Liu, B. (2018). Deep Learning for Sentiment Analysis: A Survey. arXiv preprint arXiv:1801.07883.
• Mohammad, S. M. Challenges in sentiment analysis. In A Practical Guide to Sentiment Analysis (pp. 61-83). Springer, Cham, 2017.
• Taboada, M. Sentiment Analysis: An Overview from Linguistics. Annual Review of Linguistics 2016 2:1, 325-347
• Abdul-Mageed, M. and Ungar, L., 2017. Emonet: Fine-grained emotion detection with gated recurrent neural networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 718-728).
• Mozetič, I., Grčar, M. and Smailović, J., 2016. Multilingual Twitter sentiment classification: The role of human annotators. PloS one, 11(5), p.e0155036.
• Zollo, F., Novak, P.K., Del Vicario, M., Bessi, A., Mozetič, I., Scala, A., Caldarelli, G. and Quattrociocchi, W., 2015. Emotional dynamics in the age of misinformation. PloS one, 10(9), p.e0138740.
• Zollo, F., Sluban, B., Mozetič, I. and Quattrociocchi, W., 2017, November. Toward a Better Understanding of Emotional Dynamics on Facebook. In International Workshop on Complex Networks and their Applications (pp. 365-377). Springer, Cham.
• Kralj Novak, P. , Smailović, J., Sluban, B., & Mozetič, I. (2015). Sentiment of emojis. PloS one, 10(12), e0144296.
Multilingual:
• Lo, S.L., Cambria, E., Chiong, R. and Cornforth, D., 2017. Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artificial Intelligence Review, 48(4), pp.499-527.
• Korayem, M., Aljadda, K. and Crandall, D., 2016. Sentiment/subjectivity analysis survey for languages other than English. Social network analysis and mining, 6(1), p.75.
• Abdul-Mageed, M., Diab, M. and Kübler, S., 2014. SAMAR: Subjectivity and sentiment analysis for Arabic social media. Computer Speech & Language, 28(1), pp.20-37.
Ethics:
• Townsend, L. and Wallace, C., 2016. Social media research: A guide to ethics. University of Aberdeen, pp.1-16.
84
![Page 85: Deep Learning for Natural Language Sentiment and Affectkt.ijs.si/dlsa/2018-09-14-ECML-DLSA-tutorial.pdf · • Product reviews: this dataset consists of a few million Amazon customer](https://reader033.vdocuments.mx/reader033/viewer/2022060514/5f8698241e19a65f4c4b77f7/html5/thumbnails/85.jpg)
Muhammad Abdul-Mageed
Natural Language Processing Lab
School of Information
The University of British Columbia
Vancouver, Canada
(Abdul-Mageed & Kralj Novak, 2018)
Petra Kralj Novak
Department of Knowledge Technologies
Jožef Stefan Institute
Ljubljana, Slovenia
@PetraKraljNovak