Download - Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold
![Page 1: Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold](https://reader033.vdocuments.mx/reader033/viewer/2022061223/54c6bc3a4a79593e608b456d/html5/thumbnails/1.jpg)
Evaluation Datasets for Twitter Sentiment Analysis A survey and a new dataset, the STS-Gold
Hassan Saif, Miriam Fernandez, Yulan He and Harith AlaniKnowledge Media Institute, The Open University,
Milton Keynes, United Kingdom
1st Workshop on Emotion and Sentiment in Social and Expressive Media Approaches and perspectives from AI
![Page 2: Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold](https://reader033.vdocuments.mx/reader033/viewer/2022061223/54c6bc3a4a79593e608b456d/html5/thumbnails/2.jpg)
• Definition & Background
• Evaluation Datasets for Twitter Sentiment Analysis
• STS-Gold
• Comparative Study
• Conclusion
Outline
![Page 3: Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold](https://reader033.vdocuments.mx/reader033/viewer/2022061223/54c6bc3a4a79593e608b456d/html5/thumbnails/3.jpg)
Sentiment Analysis
“Sentiment analysis is the task of identifying positive and negative opinions, emotions and evaluations in text”
3
The main dish was delicious It is a Syrian dish The main dish was
salty and horrible
Positive NegativeNeutral
Sentiment Analysis – Definition
![Page 4: Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold](https://reader033.vdocuments.mx/reader033/viewer/2022061223/54c6bc3a4a79593e608b456d/html5/thumbnails/4.jpg)
4
Sentiment Approaches
Supervised
Unsupervised
Hybrid
Sentiment Tasks
Sentiment Levels
Tweet-level
Phrase-level
Entity-level
Subjectivity
Polarity
Sentiment Strength
Emotion/Mood
Twitter Sentiment Analysis
(Background)
![Page 5: Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold](https://reader033.vdocuments.mx/reader033/viewer/2022061223/54c6bc3a4a79593e608b456d/html5/thumbnails/5.jpg)
Evaluation Datasets for Twitter Sentiment Analysis
Dataset
SA TaskSA Level
Vocabulary Size
Class Distribution
Construction & Annotation
Sparsity
Dataset
No. of Tweets
![Page 6: Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold](https://reader033.vdocuments.mx/reader033/viewer/2022061223/54c6bc3a4a79593e608b456d/html5/thumbnails/6.jpg)
Evaluation Datasets – Overview
Dataset SA Level SA Task Annotation/Agreement
Stanford Twitter Corpus (STS) Tweet Subjectivity Manual/UD
Health Care Reform (HCR) Tweet/Target Subjectivity Manual/UD
Obama-McCain Debate (OMD) Tweet Polarity* Manual/α=0.655
Sentiment Strength Twitter Dataset (SS-Tweet) Tweet Strength/Subjectivity**
Manualα≈0.56
Sanders Twitter Dataset Tweet Subjectivity Manual/UD
Dialogue Earth Twitter Corpus (WAB, GASP) Tweet/Target Subjectivity Manual/UD
SemEval-2013 Dataset Tweet/Expression
Subjectivity Manual/UD
![Page 7: Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold](https://reader033.vdocuments.mx/reader033/viewer/2022061223/54c6bc3a4a79593e608b456d/html5/thumbnails/7.jpg)
What is Missing?
• Details about the annotation methodology (STS, HCR, Sanders)
• Entity-level Sentiment Evaluation: • Most works are focused on
assessing the performance of sentiment classifiers at the tweet level (STS, OMD, SS-Tweet, Sanders)
• Datasets, which allow for the sentiment evaluation at the entity level, assign similar sentiment labels to the tweet and the entities within it. (HCR, WAB, GASP)
![Page 8: Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold](https://reader033.vdocuments.mx/reader033/viewer/2022061223/54c6bc3a4a79593e608b456d/html5/thumbnails/8.jpg)
Enables the evaluation at both the entity and tweet levels
Tweets and entities are annotated independently
Contains 58 Entities & 3000 Tweets
STS-Gold
![Page 9: Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold](https://reader033.vdocuments.mx/reader033/viewer/2022061223/54c6bc3a4a79593e608b456d/html5/thumbnails/9.jpg)
Data Collection
STS-Gold
STS Corpus
Select
Entity-Extraction
Alchemy API
Identify Frequent Concepts
Top & Mid Frequent Entities
28 Entities
100 Tweet/Entity180K Tweets
2800 Tweets
Select
3000 Tweets
+200 tweets
Entity-Extraction
147 Entities
![Page 10: Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold](https://reader033.vdocuments.mx/reader033/viewer/2022061223/54c6bc3a4a79593e608b456d/html5/thumbnails/10.jpg)
STS-Gold
Person
US
Brazil
LeBron
England
Country
Person
Taylor Swift
OprahLeBron
Obama
Person
Person
YouTube
Starbucks
McDonalds
CompanyPerson
Vegas
Sydney
Seattle
London
City
Person
Cavas
NASA
UN
Lakers
Organization
Person
Flu
CancerFever
HeadacheHealth
Condition
Person
iPod
XboxPSP
iPhone
Technology
![Page 11: Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold](https://reader033.vdocuments.mx/reader033/viewer/2022061223/54c6bc3a4a79593e608b456d/html5/thumbnails/11.jpg)
STS-Gold
Data Annotation3000 Tweets 147 Entities
Positive, Negative, Neutral, Mixed, Other
Sentiment Classes
3000 Tweets 147 Entities
58 Entities
Tweet α=0.765
Entity α1=0.416 α2=0.964
FilteringInter-annotation Agreement
Tweenator.com
2205 Tweets
![Page 12: Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold](https://reader033.vdocuments.mx/reader033/viewer/2022061223/54c6bc3a4a79593e608b456d/html5/thumbnails/12.jpg)
Comparative Study
• Vocabulary Size• Number of Tweets• Data Sparsity• Classification Performance– Polarity Classification– Naïve Bayes & Maximum Entropy
![Page 13: Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold](https://reader033.vdocuments.mx/reader033/viewer/2022061223/54c6bc3a4a79593e608b456d/html5/thumbnails/13.jpg)
Comparative Study.1
Vocabulary Size vs. No. of Tweets
- There exists a high correction between the vocabulary size and the number of tweets (ρ = 0.95)
- However, increasing the number of tweets does not always lead to increasing the vocabulary size. (OMD)
![Page 14: Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold](https://reader033.vdocuments.mx/reader033/viewer/2022061223/54c6bc3a4a79593e608b456d/html5/thumbnails/14.jpg)
Comparative Study.2
Data Sparsity
- Twitter datasets are generally very sparse- Increasing both the number of tweets or the vocabulary size increases the sparsity
degree of the dataset:- ρno_of_tweets = 0.71 - ρvocabulary_size = 0.77
![Page 15: Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold](https://reader033.vdocuments.mx/reader033/viewer/2022061223/54c6bc3a4a79593e608b456d/html5/thumbnails/15.jpg)
Comparative Study.3
Classification Performance vs. Dataset Sparsity (1)
According to Makrehchi et al (2008) and Saif et al (2012): in a given dataset the classification performance and the sparsity degree are negatively correlated, i.e., increasing the dataset sparsity hinders the classification performance.
![Page 16: Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold](https://reader033.vdocuments.mx/reader033/viewer/2022061223/54c6bc3a4a79593e608b456d/html5/thumbnails/16.jpg)
Comparative Study.3
Classification Performance vs. Dataset Sparsity (2)
- No correlation between the classification performance and the sparsity degree across the datasets. (ρacc = −0.06, ρf1 = 0.23)
- The sparsity-performance correlation is intrinsic, meaning that it might exists within the dataset itself, but not necessarily across the datasets.
![Page 17: Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold](https://reader033.vdocuments.mx/reader033/viewer/2022061223/54c6bc3a4a79593e608b456d/html5/thumbnails/17.jpg)
• Current datasets to evaluate Twitter sentiment classifiers:– Focus on the tweet-level.– Assign similar sentiment labels to the tweets
and the entities within them.
• STS-Gold allows for sentiment evaluation as both the tweet and the entity levels.
• A correlation between the vocabulary size and the number of tweets does not always exist.
• The sparsity-performance correlation is intrinsic, i.e., it only exists within the dataset itself, but not across the different datasets.
Conclusion!