sentistrength: sentiment strength detection in myspace and twitter
DESCRIPTION
Virtual Knowledge Studio (VKS). Information Studies. SentiStrength: Sentiment Strength Detection in MySpace and Twitter. Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK. SentiStrength Objective. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/1.jpg)
SentiStrength: Sentiment Strength Detection in MySpace and Twitter
Mike ThelwallStatistical Cybermetrics Research GroupUniversity of Wolverhampton, UK
Virtual Knowledge Studio (VKS)
Information Studies
![Page 2: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/2.jpg)
SentiStrength Objective
1. Detect positive and negative sentiment strength in short informal text
1. Develop workarounds for lack of standard grammar and spelling
2. Harness emotion expression forms unique to MySpace or CMC (e.g., :-) or haaappppyyy!!!)
3. Classify simultaneously as positive 1-5 AND negative 1-5 sentiment
2. Apply to MySpace comments and social issues
![Page 3: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/3.jpg)
SentiStrength Algorithm - Core
List of 890 positive and negative sentiment terms and strengths (1 to 5), e.g. ache = -2, dislike = -3, hate=-4,
excruciating -5 encourage = 2, coolest = 3, lover = 4
Sentiment strength is highest in sentence; or highest sentence if multiple sentences
![Page 4: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/4.jpg)
Examples
My legs ache.
You are the coolest.
I hate Paul but encourage him.
-2
3
-4 2
1, -2
positive, negative
3, -1
2, -4
![Page 5: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/5.jpg)
Term Strength Optimisation
Term strengths (e.g., ache = -2) initially fixed by human coderTerm strengths optimised on training set with 10-fold cross-validation Adjust term strengths to give best
training set results then evaluate on test set
E.g., training set: “My legs ache”: coder sentiment = 1,-3 => adjust sentiment of “ache” from -2 to -3.
![Page 6: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/6.jpg)
SentiStrength Algorithm -Extra
Spelling correction for repeated letters Helllllo -> Hello (emphasis: llll)
Tagging approach used (see next slide)
Extra heuristics Emphasis acts to enhance + or – emotion Emotion words ignored in questions Take strongest positive or negative
expression in whole comment Booster words (e.g., very, some)
![Page 7: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/7.jpg)
Tagging
HIIIIII MY MATE!!!!!!!! <w equiv="HI" em="IIIII">HIIIIII</w><w>MY</w><w>MATE</w><p equiv="!" em="!!!!!!!">!!!!!!!!
</p>HI MY MATE!2 3
Overall 3, -1mate = 2
![Page 8: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/8.jpg)
Experiments
Development data = 2600 MySpace comments coded by 1 coderTest data = 1041 MySpace comments coded by 3 independent codersComparison against a range of standard machine learning algorithms
![Page 9: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/9.jpg)
Inter-coder agreement
Comparison +veagree-ment
-veagree-ment
Coder 1 vs. 2 51.0% 67.3%
Coder 1 vs. 3 55.7% 76.3%
Coder 2 vs. 3 61.4% 68.2%
Krippendorff’s inter-coderweighted alpha = 0.5743for positive and 0.5634for negative sentiment
Only moderate agreementbetween codersbut it is a hard 5-category task
![Page 10: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/10.jpg)
Machine learning methods +ve
![Page 11: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/11.jpg)
Machine learning methods -ve
![Page 12: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/12.jpg)
Results:+ve sentiment strength
Algorithm Opt.Feat.
Accu-racy
Acc.+/- 1 class
Corr. Mean % abs. error
SentiStrength - 60.6% 96.9% .599 22.0%
Simple logistic regression 700 58.5% 96.1% .557 23.2%
SVM (SMO) 800 57.6% 95.4% .538 24.4%
J48 classification tree 700 55.2% 95.9% .548 24.7%
JRip rule-based classifier 700 54.3% 96.4% .476 28.2%
SVM regression (SMO) 100 54.1% 97.3% .469 28.2%
AdaBoost 100 53.3% 97.5% .464 28.5%
Decision table 200 53.3% 96.7% .431 28.2%
Multilayer Perceptron 100 50.0% 94.1% .422 30.2%
Naïve Bayes 100 49.1% 91.4% .567 27.5%
Baseline - 47.3% 94.0% - 31.2%
Random - 19.8% 56.9% .016 82.5%
![Page 13: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/13.jpg)
Results:-ve sentiment strength
Algorithm Opt.feat.
Accuracy Acc.+/- 1 class
Corr. Mean % absoluteerror
SVM (SMO) 100 73.5% 92.7% .421 16.5%
SVM regression (SMO) 300 73.2% 91.9% .363 17.6%
Simple logistic regression
800 72.9% 92.2% .364 17.3%
SentiStrength - 72.8% 95.1% .564 18.3%
Decision table 100 72.7% 92.1% .346 17.0%
JRip rule-based classifier 500 72.2% 91.5% .309 17.3%
J48 classification tree 400 71.1% 91.6% .235 18.8%
Multilayer Perceptron 100 70.1% 92.5% .346 20.0%
AdaBoost 100 69.9% 90.6% - 16.8%
Baseline - 69.9% 90.6% - 16.8%
Naïve Bayes 200 68.0% 89.8% .311 27.3%
Random - 20.5% 46.0% .010 157.7%
![Page 14: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/14.jpg)
SentiStrength ComponentsType %
Consecutive +ve words not used as boosters 61.2
Emoticons ignored 61.2
Negating words not switch (e.g., not happy) 61.0
SentiStrength standard configuration 60.9
Booster words ignored (e.g., very) 60.7
Automatic spelling correction disabled 60.6
Exclamation marks not given a strength of 2 60.6
Extra multiple letters not used as boosters 60.4
Neutral words with emphasis not counted as +ve 60.1
SentiStrength with all the above changes 57.5
![Page 15: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/15.jpg)
Example differences/errors
THINK 4 THE ADD Computer (1,-1), Human (2,-1)
0MG 0MG 0MG 0MG 0MG 0MG 0MG 0MG!!!!!!!!!!!!!!!!!!!!N33N3R!!!!!!!!!!!!!!!! Computer (2,-1), Human (5,-1)
![Page 16: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/16.jpg)
Selected variations tested
Modification (for positive sentiment)
Accuracy +/- 1class
corr. MeanAbs.% err.
Negating words not used to switch following sentiment (e.g., not happy)
60.87% 97.50% .6206 21.28%
SentiStrength standard algorithm 60.64% 96.90% .5986 21.96%
Exclamation marks not given a strength of 2
60.51% 96.62% .6035 21.47%
Automatic spelling correction disabled 60.39% 96.88% .5961 22.05%
Extra multiple letters not used as emotion boosters
60.21% 96.81% .5952 22.16%
Neutral words with emphasis not counted as positive emotion
60.13% 96.79% .5966 21.90%
SentiStrength with no extras 57.44% 96.07% .6073 21.91%
![Page 17: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/17.jpg)
Application - Evidence of emotion homophily in MySpace
Automatic analysis of sentiment in 2 million comments exchanged between MySpace friends Correlation of 0.227 for +ve emotion strength and 0.254 for –vePeople tend to use similar but not identical levels of emotion to their friends in messages
![Page 18: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/18.jpg)
CYBEREMOTIONS = data gathering + complex systems methods + ICT outputs
Collective Emotionsin Cyberspace
Sentistrength
![Page 19: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/19.jpg)
Application – sentiment in Twitter events
Analysis of a corpus of 1 month of English Twitter postsAutomatic detection of spikes (events)Sentiment strength classification of all postsAssessment of whether sentiment strength increases during important events Result – negative sentiment normally increases,
positive sentiment might tend to increase
![Page 20: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/20.jpg)
Automatically-identified Twitter spikes
![Page 21: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/21.jpg)
Chile
![Page 22: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/22.jpg)
Hawaii
![Page 23: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/23.jpg)
#oscars
![Page 24: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/24.jpg)
Tiger Woods
![Page 25: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/25.jpg)
![Page 26: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/26.jpg)
Conclusion
Automatic classification of emotion on a 5 point positive and negative scale seems possible for MySpace…And other similar short computer text messages?Hard to get accuracy much over 60%?Next = analyse emotion inonline debates
![Page 27: SentiStrength: Sentiment Strength Detection in MySpace and Twitter](https://reader035.vdocuments.mx/reader035/viewer/2022062422/5681402c550346895dab8ce3/html5/thumbnails/27.jpg)
Publication
Thelwall, M., Buckley, K., Paltoglou, G. Cai, D., & Kappas, A. (in press). Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology.
Thelwall, M., Wilkinson, D. & Uppal, S.(2010). Data mining emotion in social network communication: Gender differences in MySpace, Journal of the American Society for Information Science and Technology, 61(1), 190-199.