adapting sentiment lexicons using contextual semantics for sentiment analysis of twitter
Post on 26-Jan-2015
Embed Size (px)
DESCRIPTIONSentiment lexicons for sentiment analysis offer a simple, yet effective way to obtain the prior sentiment information of opinionated words in texts. However, words' sentiment orientations and strengths often change throughout various contexts in which the words appear. In this paper, we propose a lexicon adaptation approach that uses the contextual semantics of words to capture their contexts in tweet messages and update their prior sentiment orientations and/or strengths accordingly. We evaluate our approach on one state-of-the-art sentiment lexicon using three different Twitter datasets. Results show that the sentiment lexicons adapted by our approach outperform the original lexicon in accuracy and F-measure in two datasets, but give similar accuracy and slightly lower F-measure in one dataset.
- 1. Adapting Sentiment Lexicons using Contextual Semantics for Sentiment Analysis of Twitter Hassan Saif, Yulan He, Miriam Fernandez and Harith Alani Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom 1st Workshop on Semantic Sentiment Analysis Greece, Crete 2014
2. Sentiment Analysis Sentiment Analysis Approaches Sentiment Lexicons on Twitter Sentiment Lexicon Adaptation Approach Evaluation Conclusion Outline 3. Sentiment analysis is the task of identifying positive and negative opinions, emotions and evaluations in text 3 Opinion OpinionFact Sentiment Analysis yes, It is sunny, but also very humid :( The weather is great today :) I think its almost 30 degrees today 4. I had nightmares all night long last night :( Negative Sentiment Lexicon Text Processing Algorithm Sentiment Analysis The Lexicon-based Approach great sad down wrong horrible love Sentiment Analysis 5. Sentiment Lexicons - Lists of Opinionated: - Words and Phrases (MPQA, SentiWordNet, etc) - Common Sense Concepts (SenticNet) - Built: - Manually - Dictionary-based Approach - Corpus-based Approach - Applied to Conventional Text - Movie Reviews, News, Blogs, Open Forums, etc. 6. Sentiment Lexicons on Twitter Twitter Data - Language Variations - New Words - Noisy Nature - lol, gr8, :), :P Traditional Lexicons - Not tailored to Twitter noisy data - Fixed number of words 7. Twitter-specific Sentiment Lexicons - Such as: Thelwall-Lexicon - Built to specifically work on social data - Contain lists of emoticons, slangs, abbreviations, etc. - Coupled with rule-based method, SentiStrength - Apply text pre-processing routine on tweets 8. Twitter-specific Sentiment Lexicons Offer Context-Insensitive Prior Sentiment Orientations and Strength of words ..and Traditional Lexicons Great Problem Smile Sentiment Lexicon great sad down wrong horrible love Positive 9. Lexicons Adaptation Approaches Require Training from Labeled Corpora Supervised Unsupervised Use General Textual Corpora (e.g., WEB) or Static lexical knowledge sources (e.g., WordNet) 10. Contextual Semantic Adaptation Approach Unsupervised Approach Captures the Contextual Semantics of words To assign Contextual Sentiment 11. Contextual Semantics of Words Words that occur in similar context tend to have similar meaning Wittgenstein (1953) Great Problem Look Smile Concert Song Weather Loss Game Taylor Swift Amazing Great 12. Capturing Contextual Semantics Term (m) C1 C2 Cn. Context-Term Vector Degree of Correlation Prior SentimentSentiment Lexicon (1) (2) Great Smile Look SentiCircles Model (3) Contextual Sentiment Strength Contextual Sentiment Orientation Positive, Negative Neutral [-1 (very negative) +1 (very positive)] 13. Capturing Contextual Semantics Term (m) C1 Degree of Correlation Prior Sentiment Great Smile SentiCircles Model X = R * COS() Y = R * SIN() Smile X ri i xi yi Great PositiveVery Positive Very Negative Negative +1 -1 +1-1 Neutral Region ri = TDOC(Ci) i = Prior_Sentiment (Ci) * 14. SentiCircles (Example) 15. Overall Contextual Sentiment Ci X ri i xi yi m PositiveVery Positive Very Negative Negative +1 -1 +1-1 Neutral Region nwhicheachtermisused. Tocomputethenewsentiment of tiCircleweusetheSenti-Median metric. Wenow havethe hichiscomposedbytheset of (x, y) Cartesiancoordinatesof wherethey valuerepresentsthesentiment andthex value ength. Aneffectiveway toapproximatetheoverall sentiment y calculatingthegeometricmedianof all itspoints. Formally, (p1, p2, ..., pn ) inaSentiCircle, the2Dgeometricmedian g = arg min g2 R2 nX i = 1 k|pi g||2, (5) Senti-Median of SentiCircle Sentiment Function 16. Lexicon Adaptation Method A set of Antecedent-Consequent Rules Decides on the new sentiment of a term based on: How Weak/Strong its Prior Sentiment How Weak/Strong its Contextual Sentiment Based on the Position of the terms SentiMedian 17. Thelwall-Lexicon Case Study fiery -2 fiery -2 vex*-3 fiery -2 witch -1 inspir* 3 fiery* -2 trite* -3 fiery -2 cunt* -4 fiery -2 fiery* -2 intelligent* 2 fiery -2 joll* 3 fiery* -2 fiery* -2 suffers -4 fiery -2 loved 4 insidious* -3 despis* -4 fiery* -2 hehe* 2 398 1919 229 0 500 1000 1500 2000 2500 Positive Negative Neutral Consists of 2546 terms Coupled with prior sentiment strength between |1| and |5| [-2, -5] negative term [2, 5] positive term [-1, 1] neutral term 18. Adaptation Rules on Thelwall-Lexicon Prior Sentiment < -3 (week negative) Revolution Contextual Sentiment = Neutral Change to Neutral Rule 10 19. Experiments Sentiment Lexicon Thelwall-Lexicon Settings: Update Setting Expand Setting Update + Expand Setting Datasets Binary Sentiment Classification SentiStrength Lexicon-based Method Work on Thelwall-Lexicon 20. Results Adaptation Impact on Thelwall-Lexicon 21. Results Cross comparison results of the original and the adapted lexicons 22. Adapted Lexicons on HCR Performance 35 37 39 41 43 45 Precision Recall F1 Positive Sentiment Detection Original Updated Updated+Expanded Sentiment Class Distribution 0.35 0.4 0.45 0.5 0.55 0.6 OMD HCR STS-Gold Positive to Negative Ratio Impact on Thelwall-Lexicon 10 15 20 25 30 OMD HCR STS-Gold New Words Added To Thelwall-Lexicon 23. Conclusion We proposed an unsupervised approach for sentiment lexicon adaptation from Twitter data. It update the words prior sentiment orientations and/or strength based on their contextual semantics in tweets The evaluation was done on Thelwall-Lexicon using three Twitter datasets. Results showed that lexicons adapted by our approach improved the sentiment classification performance in both accuracy and F1 in two out of three datasets. 24. Thank You Email: email@example.com Twitter: hrsaif Website: tweenator.com