[asa] sentiment analysis in twitter, a study on the saudi community

46
Sentiment Analysis in Twitter a Study on the Saudi Community Online talk by: Dr. Nora Altwairesh Date: 11 Dec, 8:00-9:30pm

Upload: asagroup

Post on 23-Jan-2017

155 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

Sentiment Analysis in Twitter a Study on the Saudi Community

Online talk by: Dr. Nora Altwairesh

Date: 11 Dec, 8:00-9:30pm

Page 2: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

www.asa.imamu.edu.sa

Outline

•ASA •ASA Research Group?•Housekeeping •The talk

Page 3: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

www.asa.imamu.edu.sa

Sentiment Analysis

• Keyword: iPhone • Tweets: Total Tweets’

Sentiments

Pos NegNeut

iPhone is great!

iPhone connection sucks!

I bought an iPhone yesterday

Yeah IPhone has long battery life its even longer than my life :@(Challenge)!

Page 4: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

www.asa.imamu.edu.sa

Outline

•ASA •ASA Research Group?•Housekeeping •The talk

Page 5: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

Arabic Sentiment AnalysisResearch Group

www.asa.imamu.edu.sa @asa__iu

Page 6: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

www.asa.imamu.edu.sa

Group Members

Name RoleDr. Sarah alHumoud Principal Investigator

Dr. Areeb alOwisheq Co-Investigator

Dr. Nora alTwairesh Senior Investigator

Ms. Afnan alMoammar Ms. AlHanouf alSwilim

Ms. Mawaheb alTowijri Ms. Tarfa alBuhairi

Ms. Wejdan alOhaideb

Page 7: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

www.asa.imamu.edu.sa

Arabic Sentiment Analysis Group• Create an Arabic corpora• Develop a Sentiment Analyzer web

service• Disseminate aims, findings and

developed resources:• Website• Workshops • Scientific articles

• ASA Survey (collection, classification, analysis)

• Analyze and compare different SA methodologies performances

• Develop an SA classifier with discourse relation

Page 8: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

www.asa.imamu.edu.sa

Side Projects• Annotation

• 11 Annotators; • 142,434 Tweets

• Tools demo• ASA• Spam detection

Page 9: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

www.asa.imamu.edu.sa

Coming events• Sentiment Analysis in

Social Media session in• HCII2017

• Publications in•  Lecture Notes in

Computer Science (LNCS)• Deadline

• 17/ Dec/ 2016 

Page 10: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

www.asa.imamu.edu.sa

Outline

•ASA •ASA Research Group?•Housekeeping •The talk

Page 11: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

www.asa.imamu.edu.sa

Ask and talk?• For Textual Question

Use QA, • if your question is

answered it will be public

• To Speak• raise your hand

Page 12: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

www.asa.imamu.edu.sa

Attendees Countries

Saudi ArabiaUnited Arab EmiratesOtherOman

Page 13: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

www.asa.imamu.edu.sa

Attendees Majors

CSISITOtherIMDSSE 0

10

20

30

40

50

60

Page 14: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

www.asa.imamu.edu.sa

Outline

•ASA •ASA Research Group?•Housekeeping •The talk

Page 15: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

Sentiment Analysis in Twitter a Study on the Saudi Community

Online talk by: Dr. Nora Altwairesh

Date: 11 Dec, 8:00-9:30pm

Page 16: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

www.asa.imamu.edu.sa

The Speaker: Nora Al-Twairesh, Ph.D.

• Assistant Professor, • Information Technology Department• College of Computer and Information Sciences,• King Saud University• Riyadh, Saudi Arabia• Website: http://fac.ksu.edu.sa/twairesh • Research Groups: 

• http://iwan.ksu.edu.sa • https://asa.imamu.edu.sa

• Research Interests:• Arabic Sentiment Analysis of Social Media text,• Arabic Natural Language Processing,• Web and Data Mining.

Page 17: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

17www.asa.imamu.edu.sa

Contents• Introduction• What is Sentiment Analysis?• Why is it Important?• Sentiment Analysis of Arabic• Twitter• Research Motivation• Research Contributions • Results• Conclusion and Future Work

Page 18: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

18www.asa.imamu.edu.sa

What is Sentiment Analysis?

• Sentiment analysis is “the field of study that analyzes people’s opinions, sentiments, appraisals, attitudes, and emotions toward entities and their attributes expressed in written text" (Liu, 2012)

• Different names: Sentiment Analysis, Opinion mining, opinion extraction, sentiment mining, subjectivity analysis

• Sentiment Analysis classifies text polarity (positive, negative, neutral and mixed)

Page 19: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

19www.asa.imamu.edu.sa

What is Sentiment Analysis?

TweetSentiment

Positive Negative Neutral Mixed

إيجابي تغير خالد ـ الملك ـ مطارملحوظ

جدا فاشل مذيع أنه أثبت لألسف

قاريء برنامج لي ترشح ممكنممتاز باركود

االسعار لكن رائع جرير قارئغالية

لكن و جدا ممتاز بالجهاز انصحكثقيل عيبه

Page 20: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

20www.asa.imamu.edu.sa

Why is it Important?

• The proliferation of social media websites has led to the production of vast amounts of unstructured text on the Web.

• Aggregating and evaluating these opinions manually is a tedious task and could be nearly impossible.

• These opinions are important for organizations (government, business) and for individuals

Page 21: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

21www.asa.imamu.edu.sa

Sentiment Analysis Methods

• Lexicon-based: rule-based method that utilizes sentiment lexicons.

• Corpus-based: supervised learning that utilizes machine learning classifiers.

Page 22: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

22www.asa.imamu.edu.sa

Research Motivation

• Hot research field• Challenges of Arabic language• Challenges of Twitter data

Page 23: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

23www.asa.imamu.edu.sa

Arabic Language

• Morphologically Rich Language• Extremely challenging to process due to rich morphology

and complex word order• Diglossic situation with a multitude of dialects• Modern Standard Arabic : formal language• Dialects: informal language

Page 24: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

24www.asa.imamu.edu.sa

Challenges of SA of Arabic Tweets

• Use of Dialectal Arabic (DA)• Lack of Arabic Corpora and Datasets• Lack of Arabic Sentiment Lexicons

Page 25: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

25www.asa.imamu.edu.sa

Twitter

• Why Twitter?

Page 26: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

26www.asa.imamu.edu.sa

Twitter

• Why Twitter?• Mubarak, H., and Darwish K. "Using Twitter to collect a multi-

dialectal corpus of Arabic." ANLP 2014 (2014): 1.• 175 M Arabic tweets • during March 2014• 6.5 M tweets

Page 27: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

27www.asa.imamu.edu.sa

Characteristics of Twitter Data

• Language is informal• Short: 140 characters or less• Abbreviations and shortenings• Wide array of topics and large vocabulary• Spelling mistakes and creative spellings• Special strings: hashtags, emoticons,

conjoined words

Page 28: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

28www.asa.imamu.edu.sa

Research Contributions• Collecting a large dataset of Arabic Tweets 2.2M.• AraSenti-Tweet Corpus: A corpus of Saudi tweets was

constructed from the dataset of tweets.• AraSenti Lexicon: A sentiment lexicon of Arabic words was

extracted from the dataset of tweets. • Constructing an extensive list of Arabic contextual valence

shifters (negators, intensifiers, diminishers, modal words and contrast words).

• Lexicon-based method.• Corpus-based method.• Hybrid method.

Page 29: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

29www.asa.imamu.edu.sa

Data Collection

• EMO-TWEET Dataset:• distant supervision: using emoticons as noisy labels :positive, : negative.

• KEY-TWEET Dataset:• sentiment words as search keywords, ex: – سيء أعجبني

• Saudi-Tweet Dataset: • Tweet or user location set to Saudi location

Page 30: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

30www.asa.imamu.edu.sa

Data Collection and Preprocessing

Page 31: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

31www.asa.imamu.edu.sa

Data Collection and Preprocessing

Page 32: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

32www.asa.imamu.edu.sa

AraSenti-Tweet Corpus• Set of ~ 13,000 tweets were selected from the Saudi

Dataset• Most of the annotated tweets in the first stage were

positive or negative and we needed to augment the dataset with more neutral tweets, so we collected 4,000 tweets from two Saudi news accounts

• More tweets were collected to set up the test set ~2000 tweets

Page 33: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

33www.asa.imamu.edu.sa

AraSenti-Tweet Corpus

Class No. of Tweets No. of Tokens

Positive 4,957 93,601

Negative 6,155 127,182

Neutral 4,639 71,492

Mixed 1,822 39,883

Total 17,573 332,158

Page 34: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

34www.asa.imamu.edu.sa

AraSenti Lexicon

• AraSenti-Trans: Using MADAMIRA, the English glosses of the extracted words from the tweets were compared to English sentiment lexicons using certain heuristics. Then a manual correction was performed

• AraSenti-PMI: The second lexicon was generated through calculating the pointwise mutual information (PMI) measure for all words in the positive and negative datasets of tweets.

• Sentiment Score(w)=PMI(w,pos)-PMI(w,neg)

Page 35: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

35www.asa.imamu.edu.sa

Significance of AraSenti Lexion

• Captures the idiosyncratic nature of social media text.• Provides sentiment intensity of words, not only the

sentiment orientation.• MSA and DA• High coverage :200K words

Page 36: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

36www.asa.imamu.edu.sa

Arabic Valence Shifters

• Extensive list of Arabic valence shifters extracted from the datasets through similarity measures.

• Negation words, intensifiers, diminishers, modal words, presuppositional and contrast words.

• Different hypotheses were evaluated for negation handling.

Page 37: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

37www.asa.imamu.edu.sa

Arabic Valence Shifters

Example Sentiment Valence shifter

ممتع الكتاب .هذا Positive None

الكتاب ممتع غيرهذا . Negative Negation

ألنه االخالق سيء الرجل هذاالعامل .أهان

Negative None

كان االخالق لو سيء الرجلالعامل .ألهان

Neutral Modal

جيد الكتاب .هذا Positive None

جيد ظنيت كتاب بيكون إنه . Neutral or Negative

Presuppositional

ينجح أن .استطاع Positive None

ينجح بالكاد أن استطاع . Negative Presuppositional

Page 38: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

38www.asa.imamu.edu.sa

Sentiment Analysis Methods

• Three sentiment analysis methods:• Lexicon-based• Corpus-based• Hybrid

• Three classification models: • Two-way classification (positive, negative),• Three-way classification (positive, negative, neutral) • Four-way classification (positive, negative, neutral, mixed)

Page 39: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

39www.asa.imamu.edu.sa

Lexicon-based Method

• Rule-based method that utilizes the AraSenti-lexicon and performs context-aware sentiment analysis by special handling of negation and contextual valence shifters.

• Calculates sentiment score which represents sentiment intensity in addition to polarity.

Page 40: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

40www.asa.imamu.edu.sa

Corpus-based Method

• Supervised learning method that utilizes ML classifiers using the AraSenti-Tweet corpus.

• Used SVM linear kernel.• Features engineered: syntactic, semantic, and Twitter

specific. • Semantic features include the AraSenti-lexicon.• Performed feature backward selection to reach best set of

features.

Page 41: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

41www.asa.imamu.edu.sa

Hybrid Method

• The approach was to incorporate the knowledge extracted from the rule-based method as features into the statistical method.

• The tweet score that is calculated in the lexicon-based method was added to the features used in the corpus-based method.

• The hybrid method exhibited significant increases in performance for two-way and three-way classification.

• However, in four-way classification the performance of the hybrid and corpus-based method was almost the same.

Page 42: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

42www.asa.imamu.edu.sa

Results

Lexicon-based

Corpus-based Hybrid

Two-way classification 67.08 65.7 69.9

Three-way classification 45.69 59.85 61.63

Four-way classification 34.8 55.38 55.07

Page 43: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

43www.asa.imamu.edu.sa

Conclusion and Future Work

• Twitter ANLP tool: Arabic language needs enabling technologies for preprocessing Twitter data.

• Other statistical methods for generating the lexicon: Chi-Square and Information Gain.

• A sentiment treebank that allows for a complete analysis of the compositional effects of sentiment in Arabic language would enable better classification.

Page 44: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

44www.asa.imamu.edu.sa

Conclusion and Future Work

• Better handling of negation and valence shifters through constructing a specialized corpus that contains these valence shifters and annotating them with regard to the impact on sentiment.

• Sarcasm detection in tweets is a vital research direction.• Future solutions should be domain specific, dialect

specific and periodically updated to adhere to the time shift in the language on Twitter.

Page 45: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

45www.asa.imamu.edu.sa

Conclusion and Future Work

• Major and novel contributions to the field can be accomplished through collaboration of computer scientists, linguist experts and social scientists.

• Hence, interdisciplinary research is a major research necessity for the field to flourish and advance.

Page 46: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community

46www.asa.imamu.edu.sa

• Thank you..• Questions?