subjectivity and sentiment analysis of arabic trends and challenges

30
Subjectivity and Sentiment Analysis of Arabic: Trends and Challenges Nora Al-Twairesh 1

Upload: asagroup

Post on 23-Jan-2017

563 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Subjectivity and sentiment analysis of arabic trends and challenges

1

Subjectivity and Sentiment Analysis of Arabic: Trends

and ChallengesNora Al-Twairesh

Page 2: Subjectivity and sentiment analysis of arabic trends and challenges

2----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Contents

• Introduction• What is Subjectivity and Sentiment Analysis?• Why is it Important?• Sentiment Analysis Applications• Subjectivity and Sentiment Analysis of Arabic• The Literature• Challenges• Conclusion and Future Research Directions

Page 3: Subjectivity and sentiment analysis of arabic trends and challenges

3----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Introduction

What do other people think?!

Which Smart phone?Which laptop?Which hotel?Which policy?Which place?

Page 4: Subjectivity and sentiment analysis of arabic trends and challenges

4----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Introduction

Picture courtesy: http://www.creativeagentsolutions.com/real-estate-virtual-assistant-services/social-media-management/Picture courtesy: http://www.socialmediaexaminer.com/18-social-media-marketing-tips/

Page 5: Subjectivity and sentiment analysis of arabic trends and challenges

5----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

What is Subjectivity and Sentiment Analysis?• Subjectivity analysis classifies content into

objective(facts) or subjective(opinions)• Sentiment Analysis classifies text polarity (positive,

negative, neutral)

Subjectivity Analysis Sentiment AnalysisSubjective Positive

Negative

Objective Neutral

Page 6: Subjectivity and sentiment analysis of arabic trends and challenges

6----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

What is Subjectivity and Sentiment Analysis?• Different names: Sentiment Analysis, Opinion

mining, opinion extraction, sentiment mining, subjectivity analysis,

• As a multidisciplinary field in nature, sentiment analysis encompasses the fields of natural language processing, text mining and artificial intelligence.

Page 7: Subjectivity and sentiment analysis of arabic trends and challenges

7----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Why is it Important?

• The proliferation of social media websites has led to the production of vast amounts of unstructured text on the Web.

• Aggregating and evaluating these opinions manually is a tedious task and could be nearly impossible.

Page 8: Subjectivity and sentiment analysis of arabic trends and challenges

8----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Why is it Important?

• These opinions are important for organizations (government, business) and for individuals

• “Sentiment Analysis is now right at the center of the social media research.”, Liu B.

Page 9: Subjectivity and sentiment analysis of arabic trends and challenges

9----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Applications

Businesses and organizations: product and service benchmarking.market intelligence.Business spends a huge amount of money to find

consumer sentiments and opinions.Consultants, surveys and focused groups, etc

Individuals: interested in other’s opinions when purchasing a product or using a service, finding opinions on political topics

Opinion retrieval/search: providing general search for opinions.

Page 10: Subjectivity and sentiment analysis of arabic trends and challenges

10----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Subjectivity and Sentiment Analysis of Arabic• Morphologically rich language• Formal written language: Modern Standard Arabic

(MSA)• Every day spoken language : Informal Arabic,

Colloquial Arabic, Dialectal Arabic• Previous research on SSA of Arabic was merely for

MSA, but recently researchers started addressing Dialectal Arabic (DA).

Page 11: Subjectivity and sentiment analysis of arabic trends and challenges

11----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Research on SSA of Arabic

• Previous survey : Korayem et. al (2012)• 30 papers were found since Korayem’s survey up to

July 2014. • Search methodology:

• a search process was performed using the following keywords: 'Arabic subjectivity and sentiment analysis', 'Arabic opinion mining', 'Comparative opinions Arabic', and 'Opinion spam Arabic' using the following databases: Google Scholar, Springer, IEEE explorer, ACM digital library, and Science Direct.

Page 12: Subjectivity and sentiment analysis of arabic trends and challenges

12----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

2006 2007 2008 2009 2010 2011 2012 2013 20140

2

4

6

8

10

12

Year

Num

ber

of R

efer

ence

s

Page 13: Subjectivity and sentiment analysis of arabic trends and challenges

13----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

SSA methods

• Supervised learning : Corpus based• Unsupervised learning : Lexicon based• Hybrid

Page 14: Subjectivity and sentiment analysis of arabic trends and challenges

14----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Lexicon-Based

• Sentiment Lexicons can be built:• Manually• Automatically

• Words are sometimes given scores for (positive, negative, neutral)

• Ex: SentiWordNet• How to calculate a sentence’s sentiment?• SentiStrength: takes care of negation, intensifiers,

diminshers.

Page 15: Subjectivity and sentiment analysis of arabic trends and challenges

15----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Corpus-Based

• Using Machine learning Classifiers and a training corpus

1S. Mohammad EMNLP 2014 Sentiment Analysis tutorial: http://emnlp2014.org/tutorials/7_notes.pdf

Page 16: Subjectivity and sentiment analysis of arabic trends and challenges

16----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

SSA of Arabic

• Best machine learning techniques to be used were SVM and NB.

• Stemming has led to better accuracies in most studies.

• The usefulness of n-grams varied among solutions, where in some studies unigrams were useful as well as bigrams and in others a combination of bigrams and trigrams was useful.

Page 17: Subjectivity and sentiment analysis of arabic trends and challenges

17----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

SSA of Arabic

• The studies spanned different genres (news comments, movie reviews, product reviews, chat turns, Tweets, Facebook posts, forum posts) and different domains.

• The conclusion of most studies is that a different solution is needed for each genre and in each domain.

Page 18: Subjectivity and sentiment analysis of arabic trends and challenges

18----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Challenges

• Use of Dialectal Arabic (DA)• Dialects differ from MSA phonologically, morphologically and

syntactically and do not have standard orthographies. This makes the task of building morphological analyzers and POS taggers for dialects a big challenge.

• Recent efforts for building these tools for DA suffer from low accuracy and are tailored for specific dialects. The availability of such tools is essential for SSA.

• Concepts have different lexical choices in different DAs which make building lexicons that cover multiple dialects very challenging. Ex: تشكيل

• Also negation and stop words can be expressed in different ways in DAs and vary among DAs.

• It has even been deliberated that the Arabic dialects can be considered different languages in their own right.

Page 19: Subjectivity and sentiment analysis of arabic trends and challenges

19----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Challenges

• Lack of Corpora and Datasets• The accuracy of any SSA system depends on the

availability of large annotated corpora which are still a scarce resource for Arabic.

• Datasets that are available now are very small compared to those available for the English language and are usually from the news domain or movie reviews.

• This also hinders the comparison of new SSA systems to previous ones to determine their accuracy.

Page 20: Subjectivity and sentiment analysis of arabic trends and challenges

20----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Challenges

• Lack of Sentiment Lexicons• No publically available DA sentiment lexicon exists. • MSA lexicons are small compared to those built for

English language. • A recent effort to build a large scale multi-genre multi

dialect Arabic sentiment lexicon has been proposed by AbdulMajeed and Diab (2014). However, it covers only two dialects: Egyptian and Levantine and is not yet fully applied to SSA tasks.

Page 21: Subjectivity and sentiment analysis of arabic trends and challenges

21----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Challenges

• The need for Named Entity Recognition (NER)

• Although performing NER is not required to detect sentiment, in the case of Arabic language it is required because Arabic names are derived from Arabic adjectives that can be confused for sentiments, for example, the Arabic name Jamila (جميلة) which means beautiful.

Page 22: Subjectivity and sentiment analysis of arabic trends and challenges

22----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Challenges

• Handling compound phrases and idioms• Arabic speakers tend to use popular compound phrases and

idioms to express their opinions. Some examples include:• :( شيخ يا which is used to express disbelief in someone’s (ال

saying,• ( الوكيل ونعم الله which conveys a negative :(حسبي

sentiment, and carries an implicit prayer to Allah to take revenge.

• These compound phrases and idioms tend to differ throughout DAs and different Arabic cultures.

• Moreover, phrases and words used to express sentiment are subject to usage trends, with new phrases evolving every day.

Page 23: Subjectivity and sentiment analysis of arabic trends and challenges

23----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Challenges

• Use of Arabizi• A new trend in social media is the use of Latin characters

to represent Arabic words. • Arabic users of social media also tend to code switch

between Arabic and English in their writings, making it difficult to detect if a word written with Latin characters is Arabizi or English.

• The literature on Arabic SSA has not dealt with this problem yet.

Page 24: Subjectivity and sentiment analysis of arabic trends and challenges

24----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Challenges

• Sarcasm Detection • Sarcasm is a form of speech act where a person says

something positive while (s)he really means something negative or vice versa.

• Sarcasm is very hard to detect, in English, there are only few studies for sarcasm detection using supervised and semi-supervised learning approaches.

• In Arabic SSA, no study was found that takes care of sarcasm detection.

Page 25: Subjectivity and sentiment analysis of arabic trends and challenges

25----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Challenges

• Comparative Opinions • An opinion can be expressed as a comparison between

two entities. • Comparative opinions are different from regular

opinions in that they have different semantic meanings and different syntactic forms.

• Mining comparative opinions is considered a challenging task in English .

• Arabic would be no exception, however only one study has been found on mining comparative opinions in Arabic by El-Halees (2012)

Page 26: Subjectivity and sentiment analysis of arabic trends and challenges

26----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Challenges

• Opinion Spam Detection• Like in Webpages, opinions can also suffer from spam,

although opinion spam differs from web spam and therefore needs different approaches to detect it.

• Arabic opinion spam detection is still under researched. Only two studies were found.

Page 27: Subjectivity and sentiment analysis of arabic trends and challenges

27----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Challenges

• Co-reference Resolution• Co-reference resolution is a challenging problem in most

NLP applications. It is apparent in SSA also. An example illustrates the problem:

• الصور " بعض بها والتقطت جديدة كانون كاميرة اشتريتجدا جميلة "كانت

Page 28: Subjectivity and sentiment analysis of arabic trends and challenges

28----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Challenges

• Opinion Target and Opinion Holder Extraction

• Although the main task of SSA is to determine the polarity of the sentence, it is also essential to extract the opinion target and for some applications the opinion holder.

Page 29: Subjectivity and sentiment analysis of arabic trends and challenges

29----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

Conclusion

• Research on SSA of Arabic is still in its early stages, although it is gaining high interest from the research community.

• The challenges identified can all be considered future research directions

Page 30: Subjectivity and sentiment analysis of arabic trends and challenges

30----------------------------------------Email: [email protected], Web: asa.imamu.edu.sa, Twitter: @asa_iu

• Thank you..• [email protected]