sentiment analysis - ugr · 2018. 12. 17. · sentiment analysis an introduction to opinion mining...
TRANSCRIPT
Sentiment AnalysisSentiment AnalysisAn Introduction to Opinion Mining and its Applications
Ana ValdiviaGranada, 17/11/2016
About me
Ana ValdiviaDegree in Mathematics (UPC)MSc in Data Science (UGR)
Paper about museums:Martínez‐de‐Albéniz, V. and Valdivia, A.; “Measuring and Exploiting the Impact ofMeasuring and Exploiting the Impact of Exhibitions Scheduling on Museum Attendance”.
M t Th i b t SentimentMaster Thesis about Sentiment Analysis
Organizer of @DataBeersGRX
Ana Valdivia ©
ROADMAP
1 I t d ti1. Introduction
2. The Sentiment Analysis Problem
3. The Sentiment Analysis Process3. The Sentiment Analysis Process
4. My Master’s Thesis
Ana Valdivia ©
1. INTRODUCTION
What is SA?What is SA?Sentiment Analysis (SA) is the field of knowledge that analyses people’s opinions reviews or thoughts aboutanalyses people s opinions, reviews or thoughts about products, companies or experiences identifying its sentiment.
Al f d O i i Mi iAlso referred as Opinion Mining.
Ana Valdivia ©
1. INTRODUCTION
What is SA? “Alhambra with General LifeWhat is SA?parks and gardens, the towerand Nazrid palaces isabsolutely amazing. If you
“DO NOT EVEN TRY TO VISIT - A total
waste of time!!!. Spent 5 hours in the ticket
“Most visited monument in Spain. There are no
y g yare in Granada you must notmiss it.”
p 5
queue in the broiling sun 35 degrees. An
officious staff member told us when we reached
the head of the queue that there were no more words to descibe this place - beaty awaits around
every corner. THe mixture of two cultures in one
place makes it very special…”
the head of the queue that there were no more
tickets and to buy online…”
Ana Valdivia ©
1. INTRODUCTION
Where it comes from?…
Sentiment AnalysisParsing
Discourse analysis
Name entity recognition (NER)
Part-of-speech tagging (POS)
Topic segmentation
Discourse analysis
Machine translation
Part-of-speech tagging (POS)
Automatic summarization
Ana Valdivia © NLP
1. INTRODUCTION
Why is SA being popular?
Social Networks
Web 2.0Web 2.0
Ana Valdivia ©
1. INTRODUCTION
Customer’s satisfaction
Ana Valdivia © http://www.slideshare.net/robin_allfamous/sentiment‐analysis‐and‐applications‐in‐the‐news‐and‐media‐industry
1. INTRODUCTION
Why is SA being popular?
Social media sentiment isthe #nofiltervoice of thevoice of thepeople.
Ana Valdivia © http://www.slideshare.net/robin_allfamous/sentiment‐analysis‐and‐applications‐in‐the‐news‐and‐media‐industry
ROADMAP
1 I t d ti1. Introduction
2. The Sentiment Analysis Problem
3. The Sentiment Analysis Process3. The Sentiment Analysis Process
4. My Master’s Thesis
Ana Valdivia ©
2. THE SENTIMENT ANALYSIS PROBLEM
What’s an opinion?What s an opinion?
Ana Valdivia ©
2. THE SENTIMENT ANALYSIS PROBLEM
What’s an opinion?p
“If we cannot structure a problem, web bl d d d h bl ”probably do not understand the problem” .
B. Liu
Ana Valdivia ©
2. THE SENTIMENT ANALYSIS PROBLEM
What’s an opinion?p
“If we cannot structure a problem, web bl d d d h bl ”probably do not understand the problem” .
B. Liu
Ana Valdivia ©
2. THE SENTIMENT ANALYSIS PROBLEM
What’s an opinion?p
“If we cannot structure a problem, web bl d d d h bl ”
Liu’s proposal:probably do not understand the problem”.B. Liu.
BOOK REMARKB. Liu,
Sentiment analysis and i i i i
Ana Valdivia ©
opinion mining
2. THE SENTIMENT ANALYSIS PROBLEM
Polarityy
Ana Valdivia ©
2. THE SENTIMENT ANALYSIS PROBLEM
Polarityy
Ana Valdivia ©
2. THE SENTIMENT ANALYSIS PROBLEM
Polarityy
Ana Valdivia ©
2. THE SENTIMENT ANALYSIS PROBLEMOne example is worth a thousand wordsthousand words…
Ana Valdivia ©
2. THE SENTIMENT ANALYSIS PROBLEMOne example is worth a thousand words “We were very tired after a loong walk. Wethousand words… We were very tired after a loong walk. We
stopped her for a rest, the first nice thing here, is
the view, and the fruit juices were excellent. WeLiu’s proposal:
felt much better after drunk it. Also the desert
were very good. Thank you.”
Ana Valdivia ©
2. THE SENTIMENT ANALYSIS PROBLEM
Different analytic levels
‐ Document level
‐ Sentence level
‐ Aspect or entity level
Ana Valdivia ©
2. THE SENTIMENT ANALYSIS PROBLEM
Main concerns‐ Different types of opinionsDirect/indirect, comparative, explicit/implicit, …
l i h i i‐ Deal with text miningGrammar mistakes, emoticons, …
‐ Irony and sarcasm
‐ Fake or spam opinions‐ Fake or spam opinions
Ana Valdivia ©
ROADMAP
1 I t d ti1. Introduction
2. The Sentiment Analysis Problem
3. The Sentiment Analysis Process3. The Sentiment Analysis Process
4. My Master’s Thesis
Ana Valdivia ©
3. THE SENTIMENT ANALYSIS PROCESS
Step by stepp y p
Ana Valdivia ©
3. THE SENTIMENT ANALYSIS PROCESS
Step by stepp y p
Ana Valdivia ©
3. THE SENTIMENT ANALYSIS PROCESS
Sentiment identificationExpert or user Sentiment extraction
algorithmsg
‐ Stanford CoreNLP‐MeaningCloud’s‐Microsoft Azure
Ana Valdivia ©‐ …
3. THE SENTIMENT ANALYSIS PROCESS
Step by stepp y p
Ana Valdivia ©
3. THE SENTIMENT ANALYSIS PROCESS
Feature Selection
Bag of Wordsg
Ana Valdivia ©
3. THE SENTIMENT ANALYSIS PROCESS
Feature Selection Term‐Document Matrix
Bag of Words
Term Document Matrix
g
Ana Valdivia ©
3. THE SENTIMENT ANALYSIS PROCESS
Feature Selection Term‐Document Matrix
Bag of Words
Term Document Matrix
g
tf‐idf
Ana Valdivia ©
3. THE SENTIMENT ANALYSIS PROCESS
Feature SelectionText Preprocessing
ParsingParsingStemming
RemoveSTOP Words
Ana Valdivia ©
3. THE SENTIMENT ANALYSIS PROCESS
Feature SelectionText Preprocessing
ParsingParsingStemming
{nightmare nighttime nocturnal{nightmare, nighttime, nocturnal, nightlife...} night
RemoveSTOP Words
Ana Valdivia ©
3. THE SENTIMENT ANALYSIS PROCESS
Feature SelectionN‐grams More sophisticated…
Aspect‐Based Sentiment Analysis
Ana Valdivia ©ASUM
3. THE SENTIMENT ANALYSIS PROCESS
Step by stepp y p
Medhat Walaa Ahmed Hassan and Hoda Korashy "Sentiment
Ana Valdivia ©
Medhat, Walaa, Ahmed Hassan, and Hoda Korashy. Sentimentanalysis algorithms and applications: A survey." Ain ShamsEngineering Journal 5.4 (2014): 1093‐1113.
ROADMAP
1 I t d ti1. Introduction
2. The Sentiment Analysis Problem
3. The Sentiment Analysis Process3. The Sentiment Analysis Process
4. My Master’s Thesis
Ana Valdivia ©
4. MY MASTER’S THESIS
Ana Valdivia ©
4. MY MASTER’S THESIS
Objectives
1. Study correlation between humanand machine sentiment
2. Classify opinions
3.Dicover interesting patterns in negative opinionsnegative opinions
Ana Valdivia ©
4. MY MASTER’S THESIS
Ana Valdivia ©
4. MY MASTER’S THESIS
Ana Valdivia ©
4. MY MASTER’S THESIS
Studying correlation between different sentimentlabels
SentimentCoreNLP SentimentValue
Ana Valdivia ©
4. MY MASTER’S THESIS
Studying correlation between different sentimentlabels
53 08 % f i id53.08 % of coincidence
Ana Valdivia ©
4. MY MASTER’S THESIS
Studying correlation between different sentimentlabels
93 49 % f i id93.49 % of coincidence
Ana Valdivia ©
4. MY MASTER’S THESIS
Classification problem positive
positive negativeUFSMpositive negative
BFSM
Ana Valdivia © negative
4. MY MASTER’S THESIS
DocumentTerm Matrix
Use UFSM and BFSM
TripAdvisor Alhambra data set
Split it in threesets depending onsentiment classdata setlabel
PreprocessingIf it is very unbalanced, apply oversamplingtechniques
Classification algorithmsApply different machine learning algorithms in traindata set with 5cv
Split it upSplit complete set in 75% training set and 25% testing setq g
Evaluate ResultsCheck measure values and dicuss best model
Ana Valdivia ©
dicuss best model
4. MY MASTER’S THESIS
XGBoost
IR = 1
unigrams
Ana Valdivia ©
4. MY MASTER’S THESIS
Subgroup Discovery
tinegative
SD‐Map algorithm
Ana Valdivia ©
SUMMARY
SA is a very challenging problem
Lots of applications
New research line
Ana Valdivia ©