sentiment analysis - ugr · 2018. 12. 17. · sentiment analysis an introduction to opinion mining...

Post on 29-Mar-2021

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Sentiment AnalysisSentiment AnalysisAn Introduction to Opinion Mining and its Applications  

Ana ValdiviaGranada, 17/11/2016

About me

Ana ValdiviaDegree in Mathematics (UPC)MSc in Data Science (UGR)

Paper about museums:Martínez‐de‐Albéniz, V. and Valdivia, A.; “Measuring and Exploiting the Impact ofMeasuring and Exploiting the Impact of Exhibitions Scheduling on Museum Attendance”.

M t Th i b t SentimentMaster Thesis about Sentiment Analysis 

Organizer of @DataBeersGRX

Ana Valdivia ©

ROADMAP

1 I t d ti1. Introduction

2. The Sentiment Analysis Problem

3. The Sentiment Analysis Process3. The Sentiment Analysis Process

4. My Master’s Thesis

Ana Valdivia ©

1. INTRODUCTION

What is SA?What is SA?Sentiment Analysis (SA) is the field of knowledge that analyses people’s opinions reviews or thoughts aboutanalyses people s opinions, reviews or thoughts about products, companies or experiences identifying its sentiment. 

Al f d O i i Mi iAlso referred as Opinion Mining.

Ana Valdivia ©

1. INTRODUCTION

What is SA? “Alhambra with General LifeWhat is SA?parks and gardens, the towerand Nazrid palaces isabsolutely amazing. If you

“DO NOT EVEN TRY TO VISIT - A total

waste of time!!!. Spent 5 hours in the ticket

“Most visited monument in Spain. There are no

y g yare in Granada you must notmiss it.”

p 5

queue in the broiling sun 35 degrees. An

officious staff member told us when we reached

the head of the queue that there were no more words to descibe this place - beaty awaits around

every corner. THe mixture of two cultures in one

place makes it very special…”

the head of the queue that there were no more

tickets and to buy online…”

Ana Valdivia ©

1. INTRODUCTION

Where it comes from?…

Sentiment AnalysisParsing

Discourse analysis

Name entity recognition (NER)

Part-of-speech tagging (POS)

Topic segmentation

Discourse analysis

Machine translation

Part-of-speech tagging (POS)

Automatic summarization

Ana Valdivia © NLP

1. INTRODUCTION

Why is SA being popular?

Social Networks

Web 2.0Web 2.0

Ana Valdivia ©

1. INTRODUCTION

Customer’s satisfaction

Ana Valdivia © http://www.slideshare.net/robin_allfamous/sentiment‐analysis‐and‐applications‐in‐the‐news‐and‐media‐industry

1. INTRODUCTION

Why is SA being popular?

Social media sentiment isthe #nofiltervoice of thevoice of thepeople.

Ana Valdivia © http://www.slideshare.net/robin_allfamous/sentiment‐analysis‐and‐applications‐in‐the‐news‐and‐media‐industry

ROADMAP

1 I t d ti1. Introduction

2. The Sentiment Analysis Problem

3. The Sentiment Analysis Process3. The Sentiment Analysis Process

4. My Master’s Thesis

Ana Valdivia ©

2. THE SENTIMENT ANALYSIS PROBLEM

What’s an opinion?What s an opinion?

Ana Valdivia ©

2. THE SENTIMENT ANALYSIS PROBLEM 

What’s an opinion?p

“If we cannot structure a problem, web bl d d d h bl ”probably do not understand the problem” .

B. Liu

Ana Valdivia ©

2. THE SENTIMENT ANALYSIS PROBLEM 

What’s an opinion?p

“If we cannot structure a problem, web bl d d d h bl ”probably do not understand the problem” .

B. Liu

Ana Valdivia ©

2. THE SENTIMENT ANALYSIS PROBLEM 

What’s an opinion?p

“If we cannot structure a problem, web bl d d d h bl ”

Liu’s proposal:probably do not understand the problem”.B. Liu.

BOOK REMARKB. Liu, 

Sentiment analysis and i i i i

Ana Valdivia ©

opinion mining

2. THE SENTIMENT ANALYSIS PROBLEM

Polarityy

Ana Valdivia ©

2. THE SENTIMENT ANALYSIS PROBLEM

Polarityy

Ana Valdivia ©

2. THE SENTIMENT ANALYSIS PROBLEM

Polarityy

Ana Valdivia ©

2. THE SENTIMENT ANALYSIS PROBLEMOne example is worth a thousand wordsthousand words…

Ana Valdivia ©

2. THE SENTIMENT ANALYSIS PROBLEMOne example is worth a thousand words “We were very tired after a loong walk. Wethousand words… We were very tired after a loong walk. We

stopped her for a rest, the first nice thing here, is

the view, and the fruit juices were excellent. WeLiu’s proposal:

felt much better after drunk it. Also the desert

were very good. Thank you.”

Ana Valdivia ©

2. THE SENTIMENT ANALYSIS PROBLEM

Different analytic levels

‐ Document level

‐ Sentence level

‐ Aspect or entity level

Ana Valdivia ©

2. THE SENTIMENT ANALYSIS PROBLEM

Main concerns‐ Different types of opinionsDirect/indirect, comparative, explicit/implicit, …

l i h i i‐ Deal with text miningGrammar mistakes, emoticons, …

‐ Irony and sarcasm

‐ Fake or spam opinions‐ Fake or spam opinions

Ana Valdivia ©

ROADMAP

1 I t d ti1. Introduction

2. The Sentiment Analysis Problem

3. The Sentiment Analysis Process3. The Sentiment Analysis Process

4. My Master’s Thesis

Ana Valdivia ©

3. THE SENTIMENT ANALYSIS PROCESS

Step by stepp y p

Ana Valdivia ©

3. THE SENTIMENT ANALYSIS PROCESS

Step by stepp y p

Ana Valdivia ©

3. THE SENTIMENT ANALYSIS PROCESS

Sentiment identificationExpert or user               Sentiment extraction 

algorithmsg

‐ Stanford CoreNLP‐MeaningCloud’s‐Microsoft Azure 

Ana Valdivia ©‐ …

3. THE SENTIMENT ANALYSIS PROCESS

Step by stepp y p

Ana Valdivia ©

3. THE SENTIMENT ANALYSIS PROCESS

Feature Selection

Bag of Wordsg

Ana Valdivia ©

3. THE SENTIMENT ANALYSIS PROCESS

Feature Selection Term‐Document Matrix

Bag of Words

Term Document Matrix

g

Ana Valdivia ©

3. THE SENTIMENT ANALYSIS PROCESS

Feature Selection Term‐Document Matrix

Bag of Words

Term Document Matrix

g

tf‐idf

Ana Valdivia ©

3. THE SENTIMENT ANALYSIS PROCESS

Feature SelectionText Preprocessing

ParsingParsingStemming

RemoveSTOP Words

Ana Valdivia ©

3. THE SENTIMENT ANALYSIS PROCESS

Feature SelectionText Preprocessing

ParsingParsingStemming

{nightmare nighttime nocturnal{nightmare, nighttime, nocturnal, nightlife...} night

RemoveSTOP Words

Ana Valdivia ©

3. THE SENTIMENT ANALYSIS PROCESS

Feature SelectionN‐grams More sophisticated…

Aspect‐Based Sentiment Analysis

Ana Valdivia ©ASUM

3. THE SENTIMENT ANALYSIS PROCESS

Step by stepp y p

Medhat Walaa Ahmed Hassan and Hoda Korashy "Sentiment

Ana Valdivia ©

Medhat, Walaa, Ahmed Hassan, and Hoda Korashy.  Sentimentanalysis algorithms and applications: A survey." Ain ShamsEngineering Journal 5.4 (2014): 1093‐1113.

ROADMAP

1 I t d ti1. Introduction

2. The Sentiment Analysis Problem

3. The Sentiment Analysis Process3. The Sentiment Analysis Process

4. My Master’s Thesis

Ana Valdivia ©

4. MY MASTER’S THESIS

Ana Valdivia ©

4. MY MASTER’S THESIS

Objectives

1. Study correlation between humanand machine sentiment

2. Classify opinions

3.Dicover interesting patterns in negative opinionsnegative opinions

Ana Valdivia ©

4. MY MASTER’S THESIS

Ana Valdivia ©

4. MY MASTER’S THESIS

Ana Valdivia ©

4. MY MASTER’S THESIS

Studying correlation between different sentimentlabels

SentimentCoreNLP SentimentValue

Ana Valdivia ©

4. MY MASTER’S THESIS

Studying correlation between different sentimentlabels

53 08 % f i id53.08 % of coincidence

Ana Valdivia ©

4. MY MASTER’S THESIS

Studying correlation between different sentimentlabels

93 49 % f i id93.49 % of coincidence

Ana Valdivia ©

4. MY MASTER’S THESIS

Classification problem positive

positive negativeUFSMpositive negative

BFSM

Ana Valdivia © negative

4. MY MASTER’S THESIS

DocumentTerm Matrix

Use UFSM and BFSM

TripAdvisor Alhambra data set

Split it in threesets depending onsentiment classdata setlabel

PreprocessingIf it is very unbalanced, apply oversamplingtechniques

Classification algorithmsApply different machine learning algorithms in traindata set with 5cv

Split it upSplit complete set in 75% training set and 25% testing setq g

Evaluate ResultsCheck measure values and dicuss best model

Ana Valdivia ©

dicuss best model

4. MY MASTER’S THESIS

XGBoost

IR = 1

unigrams

Ana Valdivia ©

4. MY MASTER’S THESIS

Subgroup Discovery

tinegative

SD‐Map algorithm

Ana Valdivia ©

SUMMARY

SA is a very challenging problem

Lots of applications

New research line

Ana Valdivia ©

THANKS!any question?

avaldivia@ugr.es

@ana valdiAna Valdivia ©

@ _

top related