sentiment analysis & opinion mining

44
Sentiment Analysis & Opinion Mining Lecture Two: March 3, 2011 Aditya M Joshi M Tech3, CSE IIT Bombay {[email protected]}

Upload: daisy

Post on 23-Feb-2016

67 views

Category:

Documents


0 download

DESCRIPTION

Sentiment Analysis & Opinion Mining. Lecture Two: March 3, 2011. Aditya M Joshi M Tech3, CSE IIT Bombay {[email protected]}. Sentiment analysis (SA). Task of tagging text with orientation of opinion This is a good movie. This is a bad movie. The movie is set in Australia. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sentiment Analysis & Opinion Mining

Sentiment Analysis & Opinion MiningLecture Two: March 3, 2011

Aditya M JoshiM Tech3, CSEIIT Bombay

{[email protected]}

Page 2: Sentiment Analysis & Opinion Mining

Sentiment analysis (SA)

Task of tagging text with orientation of opinion

This is a good movie.

This is a bad movie.

The movie is set in Australia.

Subjective

Objective

RECAP

Page 3: Sentiment Analysis & Opinion Mining

Challenges of SA

• Domain dependent• Sarcasm• Thwarted expressions• Negation• Implicit polarity• Time-bounded

the sentences/words that contradict the overall sentiment

of the set are in majority

Example: “The actors are good, the music is brilliant and appealing.

Yet, the movie fails to strike a chord.”

Sarcasm uses words ofa polarity to represent

another polarity.

Example: “The perfume is soamazing that I suggest you wear it

with your windows shut”

Sentiment of a word is w.r.t. the

domain.

Example: ‘unpredictable’

For steering of a car,

For movie review,

“I did not like the movie.”

“Not only is the movie boring, it is also the biggest waste of producer’s

money.”

“Not withstanding the pressure of the public, let me admit that I have

loved the movie.”

“The camera of the mobile phone is less than one mega-pixel – quite

uncommon for a phone of today.”

“This phone allows me to send SMS.”

“This phone has a touch-screen.”

RECAP

Page 4: Sentiment Analysis & Opinion Mining

How much opinion?

Chart created using : www.technorati.com/chart/ RECAP

Page 5: Sentiment Analysis & Opinion Mining

Using ML for NLP

• Documents represented as feature vectors for classifiers

– Features: unigrams, etc.– Models: SVM, NB, etc.

Chart created using : www.technorati.com/chart/ RECAP

The movie is set in Australia. The movie is good.

The: 2movie: 2is: 2set: 1in: 1Australia: 1good: 1

Page 6: Sentiment Analysis & Opinion Mining

Support vector machines

• Basic idea

Separating hyperplane

Margin

Support vectors

“Maximum separating-margin classifier”

RECAP

Page 7: Sentiment Analysis & Opinion Mining

Results

Compared to list-based classifiers (58-69%) RECAP

Page 8: Sentiment Analysis & Opinion Mining

Motivation & Introduction

Classifiers for SA

Approaches to SA

Applications

Lecture 1 Lecture 2Outline

Challenges of SA: Why SA is non-trivial

Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?

Fundamentals of supervised approaches

Standard ML techniques

Comparing different classifiers for SA

Resources for SA: SentiWordNet

Subjectivity detection: Separating the opinion from facts

Adjectives for SA: Adjectives are great!

Subject-based SA: Who defeated whom?

Page 9: Sentiment Analysis & Opinion Mining

Resources for SA

SentiWordNet– WordNet synsets marked with three types of

scores: positive, negative, objective

I am feeling happy.I am feeling happy.

Page 10: Sentiment Analysis & Opinion Mining

LpLn

also-see

antonymy

Seed-set expansion in SWN

The sets at the end of kth step are called Tr(k,p) and Tr(k,n)

Tr(k,o) is the set that is not present in Tr(k,p) and Tr(k,n)

Seed words

Page 11: Sentiment Analysis & Opinion Mining

Building SentiWordnet • Classifier alternatives used: Rocchio (BowPackage) &

SVM(LibSVM) • Different training data based on expansion• POS –NOPOS and NEG-NONEG classification

• Total eight classifiers– For different combinations of k and classifiers

• Synsets not in the expanded seed set are used as test synsets– Score is average of scores returned by the classifiers

Page 12: Sentiment Analysis & Opinion Mining

Motivation & Introduction

Classifiers for SA

Approaches to SA

Applications

Lecture 1 Lecture 2Outline

Challenges of SA: Why SA is non-trivial

Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?

Fundamentals of supervised approaches

Standard ML techniques

Comparing different classifiers for SA

Resources for SA: SentiWordNet

Subjectivity detection: Separating the opinion from facts

Adjectives for SA: Adjectives are great!

Subject-based SA: Who defeated whom?

Page 13: Sentiment Analysis & Opinion Mining

Subjectivity detection

• Aim: To extract subjective portions of text• Algorithm used: Minimum cut algorithm

Page 14: Sentiment Analysis & Opinion Mining

Constructing the graph

To model item-specific and pairwise information

independently.

Nodes: Sentences of the document and source & sink

Source & sink representthe two classes of sentences

Edges: Weighted with either of the two scores

Prediction whether the sentence is subjective or not

Indsub(si)=

• Why graphs?• Nodes and edges?• Individual Scores• Association scores

Prediction whether two sentences should have

the same subjectivity level

T : Threshold – maximum distance upto which sentences may be considered proximalf: The decaying functioni, j : Position numbers

Page 15: Sentiment Analysis & Opinion Mining

Constructing the graph

• Build an undirected graph G with vertices {v1, v2…,s, t} (sentences and s,t)

• Add edges (s, vi) each with weight ind1(xi)

• Add edges (t, vi) each with weight ind2(xi)

• Add edges (vi, vk) with weight assoc (vi, vk)

• Partition cost:

Page 16: Sentiment Analysis & Opinion Mining

Example

Sample cuts:

Page 17: Sentiment Analysis & Opinion Mining

Document

Subjective

Results (1/2)

• Naïve Bayes, no extraction : 82.8%• Naïve Bayes, subjective extraction : 86.4%• Naïve Bayes, ‘flipped experiment’ : 71 %

DocumentSubjectivity

detectorObjective

POLARITY CLASSIFIER

Page 18: Sentiment Analysis & Opinion Mining

Results (2/2)

Page 19: Sentiment Analysis & Opinion Mining

Motivation & Introduction

Classifiers for SA

Approaches to SA

Applications

Lecture 1 Lecture 2Outline

Challenges of SA: Why SA is non-trivial

Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?

Fundamentals of supervised approaches

Standard ML techniques

Comparing different classifiers for SA

Resources for SA: SentiWordNet

Subjectivity detection: Separating the opinion from facts

Adjectives for SA: Adjectives are great!

Subject-based SA: Who defeated whom?

Page 20: Sentiment Analysis & Opinion Mining

Adjectives for SA

• Many adjectives have high sentiment value– A ‘beautiful’ bag– A ‘wooden’ bench– An ‘embarrassing’ performance– A ‘nice wooden’ bench– A ‘wooden nice’ bench

• An idea would be to augment this polarity information to adjectives in the WordNet

Page 21: Sentiment Analysis & Opinion Mining

Setup

• Two anchor words (extremes of the polarity spectrum) were chosen

• PMI of adjectives with respect to these adjectives is calculated

Polarity Score (W)= PMI(W,excellent) – PMI (W, poor)

excellent poor

wordPMI PMI

Page 22: Sentiment Analysis & Opinion Mining

Experimentation

• K-means clustering algorithm used on the basis of polarity scores

• The clusters contain words with similar polarities

• These words can be linked using an ‘isopolarity link’ in WordNet

Page 23: Sentiment Analysis & Opinion Mining

Results

• Three clusters seen• Major words were with negative polarity scores• The obscure words were removed by selecting

adjectives with familiarity count of 3– the ones that are not very common

• Also reports an improvement when scores are used as feature values

Page 24: Sentiment Analysis & Opinion Mining

Motivation & Introduction

Classifiers for SA

Approaches to SA

Applications

Lecture 1 Lecture 2Outline

Challenges of SA: Why SA is non-trivial

Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?

Fundamentals of supervised approaches

Standard ML techniques

Comparing different classifiers for SA

Resources for SA: SentiWordNet

Subjectivity detection: Separating the opinion from facts

Adjectives for SA: Adjectives are great!

Subject-based SA: Who defeated whom?

Page 25: Sentiment Analysis & Opinion Mining

Subject-based SA

The horse bolted.

The movie lacks a good story.

Page 26: Sentiment Analysis & Opinion Mining

Lexiconsubj. bolt

b VB bolt subj

subj. lack obj.

b VB lack obj ~subj

Argument that sends the sentiment (subj./obj.)

Argument that receives the sentiment (subj./obj.)

Argument that receives the sentiment (subj./obj.)

Page 27: Sentiment Analysis & Opinion Mining

Lexicon

• Also allows ‘\S+’ characters• Similar to regular expressions• E.g. to put \S+ to risk

– The favorability of the subject depends on the favorability of ‘\S+’.

Page 28: Sentiment Analysis & Opinion Mining

Example

The movie lacks a good story.

G JJ good obj.

The movie lacks \S+.

B VB lack obj ~subj.

Lexicon : Steps :

1) Consider a context window of upto five words

2) Shallow parse the sentence

3) Step-by-step calculate the sentiment value based on lexicon and by adding ‘\S+’ characters at each step

Page 29: Sentiment Analysis & Opinion Mining

ResultsDescription Precision Recall

Benchmark corpus

Mixed statements

94.3% 28%

Open Test corpus

Reviews of a camera

94% 24%

Page 30: Sentiment Analysis & Opinion Mining

Motivation & Introduction

Classifiers for SA

Approaches to SA

Applications

Lecture 1 Lecture 2Outline

Challenges of SA: Why SA is non-trivial

Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?

Fundamentals of supervised approaches

Standard ML techniques

Comparing different classifiers for SA

Resources for SA: SentiWordNet

Subjectivity detection: Separating the opinion from facts

Adjectives for SA: Adjectives are great!

Subject-based SA: Who defeated whom?

Cross-lingual SACross-domain SAOpinion SpamSA for tweets

Page 31: Sentiment Analysis & Opinion Mining

Hindidocument Sentiment Label

Cross-lingual SA

Englishdocument

SentimentAnalysisSystem

SentimentAnalysisSystem

• Multilingual content on the internet growing

• How can the sentiment it carries be identified?

• Can we take help of the ‘rich cousin’ English?

Page 32: Sentiment Analysis & Opinion Mining

Alternatives to Cross-lingual SA

Strategies for SA for target language

Use corpus in target language

Translate to a ‘rich’ source

language

Develop resources for target language

Page 33: Sentiment Analysis & Opinion Mining

Motivation & Introduction

Classifiers for SA

Approaches to SA

Applications

Lecture 1 Lecture 2Outline

Challenges of SA: Why SA is non-trivial

Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?

Fundamentals of supervised approaches

Standard ML techniques

Comparing different classifiers for SA

Resources for SA: SentiWordNet

Subjectivity detection: Separating the opinion from facts

Adjectives for SA: Adjectives are great!

Subject-based SA: Who defeated whom?

Cross-lingual SACross-domain SAOpinion SpamSA for tweets

Page 34: Sentiment Analysis & Opinion Mining

Domain-dependence of words

• ‘deadly’– It was one deadly match!– There are some deadly poisonous snakes in the

jungles of Amazon.

Page 35: Sentiment Analysis & Opinion Mining

General Approach

• Retain the ‘common-to-all-domain’ words• Learn only the ‘special domain’ words

• Domain differences can be substantial

Page 36: Sentiment Analysis & Opinion Mining

Motivation & Introduction

Classifiers for SA

Approaches to SA

Applications

Lecture 1 Lecture 2Outline

Challenges of SA: Why SA is non-trivial

Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?

Fundamentals of supervised approaches

Standard ML techniques

Comparing different classifiers for SA

Resources for SA: SentiWordNet

Subjectivity detection: Separating the opinion from facts

Adjectives for SA: Adjectives are great!

Subject-based SA: Who defeated whom?

Cross-lingual SACross-domain SAOpinion SpamSA for tweets

Page 37: Sentiment Analysis & Opinion Mining

Opinion spam: A side-effect of UGC

• Reviews contain rich user opinions on products and services

• Anyone can write anything on the Web– No quality control

• Result• Incentives

Low quality reviews,review spam / opinion

Spam.

Positive opinion -> Financial gain for

organization

Page 38: Sentiment Analysis & Opinion Mining

Different types of spam reviews• Type 1 (untruthful opinions)• Type 2 (reviews on brands only)• Type 3 (non-reviews)

Giving undeserving reviews to some target objects in order

to promote/demote the objecthyper spam - undeserving positive reviews

defaming spam - malicious negative reviews

DUPLICATES

No comment on the productComments on brands, manufacturer or

sellers of the product

Advertisements Other irrelevant reviews containing no opinions

e.g. questions, answers and random textAlthough you should not expect prompt shippin.

(It took 3 weeks and several e-mails before I received my order.)I would order again from this merchant,

just because the price was right - http://www.pricegrabber.com

It’s from nikon, what more you want..

Reference : [Jindal et al, 2008]

Page 39: Sentiment Analysis & Opinion Mining

Motivation & Introduction

Classifiers for SA

Approaches to SA

Applications

Lecture 1 Lecture 2Outline

Challenges of SA: Why SA is non-trivial

Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?

Fundamentals of supervised approaches

Standard ML techniques

Comparing different classifiers for SA

Resources for SA: SentiWordNet

Subjectivity detection: Separating the opinion from facts

Adjectives for SA: Adjectives are great!

Subject-based SA: Who defeated whom?

Cross-lingual SACross-domain SAOpinion SpamSA for tweets

Page 40: Sentiment Analysis & Opinion Mining

Challenges with tweets

• Ill-formed– Spelling mistakes– Informal words/emoticons– Extensions of words (‘happppyyyyy’)

• Vague topics

www.clia.iitb.ac.in:8080/TwitterApp/index.jap

Page 41: Sentiment Analysis & Opinion Mining

Mood analysis

• Real-time updation of moods w. r. t. a topic

Snapshot: MoodViews

SOME ACTUAL APPLICATIONS

Page 42: Sentiment Analysis & Opinion Mining

Semantic search

• Sentiment search API by Evri• Claims to allow deeper answers like “who”, “why”

Page 43: Sentiment Analysis & Opinion Mining

A zeitgeist

• Understanding the ‘climate’

Snapshot: Twitscoop

Page 44: Sentiment Analysis & Opinion Mining

… and many more