sentiment analysis & opinion mining
DESCRIPTION
Sentiment Analysis & Opinion Mining. Lecture Two: March 3, 2011. Aditya M Joshi M Tech3, CSE IIT Bombay {[email protected]}. Sentiment analysis (SA). Task of tagging text with orientation of opinion This is a good movie. This is a bad movie. The movie is set in Australia. - PowerPoint PPT PresentationTRANSCRIPT
Sentiment Analysis & Opinion MiningLecture Two: March 3, 2011
Aditya M JoshiM Tech3, CSEIIT Bombay
Sentiment analysis (SA)
Task of tagging text with orientation of opinion
This is a good movie.
This is a bad movie.
The movie is set in Australia.
Subjective
Objective
RECAP
Challenges of SA
• Domain dependent• Sarcasm• Thwarted expressions• Negation• Implicit polarity• Time-bounded
the sentences/words that contradict the overall sentiment
of the set are in majority
Example: “The actors are good, the music is brilliant and appealing.
Yet, the movie fails to strike a chord.”
Sarcasm uses words ofa polarity to represent
another polarity.
Example: “The perfume is soamazing that I suggest you wear it
with your windows shut”
Sentiment of a word is w.r.t. the
domain.
Example: ‘unpredictable’
For steering of a car,
For movie review,
“I did not like the movie.”
“Not only is the movie boring, it is also the biggest waste of producer’s
money.”
“Not withstanding the pressure of the public, let me admit that I have
loved the movie.”
“The camera of the mobile phone is less than one mega-pixel – quite
uncommon for a phone of today.”
“This phone allows me to send SMS.”
“This phone has a touch-screen.”
RECAP
How much opinion?
Chart created using : www.technorati.com/chart/ RECAP
Using ML for NLP
• Documents represented as feature vectors for classifiers
– Features: unigrams, etc.– Models: SVM, NB, etc.
Chart created using : www.technorati.com/chart/ RECAP
The movie is set in Australia. The movie is good.
The: 2movie: 2is: 2set: 1in: 1Australia: 1good: 1
Support vector machines
• Basic idea
Separating hyperplane
Margin
Support vectors
“Maximum separating-margin classifier”
RECAP
Results
Compared to list-based classifiers (58-69%) RECAP
Motivation & Introduction
Classifiers for SA
Approaches to SA
Applications
Lecture 1 Lecture 2Outline
Challenges of SA: Why SA is non-trivial
Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?
Fundamentals of supervised approaches
Standard ML techniques
Comparing different classifiers for SA
Resources for SA: SentiWordNet
Subjectivity detection: Separating the opinion from facts
Adjectives for SA: Adjectives are great!
Subject-based SA: Who defeated whom?
Resources for SA
SentiWordNet– WordNet synsets marked with three types of
scores: positive, negative, objective
I am feeling happy.I am feeling happy.
LpLn
also-see
antonymy
Seed-set expansion in SWN
The sets at the end of kth step are called Tr(k,p) and Tr(k,n)
Tr(k,o) is the set that is not present in Tr(k,p) and Tr(k,n)
Seed words
Building SentiWordnet • Classifier alternatives used: Rocchio (BowPackage) &
SVM(LibSVM) • Different training data based on expansion• POS –NOPOS and NEG-NONEG classification
• Total eight classifiers– For different combinations of k and classifiers
• Synsets not in the expanded seed set are used as test synsets– Score is average of scores returned by the classifiers
Motivation & Introduction
Classifiers for SA
Approaches to SA
Applications
Lecture 1 Lecture 2Outline
Challenges of SA: Why SA is non-trivial
Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?
Fundamentals of supervised approaches
Standard ML techniques
Comparing different classifiers for SA
Resources for SA: SentiWordNet
Subjectivity detection: Separating the opinion from facts
Adjectives for SA: Adjectives are great!
Subject-based SA: Who defeated whom?
Subjectivity detection
• Aim: To extract subjective portions of text• Algorithm used: Minimum cut algorithm
Constructing the graph
To model item-specific and pairwise information
independently.
Nodes: Sentences of the document and source & sink
Source & sink representthe two classes of sentences
Edges: Weighted with either of the two scores
Prediction whether the sentence is subjective or not
Indsub(si)=
• Why graphs?• Nodes and edges?• Individual Scores• Association scores
Prediction whether two sentences should have
the same subjectivity level
T : Threshold – maximum distance upto which sentences may be considered proximalf: The decaying functioni, j : Position numbers
Constructing the graph
• Build an undirected graph G with vertices {v1, v2…,s, t} (sentences and s,t)
• Add edges (s, vi) each with weight ind1(xi)
• Add edges (t, vi) each with weight ind2(xi)
• Add edges (vi, vk) with weight assoc (vi, vk)
• Partition cost:
Example
Sample cuts:
Document
Subjective
Results (1/2)
• Naïve Bayes, no extraction : 82.8%• Naïve Bayes, subjective extraction : 86.4%• Naïve Bayes, ‘flipped experiment’ : 71 %
DocumentSubjectivity
detectorObjective
POLARITY CLASSIFIER
Results (2/2)
Motivation & Introduction
Classifiers for SA
Approaches to SA
Applications
Lecture 1 Lecture 2Outline
Challenges of SA: Why SA is non-trivial
Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?
Fundamentals of supervised approaches
Standard ML techniques
Comparing different classifiers for SA
Resources for SA: SentiWordNet
Subjectivity detection: Separating the opinion from facts
Adjectives for SA: Adjectives are great!
Subject-based SA: Who defeated whom?
Adjectives for SA
• Many adjectives have high sentiment value– A ‘beautiful’ bag– A ‘wooden’ bench– An ‘embarrassing’ performance– A ‘nice wooden’ bench– A ‘wooden nice’ bench
• An idea would be to augment this polarity information to adjectives in the WordNet
Setup
• Two anchor words (extremes of the polarity spectrum) were chosen
• PMI of adjectives with respect to these adjectives is calculated
Polarity Score (W)= PMI(W,excellent) – PMI (W, poor)
excellent poor
wordPMI PMI
Experimentation
• K-means clustering algorithm used on the basis of polarity scores
• The clusters contain words with similar polarities
• These words can be linked using an ‘isopolarity link’ in WordNet
Results
• Three clusters seen• Major words were with negative polarity scores• The obscure words were removed by selecting
adjectives with familiarity count of 3– the ones that are not very common
• Also reports an improvement when scores are used as feature values
Motivation & Introduction
Classifiers for SA
Approaches to SA
Applications
Lecture 1 Lecture 2Outline
Challenges of SA: Why SA is non-trivial
Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?
Fundamentals of supervised approaches
Standard ML techniques
Comparing different classifiers for SA
Resources for SA: SentiWordNet
Subjectivity detection: Separating the opinion from facts
Adjectives for SA: Adjectives are great!
Subject-based SA: Who defeated whom?
Subject-based SA
The horse bolted.
The movie lacks a good story.
Lexiconsubj. bolt
b VB bolt subj
subj. lack obj.
b VB lack obj ~subj
Argument that sends the sentiment (subj./obj.)
Argument that receives the sentiment (subj./obj.)
Argument that receives the sentiment (subj./obj.)
Lexicon
• Also allows ‘\S+’ characters• Similar to regular expressions• E.g. to put \S+ to risk
– The favorability of the subject depends on the favorability of ‘\S+’.
Example
The movie lacks a good story.
G JJ good obj.
The movie lacks \S+.
B VB lack obj ~subj.
Lexicon : Steps :
1) Consider a context window of upto five words
2) Shallow parse the sentence
3) Step-by-step calculate the sentiment value based on lexicon and by adding ‘\S+’ characters at each step
ResultsDescription Precision Recall
Benchmark corpus
Mixed statements
94.3% 28%
Open Test corpus
Reviews of a camera
94% 24%
Motivation & Introduction
Classifiers for SA
Approaches to SA
Applications
Lecture 1 Lecture 2Outline
Challenges of SA: Why SA is non-trivial
Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?
Fundamentals of supervised approaches
Standard ML techniques
Comparing different classifiers for SA
Resources for SA: SentiWordNet
Subjectivity detection: Separating the opinion from facts
Adjectives for SA: Adjectives are great!
Subject-based SA: Who defeated whom?
Cross-lingual SACross-domain SAOpinion SpamSA for tweets
Hindidocument Sentiment Label
Cross-lingual SA
Englishdocument
SentimentAnalysisSystem
SentimentAnalysisSystem
• Multilingual content on the internet growing
• How can the sentiment it carries be identified?
• Can we take help of the ‘rich cousin’ English?
Alternatives to Cross-lingual SA
Strategies for SA for target language
Use corpus in target language
Translate to a ‘rich’ source
language
Develop resources for target language
Motivation & Introduction
Classifiers for SA
Approaches to SA
Applications
Lecture 1 Lecture 2Outline
Challenges of SA: Why SA is non-trivial
Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?
Fundamentals of supervised approaches
Standard ML techniques
Comparing different classifiers for SA
Resources for SA: SentiWordNet
Subjectivity detection: Separating the opinion from facts
Adjectives for SA: Adjectives are great!
Subject-based SA: Who defeated whom?
Cross-lingual SACross-domain SAOpinion SpamSA for tweets
Domain-dependence of words
• ‘deadly’– It was one deadly match!– There are some deadly poisonous snakes in the
jungles of Amazon.
General Approach
• Retain the ‘common-to-all-domain’ words• Learn only the ‘special domain’ words
• Domain differences can be substantial
Motivation & Introduction
Classifiers for SA
Approaches to SA
Applications
Lecture 1 Lecture 2Outline
Challenges of SA: Why SA is non-trivial
Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?
Fundamentals of supervised approaches
Standard ML techniques
Comparing different classifiers for SA
Resources for SA: SentiWordNet
Subjectivity detection: Separating the opinion from facts
Adjectives for SA: Adjectives are great!
Subject-based SA: Who defeated whom?
Cross-lingual SACross-domain SAOpinion SpamSA for tweets
Opinion spam: A side-effect of UGC
• Reviews contain rich user opinions on products and services
• Anyone can write anything on the Web– No quality control
• Result• Incentives
Low quality reviews,review spam / opinion
Spam.
Positive opinion -> Financial gain for
organization
Different types of spam reviews• Type 1 (untruthful opinions)• Type 2 (reviews on brands only)• Type 3 (non-reviews)
Giving undeserving reviews to some target objects in order
to promote/demote the objecthyper spam - undeserving positive reviews
defaming spam - malicious negative reviews
DUPLICATES
No comment on the productComments on brands, manufacturer or
sellers of the product
Advertisements Other irrelevant reviews containing no opinions
e.g. questions, answers and random textAlthough you should not expect prompt shippin.
(It took 3 weeks and several e-mails before I received my order.)I would order again from this merchant,
just because the price was right - http://www.pricegrabber.com
It’s from nikon, what more you want..
Reference : [Jindal et al, 2008]
Motivation & Introduction
Classifiers for SA
Approaches to SA
Applications
Lecture 1 Lecture 2Outline
Challenges of SA: Why SA is non-trivial
Variants of SA: What forms does it exist in?Opinion on the web: Is doing SA really worth it?
Fundamentals of supervised approaches
Standard ML techniques
Comparing different classifiers for SA
Resources for SA: SentiWordNet
Subjectivity detection: Separating the opinion from facts
Adjectives for SA: Adjectives are great!
Subject-based SA: Who defeated whom?
Cross-lingual SACross-domain SAOpinion SpamSA for tweets
Challenges with tweets
• Ill-formed– Spelling mistakes– Informal words/emoticons– Extensions of words (‘happppyyyyy’)
• Vague topics
www.clia.iitb.ac.in:8080/TwitterApp/index.jap
Mood analysis
• Real-time updation of moods w. r. t. a topic
Snapshot: MoodViews
SOME ACTUAL APPLICATIONS
Semantic search
• Sentiment search API by Evri• Claims to allow deeper answers like “who”, “why”
A zeitgeist
• Understanding the ‘climate’
Snapshot: Twitscoop
… and many more