tutorial on opinion mining and sentiment analysis

Download Tutorial on Opinion Mining and Sentiment Analysis

Post on 13-Apr-2017




11 download

Embed Size (px)


Tutorial on Opinion Mining and Sentiment analysis

Tutorial on Opinion Mining and Sentiment Analysisby Rezvaneh Rezapour (rezapou2) and Yun Hao (yunhao2)

Prepared as an assignment for CS410: Text Information Systems in Spring 2016

IntroductionHow do you choose a movie to watch?How do you pick a restaurant or hotel?How do you decide which camera to buy?

Fig 1


Fig 2Fig 3 and Fig 4Fig 5

IntroductionPeople like to share their experiences or opinions about a place, event or product with others.There are numerous Web sites and pages containing consumer opinions, for example Amazon and IMDB are great and valuable sources of information (reviews) to find others opinions.This online word-of-mouth behavior represents new and measurable sources of information. [10]But. It is tooooo much !!!!!!!

MotivationWhat do we need ? Study and extract useful information from individuals reviews.

Why is it helpful?Save timeHelp to find good and bad featuresHelp to find positive and negative points

Opinion MiningSentiment Analysis


Opinion Mining


Opinion Mining (cont.)Some terms are often used interchangeably for opinion mining.

Fig 6 Synonyms of Opinion Mining [8]


Components of Opinion Mining ModelQuestion: What do we want to extract from a review?Positive and Negative opinionsTarget of the opinions; Entity Related set of components; aspectRelated attributes; aspectSometimes opinion holder; opinion source

iPhoneBatteryVoice quality

Question: What do we want to extract from a review?Positive and Negative opinionsTarget of the opinions; Entity Related set of components; aspectRelated attributes; aspectSometimes opinion holder; opinion source

ObjectFeaturesOpinion HolderOpinion Passage on a FeatureComponents of Opinion Mining Model(cont.)

OpinionsRegular: usually referred to as opinionPositive or Negative sentiment, attitude or appraisal about an entity or aspectComparative:Relation of similarities or differences between two or more entitiesPreference of opinion holder based on shared aspectsUsually consists of comparative or superlative adjectives or adverbsNeed to first identify the objects being compared, the features being compared, and the preferences of the comparison [8]

Subjectivity and EmotionObjective sentence present factual informationSubjective sentence present feelings and beliefsEmotions are subjective feelings and thoughtsSome sentences express no emotion or opinion

Opinion SummaryAspect Based:Highlight important parts of the reviewsProduce a short text summaryPhone 1: Aspect: GeneralPositive: 5 Negative: 3 Aspect: BatteryPositive: 20 Negative: 5 Pros: Easy to read and understandCons: very qualitative

Challenges and IssuesChallengesRelevant objects vs irrelevant onesSame feature expressed in different wordingsWords that could be positive and negative in different contextLong text that could contain both positive and negative opinionsDetecting opinion oriented sentencesIntegrating the tasks aboveSome other issuesIdentifying comparison wordsDealing with different writing style by different peopleTracking changing opinionsMeasuring strength of opinionsTackling sarcastic statements and mixed viewsSpam opinions

Sentiment ClassificationUnit of Analysis: SentenceDocumentMethods:SupervisedSemi-SupervisedUnsupervised

Getting Entity and OpinionCreate a structured text from reviewsExtract object features and opinionsDetermine all sentiment polarities for opinionsDetermine relevant opinions for each object featuresMethod:Use Conditional random Fields Linear CRFs ; Computed MAPLeverage conjunction structure and syntactic tree structure and integrate them both.

Getting Entity and Opinion (cont.)What features?Token, lemma, part of speechExpand each word by getting synonyms and antonyms from WordNet Use SentiWordNet to get the prior polarityCreate your baselineRule-Based methodsLexicon-Based methodsFinding of the related paper [2]:The proposed framework in the paper outperformed many state-of-art methods.

Using Ontology to Identify FeatureHow?Use a seed set from the reviewsUse ontology construction to:Select relevant sentences including conceptionsExtract the conceptions from those sentencesSentences should consist of conjunctions and at least one concept seed.

Using Ontology to Identify Feature (cont.)Feature identification:Use ontology terminologies to extract featuresIdentify related sentences which contains ontology terminologiesPolarity Identification:Use SentiWordNetCalculate a score for positive, negative and neutral wordsGenerate an adjective lexicon with prior polaritiesSentiment Analysis:Calculate the overall opinionConsider negative words and conjunctions wordsFinding of the related paper[3]:The experiment was successful and the result is good:Accuracy of Feature Detection Result: 76.9%Accuracy of Polarity Analysis: positive: 88.3% negative: 81.7%

Making Use of Other FeaturesHypothesis 1:Users prefer reviews that satisfies their information need, that are credible, and that have mainstreaming opinion. [6]

Features indicatingInformation needWhether the review satisfies users information needCredibilityIs the review credible enough?BiasIs the review one of the mainstreaming ones?

How to quantify these features?

Making Use of Other Features (cont.)How to quantify these features? [6]Information needCapture rate: the ratio of words in product attributes and functions mentioned in the content of reviews CredibilityReliable writers often use past and perfect tense in their writing according to psychological theory.The percentage of volitive auxiliary in a review and the percentage of past and perfect tenses in a review. BiasThe most frequent in reviews for a product is considered as mainstreaming opinion (based on data from Amazon), and reviews that are given the same number of stars for the product is considered to carry mainstreaming opinion.The divergence (of the ratings) from mainstreaming opinion for a review is calculated.

Making Use of Other Features (cont.)Hypothesis 2:Reviews of reasonable length and lacking spelling and grammar errors are easy to read and thus more helpful. [7]

Features indicatingThe average level of subjectivity and the range and mix of subjectivity and objectivityContent readability

How to quantify these features?

Making Use of Other Features (cont.)How to quantify these features? [7]The average level of subjectivity and the range and mix of subjectivity and objectivityAn average probability of a review being subjective (objective information is considered as the information that also appears in the product description, and subjective is everything else)Content readabilityNumber of spelling mistakes within each reviewNumber of sentences, words, and characters of a review

Making Use of Other Features (cont.)Hypothesis 3: Customer opinions highly depend on the features of the product being reviewed. [9]How to learn useful features from the reviews? [9]Identify the features that are relevant to consumers as regarding to a certain type of product as well as the salience (relative importance of the features)Translate text into WordNet concepts and construct a graph with concepts being vertices and is-a relation being edgesUse semantic similarity to add new edges to similar vertices Locate all the related mentions of the identified features in the reviewsQuantify opinions mined from the reviews and create a corresponding numeric vector for each review

What If Opinions Are Hidden?Going beyond overall rating to find users opinion about different aspectsHow?Use Latent Aspect Rating Analysis (LARA)Approach:Identify the major aspects and segment reviewsHow? Bootstrapping-based algorithm guided by a few seed words describing the aspectsInfer aspect ratings and weights for each individual review based on the content and overall ratingHow? A generative Latent Rating Regression (LRR) model

What If Opinions Are Hidden? (cont.)LRR?The overall rating is assumed to be generated from small aspects in the review which can be captured and weighted using a regression model.After inferring aspects and their weights we use Maximum Likelihood estimator (using EM algorithm) to find the optimal value that can maximize the probability of observing the overall ratings.Finding from related paper[5]:LRR worked better than the other baseline algorithms in measuring aspect ratings.

Can Social Context Help in Review Mining?What is social context?The history of the reviewers and their social network interactions.This information is specified to some social network websites and not all.Using textual context and social context information can be helpful in evaluating the quality of individual reviewers and reviews.How?Construct a baseline using labeled reviews and the review quality pair consists of the quality and helpfulness of each review which comes from manual labeling.Improve the above mentioned feature by adding social context.Use labeled data, unlabeled data and their social context information to create a semi-supervised model.

Can Social Context Help in Review Mining? (cont.)Features:Text statistics: e.g.: length of the review, average length of sentencesSyntactic features: # of POS tagsConformity features: comparison of the review with other reviews using KL-divergence.Sentiment features: positive and negative words in the reviews.Extract features and constraints from social context and add the regularizations to the model.Finding of the related paper[4]:Using regularizations on social context improved the accuracy of the prediction when working with small training data.

ReferencesImages:Fig1: ht


View more >