sentiment analysis & opinion mining (nlp)

23
Natural Algorithm - Sentiment Analysis & Opinion Mining Drafted and Implemented at Sculatics

Upload: sridhar-manohar

Post on 12-Apr-2017

72 views

Category:

Technology


8 download

TRANSCRIPT

Page 1: Sentiment Analysis & Opinion Mining (NLP)

Natural Algorithm - Sentiment Analysis & Opinion Mining

Drafted and Implemented at Sculatics

Page 2: Sentiment Analysis & Opinion Mining (NLP)

Important Points

# This document contains algorithmic representation of a Sentiment Analysis & Opinion Mining system (using Natural Language Processing) implemented at Sculatics.

# The algorithm is written in a natural language format.

# This document expects its readers to have a basic understanding of Sentiment Analysis, Natural Language Processing and Artificial Intelligence.

# The purpose of this document is to help anybody design their own large scale Sentiment Analysis & Opinion Mining System. We have no issues even if you happen to develop the system for enterprise or commercial purposes.

# DO NOT expect any visual treats in this document. This is a pure technical document explained in natural language format.

# The program (developed at Sculatics) using this algo analyzes sentiments and mines opinions on Schools across the country on the basis of all reviews available on the web. It analyses 1000's of reviews of each School in a fast paced manner.

Page 3: Sentiment Analysis & Opinion Mining (NLP)

Algorithm Overview

# This algo reads each review and splits it into sentences.

# Each sentence is read and analysed to deduce the sentiment portrayed by the sentence and based on the analysis it assigns a sentiment score to each sentence.

# The algo further mines the aspects for which the opinions are expressed, and assigns sentiment scores to each of these aspects as well.

# Algo mainly works with lexicons and uses a unsupervised learning technique of Sentiment Analysis and Opinion Mining.

# Algo mentioned below also contains some variable names, this is to help the readers understand the flow better.

# The program implemented through this algo is named as 'SeeThrough' ver 1.0 (stable), currently being used at Sculatics.

# Program is written in PHP. Sentiment Analysis Charts were represented through Google Charts API.

Page 4: Sentiment Analysis & Opinion Mining (NLP)

Primary Data Sources (for the algo):

# Lexicons - Positive and Negative.

# AspectSource - List of all aspects per your requirement.

# Negation List - negation words that matter to your requirement.

# Negation Accompanying Positive Lexicon - negation terms that might accompany a positive lexicon.

# Same Opinion Conjunctions - List of same opinion conjunctions.

# Different Opinion Conjunctions - List of different opinion conjunctions.

# Punctuation Conjunctions - List of punctuation conjunctions.

Page 5: Sentiment Analysis & Opinion Mining (NLP)

Note

# Lexicons, Aspects, Conjunctions used for the implementation of this program are domain specific. Since we implemented this system for a specific purpose we did not see the need to read the lexicons from SentiWordNet which contains a wide range of lexicons and most of them would have been unnecessary for us.

# Having our own lexicon implementation is one of the reasons for the exceptional processing speed of the program.

# We have implemented the whole system in less than 200 lexicons.

# Ours is an n-gram implementation.

Page 6: Sentiment Analysis & Opinion Mining (NLP)

Algorithm

The algo is written in natural language as a series of blocks (smallest possible unit of implementation), with each block containing a series of steps.

Algo begins from next slide...

Page 7: Sentiment Analysis & Opinion Mining (NLP)

BLOCK 1First Iteration - processing unprocessed reviews.

Steps:1. Process one sentence at a time

2. Run each sentence against AspectSource until a match is found or until AspectSource exhaustion (whichever happens first)

3. Store matched Aspects in an array ($matched_aspects)

4. Store Sentences with Aspects in an array ($sentence_with_aspects)

5. Stores Sentence with No Aspects in an array ($sentence_with_no_aspects)

Page 8: Sentiment Analysis & Opinion Mining (NLP)

BLOCK 2Process Sentences with No Aspects

Steps:1. Process Sentences with No Aspects

2. Runs each sentence against a negation source until a match is found or until source exhaustion, whichever happens first

3. For Sentence with a negation match:a. Assign a score of -1 and store the sentence and its score in an array

($sentence_with_scores)

4. For Sentence with no negation match:a. Run sentences against a positive lexicon source

i. For every positive lex source match, check to see if there is any negation accompanying it

ii. update positiveLexiconScore and negativeLexiconScore accordingly

b. Run sentences against a negative lexicon sourcei. For every match, decrement negativeLexiconScore by -1

c. Store each sentence with its score (sum of positiveLexiconScore and negativeLexiconScore) in an array ($sentence_with_scores)

Page 9: Sentiment Analysis & Opinion Mining (NLP)

BLOCK 3Process Sentences with One Aspect

Steps:1. Process Sentences with One Aspect

2. Run each sentence against a negation source until a match is found or until source exhaustion, whichever happens first

3. For Sentence with a negation match:a. Assign a score of -1 and store the sentence and its score in an array

($sentence_with_scores) And

b. Give the Aspect a score of -1, store the sentence, aspect and its score in an array ($aspect_score)

4. For Sentence with no negation match:

a. Run sentences against a positive lexicon sourcei. For every positive lex source match, check to see if there is any

negation accompanying it

ii. update positiveLexiconScore and negativeLexiconScore accordingly

Page 10: Sentiment Analysis & Opinion Mining (NLP)

BLOCK 3Contd...

Steps:ii. Run sentences against a negative lexicon source

a. For every match, decrement negativeLexiconScore by -1

iii. Store each sentence with its score (sum of positiveLexiconScore and negativeLexiconScore) in an array ($sentence_with_scores)

Page 11: Sentiment Analysis & Opinion Mining (NLP)

BLOCK 4Process Sentences with Two Aspects

Steps:1. Process sentences with 2 Aspects (in case of more than 2, first-two occurring

aspects only should be taken)

2. Store the Aspect, Aspect Position in the sentence and Aspect Length in a temp array ($occ_pos_temp_array)

3. Compare two Aspects to see which one occurs firsta. Store Aspect Occurrence (first/Second), Aspect, Aspect Position, Aspect

length in an array ($occurrence_pos_array)

4. Extract substring between the two Aspects

5. Run Same Opinion Conjunction List against the substring until a match is found or until list exhaustion, whichever happens first

6. For a Same Opinion Conjunction 'match found'a. set $same_opinion_sentence (a flag) set to true

Page 12: Sentiment Analysis & Opinion Mining (NLP)

BLOCK 4Contd...

Steps:7. For a Same Opinion Conjunction 'match not found'

a. Run Diff Opinion Conjunction List against the substring until a match is found or until list exhaustion, whichever happens first

b. For a match found, set $same_opinion_sentence set to false and assign $diff_opinion_conj_match the matching diff opinion conjunction

Page 13: Sentiment Analysis & Opinion Mining (NLP)

BLOCK 5Process same Opinion Sentences

Steps:1. Process same Opinion Sentences

2. Run each sentence against a negation source until a match is found or until source exhaustion, whichever happens first

3. For Sentence with a negation match:a. Assign a score of -1 and store the sentence and its score in an array

($sentence_with_scores) And

b. Give the Aspect also a score of -1, store the sentence, aspect and its score in an array ($aspect_score)

4. For Sentence with no negation match:a. Run sentences against a positive lexicon source

i. For every positive lex source match, check to see if there is any negation accompanying it

ii. update positiveLexiconScore and negativeLexiconScore accordingly

Page 14: Sentiment Analysis & Opinion Mining (NLP)

BLOCK 5Contd...

Steps:4. Contd..:

b. Run sentences against a negative lexicon source i. For every match, decrement negativeLexiconScore by -1

c. Store each sentence with its score (sum of positiveLexiconScore and negativeLexiconScore) in an array ($sentence_with_scores)

d. Store Sentence, Aspect and Aspect Score in an array ($aspect_score)

Page 15: Sentiment Analysis & Opinion Mining (NLP)

BLOCK 6Process different Opinion Sentences

Steps:1. Process Different Opinion Sentences

2. Find the position of Different Opinion Conjunction word

3. Split the sentence into two. One-half upto the conjunction word, Second-half from the conjunction word till the end

4. Run First-half against a negation source until a match is found or until source exhaustion, whichever happens first

5. For First-half with a negation match:a. Assign a score of -1 to $first_half_score And

b. Give the Aspect also a score of -1, store the whole sentence, aspect and its score in an array ($aspect_score)

6. Run Second-half against a negation source until a match is found or until source exhaustion, whichever happens first

Page 16: Sentiment Analysis & Opinion Mining (NLP)

BLOCK 6Contd...

Steps:7. For Second-half with a negation match:

a. Assign a score of -1 to $second_half_score And

b. Give the Aspect also a score of -1, store the whole sentence, aspect and its score in an array ($aspect_score)

8. For a First-half with no negation match:a. Run First-half against a positive lexicon source

i. For every positive lex source match, check to see if there is any negation accompanying it

ii. update positiveLexiconScoreFirstHalf and negativeLexiconScoreFirstHalf accordingly

b. Run First-half against a negative lexicon sourcei. For every match, decrement negativeLexiconScoreFirstHalf by -1

Page 17: Sentiment Analysis & Opinion Mining (NLP)

BLOCK 6Contd...

Steps:9. For a Second-half with no negation match:

a. Run Second-half against a positive lexicon sourcei. For every positive lex source match, check to see if there is any

negation accompanying it

ii. update positiveLexiconScoreSecondHalf andnegativeLexiconScoreSecondHalf accordingly

b. Run Second-half against a negative lexicon sourcei. For every match, decrement negativeLexiconScoreSecondHalf by -1

10. Store each sentence with its score (sum of first-halfScore and second-halfScore and positiveLexiconScoreFirstHalf and negativeLexiconScoreFirstHalf and positiveLexiconScoreSecondHalf and negativeLexiconScoreSecondHalf) in an array ($sentence_with_scores)

11. Store Sentence, Aspect and Aspect Score in an array ($aspect_score)

Page 18: Sentiment Analysis & Opinion Mining (NLP)

BLOCK 7Process Sentences with Punctuation Conjunction

Steps:1. Process Punctuation Conjunction Sentences

2. Run Punctuation Conj Match List against the substring between the two Aspects

3. For a Punctuation Conj Match found:a. Set $punctuation_conjunction_match to true And

b. $matched_punctuation to the matched punctuation

c. Find punctuation position in the sentence

d. Split the sentence into 2, first-half uptil punctuation pos and second-half from punctuation pos to the end

4. Run First-half against a negation source until a match is found or until source exhaustion, whichever happens first

5. For First-half with a negation match:a. Assign a score of -1 to $first_half_score And

Page 19: Sentiment Analysis & Opinion Mining (NLP)

BLOCK 7Contd...

Steps:5. Contd...:

b. Give the Aspect also a score of -1, store the whole sentence, aspect and its score in an array ($aspect_score)

6. Run Second-half against a negation source until a match is found or until source exhaustion, whichever happens first

7. For Second-half with a negation match:a. Assign a score of -1 to $second_half_score And

b. Give the Aspect also a score of -1, store the whole sentence, aspect and its score in an array ($aspect_score)

8. For a First-half with no negation match:a. Run First-half against a positive lexicon source

i. For every positive lex source match, check to see if there is any negation accompanying it

ii. update positiveLexiconScore and negativeLexiconScore accordingly

Page 20: Sentiment Analysis & Opinion Mining (NLP)

BLOCK 7Contd...

Steps:8. Contd...:

b. Run First-half against a negative lexicon sourcei. For every match, decrement negativeLexiconScore by -1

9. For a Second-half with no negation match:a. Run Second-half against a positive lexicon source

i. For every positive lex source match, check to see if there is any negation accompanying it

ii. update positiveLexiconScore and negativeLexiconScore accordingly

b. Run Second-half against a negative lexicon sourcei. For every match, decrement negativeLexiconScore by -1

10. Store each sentence with its score (sum of first-halfScore and second-halfScore and positiveLexiconScore and negativeLexiconScore) in an array ($sentence_with_scores)

11. Store Sentence, Aspect and Aspect Score in an array ($aspect_score)

Page 21: Sentiment Analysis & Opinion Mining (NLP)

Ending Note(s)

# At this point of the algorithm you have two arrays, one with Sentences and their Scores($sentence_with_scores) and the other with Aspects and their Scores($aspect_score).

# When you process the $sentence_with_scores array, you will be able to find the overall sentiment score for all the reviews combined.

# And, when you process the $aspect_score array, you will be able to determine the aspect wise scores for all the reviews combined.

# Scores determine the Sentiment around the review.

# In this document, we are not providing the further implementation details, because how you process the Sentiment and Aspect scores is totally dependent on your implementation.

# Make sure you have enough parent-child info stored in $sentence_with_scores and $aspect_score arrays which will help determine the relationships at the end.

Page 22: Sentiment Analysis & Opinion Mining (NLP)

sub-BLOCK #Check for Negation word accompanying a positive lexicon

Note: This is an internal functionality thats gets called in every block of the algo. Below are the steps involved in it.

Steps:1. Find Positive Lexicon position in the sentence

2. Extract the substring till the positive lexicon

3. Split the substring into words - store in an array

4. Striking Distance - No.of words to search for a negation word

5. Depending upon whether the size of word split array is greater or lesser than striking distance, process steps and update negativeLexiconScore (if a negation is found), else increments positiveLexiconScore

Page 23: Sentiment Analysis & Opinion Mining (NLP)

Reference(s)

Bing Liu (Prof. Dept. of Comp. Sc. - UIC, Chicago) - Sentiment Analysis and Opinion Mining.