sentiment analysis using hybrid structure of machine learning algorithms

56
Sentiment Analysis A descriptive approach towards its Classification Techniques Sangeeth Nagarajan MR1-CSE Roll No :13 Guided By Asst Prof: Rejimoan R July 29, 2014 Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 1 / 54

Upload: sangeeth-nagarajan

Post on 01-Dec-2014

88 views

Category:

Education


4 download

DESCRIPTION

Sentiment Analysis is the process used to determine the attitude/ opinion/ emotion expressed by a person about a particular topic. The presentation dealt with general approach and different machine learning based classification alogorithms. The slides is based on the work "Sentiment analysis using Neuro-Fuzzy and Hidden Markov models of text" by Rustamov S , Mustafayev E and Clements M A.

TRANSCRIPT

Page 1: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Sentiment AnalysisA descriptive approach towards its Classification

Techniques

Sangeeth Nagarajan

MR1-CSE Roll No :13

Guided ByAsst Prof: Rejimoan R

July 29, 2014

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 1 / 54

Page 2: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Content

1 Introdction

2 Sentiment Analysis

3 Related Works

4 General Procedure

5 Data Preparation And Feature Extraction

6 Fuzzy Control System for Sentiment Analysis

7 Neuro Fuzzy Inference System for Sentiment Analysis

8 Hidden Markov Model for Sentiment Analysis

9 Hybrid Structure

10 Conclusion

11 References

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 2 / 54

Page 3: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Case Study 1

In late 1980s, a person is planning to buy a black and television . Whatcan be done to verify the quality and performance of the system?

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 3 / 54

Page 4: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Case Study 1

Solutions:

He can check with the person who were using the system

He can directly discuss with customer care person.

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 4 / 54

Page 5: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Case Study 1

Solutions:

He can check with the person who were using the system

He can directly discuss with customer care person.

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 4 / 54

Page 6: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Case Study 2

Imagine you want to buy a smartphone with latest features. What will youdon to know the features provided by different companies?

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 5 / 54

Page 7: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Case Study 2

Solutions:

He can check with the person who were using the system

He can directly discuss with customer care person.

Added to these, you can surf website that provide compartive featuresof smartphone and Users Review

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 6 / 54

Page 8: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Case Study 2

Solutions:

He can check with the person who were using the system

He can directly discuss with customer care person.

Added to these, you can surf website that provide compartive featuresof smartphone and Users Review

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 6 / 54

Page 9: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Sentiment Analysis

Process of determining the overall rating of a commodity from usersreview is called Sentiment Analysis

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 7 / 54

Page 10: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Introduction I

Process used to determine the attitude/ opinion/ emotion expressedby a person about a particular topic.

Uses natural language processing and text analytics to identify andextract subjective information in source materials.

Automatically characterize the overall feeling or mood of consumerstoward a specific brand or company and determine whether they areviewed positively or negatively.

Companies and organizations are interested in finding out customersopinions about products and services via social media.

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 8 / 54

Page 11: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Introduction II

The main Goal is for :

Detecting whether a segment of text contains an expression ofopinion.

Detecting the overall polarity of the text :- positive or negative.

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 9 / 54

Page 12: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Sentiment Analysis I

Sentiment Analysis have many other name

1 Opinion Extraction

2 Opinion Mining

3 Sentiment Mining

4 Subjective Analysis

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 10 / 54

Page 13: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Sentiment Analysis II

Why Sentiment Analysis?

Movie : Is this review postive or negative?

Products: What do people think about the new iPhone?

Politics: What do people think about the candidate or issue?

Prediction : Predict emotion outcomes or market trends fromsentiment.

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 11 / 54

Page 14: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Related Works I

1 Learning Methods: The different learning types are as follows

Supervised learning: Learning classifier from training data and assignclass labels to test data.Unsupervised learning: Learning without training data.Semi-supervised learning: Amalgamate both labeled and unlabeledtraining data.

2 Classification methods: There are Natural Language Processing andpattern-based, machine learning algorithms,such as

Naive Bayes (NB)Maximum Entropy (ME)Support Vector Machines (SVM)Fuzzy Interface System ClassficationNeural Fuzzy Interface System ClassficationHidden Markov Model

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 12 / 54

Page 15: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Related Works II

3 Feature Extaction Methods: There are four feature categories featureextraction methods used in sentiment analysis studies. These include

Syntactic Feature

Semantic Feature

Link-Based Feature

Stylistic Feature

Based On Occurences of Word in Corpus

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 13 / 54

Page 16: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

General Procedure I

Figure : Steps And Techniques used in Sentiment Analysis

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 14 / 54

Page 17: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

General Procedure II

1 Text Preprocessing:- Divided into two subcategories.

Tokenization:- The documents are separated as tokens and used forfurther processing.

Removal of Stop Words:- Some of the more frequently used stop wordsfor English include ”a”, ”of”, ”the”, ”I”, ”it”, ”you”, and ”and” theseare generally regarded as ’functional words’ which do not carrymeaning. It is practical to remove those words which appear too often.

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 15 / 54

Page 18: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

General Procedure III

2 Text TransformationThe score of each sentence is calculated by sum of weight of each termin the corresponding sentences.The weight of each term is calculated by multiplication of TF and IDFof that word based on adjective word extracted from Parts of speechtags.

TF (t) =P

N(1)

where,P=Number of times the adjective term occurs in document(d)N=Total Number of adjective in document (d).

IDF (t) = logND

DF (t)(2)

ND = total number of document in the document collectionDF (t) = number of documents in which adjective term (t)occurs in the document collection.

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 16 / 54

Page 19: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

General Procedure IV

3 Feature Selection

The simplest statistical approach for feature selection is to use themost frequently occurring words in the corpus as polarity indicators.The majority of the approaches for sentiment analysis involve atwo-step process:

i. Identify the parts of the document to contribute the positive ornegative sentiments.

ii. Join these parts of the document in ways that increase the odds of thedocument falling into one of these two polar categories.

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 17 / 54

Page 20: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

General Procedure V

4 Sentiment Classification

Classification of sentences into postive, negative and neutral polarity.

Not a clear boundary between the concepts of ”positive”,”negative”and ”neutral”.

We can use fuzzy set classification or machine learning methods forthis process.

In fuzzy set, membership function for each set is defined and optionshavings highest membership function is allocated to the group set.

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 18 / 54

Page 21: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

General Procedure VI

5 Parameters for Evaluation

Table : Contegency Table

Correct labels

Positive Negative

Classifiedlabels

Positive TP(True Positive) FP(False Positive)

Negative FN(False Negative) TN(True Negative)

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 19 / 54

Page 22: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

General Procedure VII

Accuracy =TP + TN

TP + TN + FP + FN(3)

Precision =TP

TP + FP(4)

Recall =TP

TP + FN(5)

F =2 ∗ Precision ∗ RecallPrecision + Recall

(6)

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 20 / 54

Page 23: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Data Preparation And Feature Extraction I

Feature Extraction algorithm consists of two parts:

i. Data Preparation.

ii. Calculation of Feature Vectors.

Data Preparation

Use a sentiment polarity dataset v2.0.

Machine learning based Classification consist of two set

1) Training set: Used by an automatic classifier to learn the differentiatingcharacteristics of documents.

2) Test set : Validate the performance of the automatic classifier.

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 21 / 54

Page 24: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Data Preparation And Feature Extraction II

Operations carried out are as follows:

* Combine all files from the corpus and make one text file.

* Convert the text to an array of words.

* Sort the array of words ascending order.

* Code:V = {v1, ..., vM}, where M is the number of different words(terms) in the corpusCombine all files from the corpus and make onetext file.

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 22 / 54

Page 25: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Data Preparation And Feature Extraction III

Calculation of Feature Vectors

N is the number of classes.

M is the number of different words (terms) in the corpus.

R is the number of observed sequences in the training process.

W = {w r1 ,w

r2 , ..,w

rTr} are the reviews in the training dataset, where

Tr is the length of r-th review, r = 1, 2, ...,R.

µi ,j describes the association between i-th term (word) and the j-thclass.

ci ,j is the number of times i-th term occurred in the j-th class.

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 23 / 54

Page 26: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Data Preparation And Feature Extraction IV

ei is the normalized entropy of the i-th term in the corpus

ei = − 1lgN

N∑j=1

(ci,jtilg

ci,jti

), i = 1, ...,M;j = 1, ...,N.

ti =∑jci ,j denotes the occurrence times of the i-th term in the

corpus.

Calucate Membership degree of term by an analytical formula:

µi ,j =

ci,j

M∑v=1

cv,j

(1−ei )

N∑t=1{

M∑v=1

cv,j (1−ei )}, ni ≥ nmin

0, ni < nmin

(7)

where nmin = 40

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 24 / 54

Page 27: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Fuzzy Control System for Sentiment Analysis I

Fuzzy inference is the process of formulating the mapping from giveninput(s) to output(s) using fuzzy logic.

The process involves membership functions, logic operations, andif-then rules.

At first stage membership function is estimated, then apply fuzzyoperations and modify parameters by the back-propagationalgorithm.

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 25 / 54

Page 28: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Fuzzy Control System for Sentiment Analysis II

Figure : Realization scheme of fuzzy control process

1 The membership degree of terms (µri ,j) of the r -th review arecalculated by (7).

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 26 / 54

Page 29: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Fuzzy Control System for Sentiment Analysis III

2 Maximum membership degree is found with respect to the classes forevery term of the r-th sentiment

µri ,j = µri ,j , j = arg( max1≤v≤N

µri ,v ), i = 1, ..,M. (8)

3 Means of maxima are calculated for all classes:

µrj =

∑k∈Z r

j

µrk,j

l rj,Z r

j = {i : µri ,v ) = max1≤v≤N

µri ,v}, j = 1, ..,N. (9)

where l rj = |Z rj | is the number of elements of the set Z r

j

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 27 / 54

Page 30: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Fuzzy Control System for Sentiment Analysis IV

Use the centre of gravity defuzzification method for defuzzificationoperation.

It avoids ambiguities which may arise when an output degree ofmembership comes from more than one crisp output value.

The objective function is defined as follows:

E (y) =1

2

R∑r=1

(µrj yj

N∑j=1

µrj

− dr )2 −→ miny∈RN

, (10)

y = y1, y2, .., yN , dr ∈ {1, 2, ..,N}

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 28 / 54

Page 31: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Fuzzy Control System for Sentiment Analysis V

The partial derivatives of this function are calculated in followingform:

∂E (y)

∂yt=

R∑r=1

µrj

N∑j=1

µrj

(

N∑j=1

µrj yj

N∑j=1

µrj

− dr ), t = 1, 2, ..N. (11)

Rounding of y shows the index of the classes obtained in the result

y =

N∑j=1

µjy∗j

N∑j=1

µj

(12)

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 29 / 54

Page 32: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Fuzzy Control System for Sentiment Analysis VI

Acceptance strategy (s):

s =

{is ∈ I , if y ∈ (is −41, is +41)

reject, otherwise(13)

where is is the index of the appropriate class, I =1,2,...,N.Here41 ∈ [0; 0.5) is the main quantity, which influences the reliability ofthe system. Results of sentiment analysis of movie reviews withdifferent values of 41 in Table 2 is as shown.

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 30 / 54

Page 33: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Fuzzy Control System for Sentiment Analysis VII

Table : Result of FCS for Classification of movie reviews

Folds41 = 0.4 42 = 0.45 No

Rejection

Corr(%) Rej(%) Err(%) Corr(%) Rej(%) Err(%) Correct(%)

1 65 24.5 10.5 74.5 12 13.5 81

2 73.5 19 7.5 79.5 9 11.5 84

3 66 24.5 9.5 73.5 12.5 14 81

4 71.5 22 6.5 77 11.5 11.5 83.5

5 72.5 19.5 8 81 7.5 11.5 84.5

6 70 19.5 10.5 77.5 7.5 15 81.5

7 70.5 19 10.5 76.5 9 14.5 81

8 69 22 9 75.5 11.5 13 82

9 66 24 10 73 12 15 81.5

10 71.5 19.5 9 80 7.5 12.5 84.5

Averg 69.55 21.35 9.1 76.8 10 13.2 82.45

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 31 / 54

Page 34: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Neuro Fuzzy Inference System for Sentiment Analysis I

Fuzzy logic systems are good at explaining their decisions, but theycannot automatically acquire the rules they use to make thosedecisions.

Neural networks are good at recognizing patterns, they are not goodat explaining how they reach their decisions.

Creation of intelligent hybrid systems where two or more techniquesare combined in an appropriate manner can overcome the limitationsof individual techniques

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 32 / 54

Page 35: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Neuro Fuzzy Inference System for Sentiment Analysis II

Figure : The Structure of ANFIS

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 33 / 54

Page 36: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Neuro Fuzzy Inference System for Sentiment Analysis III

Table : Results of ANFIS for Classification of movie reviews

Folds42 = 0.5;43 = 0.5 42 = 0.1;43 = 0.5 No

Rejection

Corr(%) Rej(%) Err(%) Corr(%) Rej(%) Err(%) Correct(%)

1 63.5 26.5 10 73.5 12.5 14 81

2 68 26 6 79 10 11 85.5

3 65 27 8 72.5 15.5 12 81

4 70.5 23.5 6 77 11 12 835

5 64 29.5 6.5 80 9 11 86

6 69 21 10 76 10 14 82.5

7 70 21 9 77 8 15 81.5

8 65.5 26 8.5 75 12 13 82.5

9 66 22.5 11.5 73.5 13 13.5 81

10 68.5 23 8.5 80 7.5 12.5 85

Averg 67 24 8.4 76.35 10.85 12.8 83

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 34 / 54

Page 37: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Hidden Markov Model For Sentiment Analysis I

Bayes’ theorem

Bayes’ theorem gives the relationship between the probabilities of Aand B, P(A) and P(B), and the conditional probabilities of A given Band B given A, P(A|B) and P(B|A) . In its most common form, it is:

P(A|B) =P(B|A) ∗ P(B)

P(A)(14)

Helps to use a known outcome to predict the sequence of eventsleading up to that outcome.

Example

We need to know which party is ruling based on tax cut.

Let there be two party A and B.

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 35 / 54

Page 38: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Hidden Markov Model For Sentiment Analysis II

We know since there is two parties, probability of each party to rule is0.5. i.e.

* P(A) = 0.5* P(B) = 0.5

From the previous history we can get the details of probability of taxcut given party A or B was elected. Let

* P(t|A) = 0.25, then P(t ′|A)= 0.75* P(t|B) = 0.85, then P(t ′|B)= 0.15

Figure : Tree Model

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 36 / 54

Page 39: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Hidden Markov Model For Sentiment Analysis III

From the tree model, we can calculate the probability of P(t).

P(t) = P(t|A) ∗ P(A) + P(t|B) ∗ P(B)

= (0.25 ∗ 0.5) + (0.85 ∗ 0.5)

= 0.125 + 0.425

= 0.55

Therefore by Bayes’ Theorem,

P(B|t) = P(t|B) ∗ P(B)/P(t)

= (0.85 ∗ 0.5)/0.55

= 0.772

P(A|t) = P(t|A) ∗ P(A)/P(t)

= (0.25 ∗ 0.5)/0.55

= 0.227

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 37 / 54

Page 40: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Hidden Markov Model For Sentiment Analysis IV

Markov Model

Figure : Markov Model

3 states- Bull,Bear and Even

3 observations- Up, Down and Unchanged.

For given a sequence of observations,up-down-down,the state sequences is Bull-Bear-Bear

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 38 / 54

Page 41: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Hidden Markov Model For Sentiment Analysis V

Figure : Hidden Markov Model

The key difference is that if we have the observation sequence,up-down-down, then we cannot say exactly what state sequenceproduced and thus the state sequence is hidden.

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 39 / 54

Page 42: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Hidden Markov Model For Sentiment Analysis VI

Calculate the probability that the model produced the sequence, aswell as which state sequence was most likely to have produced theobservations.

Applied in many areas of signal processing, and in particular speechprocessing. Also been applied with success to low level NLP taskssuch as part-of-speech tagging, phrase chunking, and extractingtarget information from documents.

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 40 / 54

Page 43: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Hidden Markov Model For Sentiment Analysis VII

The parameters of the HMM applied in the system as follows1 N is the number of states.2 M is the number of different words (terms) of reviews taking part in

the training process for the given problem.3 V includes all possible observations sets, V = {v1, ..., vM} (The

elements of these sets in the understanding problem are different wordsthan are in the reviews taking part in the training process)

4 Π = {Πi}i = 1N are initial state distributions: Πi = P(q1 = i)

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 41 / 54

Page 44: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Hidden Markov Model For Sentiment Analysis VIII

5 A = bai,jc is the state transition probability matrix,ai,j = P(qt = 1|qj = i), 1 ≤ i , j ≤ N.

6 B = {bj(ot)}Nj=1 are the state-dependent observation probabilities.Here, for every state j, bj(ot) = P(ot |qt = j) is the probabilitydistribution of words occurring in states.

7 O(r) = [o(r)1 , o

(r)2 , ..., o

(r)T ] are the observation sequences, where R is the

number of observed sequences, Tr is the length of r-th observedsequence, Tr ≤ T , T is the given quantity, r=1,2 ,...,R.

The parameters of the HMM are estimated according to each of thecorresponding classes and trained by the Baum-Welch algorithm.

The calculated probabilities are passed to a decision-making block.

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 42 / 54

Page 45: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Hidden Markov Model For Sentiment Analysis IX

Table : Results of HMM For Polarity Reviews

Folds 1 state 2 state 3 state 5 state

1 79.5 81 84 78

2 83.5 83.5 82 82

3 77.5 81.5 81 79.5

4 80.5 82.5 84.5 81

5 84.5 83 86.5 81

6 82.5 82.5 83.5 81

7 82.5 83 82 80

8 83.5 84.5 84.5 84

9 77 79 78.5 77

10 83.5 83.5 83 83

Averg 81.45 82.4 82.95 80.65

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 43 / 54

Page 46: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Hybrid Structure I

Hybrid-I. This system confirms the results verified by the FCS, ANFISand HMM approaches. If some of these models discard understanding,then the system does not accept any decision. This system preventsthe error in the understanding process and therefore is more reliable.

Table : Results of Hybrid-I

FoldsFCS(%) ANFIS(%) HMM-3(%) Hybrid- I

Corr(%) Rej(%) Err(%)

1 81 81.5 84 75.5 14 10.5

2 84 85.5 82 77.5 12 10.5

3 81 81 81 74.5 12.5 13

4 83.5 83.5 84.5 76.5 15 8.5

5 84.5 86 86.5 80 11 9

6 81.5 82.5 83.5 78 9 13

7 81 81.5 82 74.5 14 11.5

8 82 82.5 84.5 78.5 10 11.5

9 81.5 81 78.5 74 12 14

10 84.5 85 83 80 7.5 12.5

Averg 82.45 83 82.95 76.9 11.7 11.4

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 44 / 54

Page 47: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Hybrid Structure II

Hybrid-II. The method suggest in this system is a sequential method.The procedure is that if one classifier fails to classify a document, theclassifier will pass the document onto the next classifier, until thedocument is classified or no other classifier exists. This approachminimizes the number of rejected reviews.

Table : Results of Hybrid-II

FoldsANFIS(%) HMM-3(%) Hybrid- II(%)

Corr(%) Rej(%) Err(%)

1 60 32 8 84 84

2 61 35 4 82 84

3 60.5 34.5 5 81 81.5

4 66.5 30 3.5 84.5 87

5 58 37 5 86.5 87.5

6 66 26 8 83.5 84

7 63.5 29.5 7 82 83

8 63.5 30 6.5 84.5 84.5

9 61 30 9 78.5 79.5

10 63 30 7 83 84.5

Averg 62.3 31.4 6.3 82.95 83.95

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 45 / 54

Page 48: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Questions

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 46 / 54

Page 49: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Conclusion

Sentiment analysis is the process used to determine theattitude/opinion/emotion expressed by a person about a particulartopic.

Sentiment analysis or opinion mining uses natural language processingand text analytics to identify and extract subjective information insource materials.

Several Machine learning algorithms can be used for classification ofdocument-level sentence:-SVM, NB, ME, HMM, FCS and ANFIS

The combination of multiple classifiers can result in better accuracythan that achieved by any individual classifier.

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 47 / 54

Page 50: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Reference I

1 A. Abbasi, H. C., and Salem, A. Sentiment analysis in multiplelanguages: Feature selection for opinion classification in web forums.ACM Trans. Inf. Syst., 26(3):134 (2008).

2 B. Pang, L. L., and Vaithyanathan, S. Thumbs up? sentimentclassification using machine learning techniques. In In Proceedings ofCoRR (2002).

3 Blunsom, P. Hidden Markov Models Lecture notes. 2004.

4 C. Whitelaw, N. G., and Argamon, S. Using appraisal groups forsentiment analysis. In Proceedings of the 14th ACM Conference onInformation and Knowledge Management. (2005).

5 D.M. Blei, Ng, Y. A., and Jordan, M. Latent dirichlet allocation. InJournal of Machine Learning Research (2003).

6 D.Rutkovskiy, M.Pilinskiy, L. Neural networks, genetic algorithms andfuzzy systems. 2006.

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 48 / 54

Page 51: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Reference II

7 Efron, M. Cultural orientations: Classifying subjective documents bycocitation analysis. In Proceedings of the AAAI Fall Symposium Serieson Style and Meaning in Language, Art, Music, and Design (2004).

8 Fuller, R. Neural Fuzzy Systems. 1995.

9 He, Y. Incorporating sentiment prior knowledge for weakly- supervisedsentiment analysis. ACM TALIP (2012).

10 J. Carrillo, L. P., and Gervas, P. A hybrid approach to emotionalsentence polarity and intensity classification. In CoNLL (2010).

11 K.R. Aida-zade, S. R., and U, Ch, B. The application of hiddenmarkov model in human-computer dialogue understanding system. InTrans. of ANAS, series of physical-mathematical and technicalsciences, Baku, vol. XXXII, No 3, (2012).

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 49 / 54

Page 52: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Reference III

12 Lin, C., and He., Y. Joint sentiment/ topic model for sentimentanalysis. In In CIKM 09: Proceeding of the 18th ACM conference onInformation and knowledge management, New York, USA, ACM(2009).

13 M. Helmi, S. M. T. A. Human activity recognition using a fuzzyinference system. FUZZ-IEEE 2009, Korea (2009).

14 M. Taboada, J. Brooke, M. T. K. V., and Stede, M. Lexicon- basedmethods for sentiment analysis. In Computational Linguistics (2011).

15 Martineau, J., and Finin, T. Delta tfidf: An improved feature spacefor sentiment analysis. In Proceedings of the 3rd AAAI InternationalConference on Weblogs and Social Media (2009).

16 Meena, A., and Prabhakar, T. Sentence level sentiment analysis inthe presence of conjuncts using linguistic analysis. In In Proceedingsof ECIR (2007).

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 50 / 54

Page 53: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Reference IV

17 Ms K Mouthami, M. K. N. D., and Bhaskaran, D. M. Senti- mentanalysis and classification based on texual reviews. IEEE ASSPMagazine (2013).

18 Mullen, T., and Collier, N. Sentiment analysis using support vectormachines with diverse information sources. In In Dekang Lin andDekai Wu, editors, Proceedings of EMNLP, Barcelona, Spain, July.Association for Computational Linguistics (2004).

19 O.F. Zaidan, J. E., and Piatko, C. Using annotator rationales toimprove machine learning for text categorization. In Proceedings ofNAACL HLT (2007).

20 Paltoglou, G., and Thelwall, M. A study of information retrievalweighting schemes for sentiment analysis. ACL (2010).

21 Pang, B., and Lee, L. A sentimental education: Sentiment analysisusing subjectivity summarization based on minimum cuts. In Proceed-ings of the ACL (2004).

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 51 / 54

Page 54: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Reference V

22 Pang, B., and Lee, L. Opinion mining and sentiment analysis. NowPublishers Inc. (2008).

23 Prabowo, R., and Thelwall, M. Sentiment analysis: A combinedapproach. In Journal of Informetrics (2009).

24 Samir Rustamov, E. M., and Clements, M. A. Sentiment analysisusing neuro-fuzzy and hidden markov model of text. IEE Proceedings.Nanobiotechnology (2014).

25 Sh. Gao, W. Wu, C. L. T. C. A maximal figure-of-merit (mfom)-learning approach to robast classifier design for text categorization.ACM Transactions on Information Systems, Vol. 24, No. 2 (2006).

26 T. Wilson, J. W., and Hoffman, P. Recognizing contextual po- larityin phraselevel sentiment analysis. In In Proceedings of the HLT-EMNLP (2005).

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 52 / 54

Page 55: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Reference VI

27 Turney, P. Thumbs up or thumbs down?: Semantic orientation ap-plied to unsupervised classification of reviews. In Proceedings of the40th Annual Meeting of the ACL (2002).

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 53 / 54

Page 56: Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 54 / 54