continuous sentiment intensity prediction based on deep learning

Post on 21-Jan-2018

272 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Yunchao He (何云超)

2015.9.15 @ Yuan Ze University

“unbelievably disappointing”

“Full of zany characters and richly applied satire, and some great plot twists”

“this is the greatest screwball comedy ever filmed”

“It was pathetic. The worst part about it was the boxing scenes.”

Sentiment Analysis Using NLP, statistics, or machine learning methods to extract, identify, or otherwise

characterize the sentiment content of a text unit

Sometimes called opinion mining, although the emphasis in this case is on extraction

Other names: Opinion extraction、Sentiment mining、Subjectivity analysis

2

3

Movie: is this review positive or negative?

Products: what do people think about the new iPhone?

Public sentiment: how is consumer confidence? Is despair increasing?

Politics: what do people think about this candidate or issue?

Prediction: predict election outcomes or market trends from sentiment

4

Short text classification based on Semantic clustering

Sentiment intensity prediction using CNN

Transfer Learning*

* Future works 5

People express opinions in complex ways

In opinion texts, lexical content alone can be misleading

Intra-textual and sub-sentential reversals, negation, topic change common

Rhetorical devices such as sarcasm, irony, implication, etc.

6

Tokenization

Feature Extraction: n-grams, semantics, syntactic, etc.

Classification using different classifiers Naïve Bayes

MaxEnt

SVM

Drawback Feature Sparsity

S1: I really like this movie

[...0 0 1 1 1 1 1 0 0 ... ]

8

S1: This phone has a good keypad

S2: He will move and leave her for good

Using clustering algorithm to aggregate short text to form big clusters, in which each cluster has the same topic and the same sentiment polarity, to reduce the sparsity of short text representation and keep interpretation.

S1: it works perfectly! Love this product

S2: very pleased! Super easy to, I love it

S3: I recommend it

it works perfectly love this product very pleased super easy to I recommend

S1: [1 1 1 1 1 1 0 0 0 0 0 0 0]

S2: [0 0 0 1 0 0 1 1 1 1 1 1 0]

S3: [1 0 0 0 0 0 0 0 0 0 0 1 1]

S1+S2+S3: [...0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0...]

9

Training data labeled with positive and negative polarity

K-means clustering algorithm is used to cluster positive and negative text separately. K-means, KNN, LDA…

works perfectly! Love this product

completely useless, return policy

very pleased! Super easy to, I am pleased

was very poor, it has failed

highly recommend it, high recommended!

it totally unacceptable, is so bad

works perfectly! Love this product

very pleased! Super easy to, I am pleased

highly recommend it, high recommended!

completely useless, return policy

was very poor, it has failed

it totally unacceptable, is so bad

Topical clusters

10

Classifier: Multinomial Naive Bayes

Probabilistic classifier: get the probability of label given a clustered text

,

1

arg max ( | )

arg max ( ) ( | )Ci

is S

i js S j N

s P s C

P s P C s

$

( ) sNP s

N

,

,

( , ) 1( | )

( | ) | |

i j

i j

x V

N C sP C s

N x s V

Bayes’ theoryIndependent assumption

11

Given an unlabeled text , we use Euclidean distance to find the most similar positive cluster , and the most similar negative cluster

The sentiment of , is estimated according to the probabilistic change of the two clusters when merging with . (vs. KNN)

This merging operation is called two-stage-merging method, as each unlabeled text will be merged two times.

0, | ( ) ( ) | | ( ) ( ) |( )

1, .

m m n n

j

P NC P C P NC P Cf x

otherwise

mC

jx

nC

jx

jx

12

Dataset: Stanford Twitter Sentiment Corpus (STS)

Baseline: bag-of-unigrams and bigrams without clustering

Evaluation Metrics: accuracy, precision, recall

The average precision and accuracy is 1.7% and 1.3% higher than the baseline method.

Methods Accuracy Precision Recall

Our Method 0.816 0.82 0.813

Bigrams 0.805 0.807 0.802

13

Continuous sentiment intensity provides fine-grained representation of sentiment.

Representing sentiment as Valence-arousal can easily convert to discrete categories.

“unbelievably disappointing” ModelV: -0.5A: 0.3

15

Lexicon based Method. To find the relationship between word-level and sentence-level sentiment values. Word-level information comes from sentiment lexicon, e.g. ANEW.

Paltoglou 2013: Weighted Arithmetic Mean、Weighted Geometric Mean

Malandrakis 2013: linear regression

Paltoglou, G., Theunis, M., Kappas, A., & Thelwall, M. (2013). Predicting emotional responses to long informal text. Affective Computing, IEEE Transactions on, 4(1), 106-115.

Malandrakis, N., Potamianos, A., Iosif, E., & Narayanan, S. (2013). Distributional semantic models for affective text analysis. Audio, Speech, and Language Processing, IEEE

Transactions on, 21(11), 2379-2392.16

To find the relationship between words and sentence-level sentiment.

CNN Method Lexicon-based Methods

Word Dense vector VA value

Relationship Auto learned Manually specified

Training data Many Few or None

Word Order Considered* Not Considered

Interpretation Black Box Easy

17

To find the relationship between words and sentence-level sentiment.

Sentence Matrix -> Convolution Operator -> Max Pooling -> Regression

Word Representation: dense vector, distributed representation

我们的

不像

明镜

不可以

美丑

善恶

全部

包容

boat

ship

vesselgood

happy

Beijing

Shanghai

glad

Semantic information of word is encoded in the dense vector. 18

Sentence Matrix -> Convolution Operator -> Max Pooling -> Regression

我们的

不像

明镜

不可以

美丑

善恶

全部

包容Sentence Matrix

[ : 1,:]( )i i i mc f w S b

Dimension Reduced

Reduce the parameters of the model

Parameter sharing

f: Activation function, Relu, tanh, sigmoid, …

𝑓 𝑥 = max(0, 𝑥)

19

Sentence Matrix -> Convolution Operator -> Max Pooling -> Regression

Aggregate the information and capture the most important features

我们的

不像

明镜

不可以

美丑

善恶

全部

包容

3 6 79 7 54

79 9

Max pool with 5×1 filters and stride 1

20

Sentence Matrix -> Convolution Operator -> Max Pooling -> Regression

我们的

不像

明镜

不可以

美丑

善恶

全部

包容

x1

x2

xn

linear ℎ 𝑥𝑖, 𝑤 = 𝑤𝑇𝑥𝑖 = 𝑦𝑖

Objective function: mean squared error (MSE)

21

Learning Algorithm: stochastic gradient descent (SGD)

Learning the parameters of the model with labeled data Word vectors

Convolution filters weights

Linear regression weights

Labeled data Chinese: CVAT dataset

English: VADER dataset

Dataset size #word L Dims

CVAT 720 21094 192.1 V+A

Tweets 4000 15284 13.62 V

Movie 10605 29864 18.86 V

Amazon 3708 8555 17.3 V

NYT 5190 20941 17.48 V

22

All the dataset is separated into training set, validation set and test set for model training, hyper-parameters selection and model evaluation.

Evaluation Metrics MSE, Mean Square Error

MAE, Mean Absolute Error

Pearson’s correlation coefficient r

%2

1

1( )

n

i i

i

MSE y yn

%

1

1| |

n

i i

i

MAE y yn

% %

% %

1 1 1

2 2

1 1 1 1

1 1( )( )

1 1( ) ( )

n n n

i ii j

i j j

n n n n

i ii j

i j i j

y y y yn n

r

y y y yn n

23

Methods CNN wGW RMAR LCEL RMV

Metrics MSE MAE r MSE MAE r MSE MAE r MSE MAE r MSE MAE r

valence ratings prediction

CVAT 1.17 0.88 0.73 2.30 1.23 0.62 1.89 1.14 0.63 1.81 0.95 0.66 1.49 0.98 0.72

Tweets 1.00 0.76 0.79 2.54 1.25 0.65 1.30 0.89 0.69 1.25 0.85 0.75 1.18 0.86 0.74

Movie 2.14 1.18 0.67 6.46 2.02 0.17 3.54 1.73 0.16 2.54 1.36 0.42 2.25 1.26 0.62

Amazon 1.50 0.95 0.67 3.75 1.51 0.35 2.66 1.38 0.27 1.45 1.14 0.45 2.20 1.19 0.56

NYT 0.84 0.72 0.36 3.47 1.54 0.28 0.79 0.71 0.26 0.83 0.75 0.37 0.61 0.63 0.60

arousal ratings prediction

CVAT 0.98 0.81 0.64 1.34 0.94 0.31 1.20 0.89 0.35 1.07 0.91 0.62 0.98 0.79 0.53

CNN method improved the VA prediction experiment performance compared with the lexicon-based and RMV method.

Baseline method: • wGW, Weighted geometric mean method• RMAR, Regression on mean affective ratings• LCEL, linear combination using expanded lexicon• RMV, regression on mean vectors method

24

Using Transfer Learning Techniques to improve VA prediction performance.

Motivation: There are numerous dataset for sentiment classification but only a few dataset for VA prediction. The sentiment polarity maybe useful for VA prediction.

Method: Pre-training the classification-CNN model, and then use the parameters of the pre-trained networks as the initial value of VA prediction-CNN model, keep training on VA corpus.

25

何云超 yunchaohe@gmail.com

Thank you

26

top related