Transcript
Page 2: sentiment analysis report

Student’s Declaration

W e he reby dec lare that the work be ing p resented in this

report enti t led Sentiment Analysis Tool is an authentic reco rd of

our own work carried out under the Supervis ion o f Ms.

SMITA TIWARI.

S ignature of s tudents

Rav indra Chaudhary

DATE: Sachin S ingh

. In formation technology

This is to ce rt i f y that the above s tatement made by the

cand idates is co rrec t to the bes t o f my knowledge .

Signature of HOD S ignature of Supervisor

(Dr . P .C. Vashis t) (M s . Smita T iwar i)

In formation T echnology Assoc ia te Professor

Date ... . . . . . . . . . . . . . In formation T echnology

ii

Page 3: sentiment analysis report

ACKNOWLEDGEM ENT

I t g ives us a g reat sense o f p leasure to p resent the report of

the B . Tech. Pro jec t undertaken during B . Tech, Fourth Year.

W e owe spec ial deb t o f g rati tude to Pro f essor M s. SMITA

T IWARI and Department o f I nf o rmation Techno logy, ABES

Eng ineering Co l leg e , Ghaziabad f o r he r cons tant support and

guidance throug hout the course o f our work. Her s ince ri ty,

tho roughness and pe rseverance have been a cons tant source

o f insp irat ion f o r us . I t is only he r cognizant e f f o rts that our

endeavors have seen l ight o f the day. W e also take the

opportunity to acknowledge the contribution o f Pro f essor Dr.

P .C. Vashis th Head , Department o f I nf o rmation Techno logy,

ABES Eng ineering Co l lege , Ghaziabad f o r he r f ul l suppo rt and

ass is tance during the deve lopment o f the p ro jec t. W e also do

no t l ike to miss the opportunity to acknowledge the

contribution o f al l f aculty members o f the department f o r the ir

kind ass is tance and cooperation during the deve lopment of

our p ro jec t. L as t but no t the leas t, we acknowledge our f r iends

f o r the ir contribution in the comp le tion o f the p ro jec t.

RAVINDRA CHAUDHARY

SACHIN SINGH

i i i

Page 4: sentiment analysis report

TABLE OF CONTENTS

Inner Title Page i

Declaration ii

Acknowledgment iii

Abstract iv

1. Introduction 1-5

1.1. Motivation 1

1.2 Domain introduction 2-5

2. Objective 6

3. Methodology 7-8

3.1 Method of Sentiment Analysis 7

3.1.1. Data Acquisition 7

3.1.2. Tokenizer 7

3.1.3. Pre Processing 7

3.1.4. Feature Extraction 7

3.1.5. Classification and Prediction 8

4. Detail of project report work 9-

4.1. Data acquisition 9-11

4.2. Human Labelling 12-14

4.3. Feature Extraction 15-25

4.4. Classification 26-28

Page 5: sentiment analysis report

4.5. Tweet Mode Web Application 28-30

4.5.1. Tweet score 30

4.5.2. Tweet Compare 30

4.5.3. Tweet stats 30

5. Result Discussion 36-38

6. Conclusion and future Recommendation 39-41

7. References 42-45

Page 6: sentiment analysis report

LIST OF TABLES

Table 1: A typical 2x2 confusion matrix…………………...…………4

Page 7: sentiment analysis report

LIST OF FIGURES

Page 8: sentiment analysis report

ABSTRACT

Th i s p r o j e c t a d d r e s s e s t he p r o b l e m o f s e n t i m e n t a na l ys i s i n

t w i t t e r t ha t i s c l a s s i f y i ng t w e e t s a c c o r d i ng t o t he s e n t i m e nt

e xp r e s s e d i n t he m : p o s i t i ve , ne g a t i ve . Tw i t t e r i s a n o n l i ne m i c r o -

b l o g g i ng a nd s o c i a l - ne t w o r k i ng p l a t f o r m w h i c h a l l o w s us e r s to

w r i t e s ho r t s t a t us up d a t e s o f m a x i m um l e ng t h 1 4 0 c ha r a c t e r s . I t

i s a r a p i d l y e xp a nd i ng s e r v i c e w i t h o ve r 2 0 0 m i l l i o n r e g i s t e red

us e r s [ 2 4 ] o u t o f w h i c h 1 0 0 m i l l i o n a r e a c t i ve us e r s a nd ha l f o f

t he m l o g o n t w i t t e r o n a d a i l y b a s i s g e ne r a t i ng ne a r l y 2 5 0 m i l l i on

t w e e t s p e r d a y [ 2 0 ] . D ue t o t h i s l a r g e a m o un t o f us a g e w e ho pe

t o a c h i e ve a r e f l e c t i o n o f p ub l i c s e n t i m e n t b y a na l yz i ng t he

s e n t i m e n t s e xp r e s s e d i n t he t w e e t s . A na l yz i ng t he p ub l i c

s e n t i m e n t i s i m p o r t a n t f o r m a ny a p p l i c a t i ons s uc h a s f i r m s t r y i ng

t o f i nd o u t t he r e s p o ns e o f t he i r p r o d uc t s i n t he m a r k e t , p r e d i c t i ng

p o l i t i c a l e l e c t i o ns a nd p r e d i c t i ng s o c i o e c o no m i c p he no m e na l i ke

s t o c k e xc ha ng e . The a i m o f t h i s p r o j e c t i s t o d e ve l o p a f unc t i o na l

c l a s s i f i e r f o r a c c u r a t e a nd a u t o m a t i c s e n t i m e n t c l a s s i f i c a t i o n o f

a n unk no w n t w e e t s t r e a m .

iv

Page 9: sentiment analysis report

1

Ch a p t e r 1 : INT RODUCT ION

1 . 1 M o t i v a t i o n

W e ha ve c ho s e n t o w o r k w i t h t w i t t e r s i nc e w e f e e l i t i s a

b e t t e r a p p r o x i m a t i on o f p ub l i c s e n t i m e n t a s o p p o s e d t o

c o nve n t i o na l i n t e r ne t a r t i c l e s a nd w e b b l o g s . The r e a s o n i s

t ha t t he a m o un t o f r e l e va n t d a t a i s m uc h l a r g e r f o r t w i t t e r , as

c o m p a r e d t o t r a d i t i o na l b l o g g i ng s i t e s . M o r e o ve r t he

r e s p o ns e o n t w i t t e r i s m o r e p r o m p t a nd a l s o m o r e g e ne r a l

( s i nc e t he num b e r o f us e r s w ho t w e e t i s s ub s t a n t i a l l y m o re

t ha n t ho s e w ho w r i t e w e b b l o g s o n a d a i l y b a s i s ) . S e n t i m e nt

a na l ys i s o f p ub l i c i s h i g h l y c r i t i c a l i n m a c r o - s c a le

s o c i o e c o no m i c p he no m e na l i k e p r e d i c t i ng t he s t o c k m a r k e t

r a t e o f a p a r t i c u l a r f i r m . Th i s c o u l d b e d o ne b y a na l yz i ng

o ve r a l l p ub l i c s e n t i m e n t t o w a r d s t ha t f i r m w i t h r e s p e c t t o t i me

a nd us i ng e c o no m i c s t o o l s f o r f i nd i ng t he c o r r e l a t i o n b e t w e en

p ub l i c s e n t i m e n t a nd t he f i r m ’ s s t o c k m a r k e t va l ue . F i r m s c a n

a l s o e s t i m a t e ho w w e l l t he i r p r o d uc t i s r e s p o nd i ng i n t he

m a r k e t , w h i c h a r e a s o f t he m a r k e t i s i t ha v i ng a f a vo r a b le

r e s p o ns e a nd i n w h i c h a ne g a t i ve r e s p o ns e ( s i nc e t w i t t e r

a l l o w s us t o d o w n l o a d s t r e a m o f g e o - t a g g e d t w e e t s f o r

p a r t i c u l a r l o c a t i o ns . I f f i r m s c a n g e t t h i s i n f o r m a t i o n t he y c a n

a na l yze t he r e a s o ns b e h i nd g e o g r a p h i c a l l y d i f f e r e n t i a ted

r e s p o ns e , a nd s o t he y c a n m a r k e t t he i r p r o d uc t i n a m o re

o p t i m i ze d m a nne r b y l o o k i ng f o r a p p r o p r i a t e s o l u t i o ns l i ke

c r e a t i ng s u i t a b l e m a r k e t s e g m e n t s . P r e d i c t i ng t he r e s u l t s o f

p o p u l a r p o l i t i c a l e l e c t i o ns a nd p o l l s i s a l s o a n e m e r g i ng

a p p l i c a t i on t o s e n t i m e n t a na l ys i s . In G e r m a ny f o r p r e d i c t i ng

t he o u t c o m e o f f e d e r a l e l e c t i o ns i n w h i c h c o nc l ud e d t ha t

t w i t t e r i s a g o o d r e f l e c t i o n o f o f f l i ne s e n t i m e n t .

Page 10: sentiment analysis report

2

1 . 2 D o m a i n I n t r o d u c t i o n

Th i s p r o j e c t o f a na l yz i ng s e n t i m e n t s o f t w e e t s c o m e s und e r

t he d o m a i n o f “ P a t t e r n C l a s s i f i c a t i on ” a nd “ D a t a M i n i ng ”.

B o t h o f t he s e t e r m s a r e ve r y c l o s e l y r e l a t e d a nd i n t e r t w i ne d ,

a nd t he y c a n b e f o r m a l l y d e f i ne d a s t he p r o c e s s o f

d i s c o ve r i ng “ us e f u l ” p a t t e r ns i n l a r g e s e t o f d a t a , e i t he r

a u t o m a t i c a l l y ( uns up e r v i s e d ) o r s e m i - a u t o m a t i c a l ly

( s up e r v i s e d ) . The p r o j e c t w o u l d he a v i l y r e l y o n t e c hn i q ue s o f

“ N a t u r a l L a ng ua g e P r o c e s s i ng ” i n e x t r a c t i ng s i g n i f i c ant

p a t t e r ns a nd f e a t u r e s f r o m t he l a r g e d a t a s e t o f t w e e t s a nd

o n “ M a c h i ne L e a r n i ng ” t e c hn i q ue s f o r a c c u r a t e l y c l a s s i f y i ng

i nd i v i d ua l un l a b e l e d d a t a s a m p l e s ( t w e e t s ) a c c o r d i ng t o

w h i c he ve r p a t t e r n m o d e l b e s t d e s c r i b e s t he m .

The f e a t u r e s t ha t c a n b e us e d f o r m o d e l i ng p a t t e r ns a nd

c l a s s i f i c a t i o n c a n b e d i v i d e d i n t o t w o m a i n g r o up s : f o r m a l

l a ng ua g e b a s e d a nd i n f o r m a l b l o g g i ng b a s e d . L a ng ua ge

b a s e d f e a t u r e s a r e t ho s e t ha t d e a l w i t h f o r m a l l i ng u i s t i c s a nd

i nc l ud e p r i o r s e n t i m e n t p o l a r i t y o f i nd i v i d ua l w o r d s a nd

p h r a s e s , a nd p a r t s o f s p e e c h t a g g i ng o f t he s e n t e nc e . P r i o r

s e n t i m e n t p o l a r i t y m e a ns t ha t s o m e w o r d s a nd p h r a s e s ha ve

a na t u r a l i nna t e t e nd e nc y f o r e xp r e s s i ng p a r t i c u l a r a nd

s p e c i f i c s e n t i m e n t s i n g e ne r a l . F o r e xa m p l e t he w o rd

“ e xc e l l e n t ” ha s a s t r o ng p o s i t i ve c o nno t a t i o n w h i l e t he w o rd

“ e v i l ” p o s s e s s e s a s t r o ng ne g a t i ve c o nno t a t i o n . S o w he ne ve r

a w o r d w i t h p o s i t i ve c o nno t a t i o n i s us e d i n a s e n t e nc e ,

c ha nc e s a r e t ha t t he e n t i r e s e n t e nc e w o u l d b e e xp r e s s i n g a

p o s i t i ve s e n t i m e n t . P a r t s o f S p e e c h t a g g i ng , o n t he o t he r

ha nd , i s a s yn t a c t i c a l a p p r o a c h t o t he p r o b l e m . I t m e a ns t o

a u t o m a t i c a l l y i d e n t i f y w h i c h p a r t o f s p e e c h e a c h i nd i v i d ua l

w o r d o f a s e n t e nc e b e l o ng s t o : no un , p r o no un , a d ve r b ,

a d j e c t i ve , ve r b , i n t e r j e c t i o n , e t c . P a t t e r ns c a n b e e x t r a c t ed

f r o m a na l yz i ng t he f r e q ue nc y d i s t r i b u t i o n o f t he s e p a r t s o f

Page 11: sentiment analysis report

3

s p e e c h ( e t he r i nd i v i d ua l l y o r c o l l e c t i ve l y w i t h s o m e o t he r p a rt

o f s p e e c h ) i n a p a r t i c u l a r c l a s s o f l a b e l e d t w e e t s . Tw i t t e r

b a s e d f e a t u r e s a r e m o r e i n f o r m a l a nd r e l a t e w i t h ho w p e o p le

e xp r e s s t he m s e l ve s o n o n l i ne s o c i a l p l a t f o r m s a nd c o m p r ess

t he i r s e n t i m e n t s i n t he l i m i t e d s p a c e o f 1 4 0 c ha r a c t e rs

o f f e r e d b y t w i t t e r . The y i nc l ud e t w i t t e r ha s h t a g s , r e t w e e t s ,

w o r d c a p i t a l i za t i o n , w o r d l e ng t he n i ng [ 1 3 ] , q ue s t i o n m a r k s ,

p r e s e nc e o f u r l i n t w e e t s , e xc l a m a t i o n m a r k s , i n t e r ne t

e m o t i c o ns a nd i n t e r ne t s ho r t ha nd / s l a ng s .

C l a s s i f i c a t i on t e c hn i q ue s c a n a l s o b e d i v i d e d i n t o a t w o

c a t e g o r i e s : S up e r v i s e d vs . uns up e r v i s e d a nd no n - a d a p t i ve

vs . a d a p t i ve / r e i n f o r c em e n t t e c hn i q u e s . S up e r v i s e d a p p r o ach

i s w he n w e ha ve p r e - l a b e l e d d a t a s a m p l e s a va i l a b l e a nd w e

us e t he m t o t r a i n o u r c l a s s i f i e r . T r a i n i ng t he c l a s s i f i e r m e a ns

t o us e t he p r e - l a b e l e d t o e x t r a c t f e a t u r e s t ha t b e s t m o d e l t he

p a t t e r ns a nd d i f f e r e nc e s b e t w e e n e a c h o f t he i n d i v i d ua l

c l a s s e s , a nd t he n c l a s s i f y i ng a n un l a b e l e d d a t a s a m p le

a c c o r d i ng t o w h i c he ve r p a t t e r n b e s t d e s c r i b e s i t . F o r e xa m p le

i f w e c o m e up w i t h a h i g h l y s i m p l i f i ed m o d e l t ha t ne u t r a l

t w e e t s c o n t a i n 0 . 3 e xc l a m a t i o n m a r k s p e r t w e e t o n a ve r a ge

w h i l e s e n t i m e n t - b e a r i ng t w e e t s c o n t a i n 0 . 8 , a nd i f t he t w e e t

w e ha ve t o c l a s s i f y d o e s c o n t a i n 1 e xc l a m a t i o n m a r k t he n

( i g no r i ng a l l o t he r p o s s i b l e f e a t u r e s ) t he t w e e t w o u l d be

c l a s s i f i e d a s s ub j e c t i ve , s i nc e 1 e xc l a m a t i o n m a r k i s c l o s e r

t o t he m o d e l o f 0 . 8 e xc l a m a t i o n m a r k s . U ns up e r v i s ed

c l a s s i f i c a t i o n i s w he n w e d o no t ha ve a ny l a b e l e d d a t a f o r

t r a i n i ng . In a d d i t i o n t o t h i s a d a p t i ve c l a s s i f i c a t i on t e c hn i q ues

d e a l w i t h f e e d b a c k f r o m t he e nv i r o nm e n t . In o u r c a se

f e e d b a c k f r o m t he e nv i r o nm e n t c a n b e i n f o r m o f a hum a n

t e l l i ng t he c l a s s i f i e r w he t he r i t ha s d o ne a g o o d o r p o o r j ob

i n c l a s s i f y i ng a p a r t i c u l a r t w e e t a nd t he c l a s s i f i e r ne e d s t o

l e a r n f r o m t h i s f e e d b a c k . The r e a r e t w o f u r t he r t yp e s o f

Page 12: sentiment analysis report

4

a d a p t i ve t e c hn i q ue s : P a s s i ve a nd a c t i ve . P a s s i ve t e c hn i q ues

a r e t he o ne s w h i c h us e t he f e e d b a c k o n l y t o l e a r n a b o u t t he

e nv i r o nm e n t ( i n t h i s c a s e t h i s c o u l d m e a n i m p r o v i ng o u r

m o d e l s f o r t w e e t s b e l o ng i ng t o e a c h o f t he t h r e e c l a s s e s ) b u t

no t us i ng t h i s i m p r o ve d l e a r n i ng i n o u r c u r r e n t c l a s s i f i c a t i on

a l g o r i t hm , w h i l e t he a c t i ve a p p r o a c h c o n t i nuo us l y k e eps

c ha ng i ng i t s c l a s s i f i c a t i o n a l g o r i t hm a c c o r d i ng t o w ha t i t

l e a r ns a t r e a l - t i m e .

The r e a r e s e ve r a l m e t r i c s p r o p o s e d f o r c o m p u t i ng a nd

c o m p a r i ng t he r e s u l t s o f o u r e xp e r i m e n t s . S o m e o f t he m o s t

p o p u l a r m e t r i c s i nc l ud e : P r e c i s i on , R e c a l l , A c c u r a c y , F 1 -

m e a s ur e , T r ue r a t e a nd F a l s e a l a r m r a t e ( e a c h o f t he s e

m e t r i c s i s c a l c u l a t e d i nd i v i d ua l l y f o r e a c h c l a s s a nd t he n

a ve r a g e d f o r t he o ve r a l l c l a s s i f i e r .

T a b l e 1 : A T y p i c a l 2 x 2 C o n f u s i o n M a t r i x

Machine says yes Machine says no

Human says yes

tp

fn

Human says no

fp

tn

Page 13: sentiment analysis report

5

P re c i s i o n (P ) = 𝒕 𝒑 / 𝒕 𝒑+𝒇𝒑

R e c a l l (R ) = 𝒕 𝒑 / 𝒕 𝒑+𝒇𝒏

A c c urac y (A ) = 𝒕 𝒑+ 𝒕𝒏 / 𝒕 𝒑+ 𝒕 𝒏+𝒇+𝒇𝒑+𝒇𝒏

F 1 = 𝟐 .𝑷 .𝑹 / 𝑷+𝑹

T rue R a t e (T ) = 𝒕 𝒑 / 𝒕 𝒑+𝒇𝒏

F a l s e - a l a rm R a t e (F ) = 𝒇𝒑 / 𝒕 𝒑+𝒇𝒏

Page 14: sentiment analysis report

6

Ch a p t e r 2 : OBJ E CT IVE

• To i m p l e m e n t a N a i ve B a ye s A l g o r i t hm f o r a u t o m a t i c

c l a s s i f i c a t i o n o f t e x t i n t o P o s i t i ve , N e g a t i ve .

• S e n t i m e n t A na l ys i s t o d e t e r m i ne t he a t t i t ud e o f t he m a ss

i s p o s i t i ve , ne g a t i ve o r ne u t r a l t o w a r d t he s ub j e c t o f

i n t e r e s t .

Page 15: sentiment analysis report

7

Ch a p t e r 3 : ME T HODOL OGY

3 . 1 M e th o d s o f S e n t i m e n t An a l y s i s : -

3 . 1 . 1 . D A T A A C Q UI S I T I O N

• D o w n l o a d t he t e x t us i ng t w i t t e r A P I.

3 . 1 . 2 . T O K E NS I E R

• U s i ng P O S ( p a r t o f s p e e c h ) t a g g e r .

3 . 1 . 3 . P R E - P R O C E S S I NG

• R e m o ve s l a g ( no n – E ng l i s h ) w o r d s

• R e p l a c i ng e m o t i c o ns b y t he i r p o l a r i t y .

• R e m o ve U R L a nd H A S H TA G ( # ) , num b e r s .

• R e p l a c e s e q ue nc e o f r e p e a t e d c ha r a c t e r c o o o o o l b y c o o l

• R e m o ve no un a nd p r e p o s i t i o ns

3 . 1 . 4 . F E A T UR E E X T R A C T I O N

• P e r c e n t a g e o f c a p i t a l i ze d w o r d

• N o o f – ve / + ve c a p i t a l i ze d w o r d

• N o o f + ve / - ve ha s h t a g

• N o o f + ve / - ve e m o t i c o ns

• N o . o f ne g a t i o ns

• N o . o f s p e c i a l c ha r a c t e r s e xa m p l e : - @ # % ^ *

Page 16: sentiment analysis report

8

3 . 1 . 5 . C L A S S I F I C A T I O N A ND P R E D E C T I O NS

• The m o d e l i s b u i l t t o p r e d i c t t he s e n t i m e n t o f ne w t w e e t s

• F e a t u r e e x t r a c t e d a r e ne x t f o c us e d t o c l a s s i f i e r .

F i g u r e 1 : D a t a F l o w D i a g r a m

Page 17: sentiment analysis report

9

Ch a p t e r 4 : DE T AIL S OF P ROJ E CT RE P ORT

W ORK

The p r o c e s s o f d e s i g n i ng a f unc t i o na l c l a s s i f i e r f o r s e n t i m e n t

a na l ys i s c a n b e b r o k e n d o w n i n t o f i ve b a s i c c a t e g o r i e s . The y

a r e a s f o l l o w s :

I . D a t a A c q u i s i t i o n

I I . H um a n L a b e l l i ng

I I I . F e a t u r e E x t r a c t i o n

IV . C l a s s i f i c a t i o n

4 . 1 . D a ta Ac q u i s i t i o n :

D a t a i n t he f o r m o f r a w t w e e t s i s a c q u i r e d b y us i ng t he p y t ho n

l i b r a r y “ t w e e t s t r e a m ” w h i c h p r o v i d e s a p a c k a g e f o r s i m p le

t w i t t e r A P I [ 2 6 ] . Th i s A P I a l l o w s t w o m o d e s o f a c c e s s i ng

t w e e t s : S a m p l e S t r e a m a nd F i l t e r S t r e a m . S a m p l e S t r e am

s i m p l y d e l i ve r s a s m a l l , r a nd o m s a m p l e o f a l l t he t w e e ts

s t r e a m i ng a t a r e a l t i m e . F i l t e r S t r e a m d e l i ve r s t w e e t w h i c h

m a t c h a c e r t a i n c r i t e r i a . I t c a n f i l t e r t he d e l i ve r e d t w e e ts

a c c o r d i ng t o t h r e e c r i t e r i a :

• S p e c i f i c k e yw o r d ( s ) t o t r a c k / s e a r c h f o r i n t he t w e e t s

• S p e c i f i c Tw i t t e r us e r ( s ) a c c o r d i ng t o t he i r us e r - i d ’ s

• Tw e e t s o r i g i na t i ng f r o m s p e c i f i c l o c a t i o n ( s ) ( o n l y f o r g e o -

t a g g e d t w e e t s ) .

A p r o g r a m m e r c a n s p e c i f y a ny s i ng l e o ne o f t he s e f i l t e r i ng

c r i t e r i a o r a m u l t i p l e c o m b i na t i o n o f t he s e . B u t f o r o u r

p u r p o s e w e ha ve no s uc h r e s t r i c t i o n a nd w i l l t hus s t i c k t o t he

S a m p l e S t r e a m m o d e . S i nc e w e w a n t e d t o i nc r e a s e t he

g e ne r a l i t y o f o u r d a t a , w e a c q u i r e d i t i n p o r t i o ns a t d i f f e r e nt

Page 18: sentiment analysis report

10

p o i n t s o f t i m e i ns t e a d o f a c q u i r i ng a l l o f i t a t o ne g o . I f w e

us e d t he l a t t e r a p p r o a c h t he n t he g e ne r a l i t y o f t he t w e e ts

m i g h t ha ve b e e n c o m p r o m i s e d s i nc e a s i g n i f i c a n t p o r t i o n o f

t he t w e e t s w o u l d b e r e f e r r i ng t o s o m e c e r t a i n t r e nd i ng t o p i c

a nd w o u l d t hus ha ve m o r e o r l e s s o f t he s a m e g e ne r a l m o od

o r s e n t i m e n t . Th i s p he no m e no n ha s b e e n o b s e r ve d w he n w e

w e r e g o i ng t h r o ug h o u r s a m p l e o f a c q u i r e d t w e e t s . F o r

e xa m p l e t he s a m p l e a c q u i r e d ne a r C hr i s t m a s a nd N e w Ye a r ’ s

ha d a s i g n i f i c an t p o r t i o n o f t w e e t s r e f e r r i ng t o t he s e j o yo us

e ve n t s a nd w e r e t hus o f a g e ne r a l l y p o s i t i ve s e n t i m e n t .

S a m p l i ng o u r d a t a i n p o r t i o ns a t d i f f e r e n t p o i n t s i n t i m e w o u l d

t hus t r y t o m i n i m i ze t h i s p r o b l e m .

A t w e e t a c q u i r e d b y t h i s m e t ho d ha s a l o t o f r a w i n f o r m a t i on

i n i t w h i c h w e m a y o r m a y no t f i nd us e f u l f o r o u r p a r t i c u l a r

a p p l i c a t i on . I t c o m e s i n t he f o r m o f t he p y t ho n “ d i c t i o na ry”

d a t a t yp e w i t h va r i o us k e y - va l ue p a i r s . A l i s t o f s o m e k e y -

va l ue p a i r s a r e g i ve n b e l o w :

• W he t he r a t w e e t ha s b e e n f a vo r i t e

• U s e r ID

• S c r e e n na m e o f t he us e r

• O r i g i na l Te x t o f t he t w e e t

• P r e s e nc e o f ha s h t a g s

• W he t he r i t i s a r e - t w e e t

• L a ng ua g e und e r w h i c h t he t w i t t e r us e r ha s r e g i s t e red

t he i r a c c o un t

• G e o - t a g l o c a t i o n o f t he t w e e t

• D a t e a nd t i m e w he n t he t w e e t w a s c r e a t e d

Page 19: sentiment analysis report

11

S i nc e t h i s i s a l o t o f i n f o r m a t i o n w e o n l y f i l t e r o u t t he

i n f o r m a t i o n t ha t w e ne e d a nd d i s c a r d t he r e s t . F o r o u r

p a r t i c u l a r a p p l i c a t i o n w e i t e r a t e t h r o ug h a l l t he t w e e t s i n o u r

s a m p l e a nd s a ve t he a c t ua l t e x t c o n t e n t o f t he t w e e t s i n a

s e p a r a t e f i l e g i ve n t ha t l a ng ua g e o f t he t w i t t e r i s us e r ’ s

a c c o un t i s s p e c i f i e d t o b e E ng l i s h . The o r i g i na l t e x t c o n t e n t

o f t he t w e e t i s g i ve n und e r t he d i c t i o na r y k e y “ t e x t ” a nd t he

l a ng ua g e o f us e r ’ s a c c o un t i s g i ve n und e r “ L a n g ” .

S i nc e hum a n l a b e l l i ng i s a n e xp e ns i ve p r o c e s s w e f u r t he r

f i l t e r o u t t he t w e e t s t o b e l a b e l l e d s o t ha t w e ha ve t he

g r e a t e s t a m o un t o f va r i a t i o n i n t w e e t s w i t ho u t t he l o s s o f

g e ne r a l i t y . The f i l t e r i ng c r i t e r i a a p p l i e d a r e s t a t e d b e l o w :

• R e m o ve R e t w e e t s ( a n y t w e e t w h i c h c o n t a i ns t he s t r i ng

“ R T ” )

• R e m o ve ve r y s ho r t t w e e t s ( t w e e t w i t h l e ng t h l e s s t ha n 20

c ha r a c t e r s )

• R e m o ve no n - E ng l i s h t w e e t s ( b y c o m p a r i ng t he w o r d s o f

t he t w e e t s w i t h a l i s t o f 2 , 0 0 0 c o m m o n E ng l i s h w o r d s ,

t w e e t s w i t h l e s s t ha n 1 5 % o f c o n t e n t m a t c h i ng t h r e s ho l d

a r e d i s c a r d e d )

• R e m o ve s i m i l a r t w e e t s ( b y c o m p a r i ng e ve r y t w e e t w i t h

e ve r y o t he r t w e e t , t w e e t s w i t h m o r e t ha n 9 0 % o f c o n t e n t

m a t c h i ng w i t h s o m e o t he r t w e e t i s d i s c a r d e d )

A f t e r t h i s f i l t e r i ng r o ug h l y 3 0 % o f t w e e t s r e m a i n f o r hum a n

l a b e l l i ng o n a ve r a g e p e r s a m p l e , w h i c h m a d e a t o t a l o f 1 0 , 173

t w e e t s t o b e l a b e l l e d .

Page 20: sentiment analysis report

12

4 . 2 . Hu m a n L a b e ll in g :

F o r t he p u r p o s e o f hum a n l a b e l l i ng w e m a d e t h r e e c o p i e s o f

t he t w e e t s s o t ha t t he y c a n b e l a b e l l e d b y f o u r i nd i v i d ua l

s o u r c e s . Th i s i s d o ne s o t ha t w e c a n t a k e a ve r a g e o p i n i o n o f

p e o p l e o n t he s e n t i m e n t o f t he t w e e t a nd i n t h i s w a y t he no i se

a nd i na c c u r a c i e s i n l a b e l l i ng c a n b e m i n i m i ze d . G e ne r a l l y

s p e a k i ng t he m o r e c o p i e s o f l a b e l s w e c a n g e t t he b e t t e r i t

i s , b u t w e ha ve t o k e e p t he c o s t o f l a b e l l i ng i n o u r m i nd ,

he nc e w e r e a c he d a t t he r e a s o na b l e f i g u r e o f t h r e e .

W e l a b e l l e d t he t w e e t s i n f o u r c l a s s e s a c c o r d i ng t o

s e n t i m e n t s e xp r e s s e d / o b s e r ve d i n t he t w e e t s : p o s i t i ve ,

ne g a t i ve , ne u t r a l / o b j e c t i ve a nd a m b i g uo us . W e g a ve t he

f o l l o w i ng g u i d e l i ne s t o o u r l a b e l e r s t o he l p t he m i n t he

l a b e l l i ng p r o c e s s :

P o s i t i v e : I f t he e n t i r e t w e e t ha s a

p o s i t i ve / ha p p y / e xc i t e d / j o y f u l a t t i t ud e o r i f s o m e t h i ng i s

m e n t i o ne d w i t h p o s i t i ve c o nno t a t i o ns . A l s o i f m o r e t ha n

o ne s e n t i m e n t i s e xp r e s s e d i n t he t w e e t b u t t he p o s i t i ve

s e n t i m e n t i s m o r e d o m i na n t . E xa m p l e : “ 4 m o r e y e a r s o f

b e i n g i n s h i t h o l e A u s t r a l i a t h e n I m o v e t o t h e U S A ! : D ” .

• N e g a t i v e : I f t he e n t i r e t w e e t ha s a

ne g a t i ve / s a d / d i sp l e as e d a t t i t ud e o r i f s o m e t h i ng i s

m e n t i o ne d w i t h ne g a t i ve c o nno t a t i o ns . A l s o i f m o r e t ha n

o ne s e n t i m e n t i s e xp r e s s e d i n t he t w e e t b u t t he ne g a t i ve

s e n t i m e n t i s m o r e d o m i na n t . E xa m p l e : “ I wa n t a n a n d ro i d

n o w t h i s i P h o n e i s b o r i n g : S ” .

• N e u t r a l / O b j e c t i v e : I f t he c r e a t o r o f t w e e t e xp r e s s e s no

p e r s o na l s e n t i m e n t / o p i n i o n i n t he t w e e t a nd m e r e l y

t r a ns m i t s i n f o r m a t i o n . A d ve r t i s e m e n t s o f d i f f e r e nt

Page 21: sentiment analysis report

13

p r o d uc t s w o u l d b e l a b e l l e d und e r t h i s c a t e g o r y.

E xa m p l e : “ U S H o u s e S p e a k e r v o ws t o s t o p O b am a

c o n t r a c e p t i v e r u l e . . . h t t p : / / t . c o / c y E W q K l E ” .

• A m b i g u o u s : I f m o r e t ha n o ne s e n t i m e n t i s e xp r e s s e d i n

t he t w e e t w h i c h a r e e q ua l l y p o t e n t w i t h no o ne p a r t i c u l a r

s e n t i m e n t s t a nd i ng o u t a nd b e c o m i ng m o r e o b v i o us .

A l s o i f i t i s o b v i o us t ha t s o m e p e r s o na l o p i n i o n i s b e i ng

e xp r e s s e d he r e b u t d ue t o l a c k o f r e f e r e nc e t o c o n t e x t

i t i s d i f f i c u l t / i m p o s s i b l e t o a c c u r a t e l y d e c i p he r t he

s e n t i m e n t e xp r e s s e d . E xa m p l e : “ I k i n d o f l i k e h e r o e s a nd

d o n ’t l i k e i t a t t h e s a m e t i m e . . . ” F i na l l y i f t he c o n t e x t o f

t he t w e e t i s no t a p p a r e n t f r o m t he i n f o r m a t i o n a va i l a b le .

E xa m p l e : “ T h a t ’s e x a c t l y h o w I f e e l a b o u t a v e n g e r ’s h a -

h a ” .

• < B l a n k > : L e a ve t he t w e e t un l a b e l e d i f i t b e l o ng s t o s o me

l a ng ua g e o t he r t ha n E ng l i s h s o t ha t i t i s i g no r e d i n t he

t r a i n i ng d a t a .

B e s i d e s t h i s l a b e l e r s w e r e i ns t r uc t e d t o k e e p p e r s o na l b i ases

o u t o f l a b e l l i ng a nd m a k e no a s s um p t i o ns , i . e . j ud g e t he t w e e t

no t f r o m a ny p a s t e x t r a p e r s o na l i n f o r m a t i o n a nd o n l y f r o m

t he i n f o r m a t i o n p r o v i d e d i n t he c u r r e n t i nd i v i d ua l t w e e t .

O nc e w e ha d l a b e l s f r o m f o u r s o u r c e s o u r ne x t s t e p w a s t o

c o m b i ne o p i n i o ns o f t h r e e p e o p l e t o g e t a n a ve r a g e d o p i n i o n.

The w a y w e d i d t h i s i s t h r o ug h m a j o r i t y vo t e .

S o f o r e xa m p l e i f a p a r t i c u l a r t w e e t ha d t o t w o l a b e l s i n

a g r e e m e n t , w e w o u l d l a b e l t he o ve r a l l t w e e t a s s uc h . B u t i f

a l l t h r e e l a b e l s w e r e d i f f e r e n t , w e l a b e l l e d t he t w e e t as

“ una b l e t o r e a c h a m a j o r i t y vo t e ” . W e a r r i ve d a t t he f o l l o w i ng

s t a t i s t i c s f o r e a c h c l a s s a f t e r g o i ng t h r o ug h m a j o r i t y vo t i ng .

Page 22: sentiment analysis report

14

• P o s i t i ve : 2 5 4 3 t w e e t s

• N e g a t i ve : 1 8 7 7 t w e e t s

• N e u t r a l : 4 5 4 3 t w e e t s

• A m b i g uo us : 4 5 1 t w e e t s

• U na b l e t o r e a c h m a j o r i t y vo t e : 3 9 0 t w e e t s

• U n l a b e l e d no n - E ng l i s h t w e e t s : 3 6 9 t w e e t s

S o i f w e i nc l ud e o n l y t ho s e t w e e t s f o r w h i c h w e ha ve b e en

a b l e t o a c h i e ve a p o s i t i ve , ne g a t i ve o r ne u t r a l m a j o r i t y vo t e ,

w e a r e l e f t w i t h 8 9 6 3 t w e e t s f o r o u r t r a i n i ng s e t . O u t o f t he s e

4 5 4 3 a r e o b j e c t i ve t w e e t s a nd 4 4 2 0 a r e s ub j e c t i ve t w e e ts

( s um o f p o s i t i ve a nd ne g a t i ve t w e e t s ) .

W e a l s o c a l c u l a t e d t he hum a n - hum a n a g r e e m e n t f o r o u r t w e e t

l a b e l l i ng t a s k .

4.3. Feature Extraction

N o w t ha t w e ha ve a r r i ve d a t o u r t r a i n i ng s e t w e ne e d t o

e x t r a c t us e f u l f e a t u r e s f r o m i t w h i c h c a n b e us e d i n t he

p r o c e s s o f c l a s s i f i c a t i on . B u t f i r s t w e w i l l d i s c us s s o m e t e x t

f o r m a t t i ng t e c hn i q ue s w h i c h w i l l a i d us i n f e a t u r e e x t r a c t i o n:

• To k e n i za t i o n : I t i s t he p r o c e s s o f b r e a k i ng a s t r e a m o f

t e x t up i n t o w o r d s , s ym b o l s a nd o t he r m e a n i ng f u l

e l e m e n t s c a l l e d “ t o k e ns ” . To k e ns c a n b e s e p a r a t e d b y

w h i t e s p a c e c ha r a c t e r s a nd / o r p unc t ua t i o n c ha r a c t e r s . I t

i s d o ne s o t ha t w e c a n l o o k a t t o k e ns a s i nd i v i d ua l

c o m p o ne n t s t ha t m a k e up a t w e e t [ 1 9 ] .

Page 23: sentiment analysis report

15

• U r l ’ s a nd us e r r e f e r e nc e s ( i d e n t i f i e d b y t o k e ns “ h t t p ” a nd

“ @ ” ) a r e r e m o ve d i f w e a r e i n t e r e s t e d i n o n l y a na l yz i ng

t he t e x t o f t he t w e e t .

• P unc t ua t i o n m a r k s a nd d i g i t s / num e r a l s m a y b e r e m o ved

i f f o r e xa m p l e w e w i s h t o c o m p a r e t he t w e e t t o a l i s t o f

E ng l i s h w o r d s .

• L o w e r c a s e C o nve r s i o n : Tw e e t m a y b e no r m a l i ze d b y

c o nve r t i ng i t t o l o w e r c a s e w h i c h m a k e s i t ’ s c o m p a r i son

w i t h a n E ng l i s h d i c t i o na r y e a s i e r .

• S t e m m i ng : I t i s t he t e x t no r m a l i z i ng p r o c e s s o f r e d uc i ng

a d e r i ve d w o r d t o i t s r o o t o r s t e m [ 2 8 ] . F o r e xa m p l e a

s t e m m e r w o u l d r e d uc e t he p h r a s e s “ s t e m m e r ”,

“ s t e m m e d ” , “ s t e m m i ng ” t o t he r o o t w o r d “ s t e m ” .

A d va n t a g e o f s t e m m i ng i s t ha t i t m a k e s c o m p a r i son

b e t w e e n w o r d s s i m p l e r , a s w e d o no t ne e d t o d e a l w i t h

c o m p l e x g r a m m a t i c a l t r a ns f o r m a t i o ns o f t he w o r d . In o u r

c a s e w e e m p l o ye d t he a l g o r i t hm o f “ p o r t e r s t e m m i ng ” o n

b o t h t he t w e e t s a nd t he d i c t i o na r y , w he ne ve r t he r e w a s

a ne e d o f c o m p a r i s o n .

• S t o p - w o r d s r e m o va l : S t o p w o r d s a r e c l a s s o f s o me

e x t r e m e l y c o m m o n w o r d s w h i c h ho l d no a d d i t i ona l

i n f o r m a t i o n w he n us e d i n a t e x t a nd a r e t hus c l a i m e d t o be

us e l e s s [ 1 9 ] . E xa m p l e s i nc l ud e “ a ” , “ a n ” , “ t he ” , “ he ” , “ s he ” ,

“ b y ” , “ o n ” , e t c . I t i s s o m e t i m e s c o n ve n i e n t t o r e m o ve t he s e

w o r d s b e c a us e t he y ho l d no a d d i t i o na l i n f o r m a t i o n s i nc e t he y

a r e us e d a l m o s t e q ua l l y i n a l l c l a s s e s o f t e x t , f o r e xa m p le

w he n c o m p u t i ng p r i o r - s e n t i m e n t - p o l a r i t y o f w o r d s i n a t w e e t

a c c o r d i ng t o t he i r f r e q ue nc y o f o c c u r r e nc e i n d i f f e r e nt

c l a s s e s a nd us i ng t h i s p o l a r i t y t o c a l c u l a t e t he a ve r a ge

Page 24: sentiment analysis report

16

s e n t i m e n t o f t he t w e e t o ve r t he s e t o f w o r d s us e d i n t ha t

t w e e t .

• P a r t s - o f - S p e ec h Ta g g i ng : P O S - Ta g g i ng i s t he p r o c e s s o f

a s s i g n i ng a t a g t o e a c h w o r d i n t he s e n t e nc e a s t o w h i c h

g r a m m a t i c a l p a r t o f s p e e c h t ha t w o r d b e l o ng s t o , i . e .

no un , ve r b , a d j e c t i ve , a d ve r b , c o o r d i na t i ng c o n j unc t i o n

e t c .

N o w t ha t w e ha ve d i s c us s e d s o m e o f t he t e x t f o r m a t t i ng

t e c hn i q ue s e m p l o ye d b y us , w e w i l l m o ve t o t he l i s t o f

f e a t u r e s t ha t w e ha ve e xp l o r e d . A s w e w i l l s e e b e l o w a

f e a t u r e i s a ny va r i a b l e w h i c h c a n he l p o u r c l a s s i f i e r i n

d i f f e r e n t i a t i ng b e t w e e n t he d i f f e r e n t c l a s s e s . The r e a r e t w o

k i nd s o f c l a s s i f i c a t i o n i n o u r s ys t e m ( a s w i l l b e d i s c us s e d i n

d e t a i l i n t he ne x t s e c t i o n ) , t he o b j e c t i v i t y / s ub j e c t i v i ty

c l a s s i f i c a t i o n a nd t he p o s i t i v i t y / ne g a t i v i t y c l a s s i f i c a t i o n . A s

t he na m e s ug g e s t s t he f o r m e r i s f o r d i f f e r e n t i a t i ng b e t w e en

o b j e c t i ve a nd s ub j e c t i ve c l a s s e s w h i l e t he l a t t e r i s f o r

d i f f e r e n t i a t i ng b e t w e e n p o s i t i ve a nd ne g a t i ve c l a s s e s .

The l i s t o f f e a t u r e s e xp l o r e d f o r o b j e c t i ve / s ub j e c t i ve

c l a s s i f i c a t i o n i s a s b e l o w :

• N um b e r o f e xc l a m a t i o n m a r k s i n a t w e e t

• N um b e r o f q ue s t i o n m a r k s i n a t w e e t

• P r e s e nc e o f e xc l a m a t i o n m a r k s i n a t w e e t

• P r e s e nc e o f q ue s t i o n m a r k s i n a t w e e t

• P r e s e nc e o f u r l i n a t w e e t

• P r e s e nc e o f e m o t i c o ns i n a t w e e t

• U n i g r a m w o r d m o d e l s c a l c u l a t e d us i ng N a i ve B a ye s

• P r i o r p o l a r i t y o f w o r d s t h r o ug h o n l i ne l e x i c o n M P Q A

Page 25: sentiment analysis report

17

• N um b e r o f d i g i t s i n a t w e e t

• N um b e r o f c a p i t a l i ze d w o r d s i n a t w e e t

• N um b e r o f c a p i t a l i ze d c ha r a c t e r s i n a t w e e t

• N um b e r o f p unc t ua t i o n m a r k s / s ym b o l s i n a t w e e t

R a t i o o f no n - d i c t i o na r y w o r d s t o t he t o t a l num b e r o f w o r ds

i n t he t w e e t

• L e ng t h o f t he t w e e t

• N um b e r o f a d j e c t i ve s i n a t w e e t

• N um b e r o f c o m p a r a t i ve a d j e c t i ve s i n a t w e e t

• N um b e r o f s up e r l a t i ve a d j e c t i ve s i n a t w e e t

• N um b e r o f b a s e - f o r m ve r b s i n a t w e e t

• N um b e r o f p a s t t e ns e ve r b s i n a t w e e t

• N um b e r o f p r e s e n t p a r t i c i p l e ve r b s i n a t w e e t

• N um b e r o f p a s t p a r t i c i p l e ve r b s i n a t w e e t

• N um b e r o f 3r d

p e r s o n s i ng u l a r p r e s e n t ve r b s i n a t w e e t

• N um b e r o f no n - 3r d

p e r s o n s i ng u l a r p r e s e n t ve r b s i n a

t w e e t

• N um b e r o f a d ve r b s i n a t w e e t

• N um b e r o f p e r s o na l p r o no uns i n a t w e e t

• N um b e r o f p o s s e s s i ve p r o no uns i n a t w e e t

• N um b e r o f s i ng u l a r p r o p e r no un i n a t w e e t

• N um b e r o f p l u r a l p r o p e r no un i n a t w e e t

Page 26: sentiment analysis report

18

• N um b e r o f c a r d i na l num b e r s i n a t w e e t

• N um b e r o f p o s s e s s i ve e nd i ng s i n a t w e e t

• N um b e r o f w h - p r o no uns i n a t w e e t

• N um b e r o f a d j e c t i ve s o f a l l f o r m s i n a t w e e t

• N um b e r o f ve r b s o f a l l f o r m s i n a t w e e t

• N um b e r o f no uns o f a l l f o r m s i n a t w e e t

• N um b e r o f p r o no uns o f a l l f o r m s i n a t w e e t

The l i s t o f f e a t u r e s e xp l o r e d f o r p o s i t i ve / ne g a t i ve

c l a s s i f i c a t i o n a r e g i ve n b e l o w :

• O ve r a l l e m o t i c o n s c o r e ( w he r e 1 i s a d d e d t o t he s c o r e i n

c a s e o f p o s i t i ve e m o t i c o n , a nd 1 i s s ub t r a c t e d i n c a se

o f ne g a t i ve e m o t i c o n )

O ve r a l l s c o r e f r o m o n l i ne p o l a r i t y l e x i c o n M P Q A ( w he r e

p r e s e nc e o f s t r o ng p o s i t i ve w o r d i n t he t w e e t i nc r e a ses

t he s c o r e b y 1 . 0 a nd t he p r e s e nc e o f w e a k ne g a t i ve w o r d

w o u l d d e c r e a s e t he s c o r e b y 0 . 5 )

• U n i g r a m w o r d m o d e l s c a l c u l a t e d us i ng N a i ve B a ye s

• N um b e r o f t o t a l e m o t i c o ns i n t he t w e e t

• N um b e r o f p o s i t i ve e m o t i c o ns i n a t w e e t

• N um b e r o f ne g a t i ve e m o t i c o ns i n a t w e e t

• N um b e r o f p o s i t i ve w o r d s f r o m M P Q A l e x i c o n i n t w e e t

• N um b e r o f ne g a t i ve w o r d s f r o m M P Q A l e x i c o n i n t w e e t

• N um b e r o f b a s e - f o r m ve r b s i n a t w e e t

Page 27: sentiment analysis report

19

• N um b e r o f p a s t t e ns e ve r b s i n a t w e e t

• N um b e r o f p r e s e n t p a r t i c i p l e ve r b s i n a t w e e t

• N um b e r o f p a s t p a r t i c i p l e ve r b s i n a t w e e t

• N um b e r o f 3r d

p e r s o n s i ng u l a r p r e s e n t ve r b s i n a t w e e t

• N um b e r o f no n - 3r d

p e r s o n s i ng u l a r p r e s e n t ve r b s i n a

t w e e t

• N um b e r o f p l u r a l no uns i n a t w e e t

• N um b e r o f s i ng u l a r p r o p e r no uns i n a t w e e t

• N um b e r o f c a r d i na l num b e r s i n a t w e e t

• N um b e r o f p r e p o s i t i ons o r c o o r d i na t i ng c o n j unc t i o ns i n a

t w e e t

• N um b e r o f a d ve r b s i n a t w e e t

• N um b e r o f w h - a d ve r b s i n a t w e e t

• N um b e r o f ve r b s o f a l l f o r m s i n a t w e e t

N e x t w e w i l l g i ve m a t he m a t i c a l r e a s o n i ng o f ho w w e

c a l c u l a t e t he un i g r a m w o r d m o d e l s us i ng N a i ve B a ye s . The

b a s i c c o nc e p t i s t o c a l c u l a t e t he p r o b a b i l i t y o f a w o rd

b e l o ng i ng t o a ny o f t he p o s s i b l e c l a s s e s f r o m o u r t r a i n i ng

s a m p l e . U s i ng m a t he m a t i c a l f o r m u l a e w e w i l l d e m o ns t r a t e a n

e xa m p l e o f c a l c u l a t i ng p r o b a b i l i t y o f w o r d b e l o ng t o o b j e c t i ve

a nd s ub j e c t i ve c l a s s . S i m i l a r s t e p s w o u l d ne e d t o b e t a k e n

f o r p o s i t i ve a nd ne g a t i ve c l a s s e s a s w e l l .

W e w i l l s t a r t b y c a l c u l a t i ng t he p r o b a b i l i t y o f a w o r d i n o u r

t r a i n i ng d a t a f o r b e l o ng i ng t o a p a r t i c u l a r c l a s s :

Page 28: sentiment analysis report

20

F i g u r e 2 : P r o b a b i l i t y F o r m u l a 1

W e no w s t a t e t he B a ye s ’ r u l e [ 1 9 ] . A c c o r d i ng t o t h i s r u l e , i f

w e ne e d t o f i nd t he p r o b a b i l i t y o f w he t he r a t w e e t i s

o b j e c t i ve , w e ne e d t o c a l c u l a t e t he p r o b a b i l i t y o f t w e e t g i ve n

t he o b j e c t i ve c l a s s a nd t he p r i o r p r o b a b i l i t y o f o b j e c t i ve

c l a s s . The t e r m P ( t we e t ) c a n b e s ub s t i t u t e d w i t h P ( t w e e t |

o b j ) + P ( t w e e t | s ub j ) .

F i g u r e 3 : P r o b a b i l i t y F o r m u l a 2

N o w i f w e a s s um e i nd e p e nd e nc e o f t he un i g r a m s i ns i d e t he

t w e e t ( i . e . t he o c c u r r e nc e o f a w o r d i n a t w e e t w i l l no t a f f e c t

t he p r o b a b i l i t y o f o c c u r r e nc e o f a ny o t he r w o r d i n t he t w e e t )

w e c a n a p p r o x i m a t e t he p r o b a b i l i t y o f t w e e t g i ve n t he

o b j e c t i ve c l a s s t o a m e r e p r o d uc t o f t he p r o b a b i l i t y o f a l l t he

w o r d s i n t he t w e e t b e l o ng i ng t o o b j e c t i ve c l a s s . M o r e o ve r , i f

w e a s s um e e q ua l c l a s s s i ze s f o r b o t h o b j e c t i ve a nd

s ub j e c t i ve c l a s s w e c a n i g no r e t he p r i o r p r o b a b i l i t y o f t he

Page 29: sentiment analysis report

21

o b j e c t i ve c l a s s . H e nc e f o r t h w e a r e l e f t w i t h t he f o l l o w i ng

f o r m u l a , i n w h i c h t he r e a r e t w o d i s t i nc t t e r m s a nd b o t h o f

t he m a r e e a s i l y c a l c u l a t e d t h r o ug h t he f o r m u l a m e n t i o n

a b o ve .

F i g u r e 4 : P r o b a b i l i t y F o r m u l a 3

N o w t ha t w e ha ve t he p r o b a b i l i t y o f o b j e c t i v i t y g i ve n a

p a r t i c u l a r t w e e t , w e c a n e a s i l y c a l c u l a t e t he p r o b a b i l i t y o f

s ub j e c t i v i t y g i ve n t ha t s a m e t w e e t b y s i m p l y s ub t r a c t i ng t he

e a r l i e r t e r m f r o m 1 . Th i s i s b e c a us e p r o b a b i l i t i e s m us t a l w a ys

a d d t o 1 . S o i f w e ha ve i n f o r m a t i o n o f P ( o b j | t we e t ) w e

a u t o m a t i c a l l y k no w P ( s u b j | t we e t ) .

F i g u r e 5 : P r o b a b i l i t y F o r m u l a 4

F i na l l y w e c a l c u l a t e P ( o b j | t w e e t ) f o r e ve r y t w e e t a nd us e

t h i s t e r m a s a s i ng l e f e a t u r e i n o u r o b j e c t i v i t y / s ub j e c t i v i ty

c l a s s i f i c a t i o n .

Page 30: sentiment analysis report

22

The r e a r e t w o m a i n p o t e n t i a l p r o b l e m s w i t h t h i s a p p r o a ch.

F i r s t b e i ng t ha t i f w e i nc l ud e e ve r y un i q ue w o r d us e d i n t he

d a t a s e t t he n t he l i s t o f w o r d s w i l l b e t o o l a r g e m a k i ng t he

c o m p u t a t i o n t o o e xp e ns i ve a nd t i m e - c o ns um i ng . To s o l ve t h i s

w e o n l y i nc l ud e w o r d s w h i c h ha ve b e e n us e d a t l e a s t 5 t i m es

i n o u r d a t a . Th i s r e d uc e s t he s i ze o f o u r d i c t i o na r y f o r

o b j e c t i ve / s ub j e c t i ve c l a s s i f i c a t i on f r o m 1 1 , 2 1 6 t o 2 , 3 2 0 .

W h i l e f o r p o s i t i ve / ne g a t i ve c l a s s i f i c a t i o n un i g r a m d i c t i o na ry

s i ze i s r e d uc e d f r o m 6 , 5 0 2 t o 1 , 2 3 5 w o r d s .

The s e c o nd p o t e n t i a l p r o b l e m i s i f i n o u r t r a i n i ng s e t a

p a r t i c u l a r w o r d o n l y a p p e a r s i n a c e r t a i n c l a s s o n l y a nd d oes

no t a p p e a r a t a l l i n t he o t he r c l a s s ( f o r e xa m p l e i f t he w o rd

i s m i s s p e l l e d o n l y o nc e ) . I f w e ha ve s uc h a s c e na r i o t he n o u r

c l a s s i f i e r w i l l a l w a ys c l a s s i f y a t w e e t t o t ha t p a r t i c u l a r c l a ss

( r e g a r d l e s s o f a ny o t he r f e a t u r e s p r e s e n t i n t he t w e e t ) j us t

b e c a us e o f t he p r e s e nc e o f t ha t s i ng l e w o r d . Th i s i s a ve r y

ha r s h a p p r o a c h a nd r e s u l t s i n o ve r - f i t t i ng . To a vo i d t h i s w e

m a k e us e o f t he t e c hn i q ue k no w n a s “ L a p l a c e S m o o t h i ng ”.

W e r e p l a c e t he f o r m u l a f o r c a l c u l a t i ng t he p r o b a b i l i t y o f a

w o r d b e l o ng i ng t o a c l a s s w i t h t he f o l l o w i ng f o r m u l a : \

F i g u r e 6 : P r o b a b i l i t y F o r m u l a 5

In t h i s f o r m u l a “ x ” i s a c o ns t a n t f a c t o r c a l l e d t he s m o o t h i ng

f a c t o r , w h i c h w e ha ve a r b i t r a r i l y s e l e c t e d t o b e 1 . H o w t h i s

w o r k s i s t ha t e ve n i f t he c o un t o f a w o r d i n a p a r t i c u l a r c l a ss

Page 31: sentiment analysis report

23

i s ze r o , t he num e r a t o r s t i l l ha s a s m a l l va l ue s o t he

p r o b a b i l i t y o f a w o r d b e l o ng i ng t o s o m e c l a s s w i l l ne ve r be

e q ua l t o ze r o . Ins t e a d i f t he p r o b a b i l i t y w o u l d ha ve b e e n ze r o

a c c o r d i ng t o t he e a r l i e r f o r m u l a , i t w o u l d b e r e p l a c e b y a ve r y

s m a l l no n - ze r o p r o b a b i l i t y .

The f i na l i s s ue w e ha ve i n f e a t u r e s e l e c t i o n i s c ho o s i ng

t he b e s t f e a t u r e s f r o m a l a r g e num b e r o f f e a t u r e s . O ur

u l t i m a t e a i m i s t o a c h i e ve t he g r e a t e s t a c c u r a c y o f o u r

c l a s s i f i e r w h i l e us i ng l e a s t num b e r o f f e a t u r e s . Th i s i s

b e c a us e a d d i ng ne w f e a t u r e a d d t o t he d i m e ns i o na l i t y o f o u r

c l a s s i f i c a t i o n p r o b l e m a nd t hus a d d t o t he c o m p l e x i t y o f o u r

c l a s s i f i e r . Th i s i nc r e a s e i n c o m p l e x i t y m a y no t ne c e s s a r i ly

b e l i ne a r a nd m a y e ve n b e q ua d r a t i c s o i t i s p r e f e r r e d t o k eep

t he f e a t u r e s a t a m i n i m um l o w . A no t he r i s s ue w e ha ve w i t h

t o o m a ny f e a t u r e s i s t ha t o u r t r a i n i ng d a t a m a y b e o ve r - f i t

a nd i t m a y c o n f us e t he c l a s s i f i e r w he n d o i ng c l a s s i f i c a t i on

o n a n unk no w n t e s t s e t , s o t he a c c u r a c y o f t he c l a s s i f i e r m a y

e ve n d e c r e a s e . To s o l ve t h i s i s s ue w e s e l e c t t he m o s t

p e r t i ne n t f e a t u r e s b y c o m p u t i ng t he i n f o r m a t i o n - g a i n o f a l l

t he f e a t u r e s und e r e xp l o r a t i o n a nd t he n s e l e c t i ng t he

f e a t u r e s w i t h h i g he s t i n f o r m a t i o n g a i n . W e us e d W E K A

m a c h i ne l e a r n i ng t o o l f o r t h i s t a s k o f f e a t u r e s e l e c t i o n [ 1 7 ] .

W e e xp l o r e d a t o t a l o f 3 3 f e a t u r e s f o r o b j e c t i v i t y / s ub j e c t i v i ty

c l a s s i f i c a t i o n a nd us e d W E K A t o c a l c u l a t e t he i n f o r m a t i on

g a i n f r o m e a c h o f t he s e f e a t u r e s .

Th i s g r a p h i s b a s i c a l l y t he s up e r - i m p o s i t i o n o f 1 0 d i f f e r e nt

g r a p hs , e a c h o ne a r r i ve d t h r o ug h o ne f o l d o u t o f t he 1 0 - f o ld

c r o s s va l i d a t i o n w e p e r f o r m e d . S i nc e w e s e e t ha t a l l t he

g r a p hs a r e n i c e l y o ve r l a p p i ng s o t he r e s u l t s e a c h f o l d a re

a l m o s t t he s a m e w h i c h s ho w s us t ha t t he f e a t u r e s w e s e l e c t

Page 32: sentiment analysis report

24

w i l l p e r f o r m b e s t i n a l l t he s c e na r i o s . W e s e l e c t e d t he b e s t 5

f e a t u r e s f r o m t h i s g r a p h w h i c h a r e a s f o l l o w s :

1 . U n i g r a m w o r d m o d e l s ( f o r p r i o r p r o b a b i l i t i es o f w o r ds

b e l o ng i ng t o o b j e c t i ve / s ub j e c t i ve c l a s s e s )

2 . P r e s e nc e o f U R L i n t w e e t

3 . P r e s e nc e o f e m o t i c o ns i n t w e e t

4 . N um b e r o f p e r s o na l p r o no uns i n t w e e t

5 . N um b e r o f e xc l a m a t i o n m a r k s i n t w e e t

S i m i l a r l y w e e xp l o r e d 2 2 f e a t u r e s f o r p o s i t i ve / ne g a t i ve

c l a s s i f i c a t i o n a nd us e d W E K A t o c a l c u l a t e t he i n f o r m a t i on

g a i n f r o m e a c h o f t he s e f e a t u r e s .

Th i s g r a p h i s b a s i c a l l y t he s up e r - i m p o s i t i o n o f 1 0 d i f f e r e nt

g r a p hs , e a c h o ne a r r i ve d t h r o ug h o ne f o l d o u t o f t he 1 0 - f o ld

c r o s s va l i d a t i o n w e p e r f o r m e d . S i nc e w e s e e t ha t a l l t he

g r a p hs a r e n i c e l y o ve r l a p p i ng s o t he r e s u l t s e a c h f o l d a re

a l m o s t t he s a m e w h i c h s ho w s us t ha t t he f e a t u r e s w e s e l e c t

w i l l p e r f o r m b e s t i n a l l t he s c e na r i o s . W e s e l e c t e d t he b e s t 5

f e a t u r e s o u t o f w h i c h 2 w e r e r e d und a n t f e a t u r e s a nd w e w e re

l e f t w i t h o n l y 3 f e a t u r e s f o r o u r p o s i t i ve / ne g a t i ve

c l a s s i f i c a t i o n w h i c h a r e a s f o l l o w s :

1 . U n i g r a m w o r d m o d e l s ( f o r p r i o r p r o b a b i l i t i es o f w o r ds

b e l o ng i ng t o p o s i t i ve o r ne g a t i ve c l a s s e s )

2 . N um b e r o f p o s i t i ve e m o t i c o ns i n t w e e t

3 . N um b e r o f ne g a t i ve e m o t i c o ns i n t w e e t

Page 33: sentiment analysis report

25

The r e d und a n t f e a t u r e s w e c ho s e i g no r e b e c a us e t he y p o sed

no e x t r a i n f o r m a t i o n i n p r e s e nc e o f t he a b o ve f e a t u r e s a r e as

f o l l o w s :

• E m o t i c o n s c o r e f o r t he t w e e t

• M P Q A s c o r e f o r t he t w e e t

4 . 4 . C l a s s i f i c a t i o n :

P a t t e r n c l a s s i f i c a t i o n i s t he p r o c e s s t h r o ug h w h i c h d a t a i s

d i v i d e d i n t o d i f f e r e n t c l a s s e s a c c o r d i ng t o s o m e c o m m o n

p a t t e r ns w h i c h a r e f o und i n o ne c l a s s w h i c h d i f f e r t o s o me

d e g r e e w i t h t he p a t t e r ns f o und i n t he o t he r c l a s s e s . The

u l t i m a t e a i m o f o u r p r o j e c t i s t o d e s i g n a c l a s s i f i e r w h i c h

a c c u r a t e l y c l a s s i f i e s t w e e t s i n t he f o l l o w i ng f o u r s e n t i m e n t

c l a s s e s : p o s i t i ve , ne g a t i ve , ne u t r a l a nd a m b i g uo us .

The r e c a n b e t w o k i nd s o f s e n t i m e n t c l a s s i f i c a t i ons i n t h i s

a r e a : c o n t e x t ua l s e n t i m e n t a na l ys i s a nd g e ne r a l s e n t i m e n t

a na l ys i s . C o n t e x t ua l s e n t i m e n t a na l ys i s d e a l s w i t h

c l a s s i f y i ng s p e c i f i c p a r t s o f a t w e e t a c c o r d i ng t o t he c o n t e x t

p r o v i d e d , f o r e xa m p l e f o r t he t w e e t “ 4 m o r e y e a r s o f b e i n g i n

s h i t h o l e A u s t r a l i a t h e n I m o v e t o t h e U S A : D ” a c o n t e x t ua l

s e n t i m e n t c l a s s i f i e r w o u l d i d e n t i f y A us t r a l i a w i t h ne g a t i ve

s e n t i m e n t a nd U S A w i t h a p o s i t i ve s e n t i m e n t . O n t he o t he r

ha nd g e ne r a l s e n t i m e n t a na l ys i s d e a l s w i t h t he g e ne r a l

s e n t i m e n t o f t he e n t i r e t e x t ( t w e e t i n t h i s c a s e ) a s a w ho l e .

Thus f o r t he t w e e t m e n t i o ne d e a r l i e r s i nc e t he r e i s a n o ve r a l l

p o s i t i ve a t t i t ud e , a n a c c u r a t e g e ne r a l s e n t i m e n t c l a s s i f i e r

w o u l d i d e n t i f y i t a s p o s i t i ve . F o r o u r p a r t i c u l a r p r o j e c t w e w i l l

o n l y b e d e a l i ng w i t h t he l a t t e r c a s e , i . e . o f g e ne r a l ( o ve r a l l )

s e n t i m e n t a na l ys i s o f t he t w e e t a s a w ho l e .

The c l a s s i f i c a t i on a p p r o a c h g e ne r a l l y f o l l o w e d i n t h i s d o m a i n

i s a t w o - s t e p a p p r o a c h . F i r s t O b j e c t i v i t y C l a s s i f i c a t i on i s

Page 34: sentiment analysis report

26

d o ne w h i c h d e a l s w i t h c l a s s i f y i ng a t w e e t o r a p h r a s e as

e i t he r o b j e c t i ve o r s ub j e c t i ve . A f t e r t h i s w e p e r f o r m P o l a r i ty

C l a s s i f i c a t i on ( o n l y o n t w e e t s c l a s s i f i e d a s s ub j e c t i ve b y t he

o b j e c t i v i t y c l a s s i f i c a t i o n ) t o d e t e r m i ne w he t he r t he t w e e t i s

p o s i t i ve , ne g a t i ve o r b o t h ( s o m e r e s e a r c he r s i nc l ud e t he b o t h

c a t e g o r y a nd s o m e d o n ’ t ) . Th i s w a s p r e s e n t e d b y W i l s o n e t

a l . a nd r e p o r t s e nha nc e d a c c u r a c y t ha n a s i m p l e o ne - s t ep

a p p r o a c h [ 1 6 ] .

W e p r o p o s e a no ve l a p p r o a c h w h i c h i s s l i g h t l y d i f f e r e n t f r o m

t he a p p r o a c h p r o p o s e d b y W i l s o n e t a l . [ 1 6 ] . W e p r o p o s e t ha t

i n f i r s t s t e p e a c h t w e e t s ho u l d und e r g o t w o c l a s s i f i e r s : t he

o b j e c t i v i t y c l a s s i f i e r a nd t he p o l a r i t y c l a s s i f i e r . The f o r m er

w o u l d t r y t o c l a s s i f y a t w e e t b e t w e e n o b j e c t i ve a nd s ub j e c t i ve

c l a s s e s , w h i l e l a t t e r w o u l d d o s o b e t w e e n t he p o s i t i ve a nd

ne g a t i ve c l a s s e s . W e us e t he s ho r t - l i s t e d f e a t u r e s f o r t he s e

c l a s s i f i c a t i o ns a nd us e t he N a i ve B a ye s a l g o r i t hm s o t ha t

a f t e r t he f i r s t s t e p w e ha ve t w o num b e r s f r o m 0 t o 1

r e p r e s e n t i ng e a c h t w e e t . O ne o f t he s e num b e r s i s t he

p r o b a b i l i t y o f t w e e t b e l o ng i ng t o o b j e c t i ve c l a s s a nd t he o t he r

num b e r i s p r o b a b i l i t y o f t w e e t b e l o ng i ng t o p o s i t i ve c l a s s .

S i nc e w e c a n e a s i l y c a l c u l a t e t he t w o r e m a i n i ng p r o b a b i l i t i es

o f s ub j e c t i ve a nd ne g a t i ve b y s i m p l e s ub t r a c t i o n b y 1 , w e

d o n ’ t ne e d t ho s e t w o p r o b a b i l i t i e s .

S o i n t he s e c o nd s t e p w e w o u l d t r e a t e a c h o f t he s e t w o

num b e r s a s s e p a r a t e f e a t u r e s f o r a no t he r c l a s s i f i c a t i o n , i n

w h i c h t he f e a t u r e s i ze w o u l d b e j us t 2 . W e us e W E K A a nd

a p p l y t he f o l l o w i ng M a c h i ne L e a r n i ng a l g o r i t hm s f o r t h i s

s e c o nd c l a s s i f i c a t i o n t o a r r i ve a t t he b e s t r e s u l t :

• K - M e a ns C l us t e r i ng

• S up p o r t V e c t o r M a c h i ne

Page 35: sentiment analysis report

27

• L o g i s t i c R e g r e s s i o n

• K ne a r e s t N e i g hb o r s

• N a i ve B a ye s

• R u l e B a s e d C l a s s i f i e r s

To b e t t e r und e r s t a nd ho w t h i s w o r k s w e s ho w a p l o t o f a c t ua l

t e s t s e t f r o m o ne o f o u r c r o s s - va l i d a t i o ns o n t he 2 -

d i m e ns i o na l s p a c e m e n t i o ne d t he l a b e l s a r e t he a c t ua l

g r o und t r u t h a nd t he d i s t r i b u t i o n s ho w s ho w t he c l a s s i f i ed

d a t a p o i n t s a r e a c t ua l l y s c a t t e r e d t h r o ug ho u t t he s p a c e . A s

w e g o r i g h t t he t w e e t s t a r t s b e c o m i ng i nc r e a s i ng l y o b j e c t i ve

a nd a s w e g o up t he t w e e t s t a r t s b e c o m i ng m o r e p o s i t i ve . The

r e s u l t s f o r o u r c l a s s i f i c a t i o n a p p r o a c h a r e m e n t i o ne d i n t he

ne x t s e c t i o n o f t h i s r e p o r t .

4 . 5 . T we e t Mo d e W e b Ap p lic a t io n :

W e d e s i g ne d a w e b a p p l i c a t i on w h i c h p e r f o r m e d r e a l - t i me

s e n t i m e n t a na l ys i s o n Tw i t t e r o n t w e e t s t ha t m a t c hed

p a r t i c u l a r k e yw o r d s p r o v i d e d b y t he us e r . F o r e xa m p l e i f a

us e r i s i n t e r e s t e d i n p e r f o r m i ng s e n t i m e n t a na l ys i s o n t w e e ts

w h i c h c o n t a i n t he w o r d “ O b a m a ” he / s he w i l l e n t e r t ha t

k e yw o r d a nd t he w e b a p p l i c a t i o n w i l l p e r f o r m t he a p p r o p r i a te

s e n t i m e n t a na l ys i s a nd d i s p l a y t he r e s u l t s f o r t he us e r .

The w e b a p p l i c a t i on ha s b e e n i m p l e m e n t e d us i ng t he G o o g le

A p p E ng i ne s e r v i c e [ 2 1 ] b e c a us e i t c a n b e us e d a s a f r e e w eb

ho s t i ng s e r v i c e a nd i t p r o v i d e s a l a ye r o f a b s t r a c t i o n t o t he

d e ve l o p e r f r o m t he l o w l e ve l w e b o p e r a t i o ns s o i t i s e a s i e r t o

l e a r n . W e i m p l e m e n t e d o u r a l g o r i t hm i n p y t ho n a nd i n t e g r a ted

i t w i t h G U I f o r o u r w e b s i t e us i ng H TM L a nd J a va s c r i p t us i ng

t he j i n j a 2 t e m p l a t e [ 2 3 ] . W e us e d t he G o o g l e V i s ua l i za t i on

Page 36: sentiment analysis report

28

C ha r t A P I f o r p r e s e n t i ng o u r r e s u l t s i n a g r a p h i c a l , e a s y - t o -

und e r s t a nd m a nne r [ 2 2 ] .

W e ha ve t h r e e w a ys o f p e r f o r m i ng s e n t i m e n t a na l ys i s o n o u r

w e b s i t e a nd w e w i l l d i s c us s e a c h o f t he m o ne b y o ne :

• Tw e e t S c o r e

• Tw e e t C o m p a r e

• Tw e e t S t a t s

4 . 5 . 1 . T w e e t S c o r e :

Th i s f e a t u r e c a l c u l a t e s t he p o p u l a r i t y s c o r e o f t he k e yw o rd

w h i c h i s a num b e r f r o m 1 0 0 t o - 1 0 0 . The m o r e p o s i t i ve

p o p u l a r i t y s c o r e s ug g e s t s t ha t t he k e yw o r d i s h i g h l y

p o s i t i ve l y p o p u l a r o n Tw i t t e r , w h i l e t he m o r e ne g a t i ve

p o p u l a r i t y s c o r e s ug g e s t s t ha t t he k e yw o r d i s h i g h l y

ne g a t i ve l y p o p u l a r o n Tw i t t e r . A p o p u l a r i t y s c o r e c l o s e t o 0

s ug g e s t s t ha t t he k e yw o r d ha s e i t he r m i xe d o p i n i o ns o r i s no t

a p o p u l a r t o p i c o n Tw i t t e r . The p o p u l a r i t y s c o r e i s d e p e nd e nt

o n t w o r a t i o s :

• N um b e r o f p o s i t i ve c l a s s i f i ed t w e e t s / N um b e r o f ne g a t i ve

c l a s s i f i e d t w e e t s

• N um b e r o f t w e e t s a c q u i r e d / T i m e i n p a s t ne e d e d t o

e xp l o r e t he R E S T A P I

The f i r s t r a t i o s ug g e s t s i f t he num b e r o f p o s i t i ve t w e e t s i s

l a r g e r t ha n ne g a t i ve t w e e t s o n a p a r t i c u l a r k e yw o r d , t he

k e yw o r d w o u l d ha ve o ve r a l l p o p u l a r o p i n i o n a nd v i c e ve r s a .

The s e c o nd r a t i o s ug g e s t s t ha t t he l e s s e r t i m e i n p a s t w e

ne e d t o e xp l o r e t he R E S T A P I t o g e t t he 1 , 0 0 0 t w e e t s m e a ns

Page 37: sentiment analysis report

29

t ha t t he m o r e num b e r o f p e o p l e a r e t a l k i ng a b o u t t he k e yw o rd

o n Tw i t t e r , he nc e t he k e yw o r d i s p o p u l a r o n Tw i t t e r . H o w e ve r

i t g i ve s no i n f o r m a t i o n a b o u t t he p o s i t i v i t y o r ne g a t i v i t y o f

t he k e yw o r d a nd s o h i g he r t he s e c o nd r a t i o i s , t he m o re

p o p u l a r i t y s c o r e f r o m t he f i r s t r a t i o i s s h i f t e d t o t he e x t r e m e

e nd s ( a w a y f r o m ze r o ) m a y i t b e i n p o s i t i ve o r ne g a t i ve

d i r e c t i o n d e p e nd s o n w he t he r t he r e a r e m o r e num b e r o f

p o s i t i ve o r ne g a t i ve t w e e t s . F i na l l y a m a x i m um o f 1 0 t w e e ts

a r e d i s p l a ye d f o r e a c h c l a s s ( p o s i t i ve , ne g a t i ve a nd ne u t r a l )

s o t ha t t he us e r d e ve l o p s c o n f i d e nc e i n o u r c l a s s i f i e r .

4 . 5 . 2 . T w e e t C o m p a r e :

Th i s f e a t u r e c o m p a r e s t he p o p u l a r i t y s c o r e o f t w o o r t h r e e

d i f f e r e n t k e yw o r d s a nd r e p l i e s w i t h w h i c h k e yw o r d i s

c u r r e n t l y m o s t p o p u l a r o n Tw i t t e r . Th i s c a n ha ve m a ny

i n t e r e s t i ng a p p l i c a t i o ns f o r e xa m p l e ha v i ng o u r w eb

a p p l i c a t i on r e c o m m e nd us e r s b e t w e e n m o v i e s , s o ng s a nd

p r o d uc t s / b r a nd s .

4 . 5 . 3 . T w e e t S ta ts :

Th i s f e a t u r e i s f o r l o ng t e r m s e n t i m e n t a na l ys i s . W e i np u t a

num b e r o f p o p u l a r k e yw o r d s o n Tw i t t e r o n w h i c h a b a c k end

o p e r a t i o n r uns a f t e r e ve r y ho u r , c a l c u l a t e s t he p o p u l a r i ty

s c o r e f o r t he t w e e t s g e ne r a t e d o n t ha t k e yw o r d w i t h i n a n ho u r

t i m e f r a m e a nd s t o r e s t he r e s u l t s a g a i ns t e ve r y ho u r i n a

d a t a b a s e . W e c a n ha ve a m a x i m um o f a b o u t 3 0 0 s uc h

k e yw o r d s a s p e r G o o g l e ’ s b a nd w i d t h r e q u i r e m e n t s . S o o nce

w e ha ve a r e a s o na b l e a m o un t o f d a t a w e c a n us e i t t o p l o t

g r a p hs o f p o p u l a r i t y s c o r e a g a i ns t t i m e a nd v i s ua l i ze t he

e f f e c t o f c ha ng e i n p o p u l a r i t y s c o r e w i t h r e s p e c t t o c e r t a i n

e ve n t s . O nc e w e ha ve c o l l e c t e d e no ug h d a t a w e c a n a l s o us e

i t t o p r e d i c t c o r r e l a t i o n w i t h s o c i o - e c o no m i c p he no m e na l i ke

s t o c k e xc ha ng e r a t e s a nd p o l i t i c a l e l e c t i o ns . W o r k o n t h i s

Page 38: sentiment analysis report

30

ha s b e e n d o ne b e f o r e b y Tum a s j a n e t a l . [ 4 ] a nd B o l l e n e t a l .

[ 9 ] .

F i g u r e 7 : H o m e P a g e

Page 39: sentiment analysis report

31

F i g u r e 8 : S e a r c h P a g e

F i g u r e 9 : R e s u l t

Page 40: sentiment analysis report

32

F i g u r e 1 0 : P o s i t i ve T w e e t s

F i g u r e 1 1 : S Q L D a t a b a s e

Page 41: sentiment analysis report

33

F i g u r e 1 2 : S Q L D a t a b a s e 2

F i g u r e 1 3 : P o s i t i v e D a t a s e t 1

Page 42: sentiment analysis report

34

F i g u r e 1 4 : P o s i t i v e d a t a s e t 2

F i g u r e 1 5 : N e g a t i v e D a t a s e t 1

Page 43: sentiment analysis report

35

F i g u r e 1 6 : N e g a t i ve D a t a s e t 2

Page 44: sentiment analysis report

36

Chapter 5: RESULT DISCUSSION

W e w i l l f i r s t p r e s e n t o u r r e s u l t s f o r t he o b j e c t i ve / s ub j e c t i ve

a nd p o s i t i ve / ne g a t i ve c l a s s i f i c a t i o ns . The s e r e s u l t s a c t as

t he f i r s t s t e p o f o u r c l a s s i f i c a t i on a p p r o a c h . W e o n l y us e t he

s ho r t - l i s t e d f e a t u r e s f o r b o t h o f t he s e r e s u l t s . Th i s m e a ns

t ha t f o r t he o b j e c t i ve / s ub j e c t i ve c l a s s i f i c a t i o n w e ha ve 5

f e a t u r e s a nd f o r p o s i t i ve / ne g a t i ve c l a s s i f i c a t i on w e ha ve 3

f e a t u r e s . F o r b o t h o f t he s e r e s u l t s w e us e t he N a ïve B a yes

c l a s s i f i c a t i o n a l g o r i t hm , b e c a us e t ha t i s t he a l g o r i t hm w e a re

e m p l o y i ng i n o u r a c t ua l c l a s s i f i c a t i o n a p p r o a c h a t t he f i r s t

s t e p . F u r t he r m o r e a l l t he f i g u r e s r e p o r t e d a r e t he r e s u l t o f

1 0 - f o l d c r o s s va l i d a t i o n . W e t a k e a n a ve r a g e o f e a c h o f t he

1 0 va l ue s w e g e t f r o m t he c r o s s va l i d a t i o n .

In a d d i t i o n t o t he a b o ve i n f o r m a t i o n , w e m a k e a c o nd i t i on

w h i l e r e p o r t i ng t he r e s u l t s o f p o l a r i t y c l a s s i f i c a t i o n ( w h i c h

d i f f e r e n t i a t e s b e t w e e n p o s i t i ve a nd ne g a t i ve c l a s s e s ) t ha t

o n l y s ub j e c t i ve l a b e l l e d t w e e t s a r e us e d t o c a l c u l a t e t he s e

r e s u l t s . H o w e ve r , i n c a s e o f f i na l c l a s s i f i c a t i on a p p r o a c h , a ny

s uc h c o nd i t i o n i s r e m o ve d a nd b a s i c a l l y b o t h o b j e c t i v i t y a nd

p o l a r i t y c l a s s i f i c a t i ons a r e a p p l i e d t o a l l t w e e t s r e g a r d l e s s o f

w he t he r t he y a r e l a b e l l e d o b j e c t i ve o r s ub j e c t i ve .

I f w e c o m p a r e t he s e r e s u l t s t o t ho s e p r o v i d e d b y W i l s o n e t

a l . [ 1 6 ] ( r e s u l t s a r e d i s p l a ye d i n Ta b l e 2 a nd Ta b l e 3 o f t h i s

r e p o r t ) w e s e e t ha t a l t ho ug h t he a c c u r a c y o f ne u t r a l c l a ss

f a l l s f r o m 8 2 . 1 % t o 7 3 % i f w e us e o u r c l a s s i f i c a t i o n i ns t ead

o f t he i r s . H o w e ve r , f o r a l l o t he r c l a s s e s w e r e p o rt

s i g n i f i c a n t l y g r e a t e r r e s u l t s . A l t ho ug h t he r e s u l t s p r e s e n ted

b y W i l s o n e t a l . a r e no t f r o m Tw i t t e r d a t a t he y a r e o f p h r a se

l e ve l s e n t i m e n t a na l ys i s w h i c h i s ve r y c l o s e i n c o nc e p t t o

t w i t t e r s e n t i m e n t a na l ys i s . N e x t w e w i l l c o m p a r e o u r r e s u l t s

Page 45: sentiment analysis report

37

w i t h t ho s e p r e s e n t e d b y G o e t a l . [ 2 ] . The r e s u l t s p r e s e n ted

b y t h i s p a p e r a r e a s f o l l o w s :

I f w e c o m p a r e t he s e r e s u l t s t o o u r s , w e s e e t ha t t he y a re

m o r e o r l e s s s i m i l a r . H o w e ve r , w e a r r i ve a t c o m p a r ab le

r e s u l t s w i t h j us t 1 0 f e a t u r e s a nd a b o u t 9 , 0 0 0 t r a i n i ng d a t a .

In c o n t r a s t t o t h i s , t he y us e d a b o u t 1 . 6 m i l l i o n no i s y l a b e l s .

The i r l a b e l s w e r e no i s y i n t he s e ns e t ha t t he t w e e t s t ha t

c o n t a i ne d p o s i t i ve e m o t i c o ns w e r e l a b e l l e d a s p o s i t i ve , w h i l e

t ho s e w i t h ne g a t i ve e m o t i c o ns w e r e l a b e l l e d ne g a t i ve . The

r e s t o f t he t w e e t s ( w h i c h d i d no t c o n t a i n a ny e m o t i c o n ) w e re

d i s c a r d e d f r o m t he d a t a s e t . S o i n t h i s w a y t he y ho p e d t o

a c h i e ve h i g h r e s u l t s w i t ho u t hum a n l a b e l l i ng b u t a t t he c o s t

o f us i ng hum o ng o us l a r g e num b e r a m o un t o f d a t a s e t .

N e x t w e w i l l p r e s e n t o u r r e s u l t s f o r t he c o m p l e te

c l a s s i f i c a t i o n . W e no t e t ha t t he b e s t r e s u l t s a r e r e a c hed

t h r o ug h S up p o r t V e c t o r M a c h i ne b e i ng a p p l i e d a t t he s e c o nd

s t a g e o f t he c l a s s i f i c a t i o n p r o c e s s . H e nc e t he r e s u l t s b e l ow

w i l l o n l y p e r t a i n t o t ho s e o f S V M . The s e r e s u l t s us e a t o t a l

o f t w o f e a t u r e s : P ( o b j e c t i v i t y | t w e e t ) a nd P ( p o s i t i v i t y |

t w e e t ) . B u t i f w e i nc l ud e a l l t he f e a t u r e s e m p l o ye d i n s t e p 1

o f t he c l a s s i f i c a t i o n p r o c e s s , w e ha ve a l i s t o f 8 s ho r t l i s ted

f e a t u r e s ( 3 f o r p o l a r i t y c l a s s i f i c a t i on a nd 5 f o r o b j e c t i v i ty

c l a s s i f i c a t i o n ) . The f o l l o w i ng r e s u l t s a r e r e p o r t e d a f t e r

c o nd uc t i ng 1 0 - f o l d c r o s s va l i d a t i o n :

In c o m p a r i s o n w i t h t he s e r e s u l t s , K o u l o m p i s e t a l . [ 7 ] r e p o r ts

a ve r a g e F - m e a s ur e o f 6 8 % . H o w e ve r w he n t he y i nc l ud e

a no t he r p o r t i o n o f t he i r d a t a i n t o t he i r c l a s s i f i c a t i o n p r o c ess

( w h i c h t he y c a l l t he H A S H d a t a ) , t he i r a ve r a g e F - m e a s ure

d r o p s t o 6 5 % . In c o n t r a s t t o t h i s w e a c h i e ve a ve r a g e F -

m e a s ur e o f m o r e t ha n 7 0 % w h i c h s ho w s b e t t e r p e r f o r m a nce

t ha n e i t he r o f t he s e r e s u l t s . M o r e o ve r w e m a k e us e o f o n l y 8

Page 46: sentiment analysis report

38

f e a t u r e s a nd 9 , 0 0 0 l a b e l l e d t w e e t s , w h i l e t he i r p r o c ess

i nvo l ve s a b o u t 1 5 f e a t u r e s i n t o t a l a nd m o r e t ha n 2 2 0 , 000

t w e e t s i n t he i r t r a i n i ng s e t . O ur un i g r a m w o r d m o d e l s a r e a l so

s i m p l e r t ha n t he i r s , b e c a us e t he y i nc o r p o r a t e ne g a t i o n i n t o

t he i r w o r d m o d e l s . H o w e ve r l i k e i n t he c a s e o f ( 1 - 9 ) t he i r

t w e e t s a r e no t l a b e l l e d b y hum a ns , b u t r a t he r und e r g o no i s y

l a b e l l i ng i n t w o w a ys : l a b e l s a c q u i r e d f r o m p o s i t i ve a nd

ne g a t i ve e m o t i c o ns a nd ha s h t a g s .

F i na l l y w e c o nc l ud e t ha t o u r c l a s s i f i c a t i o n a p p r o a c h p r o v i des

i m p r o ve m e n t i n a c c u r a c y b y us i ng e ve n t he s i m p l e s t f e a t u r es

a nd s m a l l a m o un t o f d a t a s e t . H o w e ve r t he r e a r e s t i l l a

num b e r o f t h i ng s w e w o u l d l i k e t o c o ns i d e r a s f u t u r e w o r k

w h i c h w e m e n t i o n i n t he ne x t s e c t i o n .

Page 47: sentiment analysis report

39

Chapter 6: CONCLUSION AND FUTURE

RECOMMENDATIONS

The t a s k o f s e n t i m e n t a na l ys i s , e s p e c i a l l y i n t he d o m a i n o f

m i c r o - b l o g g i ng , i s s t i l l i n t he d e ve l o p i ng s t a g e a nd f a r f r o m

c o m p l e t e . S o w e p r o p o s e a c o up l e o f i d e a s w h i c h w e f e e l a re

w o r t h e xp l o r i ng i n t he f u t u r e a nd m a y r e s u l t i n f u r t he r

i m p r o ve d p e r f o r m a nc e . R i g h t no w w e ha ve w o r k e d w i t h o n l y

t he ve r y s i m p l e s t un i g r a m m o d e l s ; w e c a n i m p r o ve t ho s e

m o d e l s b y a d d i ng e x t r a i n f o r m a t i o n l i k e c l o s e ne s s o f t he w o rd

w i t h a ne g a t i o n w o r d . W e c o u l d s p e c i f y a w i nd o w p r i o r t o t he

w o r d ( a w i nd o w c o u l d f o r e xa m p l e b e o f 2 o r 3 w o r d s ) und e r

c o ns i d e r a t i on a nd t he e f f e c t o f ne g a t i o n m a y b e i nc o r p o r a ted

i n t o t he m o d e l i f i t l i e s w i t h i n t ha t w i nd o w . The c l o s e r t he

ne g a t i o n w o r d i s t o t he un i g r a m w o r d w ho s e p r i o r p o l a r i t y i s

t o b e c a l c u l a t e d , t he m o r e i t s ho u l d a f f e c t t he p o l a r i t y . F o r

e xa m p l e i f t he ne g a t i o n i s r i g h t ne x t t o t he w o r d , i t m a y

s i m p l y r e ve r s e t he p o l a r i t y o f t ha t w o r d a nd f a r t he r t he

ne g a t i o n i s f r o m t he w o r d t he m o r e m i n i m i ze d i f s e f f e c t

s ho u l d b e .

A p a r t f r o m t h i s , w e a r e c u r r e n t l y o n l y f o c us i ng o n un i g r a ms

a nd t he e f f e c t o f b i g r a m s a nd t r i g r a m s m a y b e e xp l o r e d . A s

r e p o r t e d i n t he l i t e r a t u r e r e v i e w s e c t i o n w he n b i g r a m s a re

us e d a l o ng w i t h un i g r a m s t h i s us ua l l y e nha nc e s p e r f o r m a nc e .

H o w e ve r f o r b i g r a m s a nd t r i g r a m s t o b e a n e f f e c t i ve f e a t u r e

w e ne e d a m uc h m o r e l a b e l e d d a t a s e t t ha n o u r m e a g e r 9 , 0 00

t w e e t s . R i g h t no w w e a r e e xp l o r i ng P a r t s o f S p e e c h s e p a r a te

f r o m t he un i g r a m m o d e l s , w e c a n t r y t o i nc o r p o r a t e P OS

i n f o r m a t i o n w i t h i n o u r un i g r a m m o d e l s i n f u t u r e . S o s a y

i ns t e a d o f c a l c u l a t i ng a s i ng l e p r o b a b i l i t y f o r e a c h w o r d l i ke

P ( wo r d | o b j ) w e c o u l d i ns t e a d ha ve m u l t i p l e p r o b a b i l i t i e s f o r

e a c h a c c o r d i ng t o t he P a r t o f S p e e c h t he w o r d b e l o ng s t o .

Page 48: sentiment analysis report

40

F o r e xa m p l e w e m a y ha ve P ( wo r d | o b j , v e r b ) , P ( wo r d | o b j ,

n o u n ) a n d P ( wo r d | o b j , a d j e c t i v e ) . P a ng e t a l . [ 5 ] us e d a

s o m e w ha t s i m i l a r a p p r o a c h a nd c l a i m s t ha t a p p e nd i ng P OS

i n f o r m a t i o n f o r e ve r y un i g r a m r e s u l t s i n no s i g n i f i c a n t c ha nge

i n p e r f o r m a nc e ( w i t h N a i ve B a ye s p e r f o r m i ng s l i g h t l y b e t t e r

a nd S V M ha v i ng a s l i g h t d e c r e a s e i n p e r f o r m a nc e ) , w h i l e

t he r e i s a s i g n i f i c a n t d e c r e a s e i n a c c u r a c y i f o n l y a d j e c t i ve

un i g r a m s a r e us e d a s f e a t u r e s . H o w e ve r t he s e r e s u l t s a r e f o r

c l a s s i f i c a t i o n o f r e v i e w s a nd m a y b e ve r i f i e d f o r s e n t i m e n t

a na l ys i s o n m i c r o b l o g g i ng w e b s i t e s l i k e Tw i t t e r .

O ne m o r e f e a t u r e w e t ha t i s w o r t h e xp l o r i ng i s w he t he r t he

i n f o r m a t i o n a b o u t r e l a t i ve p o s i t i o n o f w o r d i n a t w e e t ha s a ny

e f f e c t o n t he p e r f o r m a nc e o f t he c l a s s i f i e r . A l t ho ug h P a ng e t

a l . e xp l o r e d a s i m i l a r f e a t u r e a nd r e p o r t e d ne g a t i ve r e s u l t s ,

t he i r r e s u l t s w e r e b a s e d o n r e v i e w s w h i c h a r e ve r y d i f f e r e nt

f r o m t w e e t s a nd t he y w o r k e d o n a n e x t r e m e l y s i m p l e m o d e l .

O ne p o t e n t i a l p r o b l e m w i t h o u r r e s e a r c h i s t ha t t he s i ze s o f

t he t h r e e c l a s s e s a r e no t e q ua l . The o b j e c t i ve c l a s s w h i c h

c o n t a i ns 4 , 5 4 3 t w e e t s i s a b o u t t w i c e t he s i ze s o f p o s i t i ve a nd

ne g a t i ve c l a s s e s w h i c h c o n t a i n 2 , 5 4 3 a nd 1 , 8 7 7 t w e e ts

r e s p e c t i ve l y . The p r o b l e m w i t h une q ua l c l a s s e s i s t ha t t he

c l a s s i f i e r t r i e s t o i nc r e a s e t he o ve r a l l a c c u r a c y o f t he s ys t e m

b y i nc r e a s i ng t he a c c u r a c y o f t he m a j o r i t y c l a s s , e ve n i f t ha t

c o m e s a t t he c o s t o f d e c r e a s e i n a c c u r a c y o f t he m i no r i ty

c l a s s e s . Tha t i s t he ve r y r e a s o n w hy w e r e p o r t s i g n i f i c a n t ly

h i g he r a c c u r a c i e s f o r o b j e c t i ve c l a s s a s o p p o s e d t o p o s i t i ve

o r ne g a t i ve c l a s s e s . To o ve r c o m e t h i s p r o b l e m a nd ha ve t he

c l a s s i f i e r e xh i b i t no b i a s t o w a r d s a ny o f t he c l a s s e s , i t i s

ne c e s s a r y t o l a b e l m o r e d a t a ( t w e e t s ) s o t ha t a l l t h r e e o f o u r

c l a s s e s a r e a l m o s t e q ua l .

Page 49: sentiment analysis report

41

In t h i s r e s e a r c h w e a r e f o c us i ng o n g e ne r a l s e n t i m e n t

a na l ys i s . The r e i s p o t e n t i a l o f w o r k i n t he f i e l d o f s e n t i m e n t

a na l ys i s w i t h p a r t i a l l y k no w n c o n t e x t . F o r e xa m p l e w e no t i ced

t ha t us e r s g e ne r a l l y us e o u r w e b s i t e f o r s p e c i f i c t yp e s o f

k e yw o r d s w h i c h c a n d i v i d e d i n t o a c o up l e o f d i s t i nc t c l a s s es ,

na m e l y : p o l i t i c s / p o l i t i c i ans , c e l e b r i t i es , p r o d uc t s / b r a nds ,

s p o r t s / s p o r t s m e n , a nd m e d i a / m o v i e s / m us i c . S o w e c a n

a t t e m p t t o p e r f o r m s e p a r a t e s e n t i m e n t a na l ys i s o n t w e e ts

t ha t o n l y b e l o ng t o o ne o f t he s e c l a s s e s ( i . e . t he t r a i n i ng d a ta

w o u l d no t b e g e ne r a l b u t s p e c i f i c t o o ne o f t he s e c a t e g o r i es )

a nd c o m p a r e t he r e s u l t s w e g e t i f w e a p p l y g e ne r a l s e n t i m e n t

a na l ys i s o n i t i ns t e a d .

L a s t b u t no t t he l e a s t , w e c a n a t t e m p t t o m o d e l hum a n

c o n f i d e nc e i n o u r s ys t e m . F o r e xa m p l e i f w e ha ve 5 hum a n

l a b e l e r s l a b e l l i ng e a c h t w e e t , w e c a n p l o t t he t w e e t i n t he 2 -

d i m e ns i o na l o b j e c t i v i t y / s ub j e c t i v i t y a nd p o s i t i v i t y /

ne g a t i v i t y p l a ne w h i l e d i f f e r e n t i a t i ng b e t w e e n t w e e t s i n w h i c h

a l l 5 l a b e l s a g r e e , o n l y 4 a g r e e , o n l y 3 a g r e e o r no m a j o r i ty

vo t e i s r e a c he d . W e c o u l d d e ve l o p o u r c us t o m c o s t f unc t i o n

f o r c o m i ng up w i t h o p t i m i ze d c l a s s b o und a r i e s s uc h t ha t

h i g he s t w e i g h t a g e i s g i ve n t o t ho s e t w e e t s i n w h i c h a l l 5

l a b e l s a g r e e a nd a s t he num b e r o f a g r e e m e n t s s t a r t

d e c r e a s i ng , s o d o t he w e i g h t s a s s i g ne d . In t h i s w a y t he

e f f e c t s o f hum a n c o n f i d e nc e c a n b e v i s ua l i ze d i n s e n t i m e n t

a na l ys i s .

Page 50: sentiment analysis report

42

Chapter 7: REFERENCES

[ 1 ] A l b e r t B i f f e t a nd E i b e F r a nk . S e n t i m e n t K no w l e dge

D i s c o ve r y i n Tw i t t e r S t r e a m i ng D a t a . D i s c o v e r y S c i e nce ,

L e c t u r e N o t e s i n C o m p u t e r S c i e n c e , 2 0 1 0 , V o l ume

6 3 3 2 / 2 0 1 0 , 1 - 1 5 , D O I: 1 0 . 1 0 0 7 / 9 7 8 - 3 - 6 4 2 - 1 6 1 8 4 - 1 _ 1

[ 2 ] A l e c G o , R i c ha B ha ya n i a nd L e i H ua ng . Tw i t t e r S e n t i m e nt

C l a s s i f i c a t i on us i ng D i s t a n t S up e r v i s i o n . P r o j e c t T e c h n i ca l

R e p o r t , S t a n f o r d U n i v e r s i t y , 2 0 0 9 .

[ 3 ] A l e xa nd e r P a k a nd P a t r i c k P a r o ub e k . Tw i t t e r a s a C o r p us

f o r S e n t i m e n t A na l ys i s a nd O p i n i o n M i n i ng . I n P r o c e e d i n g s o f

i n t e r n a t i o na l c o n f e r e n c e o n L a n g u a g e R e s o u r c e s a nd

E v a l u a t i o n ( L R E C ) , 2 0 1 0 .

[ 4 ] A nd r a n i k Tum a s j a n , T i m m O . S p r e ng e r , P h i l i p p G .

S a nd ne r a nd Is a b e l l M . W e l p e . P r e d i c t i ng E l e c t i o ns w i t h

Tw i t t e r : W ha t 1 4 0 C ha r a c t e r s R e ve a l a b o u t P o l i t i ca l

S e n t i m e n t . I n P r o c e e d i n gs o f A A A I C o n f e r e n c e o n W e b l ogs

a n d S o c i a l M e d i a ( I C W S M ) , 2 0 1 0 .

[ 5 ] B o P a ng , L i l l i a n L e e a nd S h i va k um a r V a i t hya na t ha n .

Thum b s up ? S e n t i m e n t C l a s s i f i c a t i on us i ng M a c h i ne L e a r n i ng

Te c hn i q ue s . I n P r o c e e d i n gs o f t h e C o n f e r e n c e o n E m p i r i ca l

M e t h o d s i n N a t u r a l L a n g u a g e P r o c e s s i n g ( E M N L P ) , 2 0 0 2 .

[ 6 ] C he nha o Ta n , L i l i a n L e e , J i e Ta ng , L o ng J i a ng , M i ng Zho u

a nd P i ng L i . U s e r L e ve l S e n t i m e n t A na l ys i s Inc o r p o r a t i ng

S o c i a l N e t w o r k s . I n P r o c e e d i ng s o f A C M S p e c i a l I n t e r e s t

G r o u p o n K n o wl e d g e D i s c o v e r y a n d D a t a M i n i n g ( S I G K D D ) ,

2 0 1 1 .

[ 7 ] E f t hym i o s K o u l o um p i s , The r e s a W i l s o n a nd J o ha nna

M o o r e . Tw i t t e r S e n t i m e n t A na l ys i s : The G o o d t he B a d a nd t he

Page 51: sentiment analysis report

43

O M G ! I n P r o c e e d i ng s o f A A A I C o n f e r e n c e o n W e b l o g s a nd

S o c i a l M e d i a ( I C W S M ) , 2 0 1 1 .

[ 8 ] H a t z i va s s i l o g l o u , V . , & M c K e o w n, K . R . . P r e d i c t i ng t he

s e m a n t i c o r i e n t a t i o n o f a d j e c t i ve s . In P r o c e e d i n g s o f t h e 3 5 th

A n n u a l M e e t i n g o f t h e A C L a n d t h e 8 t h C o n f e r e n c e o f t he

E u r o p e a n C h a p t e r o f t h e A C L , 2 0 0 9 .

[ 9 ] J o ha nn B o l l e n , A l b e r t o P e p e a nd H u i na M a o . M o d e l l i ng

P ub l i c M o o d a nd E m o t i o n : Tw i t t e r S e n t i m e n t a nd s o c i o -

e c o no m i c p he no m e na . I n P r o c e e d i n gs o f A A A I C o n f e r e n ce on

W e b l o g s a n d S o c i a l M e d i a ( I C W S M ) , 2 0 1 1 .

[ 1 0 ] L uc i a no B a r b o s a a nd J un l a n F e ng . R o b us t S e n t i m e nt

D e t e c t i o n o n Tw i t t e r f r o m B i a s e d a nd N o i s y D a t a . I n

P r o c e e d i ng s o f t h e i n t e r n a t i on a l c o n f e r e n c e on

C o m p u t a t i o n a l L i n g u i s t i c s ( C O L I N G ) , 2 0 1 0 .

[ 1 1 ] P e t e r D . Tu r ne y . Thum b s U p o r Thum b s D o w n? S e m a n ti c

O r i e n t a t i o n A p p l i e d t o U ns up e r v i s e d C l a s s i f i c a t i on o f

R e v i e w s . I n P r o c e e d i n gs o f t h e A n n u a l M e e t i n g o f t he

A s s o c i a t i o n o f C o m p u t a t i o n a l L i n g u i s t i c s ( A C L ) , 2 0 0 2 .

[ 1 2 ] R ud y P r a b o w o a nd M i k e The l w a l l . S e n t i m e n t A na l ys i s : A

C o m b i ne d A p p r o a c h . J o u r na l o f In f o m e t r i c s , V o l um e 3 , Is s ue

2 , A p r i l 2 0 0 9 , P a g e s 1 4 3 - 1 5 7 , 2 0 0 9 .

[ 1 3 ] S a m ue l B r o d y a nd N i c ho l a s D i a k o p o u l us .

C o o o o o o oo oo o oo oo l l l l l l l l l l l l l l ! ! ! ! ! ! ! ! ! ! ! ! ! ! U s i ng W o rd

L e ng t he n i ng t o D e t e c t S e n t i m e n t i n M i c r o b l o g s . I n

P r o c e e d i ng s o f E m p i r i c a l M e t h o d s o n N a t u r a l L a n g uage

P r o c e s s i n g ( E M N L P ) , 2 0 1 1 .

[ 1 4 ] S o o - M i n K i m a nd E d ua r d H o vy . D e t e r m i n i ng t he

S e n t i m e n t o f O p i n i o ns . I n P r o c e e d i n gs o f I n t e r n a t i ona l

C o n f e r e n c e o n C o m p u t a t i o n a l L i n g u i s t i c s ( I C C L ) , 2 0 0 4 .

Page 52: sentiment analysis report

44

[ 1 5 ] S t e f a no B a c c i a ne l l a , A nd r e a E s u l i , F a b r i z i o S e b a s t i ani .

S E N T IW O R D N E T 3 . 0 : A n E nha nc e d L e x i c a l R e s o ur c e f o r

S e n t i m e n t A na l ys i s a nd O p i n i o n M i n i ng . I n P r o c e e d i ng s o f

i n t e r n a t i o na l c o n f e r e n c e o n L a n g u a g e R e s o u r c e s a nd

E v a l u a t i o n ( L R E C ) , 2 0 1 0 .

[ 1 6 ] The r e s a W i l s o n , J a nyc e W i e b e a nd P a u l H o f f m a nn .

R e c o g n i z i ng C o n t e x t ua l P o l a r i t y i n P h r a s e - L e ve l S e n t i m e nt

A na l ys i s . I n t h e A n n u a l M e e t i n g o f A s s o c i a t i o n o f

C o m p u t a t i o na l L i n g u i s t i cs : H u m a n L a n g u a g e T e c h n o l og i es

( A C L - H L T ) , 2 0 0 5 .

[ 1 7 ] Ia n H . W i t t e n , E i b e F r a nk & M a r k A . H a l l . D a t a M i n i ng –

P r a c t i c a l M a c h i ne L e a r n i ng To o l s a nd Te c hn i q ue s .

[ 1 9 ] R i c g a r d O . D ud a , P e t e r E . H a r t & D a v i d G . S t o r k : P a t t e rn

C l a s s i f i c a t i o n .

[ 1 9 ] S t e ve n B i r d , E ve n K l e i n & E d w a r d L o p e r . N a t u r a l

L a ng ua g e P r o c e s s i ng w i t h P y t ho n .

[ 2 0 ] B e n P a r r . Tw i t t e r H a s 1 0 0 M i l l i o n M o n t h l y A c t i ve U s e r s ;

5 0 % L o g In E ve r y d a y .

< h t t p : / / m a s ha b l e . c o m / 2 0 1 1 / 1 0 / 1 7 / t w i t t e r - c o s t o l o - s t a t s / >

[ 2 1 ] G o o g l e A p p E ng i ne

< h t t p s : / / d e ve l o p e r s . g o o g l e . c o m / a p p e ng i ne / >

[ 2 2 ] G o o g l e C ha r t A P I

< h t t p s : / / d e ve l o p e r s . g o o g l e . c o m / c ha r t / >

[ 2 3 ] J i n j a 2 : Te m p l a t i ng L a ng ua g e f o r P y t ho n

< h t t p : / / j i n j a . p o c o o . o r g / >

[ 2 4 ] M a g g i e S h i e l d s , T e c h n o l o gy R e p o r t e r , B B C N e ws .

Tw i t t e r c o - f o und e r J a c k D o r s e y r e j o i ns c o m p a ny .

< h t t p : / / w w w . b b c . c o . uk / ne w s / b us i ne s s - 1 2 8 8 9 0 4 8 > .

Page 53: sentiment analysis report

45

[ 2 5 ] M u l t i P e r s p e c t i ve Q ue s t i o n A ns w e r i ng ( M P Q A ) O n l i ne

L e x i c o n < h t t p : / / w w w . c s . p i t t . e d u / m p q a / s ub j _ l e x i c o n . h t m l >

[ 2 6 ] Tw e e t S t r e a m : S i m p l e Tw i t t e r S t r e a m i ng A P I A c c ess

< h t t p : / / p yp i . p y t ho n . o r g / p yp i / t w e e t s t r e a m >

[ 2 7 ] Tw i t t e r R E S T A P I h t t p s : / / d e v . t w i t t e r . c o m / d o c s / a p i

[ 2 8 ] Tw i t t e r S e n t i m e n t , a n o n l i n e a p p l i c a t i o n p e r fo rm i ng

s e n t i m e n t c l a s s i f i c a t i o n o f T wi t t e r .

< h t t p : / / t w i t t e r s e n t i m e n t . a p p s p o t . c o m / >


Top Related