sentiment analysis report

Download sentiment analysis report

Post on 16-Apr-2017

62 views

Category:

Engineering

0 download

Embed Size (px)

TRANSCRIPT

  • i

    Project Report

    On

    Sentiment Analysis Tool

    Submitted as partial fulfillment for the award of

    BACHELOR OF TECHNOLOGY

    DEGREE Session 2015-16

    In

    Information Technology

    By RAVINDRA CHAUDHARY (1203213037)

    SACHIN SINGH (1203213039)

    Under the guidance of

    Ms. SMITA TIWARI

    ABES ENGINEERING COLLEGE, GHAZIABAD

    1.

    AFFILIATED TO

    Dr. A.P.J. ABDUL KALAM TECHNICAL UNIVERSITY, LUCKNOW, UTTAR PRADESH (Formerly UPTU)

    http://www.google.co.in/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0CBwQFjAAahUKEwiB9p2z9ZjJAhWJRhQKHaoAAss&url=http%3A%2F%2Fwww.uptu.ac.in%2F&usg=AFQjCNFBb2F6JIHEBQYIF5_s3eNMNAMzwg&bvm=bv.107467506,bs.1,d.bGghttp://www.google.co.in/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0CBwQFjAAahUKEwiB9p2z9ZjJAhWJRhQKHaoAAss&url=http%3A%2F%2Fwww.uptu.ac.in%2F&usg=AFQjCNFBb2F6JIHEBQYIF5_s3eNMNAMzwg&bvm=bv.107467506,bs.1,d.bGghttp://117.55.241.6/library/http://www.google.co.in/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=0CAcQjRw&url=http://www.btechallsolutions.in/2015/03/uptu-odd-semester-result-2014-2015.html&ei=VYxyVd_CN4ahugS8qIHQDw&bvm=bv.95039771,d.c2E&psig=AFQjCNGd-6PcxCkg6TtO-5hEz36L3Qg_4A&ust=1433656802201475

  • Students Declaration

    W e he reby dec lare that the work be ing p resented in this

    report enti t led Sentiment Analysis Tool is an authentic reco rd of

    our own work carried out under the Supervis ion o f Ms.

    SMITA TIWARI.

    S ignature of s tudents

    Rav indra Chaudhary

    DATE: Sachin S ingh

    . In formation technology

    This is to ce rt i f y that the above s tatement made by the

    cand idates is co rrec t to the bes t o f my knowledge .

    Signature of HOD S ignature of Supervisor

    (Dr . P .C. Vashis t) (M s . Smita T iwar i)

    In formation T echnology Assoc ia te Professor

    Date ... . . . . . . . . . . . . . In formation T echnology

    ii

  • ACKNOWLEDGEM ENT

    I t g ives us a g reat sense o f p leasure to p resent the report of

    the B . Tech. Pro jec t undertaken during B . Tech, Fourth Year.

    W e owe spec ial deb t o f g rati tude to Pro f essor M s. SMITA

    T IWARI and Department o f I nf o rmation Techno logy, ABES

    Eng ineering Co l leg e , Ghaziabad f o r he r cons tant support and

    guidance throug hout the course o f our work. Her s ince ri ty,

    tho roughness and pe rseverance have been a cons tant source

    o f insp irat ion f o r us . I t is only he r cognizant e f f o rts that our

    endeavors have seen l ight o f the day. W e also take the

    opportunity to acknowledge the contribution o f Pro f essor Dr.

    P .C. Vashis th Head , Department o f I nf o rmation Techno logy,

    ABES Eng ineering Co l lege , Ghaziabad f o r he r f ul l suppo rt and

    ass is tance during the deve lopment o f the p ro jec t. W e also do

    no t l ike to miss the opportunity to acknowledge the

    contribution o f al l f aculty members o f the department f o r the ir

    kind ass is tance and cooperation during the deve lopment of

    our p ro jec t. L as t but no t the leas t, we acknowledge our f r iends

    f o r the ir contribution in the comp le tion o f the p ro jec t.

    RAVINDRA CHAUDHARY

    SACHIN SINGH

    i i i

  • TABLE OF CONTENTS

    Inner Title Page i

    Declaration ii

    Acknowledgment iii

    Abstract iv

    1. Introduction 1-5

    1.1. Motivation 1

    1.2 Domain introduction 2-5

    2. Objective 6

    3. Methodology 7-8

    3.1 Method of Sentiment Analysis 7

    3.1.1. Data Acquisition 7

    3.1.2. Tokenizer 7

    3.1.3. Pre Processing 7

    3.1.4. Feature Extraction 7

    3.1.5. Classification and Prediction 8

    4. Detail of project report work 9-

    4.1. Data acquisition 9-11

    4.2. Human Labelling 12-14

    4.3. Feature Extraction 15-25

    4.4. Classification 26-28

  • 4.5. Tweet Mode Web Application 28-30

    4.5.1. Tweet score 30

    4.5.2. Tweet Compare 30

    4.5.3. Tweet stats 30

    5. Result Discussion 36-38

    6. Conclusion and future Recommendation 39-41

    7. References 42-45

  • LIST OF TABLES

    Table 1: A typical 2x2 confusion matrix...4

  • LIST OF FIGURES

  • ABSTRACT

    Th i s p r o j e c t a d d r e s s e s t he p r o b l e m o f s e n t i m e n t a na l ys i s i n

    t w i t t e r t ha t i s c l a s s i f y i ng t w e e t s a c c o r d i ng t o t he s e n t i m e nt

    e xp r e s s e d i n t he m : p o s i t i ve , ne g a t i ve . Tw i t t e r i s a n o n l i ne m i c r o -

    b l o g g i ng a nd s o c i a l - ne t w o r k i ng p l a t f o r m w h i c h a l l o w s us e r s to

    w r i t e s ho r t s t a t us up d a t e s o f m a x i m um l e ng t h 1 4 0 c ha r a c t e r s . I t

    i s a r a p i d l y e xp a nd i ng s e r v i c e w i t h o ve r 2 0 0 m i l l i o n r e g i s t e red

    us e r s [ 2 4 ] o u t o f w h i c h 1 0 0 m i l l i o n a r e a c t i ve us e r s a nd ha l f o f

    t he m l o g o n t w i t t e r o n a d a i l y b a s i s g e ne r a t i ng ne a r l y 2 5 0 m i l l i on

    t w e e t s p e r d a y [ 2 0 ] . D ue t o t h i s l a r g e a m o un t o f us a g e w e ho pe

    t o a c h i e ve a r e f l e c t i o n o f p ub l i c s e n t i m e n t b y a na l yz i ng t he

    s e n t i m e n t s e xp r e s s e d i n t he t w e e t s . A na l yz i ng t he p ub l i c

    s e n t i m e n t i s i m p o r t a n t f o r m a ny a p p l i c a t i ons s uc h a s f i r m s t r y i ng

    t o f i nd o u t t he r e s p o ns e o f t he i r p r o d uc t s i n t he m a r k e t , p r e d i c t i ng

    p o l i t i c a l e l e c t i o ns a nd p r e d i c t i ng s o c i o e c o no m i c p he no m e na l i ke

    s t o c k e xc ha ng e . The a i m o f t h i s p r o j e c t i s t o d e ve l o p a f unc t i o na l

    c l a s s i f i e r f o r a c c u r a t e a nd a u t o m a t i c s e n t i m e n t c l a s s i f i c a t i o n o f

    a n unk no w n t w e e t s t r e a m .

    iv

  • 1

    Ch a p t e r 1 : INT RODUCT ION

    1 . 1 M o t i v a t i o n

    W e ha ve c ho s e n t o w o r k w i t h t w i t t e r s i nc e w e f e e l i t i s a

    b e t t e r a p p r o x i m a t i on o f p ub l i c s e n t i m e n t a s o p p o s e d t o

    c o nve n t i o na l i n t e r ne t a r t i c l e s a nd w e b b l o g s . The r e a s o n i s

    t ha t t he a m o un t o f r e l e va n t d a t a i s m uc h l a r g e r f o r t w i t t e r , as

    c o m p a r e d t o t r a d i t i o na l b l o g g i ng s i t e s . M o r e o ve r t he

    r e s p o ns e o n t w i t t e r i s m o r e p r o m p t a nd a l s o m o r e g e ne r a l

    ( s i nc e t he num b e r o f us e r s w ho t w e e t i s s ub s t a n t i a l l y m o re

    t ha n t ho s e w ho w r i t e w e b b l o g s o n a d a i l y b a s i s ) . S e n t i m e nt

    a na l ys i s o f p ub l i c i s h i g h l y c r i t i c a l i n m a c r o - s c a le

    s o c i o e c o no m i c p he no m e na l i k e p r e d i c t i ng t he s t o c k m a r k e t

    r a t e o f a p a r t i c u l a r f i r m . Th i s c o u l d b e d o ne b y a na l yz i ng

    o ve r a l l p ub l i c s e n t i m e n t t o w a r d s t ha t f i r m w i t h r e s p e c t t o t i me

    a nd us i ng e c o no m i c s t o o l s f o r f i nd i ng t he c o r r e l a t i o n b e t w e en