Transcript
Page 1: Introduction to Sentiment Analysis

SENTIMENT ANALYSIS

A Seminar Report Submitted

in Partial Fulfillment of the Requirements

for the Degree of

Bachelor of Engineering

in

Computer Engineering

Submitted by

Patil Makrand Anil

DEPARTMENT OF COMPUTER ENGINEERING

SSVPS’s B. S. DEORE COLLEGE OF ENGINEERING, DHULE2013 - 2014

Page 2: Introduction to Sentiment Analysis

SENTIMENT ANALYSIS

A Seminar Report Submitted

in Partial Fulfillment of the Requirements

for the Degree of

Bachelor of Engineering

in

Computer Engineering

Submitted by

Patil Makrand Anil

Guided by

Ms. A. A. Chavan

DEPARTMENT OF COMPUTER ENGINEERING

SSVPS’s B. S. DEORE COLLEGE OF ENGINEERING, DHULE2013 - 2014

Page 3: Introduction to Sentiment Analysis

SSVPS’s B. S. DEORE COLLEGE OF ENGINEERING, DHULE

DEPARTMENT OF COMPUTER ENGINEERING

CERTIFICATE

This is to certify that the Seminar entitled Sentiment Analysis has been carried out

by

Patil Makrand Anil

under my guidance in partial fulfillment of the degree of Bachelor of Engineering in

Computer Engineering of North Maharashtra University, Jalgaon during the academic

year 2013 - 2014. To the best of my knowledge and belief this work has not been

submitted elsewhere for the award of any other degree.

Date:

Place: Dhule

Guide

Ms. A. A. Chavan

Head Principal

Prof. B. R. Mandre Dr. Hitendra D. Patil

iii

Page 4: Introduction to Sentiment Analysis

Acknowledgement

The completion of the report on “Sentiment Analysis”has given me profound knowledge. I

am sincerely thankful to Prof B. R. Mandre and my guide Ms. A. A. Chavan who have co-

operated and guided me at different stages during the preparation of this report. My sincere

thanks to the staff of “Computer Engineering Department”, without the help of them I could

not have even conceived the accomplishment of this report. This work is virtually the result

of their inspiration and guidance.I would also like to thank the entire library staff and all

those who directly or indirectly were the part of this work.

Patil Makrand Anil

iv

Page 5: Introduction to Sentiment Analysis

Contents

Acknowledgement iv

Abstract 1

1 Introduction 2

1.1 What is Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Need of Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Summery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Literature Survey 4

3 Methodology 6

3.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.2 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.3 Summery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4 Implementation 9

4.1 Machine Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4.2 Natural Language Processing Approach . . . . . . . . . . . . . . . . . . . . . 10

4.3 Summery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5 Applications 12

5.1 Summery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

6 Advantages & Disadvantages 14

6.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6.2 Summery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

7 Conclusion 15

Bibliography 16

v

Page 6: Introduction to Sentiment Analysis

List of Figures

4.1 Implementation Architecture using Machine Learning Approach . . . . . . . 9

4.2 Implementation Architecture using NLP Approach . . . . . . . . . . . . . . 10

vi

Page 7: Introduction to Sentiment Analysis

Abstract

Our day-to-day life has always been influenced by what people think. Ideas and opinions of

others have always affected our own opinions. The explosion of Web 2.0 has led to increased

activity in Podcasting, Blogging, Tagging, Contributing to RSS, Social Bookmarking, and

Social Networking. As a result there has been an eruption of interest in people to mine

these vast resources of data for opinions. Sentiment Analysis or Opinion Mining is the

computational treatment of opinions, sentiments and subjectivity of text. In this report, we

discuss various approaches to perform a computational treatment of sentiments and opinions.

Various supervised or data-driven techniques to Sentiment Analysis like Naive Byes, Support

Vector Machine and SentiWordNet approach to Sentiment Analysis.

1

Page 8: Introduction to Sentiment Analysis

Chapter 1

Introduction

1.1 What is Sentiment Analysis

Sentiment Analysis is a Natural Language Processing and Information Extraction task that

aims to obtain writers feelings expressed in positive or negative comments, questions and re-

quests, by analyzing a large numbers of documents.For example: “I am so happy today,good

morning to everyone”, is a general positive text.Generally speaking, sentiment analysis aims

to determine the attitude of a speaker or a writer with respect to some topic or the overall

functonality of a document.Sentiment analysis is also known as opinion mining. Basically,

Sentiment Analysis is the task of identifying whether the opinion expressed in a text is Posi-

tive or Negative. Natural language processing (NLP) is a field of computer science, artificial

intelligence, and linguistics concerned with the interactions between computers and human

(natural) languages.

1.2 Need of Sentiment Analysis

According to a recent statistics by the Social Media tracking company Technorati, four out of

every five users of Internet use social media in some form. This includes friendship networks,

blogging and micro-blogging sites, content and video sharing sites etc. It is worth observing

that the World Wide Web has now completely transformed into a more participative and

co-creative Web. It allows a large number of users to contribute in a variety of forms. The

fact is that even those who are virtually novice to the technicalities of the Web publishing

are creating content on the Web. In fact the value of a Website is now determined largely

by its user base, which in turn decides the amount of data available on it. It may perhaps

be true to say that Data is the new Intel inside.[1]

One such interesting form of user contributions on the Web is reviews. Many sites on the

Web allow users to write their experiences or opinion about a product or service in form

2

Page 9: Introduction to Sentiment Analysis

CHAPTER 1. INTRODUCTION

of a review. The Web is now full of userreviews for different items ranging from mobile

phones, holiday trips, and hotel services to movie reviews etc. It is interesting to observe

that these reviews not only express opinions of a group of users but is also a valuable

source for harnessing collective intelligence. For example, a user looking for a hotel in a

particular tourist city may prefer to go through the reviews of available hotels in the city

before making a decision to book in one of them. Or a user willing to buy a particular model

of digital camera may first look at reviews posted by many other users about that camera

before making a buying decision. This not only helps in allowing the user to get more and

relevant information about different products and services on a mouse click, but also helps

in arriving at a more informed decision. Sometimes users prefer to write their experiences

about a product or service as form of a blog post rather than an explicit review. However,

in both case the data is basically textual. Popular sites like carwale.com, imdb.com are now

full of user reviews, in this case reviews of cars and movies respectively.[3]

Though these reviews and posts are beyond doubt very useful and valuable, but at the

same time it is also quite difficult for a new user (or a prospective customer) to read all the

reviews/ posts in a short span of time. Fortunately we have a solution to this information

overload problem which can present a comprehensive summary result out of a large number of

reviews. The new Information Retrieval formulations, popularly called sentiment classifiers,

now not only allow to automatically label a review as positive or negative, but to extract

and highlight positive and negative aspects of a product/ service. Sentiment analysis is

now an important part of Information Retrieval based formulations in a variety of domains.

It is traditionally used for automatic extraction of opinions types about a product and for

highlighting positive or negative aspects/ features of a product.

It is widely believed that Sentiment analysis is needed and useful. It is also widely accepted

that extracting sentiment from text is a hard semantic problem even for human beings. So

in general, Sentiment Analysis will be useful for extracting sentiments available on Blogging

sites, Social Network, Discussion Forum in order to benefit both company and customer/user.

1.3 Summery

What is Sentiment Analysis, what is the need of Sentiment Analysis and the basic introduc-

tion Sentiment Analysis has been covered in this chapter.

3

Page 10: Introduction to Sentiment Analysis

Chapter 2

Literature Survey

Balamurali et al. (2011) presents an innovative idea to introduce sense based sentiment

analysis. This implies shifting from lexeme feature space to semantic space i.e. from simple

words to their synsets. The works in Sentiment Analysis, for so long, concentrated on lexeme

feature space or identifying relations between words using parsing. The need for integrating

sense to Sentiment Analysis was the need of the hour due to the following scenarios, as

identified by the authors:

• A word may have some sentiment-bearing and some non-sentiment-bearing senses

• There may be different senses of a word that bear sentiment of opposite polarity

• The same sense can be manifested by different words (appearing in the same synset)

Using sense as features helps to exploit the idea of sense/concepts and the hierarchical

structure of the WordNet. The following feature representations were used by the authors

and their performance were compared to that of lexeme based features:

• A group of word senses that have been manually annotated (M)

• A group of word senses that have been annotated by an automatic WSD (I)

• A group of manually annotated word senses and words (both separately as features)

(Sense + Words(M))

• A group of automatically annotated word senses and words (both separately as fea-

tures) (Sense + Words(I))

Sense + Words(M) and Sense + Words(I) were used to overcome non-coverage of Word-

Net for some noun synsets. The authors used synset-replacement strategies to deal with

non-coverage, in case a synset in test document is not found in the training documents.

In that case the target unknown synset is replaced with its closest counterpart among the

WordNet synsets by using some metric.

4

Page 11: Introduction to Sentiment Analysis

CHAPTER 2. LITERATURE SURVEY

Supprt Vector Machines were used for classification of the feature vectors and IWSD was

used for automatic WSD. Extensive experiments were done to compare the performance

of the 4 feature representations with lexeme representation. Best performance, in terms of

accuracy, was obtained by using sense based SA with manual annotation (with an accuracy of

90.2 percent and an increase of 5.3 percent over the baseline accuracy) followed by Sense(M),

Sense + Words(I), Sense(I) and lexeme feature representation. LESK was found to perform

the best among the 3 metrics used in replacement strategies.

One of the reasons for improvements was attributed to feature abstraction and dimension-

ality reduction leading to noise reduction. The work achieved its target of bringing a new

dimension to Sentiment Analysis by introducing sense based Sentiment Analysis.

5

Page 12: Introduction to Sentiment Analysis

Chapter 3

Methodology

There are primarily two types of approaches for sentiment classification of opinionated

texts[1]:

1. using a Machine learning based text classifier such as Naive Bayes, Support Vector

Machine

2. using Natural Language Processing

3.1 Machine Learning

Machine learning, a branch of artificial intelligence, concerns the construction and study of

systems that can learn from data. For example, a machine learning system could be trained

on email messages to learn to distinguish between spam and non-spam messages. After

learning, it can then be used to classify new email messages into spam and non-spam folders.

Machine learning focuses on prediction, based on known properties learned from the training

data.

Classification is the problem of identifying to which of a set of categories (sub-populations)

a new observation belongs, on the basis of a training set of data containing observations (or

instances) whose category membership is known For example would be assigning a given

email into “spam” or “non-spam” classes

An algorithm that implements classification, especially in a concrete implementation, is

known as a classifier. The term classifier sometimes also refers to the mathematical function,

implemented by a classification algorithm, that maps input data to a category

By training it means to train them on particular inputs so that later on we may test

them for unknown inputs (which they have never seen before) for which they may classify or

predict etc based on their learning.Classifying data is a common task in machine learning.

Suppose some given data points each belong to one of two classes, and the goal is to decide

which class a new data point will be in.

6

Page 13: Introduction to Sentiment Analysis

CHAPTER 3. METHODOLOGY

The machine learning based text classifiers are a kind of supervised machine learning

paradigm, where the classifier needs to be trained on some labeled training data before it

can be applied to actual classification task. The training data is usually an extracted portion

of the original data hand labeled manually. After suitable training they can be used on the

actual test data. The Naive Bayes is a statistical classifier whereas Support Vector Machine

is a kind of vector space classifier. The statistical text classifier scheme of Naive Bayes

(NB) can be adapted to be used for sentiment classification problem as it can be visualized

as a 2-class text classification problem: in positive and negative classes.[2] Support Vector

machine (SVM) is a kind of vector space model based classifier which requires that the text

documents should be transformed to feature vectors before they are used for classification.

Usually the text documents are transformed to multidimensional tf.idf vectors. The entire

problem of classification is then classifying every text document represented as a vector into

a particular class. It is a type of large margin classifier. Here the goal is to find a decision

boundary between two classes that is maximally far from any document in the training data.

This approach needs

1. A good classifier such as Naive Byes, Support Vector Machine,etc

2. A training set for each class

There are various training sets available on Internet such as Movie Reviews data set, twitter

dataset, etc.

Class can be Positive,negative. For both the classes we need training data sets.

3.2 Natural Language Processing

Natural language processing (NLP) is a field of computer science, artificial intelligence, and

linguistics concerned with the interactions between computers and human (natural) lan-

guages.

This approach utilizes the publicly available library of SentiWordNet, which provides a sen-

timent polarity values for every term occurring in the document. In this lexical resource

each term t occurring in WordNet is associated to three numerical scores obj(t), pos(t)

and neg(t), describing the objective, positive and negative polarities of the term, respec-

tively. These three scores are computed by combining the results produced by eight ternary

classifiers.[3]

WordNet is a large lexical database of English. Nouns, verbs, adjectives and adverbs are

grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept.

7

Page 14: Introduction to Sentiment Analysis

CHAPTER 3. METHODOLOGY

WordNet is also freely and publicly available for download. WordNet’s structure makes it a

useful tool for computational linguistics and natural language processing. It groups words

together based on their meanings.

Synet is nothing but a set of one or more Synonyms.

This approach uses Semantics to understand the language. Major tasks in NLP that helps

in extracting sentiment from a sentence[1] :

1. Extracting part of the sentence that reflects the sentiment

2. Understanding the structure of the sentence

3. Different tools which help process the textual data

Basically, Positive and Negative scores (for particular synet) got from SentiWordNet

according to its part-of-speech tag and then by counting the total positive and negative

scores we determine the sentiment polarity based on which class (i.e. either positive or

negative) has received the highest score.

3.3 Summery

The various approaches for Sentiment Analysis has been discussed in this chapter. There

are total two ways one is using Machine Learning and the other is using Natural Language

Processing.

8

Page 15: Introduction to Sentiment Analysis

Chapter 4

Implementation

Sentiment Analysis can be implemented using 2 approaches [1]

1. Machine Learning Approach

2. Natural Language Processing Approach

4.1 Machine Learning Approach

Machine learning approach needs a dataset, a classifier to train. Basic idea behind this ap-

proach is that first we collect the data set which can be movie review dataset,twitter dataset,

etc. These data sets are freely available on internet. Then we pre process the data set and

prepare a training set for our classifier. Using training set we train the classifier, after train-

ing we provide test data set to classifier.

Following figure shows the basic implementation model of Sentiment Analysis using Ma-

chine Learning Approach

Figure 4.1: Implementation Architecture using Machine Learning Approach

9

Page 16: Introduction to Sentiment Analysis

CHAPTER 4. IMPLEMENTATION

Data sets are freely available on internet. For Example, City Grid Media, it is a online

media company that connects web and mobile publishers with local businesses by linking

them through city grid. It provides apis, reviews, ratings(1-10). Its domain is Restaurant.

Pre-processing involves dividing the sentence into tokens, case conversion, removal of punc-

tuations, word conversion to full forms.

4.2 Natural Language Processing Approach

Natural Language Processing approach uses SentiWordNet lexicon. Which consists of pos-

itive, negative score for each of the term occuring in WordNet. The implementation done

by extracting the adjectives out of the sentence and then searching it in the SentiWordNet

to find out its positive, negative score. In this way the total net score of the sentence is

calculated and whichever is greater (either positive or negative) becomes the review for the

sentence.

Following figure shows the basic implementation architecture of Sentiment Analysis using

Natural Language Processing Approach.

Figure 4.2: Implementation Architecture using NLP Approach

10

Page 17: Introduction to Sentiment Analysis

CHAPTER 4. IMPLEMENTATION

4.3 Summery

The various approaches to implement Sentiment Analysis has been discussed in this chapter

in detail. There are total two ways one is using Machine Learning and the other is using

Natural Language Processing.

11

Page 18: Introduction to Sentiment Analysis

Chapter 5

Applications

Word of mouth is the process of conveying information from person to person and plays a

major role in customer buying decisions. In commercial situations, Word of mouth involves

consumers sharing attitudes, opinions, or reactions about businesses, products, or services

with other people. Word of mouth communication functions based on social networking and

trust. People rely on families, friends, and others in their social network. Research also

indicates that people appear to trust seemingly disinterested opinions from people outside

their immediate social network, such as online reviews. This is where Sentiment Analysis

comes into play. Growing availability of opinion rich resources like online review sites,

blogs, social networking sites have made this “decision-making process” easier for us. With

explosion of Web 2.0 platforms consumers have a soapbox of unprecedented reach and power

by which they can share opinions. Major companies have realized these consumer voices

affect shaping voices of other consumers.[2]

Sentiment Analysis thus finds its use in Consumer Market for Product reviews,Marketing

for knowing consumer attitudes and trends, Social Media for finding general opinion about

recent hot topics in town, Movie to find whether a recently released movie is a hit.[2]

12

Page 19: Introduction to Sentiment Analysis

CHAPTER 5. APPLICATIONS

Classification of applications into the following categories:

1. Review-Related Websites : Movie Reviews, Product Reviews etc.

2. As a Sub-Component Technology : Detecting antagonistic, heated language in mails,

spam detection, context sensitive information detection etc.

3. Businesses and Organizations :

• Brand analysis

• New product perception

• Product and Service benchmarking

• Market Intelligence

• Business spends a huge amount of money to find consumer sentiments and opin-

ions

– Consultants, surveys and focused groups, etc

4. Individuals : Interested in other’s opinions when

• Purchasing a product or using a service

• Finding opinions on political topics

5. Ads Placements : Placing ads in the user-generated content

• Place an ad when one praises a product.

• Place an ad from a competitor if one criticizes a product.

5.1 Summery

This chapter tells the various applications of Sentiment Analysis.

13

Page 20: Introduction to Sentiment Analysis

Chapter 6

Advantages & Disadvantages

6.1 Advantages

1. A lower cost than traditional methods of getting customer insight.

2. A faster way of getting insight from customer data.

3. The ability to act on customer suggestions.

4. Identifies an organisation’s Strengths, Weaknesses, Opportunities & Threats (SWOT

Analysis)

5. As 80% of all data in a business consists of words, the Sentiment Engine is an essential

tool for making sense of it all.

6. More accurate and insightful customer perceptions and feedback.

6.2 Summery

This chapter gives the advantages of Sentiment Analysis.

14

Page 21: Introduction to Sentiment Analysis

Chapter 7

Conclusion

Sentiment analysis, as an interdisciplinary field that crosses natural language processing,

artificial intelligence, and text mining. We have seen that Sentiment Analysis can be used

for analyzing opinions in blogs, newspaper, articles,Product reviews, Social Media websites,

Movie-review websites where a third person narrates his/her views. We also studied Natural

Language Processing and Machine Learning approaches for Sentiment Analysis. We have

seen that is easy to implement Sentiment Analysis via SentiWordNet approach than via

Classifier approach. We have seen that sentiment analysis has many applications and it is

important field to study. Sentiment analysis has Strong commercial interest because Com-

panies want to know how their products are being perceived and also Prospective consumers

want to know what existing users think

15

Page 22: Introduction to Sentiment Analysis

Bibliography

[1] P. W. V.K. Singh R. Piryani A. Uddin, “Sentiment analysis of movie reviews and blog

posts,” IEEE International Advance Computing Conference (IACC), vol. 3, 2013.

[2] A. A. G. Mostafa Karamibekr, “Sentiment analysis of social issues,” International Con-

ference on Social Informatics, 2012.

[3] M. R. Alaa Hamouda, “Reviews classification using sentiwordnet lexicon,” The Online

Journal on Computer Science and Information Technology (OJCSIT), vol. 2, August

2011.

16


Top Related