sentiment analysis for the italian language

130
Universit ` a degli Studi di Udine Dipartimento di Matematica e Informatica Dottorato di Ricerca in Informatica Ph.D. Thesis Sentiment Analysis for the Italian language Candidate: dott. Paolo Casoto Supervisor: Professor Carlo Tasso January 8, 2012

Upload: paolo-casoto

Post on 26-Jan-2015

146 views

Category:

Technology


1 download

DESCRIPTION

The PhD thesis of Dr. Paolo Casoto on Sentiment Analysis. The work presented in this thesis provides several contributions to the specific task of Sentiment Analysis applied, more specifically, to product reviews written in Italian language. In particular the following contributions have been proposed: • a generic framework aimed at defining, training and testing automatic tools devoted to Sentiment Analysis based on supervised classifiers has been designed and implemented. The SENT-IT framework provides a complete set of integrated tools for linguistic analysis and machine learning, which could be applied in order to easily generate new automatic tools for sentiment classification and to evaluate experimentally their performances. A comprehensive description of the SENT-IT framework and its modules is provided in Chapter 3. SENT-IT framework is based on open-source solutions and will be freely released soon for research purposes. • a set of automatically annotated corpora constituted by product reviews writ- ten in Italian language, grouped by product domain (e.g.: movie, cars, cell phones, et al.), has been collected and shared with other researchers. Each product review is constituted by a short text, a set of additional and optional information, such as date, author name and age, and an overall polarity rating indicator, aimed at representing the polarity expressed by the author within the review. Corpora which have been developed in order to perform evaluation of the proposed methodologies for Sentiment Analysis, could be used in the future by other researchers as a Gold Standard, not available for the Italian language until the beginning of this thesis. Review corpora have been publicly released in 2008 in XML format and are available at author’s site. • a document features representation schema suitable for Sentiment Analysis applied to Italian language has been proposed and experimentally evaluated. The set of selected features, described in detail in Chapter 3, is constituted by representation features described as suitable in literature, in the case of English language, and ad-hoc defined features, proposed according with the specific particularities of the Italian language. • a domain independent meta-classifier devoted to Sentiment Analysis has been implement by applying a stacking approach to previously trained domain-dependent classifiers. Stacking approach has been investigated in order to improve the effectiveness of the ensemble classifier on unknown or already known domains. • a lexical resource of polarity oriented terms for the Italian language has been developed, by proposing a shortest path algorithm based on a graph representation of the input terms. Semantic relations connecting terms, like synonymy, antinomy and similarity have been used in order to generate the graph representation.

TRANSCRIPT

Page 1: Sentiment Analysis for the Italian language

Universita degli Studi di Udine

Dipartimento di Matematica e Informatica

Dottorato di Ricerca in Informatica

Ph.D. Thesis

Sentiment Analysis for the Italianlanguage

Candidate:

dott. Paolo Casoto

Supervisor:

Professor Carlo Tasso

January 8, 2012

Page 2: Sentiment Analysis for the Italian language

Author’s address:

Dipartimento di Matematica e InformaticaUniversita degli Studi di UdineVia delle Scienze, 20633100 UdineItalia

Page 3: Sentiment Analysis for the Italian language

Abstract

Sentiment Analysis is the discipline aimed at analyzing and classifying the orienta-tion of the opinions expressed in a document or, more generally, in a textual entity.

Each textual entity could be classified as positive, negative or neutral, accord-ing with the orientation of opinions it expresses. By means of Sentiment Analysisthe sentence ”Il motore della Fiat Punto e brillante e piacevole da guidare” is au-tomatically classified as positive, while the sentence ”Il cambio e impreciso e necompromette la guidabilita” is classified as negative, due to the different orientationof the opinions they express.

The research activities described in this thesis aim at investigating and proposingdifferent techniques for Sentiment Analysis applied to documents written in theItalian Language. The need of automatic tools for Sentiment Analysis is justified bythe huge amount of opinionated contents available on the Web (e.g.: review sites,blogs, forums) and their continuous growth rate. Users could not deal with suchamount of data; automated tools able to summarize the polarity rating expressedby other reviewers in a set of heterogeneous information sources are required.

Sentiment Analysis has many potential applications, ranging from tracking users’opinions and preferences about products or political candidates as expressed in on-line forums, to customer relationship management or terrorism prevention. MoreoverSentiment Analysis includes several different tasks, also referred as Opinion Mining,which are aimed at investigating and identifying subjective elements, which couldappear in a given document, as opinions or judgements.

Sentiment analysis has been investigated since 2001 by many researchers world-wide; in particular most of the research activity is focused on the English language.This thesis and its related publications represent the first approach to SentimentAnalysis for the Italian language. In particular we aim at developing and evaluatinga set of methodologies, based on both linguistic and machine learning algorithms,for defining domain dependent and independent classifiers for opinion polarity ofproduct reviews.

In order to support the experimental activity the SENT-IT framework has beendesigned and implemented; it provides a complete toolbox for document analysis,feature extraction and classifier training and evaluation. The SENT-IT frameworkhas been used to evaluate the proposed methodology for opinion polarity analysison both domain dependent and independent environments. The results confirm, in

Page 4: Sentiment Analysis for the Italian language

iv ABSTRACT

terms of classification accuracy, that automatic tools for Sentiment Analysis in theItalian language could reach performances similar to those described in literaturefor the English language.

Page 5: Sentiment Analysis for the Italian language

Acknowledgments

During the last four years I though many times about leaving these thesis uncom-pleted; fortunately I did not heard this voices in my mind. I must admit these fouryears have been a long and adventurous journey, both professionally and personallyand, at the end, I’m glad I did it.

First of all I want to thank Professor Carlo Tasso, my advisor since my bachelorthesis in 2003, who helped me during this work with his experience and advice. Iwant to thank my colleagues at Artificial Intelligence Laboratory: Andrea, Antonina,Felice and Nirmala for their support, advice, help and friendship.

I want to thank Luca and Ivan for their help, especially in a very bad momentof my life. Probably both myself and my thesis will not be here without theirmotivational work.

Page 6: Sentiment Analysis for the Italian language

vi ACKNOWLEDGMENTS

Page 7: Sentiment Analysis for the Italian language

Contents

1 Introduction 11.1 The importance of Opinion and Sentiment in Modern Information

Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Applications of Opinion Mining and Sentiment Analysis methodologies 61.4 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.5 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Sentiment Analysis: challenges, solutions and tasks 132.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Challenges in Sentiment Analysis . . . . . . . . . . . . . . . . . . . . 172.3 Sentiment Polarity Classification . . . . . . . . . . . . . . . . . . . . . 22

2.3.1 Sentiment Polarity Regression . . . . . . . . . . . . . . . . . . 242.4 Opinion Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.5 Affect computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.6 Multilingual Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . 27

3 A Supervised Approach to Overall Opinion Polarity Analysis 313.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2.1 Turney . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.2.2 Pang et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.2.3 Dave et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.2.4 Salvetti et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3 The SENT-IT Framework . . . . . . . . . . . . . . . . . . . . . . . . 363.3.1 Product Review Crawler . . . . . . . . . . . . . . . . . . . . . 433.3.2 Analysis Module . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.4 Expertiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.4.1 The Movie Review Corpus . . . . . . . . . . . . . . . . . . . . 473.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.5 A novel visualization approach for polarity classified reviews . . . . . 533.5.1 Basics of graph theory . . . . . . . . . . . . . . . . . . . . . . 533.5.2 Zz-structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Page 8: Sentiment Analysis for the Italian language

viii CONTENTS

3.5.3 Data Visualization Module . . . . . . . . . . . . . . . . . . . . 573.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4 Domain Independent Sentiment Analysis 614.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.2.1 Aue and Gamon . . . . . . . . . . . . . . . . . . . . . . . . . 624.2.2 Engstrom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.2.3 Agrin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.2.4 Blitzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.3 Domain Independent OvOP . . . . . . . . . . . . . . . . . . . . . . . 654.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.4.1 Test set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5 Automatic Generation of Lexical Resources for Sentiment Analysis 755.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.1.1 Prior subjectivity status contextualization . . . . . . . . . . . 775.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.2.1 Hatzivassiloglou and McKeown . . . . . . . . . . . . . . . . . 795.2.2 Turney and Littman . . . . . . . . . . . . . . . . . . . . . . . 815.2.3 Kamps et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.2.4 Takamura et al. . . . . . . . . . . . . . . . . . . . . . . . . . . 855.2.5 Esuli et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.3 Determining the polarity orientation . . . . . . . . . . . . . . . . . . 865.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.4.1 The OpenOffice dictionary . . . . . . . . . . . . . . . . . . . . 905.4.2 The SinonimiMaster dictionary . . . . . . . . . . . . . . . . . 915.4.3 Test set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915.4.4 Seed sets and parameters . . . . . . . . . . . . . . . . . . . . . 925.4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.5 OvOP analysis based on sentiment oriented terms . . . . . . . . . . . 965.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.6.1 Test set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 995.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

6 Conclusions 105

A Publications 107

Bibliography 109

Page 9: Sentiment Analysis for the Italian language

List of Figures

2.1 Different approaches to text categorization and polarity classification. 142.2 Template adopted in [87] and [88] for opinion oriented information

extraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3 The EmpathyBuddy email agent [49, 48] in action. . . . . . . . . . . 27

3.1 Overall architecture of the SENT-IT framework. . . . . . . . . . . . . 383.2 OvOP Workflows available in the SENT-IT framework. . . . . . . . . 413.3 Overall architecture of the Product Review Crawler module. . . . . . 443.4 Distribution of preassigned OvOP in MRC. . . . . . . . . . . . . . . . 493.5 Feature selection and accuracy for both NB and SVM classifiers. . . . 523.6 An example of zz-structure. . . . . . . . . . . . . . . . . . . . . . . . 553.7 An example of H-view on focus v7. . . . . . . . . . . . . . . . . . . . 573.8 Set of reviews retrieved from the MRC with the query ”Johnny Depp”. 583.9 A view related to dimensions ”Johnny Depp” and ”Pirati dei Caraibi”. 59

4.1 The meta-classification OvOP process. . . . . . . . . . . . . . . . . . 66

5.1 Prior subjectivity status contextualization process. . . . . . . . . . . 785.2 A subset of the nodes and edges constituting the WordNet graph

analyzed in [38]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.3 The classification of the term ”efficient” provided by SentiWordNet. . 875.4 The term polarity evaluation process. . . . . . . . . . . . . . . . . . . 895.5 Data provided by the SinonimiMaster dictionary for the term effi-

ciente (efficient). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915.6 The OvOP analysis process. . . . . . . . . . . . . . . . . . . . . . . . 1005.7 Accuracy of OvOP analysis. . . . . . . . . . . . . . . . . . . . . . . . 103

Page 10: Sentiment Analysis for the Italian language

x LIST OF FIGURES

Page 11: Sentiment Analysis for the Italian language

List of Tables

2.1 List of polarity conveying terms collected by two human experts in [63]. 182.2 List of polarity conveying terms collected by human expert and statis-

tic analysis of document corpus in [63]. . . . . . . . . . . . . . . . . . 18

3.1 Average three-fold cross-validation accuracies achieved by Pang et al.in [63]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.2 Average accuracy of trained classifiers. . . . . . . . . . . . . . . . . . 503.3 Average accuracy+ and accuracy− of trained classifiers. . . . . . . . . 503.4 Top 50 features extracted from the training set with the highest IG

value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.5 Average accuracy of the U3 and UBT3 based classifiers after feature

selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.1 Average three-fold cross-validation accuracies for each domain depen-dent OvOP classifier trained according to the UBT3 feature set. . . . 68

4.2 Top 30 features extracted from each training set with the highest IGvalue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.3 Average three-fold cross-validation accuracy for each domain depen-dent OvOP classifier applied to different domains. . . . . . . . . . . . 71

4.4 Classification accuracy of a classifier trained on three domains andtested on the forth domain. . . . . . . . . . . . . . . . . . . . . . . . 71

4.5 Classification accuracy of a classifier trained on the four domains. . . 724.6 Classification accuracy of a meta-classifier evaluated on the four do-

mains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.1 Adjectives provided by users in L1 with two or more occurrences. . . 935.2 Positive and negative adjectives with the highest orientation value

O(t) generated from the OpenOffice dictionary. . . . . . . . . . . . . 945.3 Positive and negative adjectives with the highest orientation value

O(t) generated from the SinonimiMaster dictionary. . . . . . . . . . . 955.4 Coverage and accuracy of both generated sentiment-classified lexical

resources with respect to test set L1. . . . . . . . . . . . . . . . . . . 965.5 Accuracy of generated sentiment-classified lexical resources with re-

spect to test set L2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

Page 12: Sentiment Analysis for the Italian language

xii LIST OF TABLES

5.6 Extraction rules used for OvOP analysis. . . . . . . . . . . . . . . . . 985.7 Accuracy of lexical resource based OvOP analysis. . . . . . . . . . . . 101

Page 13: Sentiment Analysis for the Italian language

Chapter 1

Introduction

1.1 The importance of Opinion and Sentiment in

Modern Information Access

Most of the decision-making processes we exploit during our life are based on subjec-tive information: opinions and sentiments. Knowing what other people think, howthey perceive reality, is required in order to support several activities we exploitdaily, like, for example, renting a movie or buying a new digital camera. Most ofpeople usually asks a friend for recommendation about which product to buy, whichcandidate to vote, which book to read, before to make a decision [62].

The importance of gathering and knowing people opinions and preferences is anissue well known by companies, organizations and public authorities. Companies,for example, collect opinions about their products and brands to support marketingstrategy and to plan new activities aimed at improving the way the company is per-ceived by its customers. Politicians could use opinions collected from their electorsand even from the electors of their competitors in order to define their programsand plans.

The ability to predict how shareholders and the markets will answer to good andbad news is a key issue for both companies (e.g. banks, financial traders, brokeragencies, insurances et al.) and public authorities; financial trends, in fact, areparticularly sensible to opinions, judgements and ever fears perceived by investors.

Opinions could also be used by public authorities to analyze the happiness of thecitizens and to identify the issues perceived by the population, in order to preventcritical situations like terrorism.

Opinion sharing was limited, till ten years ago, in terms of both the amount ofavailable information and the potential sources of such kind of information: friends,newspapers, television. The growth of the World Wide Web, on the other hand,made possible for everyone to access a huge amount of information sources (e.g.:forums, blogs, review sites, et al.) providing opinions and, more generally, subjectiveinformation.

Page 14: Sentiment Analysis for the Italian language

2 CHAPTER 1. INTRODUCTION

However the amount of available input data cannot be easily handled by a singleperson; automatic tools are required in order to filter relevant information andaggregate it in a way suitable for decision making. For example, given a set of 1000reviews regarding a brand-new digital camera, the automatic tool could extract theamount of positive and negative opinions and present them to the customer.

Analysis of opinions can even move further, by providing users the ability todistinguish between opinions associated with different features of a given product:for example, given a specific digital-camera, specific methodologies can be appliedto identify that reviewers are enthusiastic about the quality of the lenses but, at thesame time, very disappointed by the low precision of the autofocus system.

Another issue concerning the large amount of available opinions is trust : opinionsprovided by a friend or by a well-known movie critic tend to be more trusted thanopinions provided by strangers. Even in this case analysis of opinions can provide anadvice to users: by automatically analyzing and classifying thousand of documents,wrong or fake opinions provided by single untrustful users can be leveraged by takingin account the wide amount of knowledge provided by the crowd. Understandinghow the trust of a seller is perceived by its customers according with the reviewswritten by other customers is becoming a critical issue in e-commerce, especially insystems like E-Bay, where trust is considered a key factor at beginning of a newtransaction.

Opinion Mining and Sentiment Analysis identify the new field of research devotedto designing and evaluating tools for automatic opinion analysis. It started approx-imately in 2001, with contributions from researchers coming from the domains ofmachine learning, computational linguistic and information retrieval. Most of theresearch experiences aimed at investigating Sentiment Analysis and Opinion Miningfocused on the English language; in fact only a small part of the available worksdeals with the problem of Sentiment Analysis for other languages, where the amountof available tools, resources and previous experiences is significantly reduced.

Page 15: Sentiment Analysis for the Italian language

1.2. TERMINOLOGY 3

1.2 Terminology

Several different terms have been proposed during the last ten years by authorsinvolved in the field of sentiment analysis in order to describe their work and thedifferences with work done by others; in fact no uniform terminology is available.

According to Wiebe, the subjectivity of a text is defined as the set of elementsdescribing the private state of the writer. Assumptions, beliefs, thoughts, experi-ences, opinions, and judgments expressed in texts are typical clues of subjectivity[91].

Sentiment is defined as the subset of subjective clues that can be measured interms of positive, neutral or negative orientation.

The automatic analysis of opinions, sentiment and subjectivity of a given textare known, respectively, as Opinion Mining, Sentiment Analysis and SubjectivityAnalysis.

The term Opinion Mining (OM) has been introduced in 2003 in order to describethe activity aimed at “processing a set of search results for a given item, generatinga list of product attributes (quality, features, et al.) and aggregating opinions abouteach of them (poor, mixed, good)” [19]. Opinion Mining, according to definitionprovided by Dave, is concerned with the analysis of the opinions expressed by adocument, not considering in any way the specific topic (topicality) of analyzeddocument: for this reason OM is classified as a non-topical text analysis task1. Theterm has been also used in several works available in literature, including [55], [32]and [28].

The term Sentiment Analysis (SA) has been introduced in 2001, in order to de-scribe the process aimed at automatically evaluating the polarity expressed by a setof given documents. The term Sentiment has been inherited by Das and Chen [16]from the economical domain: the work is aimed at defining a prevision algorithmable to determine the future market-share tendency given a set of documents con-cerning public companies extracted from economical newspapers. More specificallythe analysis of future market tendency is usually referred in the economical domainas market sentiment. The term Sentiment Analysis has been used in several worksin literature, including [82], [84], [63] and [97].

While OM is mainly focused on recognizing opinions expressed in a given textwith respect to specific attributes (e.g.: to recognize the opinions related to theengine in a car review and separate them from the opinions regarding tyres), SAis focused, on the other hand, on classifying a given document according with thepolarity it expresses (either negative or positive).

1Other text analysis tasks, which could be seen as non-topical, are genre recognition, aimedat identifying and classifying the type of the analyzed document, author recognition, aimed atrecognizing the author of a document in a set of potential authors, each one characterized by itsown writing style.

Page 16: Sentiment Analysis for the Italian language

4 CHAPTER 1. INTRODUCTION

In order to properly classify the different issues and solutions proposed in litera-ture in the specific domain of OM and SA, two different dimensions can be analyzed:the granularity of the input documents and the opinion-related goal each solutionin aimed at solving (e.g.: determining subjectivity, polarity or force of the inputdocument).

Granularity defines which textual entities will be considered and analyzed by theapplication:

• document level granularity [63]: each document is seen as the base element foranalysis of the opinion-related properties. Each property is evaluated on thedocument seen as a whole, even if constituted by sentences expressing differentopinion-related properties.

• sentence level granularity [60]: each sentence constituting a given documentis seen as the base element for analysis of the opinion-related properties (e.g.:given a car review, a polarity rating is assigned to each of the sentences con-stituting it).

• proposition and text span level granularity : propositions and text spans, con-stituted by two or more words, are seen as the base element for analysis of theopinion-related properties.

• term level granularity [33]: analysis of the opinion-related properties is per-formed on a vocabulary of terms. The terms are not considered with respect tothe contexts in which they appear; properties evaluated on terms are general.Analysis at term level granularity could lead to the composition of opinion-related lexical resources. Such resources could be used by application devotedto OM or SA to improve their effectiveness (e.g.: the term ”buono” expressesa positive polarity, while the term ”casa” does not express any subjectivity).

• term sense level granularity [26]: the most detailed level at which OM and SAanalysis could be performed, exploited by Esuli et al. during the developmentof the SentiWordNet resource. With respect to the term level granularity, eachdifferent sense of a term is considered as the base element for analysis of theopinion-related properties.

Opinion-related dimensions define which aspect of subjectivity the subtask willfocus on. Three opinion related dimensions have been mainly investigated in liter-ature:

• subjectivity [98], [60], [42]: subjectivity analysis is the activity aimed at rec-ognizing if a given textual entity, according to selected granularity, containssubjective expressions. The subjectivity analysis task, for example, could beused to determine that the sentence ”Il motore della Fiat Punto e bril-lante e piacevole da guidare” contains an opinion, while the sentence ”Il

Page 17: Sentiment Analysis for the Italian language

1.2. TERMINOLOGY 5

motore della Fiat Punto eroga una Potenza di 120 CV” provides onlyobjective information. The term OM is usually associated with applicationsinvolved in subjectivity analysis.

• polarity [82],[63]: polarity analysis is the activity aimed at evaluating the po-larity (in terms of positive or negative orientation) expressed by a textualentity. For example polarity analysis applied to following sentences ”Il mo-tore della Fiat Punto e brillante e piacevole da guidare”, ”Il cambio eimpreciso e ne compromette la guidabilita”, recognizes that the formerexpresses a positive orientation, while the latter expresses a negative orienta-tion. The term SA is usually associated with applications involved in polarity(sometimes referred as orientation) analysis.

• force [61]: force analysis is aimed at recognizing the intensity of the subjec-tive elements contained in a given textual entity. This dimension is usuallyinvestigated in association with subjectivity or polarity. Force analysis can beexploited, for example, to compare intensity of the two sentences “Il cambioe un po’ impreciso” and “Il cambio e un terribilmente impreciso” andto conclude that the latter one expresses a more intense polarity with respectto the former one, which provides a lighter orientation.

Subjectivity and polarity analysis could usually be seen as a typical classificationproblem: given a textual entity D as input, a specific class C (with C ∈ [ objective,subjective ] or C ∈ [ positive, negative ]) is assigned to D. Force analysis, on theother hand, presents a regression-like nature: given a textual entity D as input, theanalysis tasks is aimed at assigning a force value FD to D.

Page 18: Sentiment Analysis for the Italian language

6 CHAPTER 1. INTRODUCTION

1.3 Applications of Opinion Mining and Senti-

ment Analysis methodologies

Opinion Mining and Sentiment Analysis have been proficiently applied to severaldomain of application, in both research and industrial contexts. The domains of ap-plication which have been explored since 2001 vary from product recommendation toproduct marketing, from brand and reputation analysis to business and governmentintelligence, from analysis of the market sentiment to terrorism prevention.

The most significant domain of application of Sentiment Analysis is representedby product recommendation and review: Sentiment Analysis could be applied toautomatically summarize the polarity expressed by a set of reviews concerning agiven product or a specific property of the product, identified by means of OpinionMining.

Summarized and aggregated information could be used by customers to evaluatehow a product is perceived by other customers and decide if it should be bought.Customers could easily base their decision-making process on aggregated data with-out reading the whole set of product reviews.

Summarization and analysis of the reviews concerning a specific product becomeseven more significant for companies building or selling the product itself: they canidentify, by means of both Opinion Mining and Sentiment Analysis activities, whichproperties of their products are perceived as a benefit or, on the other hand, as anissue, by customers. At the same time, described techniques could be applied toreviews concerning products build by competitors, in order to compare the feed-back provided by users. A company could find, for example, that its products areperceived better than the products of their competitors because of a faster spareparts delivery system. On the other hand the company could identify that one of itsproducts is perceived as too expensive by potential customers with respect to theset of provided functionalities.

Summarization of reviews polarity has been analyzed by Turney in [84]: reviewsare segmented in sentences, for each sentence the semantic orientation rating isevaluated by comparing its similarity to a positive reference word (“excellent”) withits similarity to a negative reference word (“poor”). Similarity between phrases andwords is defined by means of the PMI-IR algorithm [82]. The overall polarity of thereview is evaluated as the average semantic orientation of the sentences constitutingit.

Pang et al. [63] proposed a machine learning based approach for sentiment clas-sification of movie domain reviews; supervised binary classifiers have been applied toreviews in order to evaluate reviews orientation. Pang showed how machine learningcould improve the effectiveness of the method proposed by Turney in the specificdomain of movie reviews.

Review classification and clustering have been explored also in [5]: four differentcorpora of reviews belonging to different domains have been analyzed and evaluated.

Page 19: Sentiment Analysis for the Italian language

1.3. APPLICATIONS OF OPINION MINING AND SENTIMENT ANALYSIS METHODOLOGIES7

The different types of data they consider range from movie reviews to short, phrase-level user feedback from web surveys. Authors present an innovative clusteringapproach aimed at graphically representing the sentiment orientation of differentaspects of the input reviews.

The movie domain has been analyzed in [54]: a new trend prediction algorithmbased on sentiment expressed by messages available in blogs is presented. Morespecifically the authors present a new approach to predict the sales of a movieby analyzing both the amount of references to it available in the blogosphere andthe sentiment orientation of the textual context (at text span level granularity)surrounding each movie reference. Authors showed that a correlation exists betweena movie financial performance and the amount of positive oriented references to it.

Politics has been influenced too by the growth of popularity of both SentimentAnalysis and Opinion Mining; in particular two different tasks have been exploredby several authors in literature: to identify the opinions of the voters and to clarifythe position expressed by a politician with respect to a specific topic.

The first task has been explored, for example, in [47, 56]; in particular Mullenet al. evaluated the performance of a classification method based on Naıve Bayesclassifiers aimed at inferring the political affiliation of a blogger. The experimentalactivity described by the authors leads to poor results in terms of classificationaccuracy, with a best performance of 64,48%. A better accuracy, 65,57%, has beenreached by replacing the Naıve Bayes classifier with a simple classification rule: toassign a user to a political affiliation opposite to the users they tend to quote or bequoted by.

In [80] authors investigated the problem of determining if a politician speakingduring a debate agrees or not with the contents of the debate: in particular theyshowed how integrating constraints based on speaker identity and on direct tex-tual references between statements can significantly increase the accuracy of sup-port/opposition classification.

A similar domain has been explored in [46], where a new application for auto-matically analyzing a large set of documents has been developed in order to identifythose documents which support or oppose to a specific rule proposition.

Sentiment Analysis and Opinion Mining could also be used to implement opinionrelated search engines; an opinion related search engine can be defined as a searchengine providing users the ability to filter the retrieved data according with a spe-cific subjectivity or orientation (e.g.: user asks for documents expressing negativeopinions about “Fiat Punto”). An example of opinion related search engine hasbeen proposed in [4]; the system described by the authors has been evaluated onblog contents included in the 2006 TREC Blog track. In particular authors showedhow query reformulation including opinion-related terms could be proficiently usedto improve accuracy in retrieval of opinion related contents.

Sentiment Analysis has been exploited in order to implement tools for checkingthe coherence of the review expressed by a user: for example SA could be usedto automatically check if the contents of review are compatible with the rating

Page 20: Sentiment Analysis for the Italian language

8 CHAPTER 1. INTRODUCTION

expressed by the user to summarize the review.Sentiment Analysis and Opinion Mining have also been integrated into product

recommender systems, in order to provide augmented recommendation based onboth collaborative filtering and analysis of user feedbacks. Product with manypositive reviews will be recommended with an higher probability with respect toproducts with a lot of negative reviews.

Moreover, Sentiment Analysis and Opinion Mining could be adopted in orderto identify flames (messages with improper language) in email communications,forums, blogs and websites. Even accuracy of Information Extraction could beimproved by integrating Opinion Mining into the extraction workflow, as describedin [70]: sentences characterized by highest subjectivity are discarded, limiting theextraction process to objective sentences.

Opinion Mining has also been exploited in question answering in order to developan opinion related question answering system [77] [75]: a system, which answers touser questions by providing both objective and subjective information. For example,given the question “Com’e il motore della FiatPunto ?“2 an opinion relatedquestion answering system could provide both positive and negative oriented per-spectives on the same topic. In [98] an opinion mining application operating at bothdocument and sentence level is described; its goal is the identification of subjectivetextual entities which could be exploited in order to answer questions expressed byusers.

Other activities where Sentiment Analysis and Opinion Mining have been in-tegrated in order to improve effectiveness include summarization [73] and citationanalysis [64], where Sentiment Analysis could be exploited in order to identify if anauthor agrees or not with an hypothesis or a result expressed by other authors.

2“How good is Fiat Punto’s engine?“

Page 21: Sentiment Analysis for the Italian language

1.4. CONTRIBUTION 9

1.4 Contribution

The work presented in this thesis provides several contributions to the specific task ofSentiment Analysis applied, more specifically, to product reviews written in Italianlanguage. In particular the following contributions have been proposed:

• a generic framework aimed at defining, training and testing automatic toolsdevoted to Sentiment Analysis based on supervised classifiers has been de-signed and implemented. The SENT-IT framework provides a complete set ofintegrated tools for linguistic analysis and machine learning, which could beapplied in order to easily generate new automatic tools for sentiment classi-fication and to evaluate experimentally their performances. A comprehensivedescription of the SENT-IT framework and its modules is provided in Chapter3. SENT-IT framework is based on open-source solutions and will be freelyreleased soon for research purposes.

• a set of automatically annotated corpora constituted by product reviews writ-ten in Italian language, grouped by product domain (e.g.: movie, cars, cellphones, et al.), has been collected and shared with other researchers. Eachproduct review is constituted by a short text, a set of additional and optionalinformation, such as date, author name and age, and an overall polarity ratingindicator, aimed at representing the polarity expressed by the author withinthe review. Corpora which have been developed in order to perform evaluationof the proposed methodologies for Sentiment Analysis, could be used in thefuture by other researchers as a Gold Standard, not available for the Italianlanguage until the beginning of this thesis. Review corpora have been publiclyreleased in 2008 in XML format and are available at author’s site3.

• a document features representation schema suitable for Sentiment Analysisapplied to Italian language has been proposed and experimentally evaluated.The set of selected features, described in detail in Chapter 3, is constitutedby representation features described as suitable in literature, in the case ofEnglish language, and ad-hoc defined features, proposed according with thespecific particularities of the Italian language.

• a domain independent meta-classifier devoted to Sentiment Analysis has beenimplement by applying a stacking approach to previously trained domain-dependent classifiers. Stacking approach has been investigated in order toimprove the effectiveness of the ensemble classifier on unknown or alreadyknown domains.

• a lexical resource of polarity oriented terms for the Italian language has beendeveloped, by proposing a shortest path algorithm based on a graph represen-tation of the input terms. Semantic relations connecting terms, like synonymy,

3http://users.dimi.uniud.it/ paolo.casoto/research.html

Page 22: Sentiment Analysis for the Italian language

10 CHAPTER 1. INTRODUCTION

antinomy and similarity have been used in order to generate the graph rep-resentation. In particular our research has been focused on evaluating thepolarity orientation of attributes; in fact attributes, as described in detail inboth Chapters 3 and 5, carry most of the overall polarity of the documentsanalyzed in this work.

• a novel information visualization and navigation approach, based on zz-structures,has been proposed. The navigation module is aimed at providing users an ef-fective and personalized way to browse the set of reviews according with thepolarity expressed by each review.

The research activities described in this thesis and in [13] and [14] represent thefirst solution published in literature to the problem of Sentiment Analysis applied todocuments written in Italian language. For this reason results presented in Chapters3, 4 and 5 could only be compared with similar results presented in literature butevaluated on different languages. In particular results have been compared withsimilar approaches applied to documents written in English language.

The SENT-IT framework has been considered also as part of several theoreticalproposals for novel information and knowledge sharing systems, presented in [68],[14], and [6]. The outcomes of the SENT-IT framework could be used to properlyannotate, in an automatic way, a set of input documents. Annotations providedby SENT-IT have been grouped with the annotations generated by InformationExtraction tools on the same documents and used to inference and suggest moreannotations to users in a proactive way.

In addiction to Sentiment Analysis, the author focused, during its PhD researchactivity, attention on the domain of Digital Libraries and Digital Preservation ofCultural Heritage. Activities and results which have been achieved in this domainsare out of the topics of this thesis and will not be described. The full set of publi-cations is listed in Appendix A.

Page 23: Sentiment Analysis for the Italian language

1.5. OUTLINE OF THE THESIS 11

1.5 Outline of the thesis

This thesis is constituted by five chapters; in Chapter 2 a detailed description ofthe Sentiment Analysis activity is provided. Issues which affect the accuracy ofSentiment Analysis task are described and compared with topic-based classificationactivities, well known in literature, like document categorization. The problem ofapplying Sentiment Analysis techniques to non English languages, where lexical orlinguistic resources are often missing, is analyzed, by describing some of the solutionsavailable in literature.

In Chapter 3 we focus on the specific problem of Overall Opinion Polarity clas-sification at document level; more specifically we are interested in defining andevaluating supervised classifiers aimed at classifying a movie review as positive ornegative according with the sentiment orientation its author expresses. In order toovercome this problem several document representation schemas and classificationmethods have been proposed and evaluated; the SENT-IT framework is presentedin detail and results are analyzed and discussed. Chapter ends with a brief descrip-tion of the information visualization module, based on zz-structures, which has beenproposed in [14] in order to improve user navigation of movie reviews.

In Chapter 4 we move from the problem of effectively identifying the OverallOpinion Polarity of reviews in a specific domain to a more difficult activity: togenerate and train a classifier able to perform Overall Opinion Polarity analysison documents concerning different or unknown domains. In particular we describeour meta-classification approach, based on the stacking method of ensemble clas-sification. Evaluated results are analyzed and discussed, with respect to similarexperiences available in literature.

In Chapter 5 we move forward from automatic classification of Overall OpinionPolarity at document level to term level. More specifically we propose an originalmethod for determining the polarity orientation of a set of terms extracted by thevocabulary of the Italian language, based on shortest path models applied to thelink graph representing the selected terms. This activity is aimed at creating apolarity oriented lexicon for the Italian language, which could be used in orderto improve the effectiveness of the Overall Opinion Polarity analysis at documentlevel, by identifying unigrams carrying domain independent features for documentrepresentation.

In Chapter 6 a brief summary of the obtained results is reported and analyzedand the possible future path of research are described.

Page 24: Sentiment Analysis for the Italian language

12 CHAPTER 1. INTRODUCTION

Page 25: Sentiment Analysis for the Italian language

Chapter 2

Sentiment Analysis: challenges,solutions and tasks

Abstract

In this Chapter a detailed survey about challenges in Sentiment Analysis andrelated solutions presented in literature is provided. This chapter is aimedat describing the critical issues, which affect the accuracy of the SENT-ITframework described in Chapter 3 in overall opinion classification. A brief de-scription of the complementary tasks which are related with Sentiment Anal-ysis but not directly covered by our activities on the SENT-IT framework isproposed. The Chapter ends with the analysis of works presented in literaturefor Sentiment Analysis applied to non English languages.

2.1 Introduction

Interest expressed in Sentiment Analysis is constantly increasing in both industrialand research sectors, due to its wide range of potential applications, first of thembeing business intelligence.

Since the beginning of research activity in this specific area, it became clear,as stated by Turney in [82] and then confirmed by many other works, that Senti-ment Analysis is different form classic document classification activities, such as textcategorization.

“Text categorization (also referred as text classification, or topic spotting) is theactivity of labelling natural language texts with thematic categories (also referredas classes) from a predefined set.“ [72]

Categories are defined according to the specific goals of users or applications,which will perform categorization: different tasks are based on different sets of cate-gories. In fact the number of categories, which could be used in text categorization,could vary from a small set constituted just by two categories (binary classification)

Page 26: Sentiment Analysis for the Italian language

14 CHAPTER 2. SENTIMENT ANALYSIS: CHALLENGES, SOLUTIONS AND TASKS

to larger sets including thousand or more categories (e.g.: categories required toclassify a newspaper’s article according to its topic, by using a structured taxonomyor ontology). Moreover text categorization could be performed according to twodifferent approaches, described in Figure 2.1:

• single label : input text ti is assigned to exactly one category Cj; in fact cate-gories do not overlap each other;

• multi label : input text ti could be assigned to one or more categories C1, C2, C3, . . . ,overlapping each other.

Sentiment Analysis, on the other hand, is usually based on a relatively small set ofcategories (e.g.: “positive” and “negative” in a binary classification approach; “5stars”. . . ”1 star” in a multiclass classification approach). Such classes are domainindependent and generalized across users and applications. “Positive” category, forexample, clearly represents the set of positively oriented documents in both themovie and the car review domain. Generalization of categories across users andapplications may not, on the contrary, be a valid hypothesis when dealing with textcategorization. Consider, for example, two users, which are interested in classifyingtheir documents according to two different taxonomies describing document’s top-icality. Same category Ci, for example “sport”, could be located at different levelsof the taxonomies and be associated with different documents.

Figure 2.1: Different approaches to text categorization and polarity classification.

Page 27: Sentiment Analysis for the Italian language

2.1. INTRODUCTION 15

Another difference that arises by comparing Sentiment Analysis with traditionaltext categorization tasks is the strong relationship, which connects categories eachother. More specifically, while text categorization is based on unrelated (or hierar-chically related when dealing with taxonomies) categories, Sentiment Analysis usescategories representing opposite concepts (e.g.: “positive” and “negative” in binaryclassification) or categories connected by an order relation (e.g.: “5 stars”. . . ”1 star”in multiclass classification).

In fact, as stated by Pang and Lee in [62]: ”the regression-like nature of strengthof feeling, degree of positivity, and so on seems rather unique to sentiment catego-rization (although one could argue that the same phenomenon exists with respectto topic-based relevance)”.

Opinion Mining has many characteristics that differ from another activity com-monly applied to unstructured texts: Information Extraction (IE). Information Ex-traction could be defined as “ the process of filling the fields and records of a databasefrom unstructured or loosely formatted text” [50].

The IE process analyses the input text, in order to identify and to extract ref-erences to entities (e.g. people, places, dates, currencies, et al.) appearing in thetext and their relationships (e.g. person X is going to place Y). Extracted dataare structured by means of a frame-like structure, the template, a list of slots filledwith the strings extracted from the input document during the IE activity. Eachtemplate is strongly coupled with the specific domain and task it is aimed at: tem-plates devoted for the medical domains, where slots contain references to diseases,drugs, viruses or chemical compounds, do not perform effectively when applied tounstructured bibliographic references.

In fact Information Extraction is described in literature as a strongly domaindependent activity; many works, like [29] and [31], describe this issue of IE, intro-ducing and evaluating new approaches aimed at domain-independent InformationExtraction of documents available on the Web.

Opinion oriented Information Extraction, on the other hand, is based on tem-plates whose fields (e.g.: appraiser, appraised, orientation, attitude, strength, et al.)generalize well across different domains. The described template could be adoptedto extract opinions expressed by a text in both car and movie review domain. In[87] and [88] a particular domain-independent template, defined as appraisal groupis used to extract opinion related information from a set of input texts and performSentiment Analysis classification. The frame-like structure used to describe an ap-praisal group and the set of textual entities, which could be assigned to the attitudeslot, are represented in Figure 2.2.

Page 28: Sentiment Analysis for the Italian language

16 CHAPTER 2. SENTIMENT ANALYSIS: CHALLENGES, SOLUTIONS AND TASKS

Figure 2.2: Template adopted in [87] and [88] for opinion oriented informationextraction.

Page 29: Sentiment Analysis for the Italian language

2.2. CHALLENGES IN SENTIMENT ANALYSIS 17

2.2 Challenges in Sentiment Analysis

Both sentiment polarity classification and opinion oriented information extraction,as described in the previous section, generalize well across different domains, differ-ent users and even different information needs.

By observing this domain-independency of activities constituting the researchfield of Sentiment Analysis, the following assumption could be formulated: the po-larity (or, in the same way, the subjectivity for opinion oriented information extrac-tion) of a text is given by the polarity of the single words constituting it. In orderto classify the polarity of an opinion, expressed in a given text, a set of specifickeywords should be identified.

In order to support such assumption, which has been proven as being partiallywrong by several publications, [63], consider the following example from the cellphone review domain:

Considero il Nokia 5250 un vero affare, visto che possiede tutte lefunzioni piu ricercate in un telefono di ultima generazione e costacome una serata al ristorante. Il 5230 e un ottima via di mezzo ad unprezzo basso, secondo me ne venderanno una miriade.

The topic of this review could be easily identified by the entity “Nokia 5250”1, whilethe presence of words like “affare” and “ottima” clearly suggests review’s author isexpressing a positive opinion. Other textual elements, such as “piu ricercate”, “ul-tima generazione” and “ne venderanno una miriade” help in conveying the polarityof the opinion expressed by the author.

By looking at the previous example it seems that distinguishing positive fromnegative reviews is relatively easy for humans, especially in comparison to other tra-ditional text categorization problem, like topic categorization applied to documentsconcerning very similar topics. However, as stated by Pang in [63], the identificationof keywords conveying sentiment polarity is difficult even for human classifiers. In[63] the author asked two human classifiers to collect, independently each other, alist of indicators of positive or negative orientation in a given document. Both listsof keywords proposed by the human experts, intuitively plausible, has been used inorder to classify a set of input documents, concerning movie review domain: eval-uated accuracy, however, is only about 60%, with respect to the baseline of 50%based on random classification. The two lists of polarity bearing terms collected byhuman experts are reported in Table 2.1.

In order to improve the accuracy of the classification activity based on key-words identification, a new list of polarity conveying terms has been collected and

1Appraised entity is easily extracted, in the specific domain of cell phones, by implementinga named extity extraction [15] module based on a small set of extraction rules. Named entityextraction could be achieved, for example, by identifying the textual spans containing a referenceto a manufacturer of cell phones (e.g. Nokia, Apple), listed in a gazetteer, followed by a number(e.g.: 5240, 3310).

Page 30: Sentiment Analysis for the Italian language

18 CHAPTER 2. SENTIMENT ANALYSIS: CHALLENGES, SOLUTIONS AND TASKS

Table 2.1: List of polarity conveying terms collected by two human experts in [63].

Human 1

Positive dazzling, brilliant, phenomenal, excellent, fantasticNegative suck, terrible, awful, unwatchable, hideous

Human 2

Positive gripping, mesmerizing, riveting, spectacular, coolawesome, thrilling, badass, excellent, moving, exciting

Negative bad, cliched, sucks, boring, stupid, slow

evaluated: words has been chosen according to a preliminary examination of thefrequency counts characterizing the test set. The second list, shown in table 2.2,has the same size of the previously described lists but, at the same time, presentssome particularities. Terms whose semantic is not directly bearing a polarity orien-tation have been included as an indicator of positive (still) or negative orientation(question and exclamation mark).

Although such textual entities would probably not have been proposed by humanexperts, their usage as polarity indicators arises from the statistical analysis of termfrequency applied to the test corpus. The accuracy achieved by the latter describedlist of keywords is almost 70%.

Table 2.2: List of polarity conveying terms collected by human expert and statisticanalysis of document corpus in [63].

Positive love, wonderful, best, great, superb, still, beautifulNegative bad, worst, stupid, waste, boring, ?, !

Pang et al. showed that polarity classification based on keyword identificationcould be outperformed by adopting machine learning based approach. More specifi-cally, by using unigram models and Naıve Bayesian classifiers, accuracy over 80% hasbeen achieved. However such accuracy, even if better than performance achievedwith keywords identification, in still lower than performance expected in typicaltopic-based binary classification [72].

But why is the sentiment classification problem harder than the traditional topic-based binary classification, considering, in particular, that “positive” and “negative”classes are so semantically different each other?

Page 31: Sentiment Analysis for the Italian language

2.2. CHALLENGES IN SENTIMENT ANALYSIS 19

The most significant difference with topic classification of textual entities, asstated in [82], [63], [71] and others, is that “sentiment can often be expressed ina more subtle manner, making it difficult to be identified by any of a sentence ordocument’s terms when considered in isolation” [62].

In order to better understand how sentiment orientation could be expressed with-out requiring the presence of specific polarity bearing terms, consider the followingexamples in both English and Italian languages 2:

• “If you are reading this because it is your darling fragrance, please wear it athome exclusively, and tape the windows shut.” The example has been takenfrom the perfume review domain; it clearly expresses a negative orientation,although no ostensibly negative words occur.

• “She runs the gamut of emotions from A to B.” Even in this example, takenfrom the movie review domain, no ostensibly negative words occur but, in fact,the author expresses a strongly negative opinion.

• “Everytime I read ‘Pride and Prejudice’ I want to dig her up and beat her overthe skull with her own shin-bone.” This example, extracted by a review writtenby Mark Twain about Jane Austen’s books, expresses a strongly negativeopinion.

• “La nuova Fiat Punto rappresenta l’anello di congiunzione fra il carro bestiamedel secolo scorso e le automobili tedesce del giorno d’oggi.” In this example,taken from the car review domain, a strongly negative opinion is expressedalthough no ostensibly negative words occur.

• “Certamente un pomeriggio con la suocera puo rivelarsi piu entusiasmante diquesta sceneggiatura.” Another example, taken from the movie review domain,where a negative opinion is expressed. In this particular review sentiment issubtle and very difficult to understand. In fact irony is used to convey thenegative opinion the author wants to express about the plot of the movie.She/he compares the time spent seeing the movie with the time spent withthe mother-in-law, which is usually associated with negative moments andfeelings.

• “Il nuovo Nokia N8 e una bomba, l’ultima ancora di salvataggio per il bilanciodell’azienda finlandese.” In this example a positive opinion arises; more specif-ically the author wants to express how the appeal of the described productcould attract many new customers. Even in this case no reference to positiveterms is present in the review. Moreover words like “bomba” (bomb) and“ultima” (last) are used to convey a positive polarity, even if both, in their

2English examples have been extracted from [62], while the Italian examples have been manuallyextracted during the preliminary analysis of the corpora described in Chapters 3 and 4.

Page 32: Sentiment Analysis for the Italian language

20 CHAPTER 2. SENTIMENT ANALYSIS: CHALLENGES, SOLUTIONS AND TASKS

English version, have been classified, by using SentiWordNet [25, 26, 27, 28],as strongly objective and slightly oriented to negative polarity.

Provided examples show how polarity orientation could be conveyed withoutrequiring opinions, even if strongly oriented, to be associated with specific keywordsor phrases.

Moreover, as stated by Kim and Hovy in [43], another issue affecting classificationof polarity is the difficulty in recognizing and distinguishing objective and subjectiveparts of a given text. According to the authors, this task reveals itself as particularlydifficult for human classifiers too; more specifically they express their doubts byclaiming that “human annotators often disagreed on whether a belief statement wasor was not an opinion”.

The conclusions expressed by Kim and Hovy has not been widely accepted bythe research community. In [74] [75] authors present their results based on manualidentification of opinionated sentences and their respective polarities in 24 differentdocuments (13 for the study A and 11 for the study B, performed by the same expertstwo months later). The experimental activity shows an inter-evaluator agreement of83% for study A and 85% for study B, that outperforms results presented by Kimand Hovy.

Even objective sentences of a text, indeed, could provide opinions and polarityclues; even “facts”, strongly objective sentences, do not guarantee the absence ofopinion. In order to clarify this statement, consider the following examples writtenin both English and Italian language:

• “I must familiarise my mind with the fact that Miss Austen is not a poetess.I must “learn to acknowledge her as one of the greatest artists, of the greatestpainters of human character, and one of the writers with the nicest sense ofmeans to an end that ever lived.”

• “Il Nokia N8 non e un comune telefono. Il suo schermo da 3.5’ e la connet-tivita wireless Wi-Fi / 3G lo rendono un vero computer portatile. La batteriagarantisce oltre 6 ore di autonomia.”

In both examples strong opinion is expressed by both objective and subjective sen-tences at the same time. Consider the second example, extracted from the cell phonereview domain; it is constituted by 3 different sentences, which could all be classifiedas facts conveying only objective information (e.g.: the first sentence, “Il Nokia N8non e un comune telefono” is an example of a fact, providing the definition ofan entity, a cell phone). However, although constituted by objective sentences, thetext expresses a positive opinion, in particular it shows that the described productprovides a wider set of capabilities not available in similar products, leading to anadvanced and unique product.

Similarly, the example shows how the sentence “the fact that” does not neces-sarily guarantee the objective truth of what follows it. Objective sentence “Miss

Page 33: Sentiment Analysis for the Italian language

2.2. CHALLENGES IN SENTIMENT ANALYSIS 21

Austen is not a poetess” provides emphasis to following sentences, by augmentingtheir orientation; the sentence, in other terms, is not aimed at describing a real fact,but at conveying an opinion in a more suitable or elegant way.

In addiction to difficulties in recognizing sentences, which could be classified asopinions, another issue, widely discussed in literature, is the identification of opinionholder and opinion object (also referred as appraiser and appraised in [87, 88]). Thisissue has been studied, in particular, in [47, 56, 80, 57] in order to identify the holderof an opinion in transcriptions of political debates.

The general notion of positive and negative opinions is consistent across differ-ent domains, as described in Section 2.1; however sentiment and subjectivity of agiven text depend, as shown by previous examples, from the context where the textis located. This assumption can be generalized at domain level: in each domaindifferent terms can be used to convey the same opinion and polarity or, on the otherhand, same terms can convey different semantics in different domains and contexts[62].

The following examples describe how some terms or sentences whose polaritychanges across different domains:

• “Go read the book” [62]. This simple sentence clearly expresses a positiveopinion when concerning a book review. Same sentence, however, expresses aquite negative sentiment when used in the review of a movie. Same sentence,thus, could be used to express completely opposite opinions and sentiments indifferent contexts or domains.

• “La memoria RAM installata a bordo e di 512 MB”. This simple sentencepresents both the previously described issues: it is a fact, expressing an objec-tive information about a product, but provides, at the same time, an implicitopinion concerning the quality of the described product. Moreover it assumesdifferent orientations when applied to different domains. For example it coulddescribe an high level cell phone, assuming a positive orientation, in the cellphone domain, while it could assume a quite negative orientation when used inthe computer domain, where 512 MB is considered, nowadays, a poor amountof RAM.

This phenomenon is more frequent when documents contains off-topic or cross-topic sections, for example when the author moves to a different domain dependentvocabulary, inside the body of the review.

Last issue, which affects Sentiment Analysis, concerns the importance of model-ing the structure of the discourse expressed by the author of a text. In traditionaltext categorization the order in which different subjects are presented is not im-portant; terms, which occur relatively frequently in the text concur in determiningthe topic the the document. In Sentiment Analysis the order in which opinionsare presented influences the polarity expressed by the document; same sentences

Page 34: Sentiment Analysis for the Italian language

22 CHAPTER 2. SENTIMENT ANALYSIS: CHALLENGES, SOLUTIONS AND TASKS

in a different order could lead to a completely opposite overall sentiment polarity.Following examples provide a better understanding of described phenomenon:

• “This film should be brilliant. It sounds like a great plot, the actors are firstgrade, and the supporting cast is good as well, and Stallone is attempting todeliver a good performance. However, it can’t hold up.” The orientation ofthe text, except for the last sentence, is clearly positive; in fact the presence ofseveral positive indicators like “brilliant”, “great”, “first grade”, “good”supports this hypothesis. However the overall sentiment, which is clearly nega-tive, is provided only by the last sentence. The last sentence, in fact, is crucialfor determining the overall polarity of the review but, at the same time, doesnot provide any explicit negative polarity indicator.

• “Il cambio e preciso e silenzioso, lo sterzo pronto alle sollecitazioni. L’elettronicadi bordo non delude. Tuttavia il prodotto finale e ancora troppo arretratorispetto ai diretti concorrenti sul mercato.” Similarly to the previous example,in this review extracted from the car domain the overall negative polarity isexpressed by the last sentence. First sentences, in particular, provide a positiveopinion about the car under description. This issue as been described by Tur-ney in [82] as the problem of identifying and distinguishing sentences providingopinions concerning the whole from sentences providing opinions concerningsingle elements. This issue affects, in particular, specific domains, as provenby Turney: “good beaches do not necessarily add up to a good vacation. Onthe other hand good banking services add up to a good bank”.

2.3 Sentiment Polarity Classification

The sentiment polarity classification is the task aimed at classifying the opinion ex-pressed by an opinionated text as “positive” or “negative” or at locating its positionon the continuum between these two polarities [62].

Polarity classification of opinionated texts could improve the effectiveness ofseveral activities based on the analysis of large amount of textual data, the mostimportant one being Business Intelligence; in fact, as described in Chapter 1, polar-ity classification has been exploited in literature in order to improve the effectivenessof several applications. In [22] polarity classified sentences have been used to de-fine novel sentiment information retrieval models in the framework of probabilisticlanguage models, aimed at improving the accuracy of polarity-oriented retrieval.

Sentiment polarity classification could be used to refer broadly to binary catego-rization (e.g. opinion expressed by text A is classified as “positive” or “negative”),to regression (e.g.: opinion expressed by text A is classified as “2” in a scale between”0”, which represents an “extremely negative” opinion and ”10”, which representsan “extremely positive” opinion), or to ranking (e.g.: opinion expressed by text Ais more positive than opinion expressed by text B on the same topic).

Page 35: Sentiment Analysis for the Italian language

2.3. SENTIMENT POLARITY CLASSIFICATION 23

Sentiment polarity classification, in both its binary or regression formulation, isusually based on two opposite classes, like “positive and “negative” or “like” and“dislike”, whose semantic is quite clear. However such dichotomies used in classi-fication could also assume different nuances, for example in the field of applicationof politic, as in [56] [57] and [80], where authors are interested in determining if anopinion holder supports or not the topic under discussion during a debate. Or in[44], where authors are interested in predicting which party will win an election bylooking at informal opinions left by users on an election prediction website. Theauthors evaluated a prediction accuracy of 81.68%, by adopting a classification ap-proach based on the SVM method, properly improved by integration with a noveltechnique, which generalizes n-gram feature patterns.

In fact all described variants to the standard sentiment polarity classificationactivity could be exploited by mean of similar machine learning tasks, such as NaıveBayesian and SVM classifiers.

In [41] authors focused on determining the reasons why a product is liked or notliked by reviewers. More specifically the work is aimed at identifying and classifyingwhich expression of a review describes “pro” and “con” of a given product, like, forexample:

• “The battery life of this laptop is only 2 hours long.”

• “La tastiera e affidabile e poco rumorosa.”

Authors applied a Maximum Entropy approach; more specifically, in order toeasily deal with a multi-class classification problem (sentences can express “pro” -PR, “con” - CR or no reason - NR), a two-step binary classification approach hasbeen used: first classifier is aimed at distinguishing between CR or PR sentences andNR sentences, which are not relevant in pro and con extraction. Second classifier,indeed, performs extraction of CR and PR sentences. The ability to extract theelements, which represent “pro” and “con” in a given text, leads to a significantimprovement to Sentiment Analysis: the ability to evaluate the agreement betweenthe overall rating expressed by a reviewer and the effective contents of his/her review.Such analysis could be used, for example, to determine the reputation of a reviewerand, consequently, the trust about her/his review.

Although most of the polarity orientation of a given text is provided by sub-jective (opinionated) contents, sentiment polarity classification could be applied toobjective texts too. Some of the examples, which have been reported and describedin Section 2.2 could be considered in order to support the importance of applyingsentiment polarity classification to objective information too. The following sen-tences represent two further examples aimed at providing a better understandingof how objective information could help in determining the sentiment polarity of agiven text:

• “The Nokia N8 has got a large and brilliant display, with a resolution of 320per 480 pixels.”

Page 36: Sentiment Analysis for the Italian language

24 CHAPTER 2. SENTIMENT ANALYSIS: CHALLENGES, SOLUTIONS AND TASKS

• “Il Nokia N8 ha uno schermo ampio e brillante, con una risoluzione di 320 per480 pixel.”

Both sentences are objective, they express a fact about the capabilities of thedisplay; such information, however, could reveals itself useful in determining theoverall polarity of the product. In [45] a sentiment classification activity applied toobjective texts is described: authors are interested in developing a novel predictionmodel, able to predict the trends of a public company stock with respect to a setof news concerning the company itself. In order to evaluate the accuracy of theproposed prediction model the Multex Significant Developments corpus, constitutedby more than 12.000 news, has been used as testing set. The predicted results havebeen compared, for each company, with the real trend the company stocks describedduring the same period. Evaluated accuracy spans from 70,3% to 52%, dependingon the different parameters and labelling methods considered by the experiment.

The overall polarity orientation of a given text could also be determined bythe presence of comparative sentences, like “Canon EOS optics are better thanthose of Sony and Nikon”, which represent a relationship between different opinionsexpressed across the same document or across different documents by the sameauthor. Given the previous example and an opinionated text written by the sameauthor about the optics of a Nikon digital camera, previously classified as positivelyoriented, we could plausibly assume that evaluation expressed about Canon EOSdigital camera would be positively oriented too.

This particular problem has been investigated in [37] and [36]: different super-vised learning approaches as been exploited in order to identify and extract compar-ative sentences in a domain dependent environment. Results have been evaluated onthree different domains: news articles, consumer reviews of products, and Internetforum postings.

2.3.1 Sentiment Polarity Regression

Sentiment polarity classification could be generalized from a binary classificationproblem to a multi-class classification problem, where ratings assigned to a text inorder to describe the polarity it expresses, represent classes. Multi-class classificationbased on ordinal ratings is in fact a form of ordinal regression classification.

Moving towards a multi-class classification problem improves the effectivenessof the classification activity, by providing a more detailed rating schema(e.g.: doc-uments could be classified by using classes, which span from “extremely positive”to “extremely negative”). At the same time, however, it affects the accuracy of thetrained classifiers by improving their complexity.

An interesting property of the multi-class reformulation of the sentiment polarityclassification activity is represented by the following observation: although eachclass representing a specific rating is characterized by a specific vocabulary (theset of keywords, which could be used to infer that the polarity expressed by a

Page 37: Sentiment Analysis for the Italian language

2.4. OPINION MINING 25

text matches the class), texts containing a mixture of terms from opposite classescould be assigned to a third class. Consider, for example, a classification problembased on three different classes: the “positive” class, the “negative” class and the“neutral” class. Neutral texts could fall into neutral class because they containmultiple references to neutral terms (e.g.: “normal”, “standard”, “mediocre”) orbecause they contain a mix of terms from both positive and negative class, whosemixture leads to an overall neutral polarity.

The neutral class is a critical class in sentiment polarity classification, whosesemantic could be particularly subtle. For this reason, as described in Chapter 3, ithas not been considered in this thesis. Neutral class, in fact, could represent, at thesame time, three different situations:

1. the text does not express any information concerning its polarity orientation;

2. the text includes both positive and negative opinions, mitigating each otherwithout leading to a clear opinion;

3. the text explicitly expresses a neutral opinion: the author wants to express anopinion, which could not be classified neither as positive or negative.

Another aspect of neutral class has been observed in [12]; the authors show howneutral comments are usually perceived as slightly negative by users. accordingto the authors, who based their work on the observation of dynamics of sellers’reputation on eBay, the effects of a neutral feedback are similar to the effects of anegative feedback.

Cabral’s assumption has been further corroborated even in our work, as describedin both Chapters 3 and 4: most of neutral comments we collected in our experimentalactivity could be seen as negative feedbacks, soften according to social influences,such shame (e.g.: don’t let the others know I did a bad affair buying a specificproduct) and fear (e.g.: according to [12] “a buyer leaving a negative comment hasa 40% chance of being hit back, while a buyer leaving a neutral comment only hasa 10% chance of being retaliated upon by the seller.”).

2.4 Opinion Mining

In Section 2.3 we have seen how most of the works described in literature, whichare aimed at sentiment polarity classification, are based on the assumption thatopinionated texts are provided as input. The importance of deciding if a givendocument contains subjective information has been summarized by Mihalcea in [52]:

“the problem of distinguishing subjective versus objective instanceshas often proved to be more difficult than subsequent polarity classifica-tion, so improvements in subjectivity classification promise to positivelyimpact sentiment classification.”

Page 38: Sentiment Analysis for the Italian language

26 CHAPTER 2. SENTIMENT ANALYSIS: CHALLENGES, SOLUTIONS AND TASKS

The role of adjectives as carriers of orientation and, moreover, their effects onsentence subjectivity have been examined by Hatzivassiloglou and Wiebe in [34].Authors are interested in determining if a given sentence is subjective by analyzingthe adjectives it contains. Several projects, described in detail in [91], explore toproblem of determining the subjectivity of a given text, sentence or sub-sentence indifferent domains, like in [92],[90],[98], and [93]. Moreover in [91] a comprehensivesurvey of subjectivity recognition using different clues and features is provided.

Wilson [94] addresses the problem of determining clause-level opinion strength(e.g.: “how mad are you?”). In particular the author shows how the problem ofdetermining opinion strength is different from inferencing the polarity of an opinion.A text classified as a neutral opinion by polarity classification is not necessary anobjective text, where a clear opinion is missing: it can convey a strong “mediocre”opinion or it can describe some aspects of the product as positives and some othersas negative, mitigating each other.

Subjectivity detection and ranking at the document level is a task derived fromgenre classification, the process devoted to infer the genre of a given text. In [98] theauthors obtained a high accuracy (97%) with a Naive Bayes classifier on a testing setconstituted by articles from the Wall Street Journal. The authors aimed at properlydiscriminating between three classes of articles: News and Business (facts), Editorialand Letter to the Editor (opinions), as previously performed in [92] and [91].

2.5 Affect computing

Another research field related with the Sentiment Analysis domain is represented bycomputational affect (also defined as affect analysis). Computational affect movestowards the identification and extraction of opinionated pieces of text and the sub-sequent sentiment polarity classification, focusing on the identification of specificemotions appearing in the text as part of an opinion.

Most of the research works are inspired by the six universal emotions describedby Ekman in his study on the possible expressions of the human face [23]: anger,disgust, fear, happiness, sadness, and surprise. In [49],[48] a subset of the Open MindCommon-sense Corpus is used to generate four different affect classification models,based on the six emotions described by Ekman. Input text is split in sentences andfor each sentence the affective classification is performed; several techniques havebeen introduced, such as analysis of the global mood at document level, to smooththe transition between sequent sentences.

Figure 2.3 shows an example of the system developed in [49],[48] called Empathy-Buddy email browser; the system is aimed at representing in real time the affectivequality of the text being typed by the user by means of Chernov-style face feedback.

Another novel approach described in [49],[48] is the ability to define and toextract patterns representing meta-emotions : complex emotions, such as frustration,relief of horror, which could be represented as a mix of the six basic emotions.

Page 39: Sentiment Analysis for the Italian language

2.6. MULTILINGUAL SENTIMENT ANALYSIS 27

Figure 2.3: The EmpathyBuddy email agent [49, 48] in action.

Frustration, for example, is defined as the repetition of words expressing anger witha low strength.

Affect computing has been moreover investigated by Valitutti [86]: authors de-veloped WordNet-Affect, a subset of the WordNet resource labelled by means of ataxonomy of 11 categories representing affective concepts, defined as a-labels. A setof 1,314 WordNet synsets, including 3,340 different terms, has been labelled withthe set of pre-defined a-labels.

WordNet-Affect has been initially developed by manually labelling a set of morethan 1,900 terms selected from different resources, like dictionaries. Labelled termshave been linked to their respective WordNet synsets, each one with an associatedframe of related information (e.g. Italian and English version, a-label).

WordNet hierarchy has been exploited in order to identify new affective synsetsnot included into the WordNet-Affect core; following relations between synsets havebeen investigated according to the assumption that they preserve the affective mean-ing of the related synsets: antonymy, similarity, derived-from, pertains-to, attributeand see-also.

2.6 Multilingual Sentiment Analysis

Most of the research activities on Sentiment Analysis available in literature arefocused on documents written in the English language; in fact most of the availableresources required in Opinion Mining and Sentiment Analysis, like lexicons andmanually labelled corpora, are easily available only for the English language.

The lack of linguistic resources is described as a critical issue in most of theresearch experiences concerning Sentiment Analysis of non English languages. Inthis thesis, as described in Chapters 3 and 5, many experimental activities have beenexploited in order to deal with the lack of available resources for the Italian language.

Page 40: Sentiment Analysis for the Italian language

28 CHAPTER 2. SENTIMENT ANALYSIS: CHALLENGES, SOLUTIONS AND TASKS

The development of new tools and resources for a foreign language requires severalyears of work; for this reason we focused more on supervised and unsupervisedmachine learning approaches instead of more complex analysis tools, like parsers orinformation extraction engines.

Sentiment Analysis techniques applied to foreign languages have been investi-gated in [79] and [39] for the Japanese language, in [35, 99, 100] for the Chineselanguage, in [1] for the Arabic language and in [43] for the German language.

Many researchers investigated novel methods to automatically generate resourcesrequired in Sentiment Analysis for a new language: lexical resources already definedfor the English language have been projected to the target languages by means ofdifferent cross-lingual projection strategies. In [52] both bilingual dictionaries andparallel corpora have been investigated in order to exploit sentiment analysis for theRomanian language.

The simplest strategy devoted to cross-lingual Sentiment Analysis is automaticmachine translation [7]: during pre-processing activities input text is translated inEnglish and, subsequently, classified by means of a Sentiment Analysis classifier.Nine different languages have been investigated, by means of the state-of-the art au-tomatic translation system, the WebSphere Translation Service developed by IBM.Authors show how Sentiment Analysis applied to automatically translated texts per-forms in a consistent way across different languages. Moreover authors show howthe proposed approach could be generalized from the exploited translation service.

Another interesting result presented in [7] is constituted by the analysis of cross-cultural orientation of automatically annotated terms: languages like English andItalian are the most biased languages towards negative sentiment orientation, whileKorean language is the most biased languages towards positive sentiment orienta-tion.

In [52] the subjectivity lexicon used in the OpinionFinder system [89] is trans-lated in Romanian language by using both an authoritative and a web based English-Romanian dictionary containing, respectively, 41.500 and 4.500 entries. Opinion-Finder lexicon is constituted by expressions constituted by one or more words, la-belled according to their subjectivity and strength.

Several issues have been faced in order to properly perform cross-lingual projec-tion, such as word lemmatization, resolution of translation ambiguities and, finally,translation of multi-word expressions. Multi-word expressions have been translatedword-by-word from the English language to the Romanian language and validated bycounting the number of occurrences of the translated expressions on the AltaVistasearch engine. Authors evaluated, by performing manual annotation of 150 sampleexpressions extracted from the generated lexicon, that subjectivity clues tend to beless reliable in the target language. In other words part of the subjectivity is lost intranslation and such issue is stronger on weekly subjective expressions.

In addition to lexicon translation in [52] corpus based cross-lingual projection hasbeen investigated, in order to overcome the limitations observed in lexicon transla-tion: authors translated a set of 107 English documents in Romanian and manually

Page 41: Sentiment Analysis for the Italian language

2.6. MULTILINGUAL SENTIMENT ANALYSIS 29

annotated them in order to be used as gold standard in evaluation. English docu-ments have been automatically annotated by means of the OpinionFinder system;annotations have been projected to the corresponding Romanian texts. A Bayesclassifier has been trained on the translated sentences and evaluated; authors showhow corpus based cross-lingual projection performs better than lexicon translation.

According to the authors corpus based cross-lingual projection improves theeffectiveness of lexicon translation: the context in which a text is used in the originallanguage could reduce its ambiguity on the new language.

Cross-lingual projection allows researcher to investigate Sentiment Analysis innew languages without requiring large linguistic resources or complex analysis toolsto be developed. On the other hand cross-lingual projection presents several is-sues, including translation ambiguities and cross-cultural differences, like irony, noteasily projectable on different languages. Such issues affect, as proven by describedresearch experiences, the effectiveness of both Sentiment Analysis and Opinion Min-ing activities, leading to lower performances with respect to the English language.

Page 42: Sentiment Analysis for the Italian language

30 CHAPTER 2. SENTIMENT ANALYSIS: CHALLENGES, SOLUTIONS AND TASKS

Page 43: Sentiment Analysis for the Italian language

Chapter 3

A Supervised Approach to OverallOpinion Polarity Analysis

Abstract

In this Chapter we present a new supervised approach for evaluating the over-all opinion polarity (OvOP) of a set of documents written in Italian language.The proposed method is based on two different supervised learners: NaıveBayes classifier and SVM classifier. The set of features that has been intro-duced in order to represent the input documents includes several contributionspreviously presented in literature for the English language. We tried, duringthe experimental activity, to evaluate and to adapt the set of selected featuresto the specific context of the Italian language. Collected results provide agood evaluation of the effectiveness of our approach for the Italian language.

3.1 Introduction

The amount of product reviews freely available online is facing a continuos growthsince the last ten years: Internet has become the best mean of communication usedby people to express their opinions about every kind of product and services. At thesame time Internet represents, nowadays, the most valuable source of information,and more specifically subjective information, for each customer interested in buyinga product.

It is interesting to notice how product reviews can be used in different ways byusers, according to their specific information needs: for example, as described in [71],a customer that is already interested in a certain product may want to read somenegative reviews just to pinpoint possible drawbacks, but has no interest in spendingtime reading positive reviews. In contrast, customers interested in watching a goodmovie may want to read reviews that express a positive overall opinion polarity.

Page 44: Sentiment Analysis for the Italian language

32CHAPTER 3. A SUPERVISED APPROACH TO OVERALL OPINION POLARITY ANALYSIS

However the abundance of such reviews may reveal itself, at the same time,an issue; reading each review in order to evaluate its subjectivity and, moreover,the polarity it conveys is a complex and time-consuming task. Tools aimed atautomatically determining the polarity of a given review according to its contentsare required; for each review a polarity value, expressed as positive or negative, isrequired in order to represent the orientation it bears. Such value is defined OverallOpinion Polarity (OvOP): the polarity that is assigned to the opinion expressedin a document seen as a whole. The process aimed at determining the OvOP ofa document is referred ad OvOP analysis (also referred as OvOP classification orOvOP Identification [71]).

OvOP, according to the taxonomy introduced in Chapter 1, is classified as asubtask of Sentiment Analysis with a document level granularity. In this Chapter wedesign and implement a set of binary supervised classifiers aimed at distinguishingbetween positive and negative oriented product reviews. For example, given thefollowing user generated review concerning an MP3 player:

Il prezzo di questo oggetto e attualmente di circa 130 euro, ed ildesign e le funzionalita lo rendono un ottimo regalo per tutti. A mepiace molto soprattutto per le sue funzionalita ma lo apprezzo moltoanche per la durata della batteria, che garantisce ben 12 ore di musicain riproduzione continua.

we are interested in automatically identifying that the OvOP expressed by the doc-ument is positive.

3.2 Related Work

3.2.1 Turney

In [82] the author presents an unsupervised algorithm aimed at classifying reviews as”recommended” or ”not recommended” based on the average semantic orientationof sentences. Semantic orientation of a sentence, described in detail in Section 5.2.2,is defined as the mutual information between the phrase and the word “excellent”minus the mutual information between the phrase and the word “poor”. Mutualinformation is calculated according to the PMI-IR metric, introduced in [83], basedon the number of hits from the AltaVista search engine, by using the NEAR operatorin query formulation. The semantic orientation of a sentence indicates its polarity(positive vs. negative) and, at the same time, the strength of the opinion it conveys.

Sentences are extracted from reviews by using a set of five extraction rules basedon POS tagging; extraction rules are aimed at identifying occurrences of adjectivesand adverbs, when used together or in association with common and proper names.The semantic orientation of a document is calculated as the average semantic ori-entation of its phrases.

Page 45: Sentiment Analysis for the Italian language

3.2. RELATED WORK 33

The algorithm has been tested on 410 product reviews covering different domains,including a set of 120 reviews concerning movies. 59% of the reviews constitutingthe test set are positive; the results show an average accuracy of 74,39% acrossthe different domains. The movie domain presents the lowest performance, with anaccuracy of 65,83%; such result is in contrast with average accuracy obtained forthe car domain (84,00%) and for the bank domain (80,00%).

The author shows, in particular, how its domain independent approach, presentssome limitations when different words can be used to convey different orientationsin different domains. For example the word ”unpredictable”, which is consideredpositive in a movie review context (”unpredictable plot”) but negative in a carreview (”unpredictable steering”).

3.2.2 Pang et al.

In [63] the authors present the first application of machine learning based techniquesaimed at determining the OvOP of a set of movie reviews, a domain already exploitedin [84] by means of the Pointwise Mutual Information approach. Movie reviewscollected from the web are classified as positive or negative according to the ratingindicator assigned by their authors; in particular star rating provided by reviewersto summarize the sentiment of each review has been considered. The corpus ofmovie reviews collected by the authors represents the Gold Standard for evaluationof OvOP classification systems; many works, including [71], use the corpus in orderto evaluate the performances of the proposed approaches.

The first assumption proven by the authors as wrong is based on the idea that afew words expressing strong sentiment are enough to classify documents accordingto their OvOP. In order to prove the assumption a list of words bearing polarityorientation has been compiled by two different human annotators. Classificationbased only on the presence of identified words presents, however, relatively poorresults, varying from 55% to 65%, partly due to the low coverage offered by the listof words collected by human annotators, each limited to 20 terms. Results showhow sentiment-carrying words are not enough to perform sentiment classification ina proficient way.

A third list constituted by 7 positive and 7 negative words and symbols (excla-mation and question marks are included in order to describe negative documents)has been collected by looking at the polarity clues, occurring most frequently in theinput corpus. The list has been used for OvOP classification, leading to an accuracyof 69%.

A machine learning approach is defined in order to improve the effectiveness ofthe OvOP classification process. Each document is represented as a feature vector;the set of features used in each experimental activity is reported in Table 3.1. Threedifferent learning method for OvOP classification are exploited: Naıve Bayes (NB),Maximum Entropy (ME) classification and Support Vector Machines (SVM).

The set of unigrams used in feature representation has been filtered in order

Page 46: Sentiment Analysis for the Italian language

34CHAPTER 3. A SUPERVISED APPROACH TO OVERALL OPINION POLARITY ANALYSIS

Table 3.1: Average three-fold cross-validation accuracies achieved by Pang et al. in[63].

Features # of features Freq. or Pres. NB ME SVM(1) unigrams 16165 freq. 78,7 N/A 72,8(2) unigrams ” pres. 81,0 80,4 82,9(3) unigrams+bigrams 32330 pres. 80,6 80,8 82,7(4) bigrams 16165 pres. 77,3 77,4 77,1(5) unigrams+POS 16695 pres. 81,5 80,4 81,9(6) adjectives 2633 pres. 77,0 77,7 75,1(7) top 2633 unigrams 2633 pres. 80,3 81,0 81,4(8) unigrams+position 22430 pres. 81,0 80,1 81,6

to include only those terms occurring 4 or more times in the corpus; similarly theset of bigrams has been limited to items with 7 or more occurrences. A simpleimplementation of negation is also implemented in order to represent the effects ofpolarity inversion determined in a text by negation. Results provided by the authorsshow how accounting for feature presence leads to a better performance with respectto classification based on feature frequency. This result seems in a different directionwith respect to previous works dealing with topic based classification, where termfrequency is described as a good representation feature, like the TF-IDF model.The best results are produced using SVMs and unigram presence features (82,9% ofaccuracy) and unigram presence combined with part-of-speech information (81,9%of accuracy).

Authors, moreover, show how adjectives are not, in the analyzed domain, thefeatures leading to the best results; previous works like [33] and [83] focused on therole of adjectives as carriers of most of the OvOP of a given document.

The results presented by the authors show a higher level of accuracy with respectto similar experiences in the area of OvOP classification; in fact they show howmachine learning significantly improves the performance of the OvOP classificationprocess, even if it is a more challenging task than topic classification. OvOP isexpressed, according to the authors, in a more subtle way and cannot be easilyidentified simply by looking at keywords appearing in the document.

3.2.3 Dave et al.

Another example of machine learning applied to OvOP classification of productreviews is presented in [19]: authors describe a set of pre-processing activities aimedat improving the effectiveness of feature representation models originally proposed in[63]. Moreover authors evaluate their approach on several sets of domain dependentreviews, including electronic goods, books and music. Test sets vary both in size

Page 47: Sentiment Analysis for the Italian language

3.2. RELATED WORK 35

(from 139 reviews for the domain of entertainment laptops to more than 14.000reviews regarding digital cameras), ratio between number of positive and negativereviews and source (reviews have been collected from both C—Net and Amazonwebsites).

Several substitutions have been evaluated in order to select a proper set of fea-tures suitable for document representation: metadata substitutions have been usedto identify numbers appearing in the reviews. Each occurrence of a number, forexample, is replaces with the NUMBER token. Substitutions are aimed at overcom-ing the variations and dependencies of language. However statistical and linguisticsubstitutions, based on term frequency and POS tagging, do not provide any sig-nificant improvement to the accuracy of OvOP classification with respect to thebaseline represented by results described in [63].

N-grams with different length have been exploited in addiction to several smooth-ing and scoring functions have been exploited: none of them leads to significant im-provements. In fact the best results have been obtained by coupling product namesubstitution with a substring generation algorithm: an accuracy of 85,3% accuracy,which represents a small but significant improvement to document representationbased exclusively on bigrams.

Authors assert that accuracy of OvOP is partly dependent from the inconsistencyin rating, characterizing training data: different users assign in a personal way thesame rating indicator to documents containing differently oriented features. Sparsityof data and skewed distribution of input documents can also be seen as issues forOvOP: although negative reviews are generally longer, they are written in a morevaried language and therefore are harder to classify.

3.2.4 Salvetti et al.

In [71] the problem of OvOP classification of the movie review corpus collected in[63] as been deeper investigated. In particular authors evaluate the effects of lexicalpreprocessing and filetring on the accuracy of two statistical classifiers trained onmovie reviews. Two lexical filtering strategies based, respectively, on WordNet andon POS tags, have been evaluated with Naıve Bayes and Markov Models as learningmethods.

Each document is POS tagged using the Brill Tagger[11]: each term is replacedby the pair {POS tag, lemmatized form}. Authors, by analyzing movie reviews,show that even most positive reviews have portions with negative polarity or no clearpolarity at all: the presence of parts with conflicting polarities or lack of polaritywithin a review can be seen, according to the authors, as one of the major obstaclesfor accurate OvOP.

In order to reduce noise introduced by terms not contributing to the OvOP ofa given document, a set of filters based on POS tagging has been defined. Parts ofspeech that are least likely to contribute to the polarity of a review, like prepositions,are removed from the document representation, while proper nouns are substituted

Page 48: Sentiment Analysis for the Italian language

36CHAPTER 3. A SUPERVISED APPROACH TO OVERALL OPINION POLARITY ANALYSIS

with a common placeholder, in order to limit the unnecessary variance and sparsitythey introduce to document representation. Several combinations of the describedfiltering rules have been evaluated.

Another kind of filters introduced by the authors is based on WordNet; thecorresponding synset in WordNet is assigned to each term. Ambiguity affecting pol-ysemic terms is handled by always choosing the first synset proposed by WordNet;each synset, moreover, is generalized according to the hypernymy relation and re-placed by its ancestor along the WordNet hierarchy. Synsets with same ancestorsare collapsed to the same synset.

Both POS and WordNet filtering do not provide improvements in term of accu-racy with respect to the use of all POS tags as features in document representation;an accuracy of 80,5% has been obtained using NB, while an accuracy of 79,5% rep-resents the best performance of the classifier based on Markov models. However,both filtering methodologies present significant improvements in performance, withrespect to the baseline, when applied to small training sets. Even in this work theassumption that adjectives convey most of the polarity of a given document hasbeen proven to be partly wrong; in particular performance achieved by consideringonly adjectives in OvOP classification is between 3 and 7% lower with respect tothe use of all available POS tags as features.

3.3 The SENT-IT Framework

In this Section we present in detail the SENT-IT framework, developed during thePhD activities described in this thesis. It has been designed and implemented inorder to support, by means of a flexible and reusable platform, the development andevaluation of new methodologies for Sentiment Analysis on documents written inItalian language. All the models for OvOP analysis described in this thesis have beenimplemented using the SENT-IT framework; we executed hundreds of experimentaltests using the testing capabilities provided by the framework.

The SENT-IT framework provides researchers the capability to easily definenew features suitable for document representation and to reuse existing ones. In thesame way researchers can easily plug into their experimental activity several differentclassification methodologies and algorithms included into the SENT-IT framework.The SENT-IT framework includes, as well, a toolbox for linguistic processing oftexts written in Italian language, integrated from existing libraries or developedfrom scratch.

In order to cope with described requirements, the whole framework has beenmodeled around the concepts of experimental suite and experimental task. An ex-perimental task is defined as the whole set of activities required to determine theOvOP of a set of input documents: it includes document analysis, document repre-sentation, OvOP classification and aggregation of obtained results. The SENT-ITframework is responsible for the execution of experimental tasks defined by users; it

Page 49: Sentiment Analysis for the Italian language

3.3. THE SENT-IT FRAMEWORK 37

automatically generates the output of the OvOP classification process given a cor-pus, a set of execution parameters (e.g.: number of concurrent threads, location ofinput data, et al.) and the list of activities to be performed. Experimental activitiesdescribed in this thesis have been implemented, indeed, as an experimental task ofthe SENT-IT framework.

An experimental suite is defined as an ordered list of experimental tasks, appliedon the same set of documents and within the same experimental environment; theconcept of experimental suite has been introduced in order to simplify the executionof several experimental tasks on the same input data with different experimentalsetups.

The ideas of experimental suite and experimental task have been inspired by thearchitectural design that characterizes jUnit1, a framework devoted to automaticcode testing. In particular we noticed that the architectural patterns adopted by thejUnit framework could be proficiently introduced in our architecture too. Providingusers the ability to easily setup and execute experiments aimed at determining theOvOP of a set of documents is, in fact, very similar to the definition of test casesaimed, instead, at automatic code testing.

The SENT-IT framework is written mainly in Java 1.6 SE, but it also containssubmodules written in Python and Perl. Such languages have been introduced fortheir enhanced ability to deal with texts, which is one of the drawbacks of the Javalanguage, and the needs of integrating existing linguistic processing and analysistools.

The overall architecture of the SENT-IT framework is illustrated in Figure 3.1.On the left-hand side, all the possible sources that could be used to store inputdocuments are shown: local and remote files (HTML, TXT and XML) and documentrepositories (SQL-based database). The right-hand side shows, instead, the OvOPthat has been inferred for each input document.

The main modules of the SENT-IT framework are:

• the Document Persistence Layer : this module is aimed at providing the capa-bilities required to store, retrieve and access documents and their additionalinformation (e.g.: source, domain, POS tagged representation, et al.).

A SQL Document Base (DB) has been designed for document storage: inparticular the document base can be used by other modules of the SENT-ITframework to store the results of time-consuming processing activities, likePOS tagging.

The DPL allows users to access the contents of input documents in a trans-parent way, independently from the support or format used to store them. Inparticular, in addiction to the Document Base, the DPL can manage docu-ments stored locally as XML, TXT and HTML. Remote documents, identifiedby URL addresses, are managed by the Product Review Crawler (PRC), a

1http://junit.sourceforge.net/

Page 50: Sentiment Analysis for the Italian language

38CHAPTER 3. A SUPERVISED APPROACH TO OVERALL OPINION POLARITY ANALYSIS

Figure 3.1: Overall architecture of the SENT-IT framework.

Page 51: Sentiment Analysis for the Italian language

3.3. THE SENT-IT FRAMEWORK 39

submodule of DPL described in detail in Section 3.3.1. PRC is aimed, inparticular, at downloading the documents and parsing their contents; dataextracted by the PRC is stored by the DPL into the DB.

• the Linguistic Processing Library (LPL): a set of tools used by other modules,in particular by the Analysis Module, to perform several kinds of linguisticanalysis. Tools included into the LPL are specialized for the Italian languageand have been selected from the analysis of the state of the art, according totheir availability (most of the tools are the result of academic or open sourceprojects). The LPL includes:

– a stop word remover based on a list of manually collected stop words thatcharacterize the Italian language;

– a stemming algorithm for the Italian language based on the Porter algo-rithm [67], included into the Snowball library2 for Java;

– a spell-checker based on the Italian Open Office spell-checker3. Morespecifically we developed the programming interface required to interactwith the Open Office spell-checker and to collect suggested correctionsto the texts provided as input. The spell-checker has been used in ourexperimental activity, with poor results and no clear improvement interms of accuracy of the OvOP classification process, in order to reducethe amount of typographic and syntactic errors appearing in productreviews provided by users;

– a sentence splitter, aimed at extracting the sentences constituting a doc-ument. Our sentence splitter is based, in detail, on a set of 10 extractionpatterns, aimed at identifying punctuation and capital letters. In orderto reduce the issues related with acronyms that are currently used inthe Italian language, a gazetteer has been included; when the extractionpatterns identify an acronym listed in the gazetteer, its punctuation andcapital letters are ignored by the sentence splitting task, which moves tothe next term;

– a Part-Of-Speech tagger properly trained for the Italian language, basedon the TreeTagger system4. In particular the parameter files for the Ital-ian language, developed by Marco Baroni5 have been used to train thePOS tagging algorithm. A multithread wrapper for the TreeTagger sys-tem has been developed in order to provide interaction with the SENT-ITframework. POS tagging is a time consuming activity, which could take

2http://snowball.tartarus.org/3http://it.openoffice.org/linguistico/spellcheck.html4http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html5Parameter files for the Italian language are freely available at

http://clic.cimec.unitn.it/marco/tools and resources.html

Page 52: Sentiment Analysis for the Italian language

40CHAPTER 3. A SUPERVISED APPROACH TO OVERALL OPINION POLARITY ANALYSIS

several seconds for each document to be performed: for this reason moreinstances of the TreeTagger system are executed in parallel, each one an-alyzing a single document. Moreover, in order to reduce computationaltime, the POS tagging is executed only at first occurrence of the inputdocument: the results of the POS tagging activity are stored by the DPLinto the DB and retrieved when necessary. This solution reduces signifi-cantly the time required to execute an experimental suite constituted byseveral experimental tasks on the same set of input documents.

• a collection of Data Structures : data structures can be used in several analysisscenarios, like the evaluation of the TF-IDF metric for each term that appearsinto a corpus. Data structures include reversed indexes, customized hash tablesand sparse matrixes, used to represent large amount of sparse data, such as thevector representation of input documents, in an efficient way. For each datastructure a persistence policy has been implemented, in order to allow usersto store and reuse previously analyzed data in different experimental tasks.Each data structure, moreover, has been designed in order to be thread-safeand to grant synchronized access and update; this requirements arise from themultithread optimization, which has been introduced to the Analysis Modulein order to reduce time required to perform the whole document analysis task.

• the Data Visualization Module (DVM), described in detail in Section 3.5:aimed at visualizing in an innovative and user-friendly way the results of theOvOP classification process on a set of input documents. In particular we fo-cused on the possibility to apply the visualization capabilities of Zz-structures[58] to the scenario of a movie review corpus: users can navigate the set ofavailable information along different semantic relationships between the prod-uct reviews, one of them being the OvOP expressed by each document. TheDVM has been used, moreover, to support, with a simple and intuitive inter-face, the whole set of activities concerning manual annotation and classificationof lexical resources described in Chapter 5.

• the Execution Module (EM): it represents the core module of the SENT-ITframework, devoted to the execution of experimental suites and tasks definedby the researcher. The EM, according to the settings provided by the user,instantiates the specific modules that will be used during experimental activityand orchestrates their interactions. We identified and implemented four dif-ferent workflows, which could be adopted as template for experimental taskz:training, cross evaluation, evaluation and meta-cross evaluation. Fig-ure 3.2 represents the sequence of activities constituting each template; themeta-cross evaluation template is described in detail in Chapter 4.

The EM is responsible, by sending requests to the DPL, of loading the set of in-put documents used in experimental activity. When documents are available,

Page 53: Sentiment Analysis for the Italian language

3.3. THE SENT-IT FRAMEWORK 41

Figure 3.2: OvOP Workflows available in the SENT-IT framework.

Page 54: Sentiment Analysis for the Italian language

42CHAPTER 3. A SUPERVISED APPROACH TO OVERALL OPINION POLARITY ANALYSIS

the EM activates the Analysis Module defined for the experimental activity.When the OvOP analysis and classification process ends, the EM module gath-ers the results and the information concerning the lifecycle of the experimentalactivity (e.g.: duration and memory footprint of each task, list of warning anderror messages generated during the process and reusable intermediate results).

Experimental suites and tasks executable by the EM can be defined both inJava, by properly extending the associated abstract classes, or by using aSENT-IT Experiment Descriptor file, an XML file used to describe the wholeset of properties and modules constituting the experimental process.

• the Analysis Module (AM), described in detail in Section 3.3.2: aimed at per-forming different kind of textual and statistical analysis on input documents.The AM is aimed at collecting the values that will be used to build the rep-resentation of a given document, with respect to the specific set of featuresadopted by the OvOP classification process. The AM has been designed tosupport concurrent execution on different subsets of the input corpus, in or-der to reduce the time spent to execute complex analysis tasks, which couldrequire several seconds for each document to be performed.

• the Representation Module (RM): aimed at structuring the data collected bythe AM for each input document into a vector space model suitable for theClassification Module. RM allows researcher to define the set of features thatwill be used in document representation and the set of features selection cri-teria aimed at reducing the document representation space to the set of mostsignificant (e.g.: Information Gain provided by each feature) dimensions. RMuses optimized sparse matrixes to represent the vector space model of the inputdocuments, in order to reduce the required memory-consumption.

RM, moreover, allows user to store the vector space model representing inputdocuments in Attribute-Relation File Format (ARFF)6. The ARFF files canbe loaded by the RM to regenerate the vectorial representation of input docu-ments: new features and filters can be added to the representation model. TheARFF file represents a standard de facto in machine learning community; in-deed several applications like the Weka suite7 and RapidMiner8 use the ARFFfile as a standard for vector space representation.

• the Classification Module (CM): aimed at training and testing machine learn-ing based classifiers for OvOP. CM is based on the wide set of learning methodsand pre-processing capabilities provided by the Weka API; user can define foreach experimental task the learning method to be used for OvOP classification

6A detailed description of the ARFF format is available athttp://www.cs.waikato.ac.nz/ml/weka/arff.html

7http://www.cs.waikato.ac.nz/ml/weka/8http://rapid-i.com/content/view/181/190/

Page 55: Sentiment Analysis for the Italian language

3.3. THE SENT-IT FRAMEWORK 43

and its related parameters. CM provides, moreover, the ability to store andto load already trained OvOP classifiers.

3.3.1 Product Review Crawler

The Product Review Crawler (PRC), reported in the architectural diagram in Figure3.3, is a submodule of the DPL responsible for monitoring and crawling a set of websources and extracting newly published reviews from them. Reviews will be usedas input of the OvOP analysis activity. Potential sources include web sites, forums,blogs, and newsgroups.

Parsing and extraction rules for each web source are defined as Java regularexpressions, xPath expressions or Java classes, by implementing a specific interfaceprovided by the SENT-IT framework. A lookup table is used to assign to eachsource its corresponding parser; the PRC, indeed, is not able to parse documentsfrom unknown sources. PRC could be used in two different modes:

• standalone mode: the PRC is invoked by the DPL, receiving as input thelist of URL addresses representing documents to download and parse. Foreach provided URL the PRC activates a new thread, aimed at downloadingthe target document. When the download is completed, parsing and dataextraction is performed; documents and additional information are stored bythe DPL in the document base.

• proactive mode: the extraction of the reviews is achieved by using a set ofautonomous agents devoted to web crawling, each one assigned to a specificweb source. The agents continuously monitor each web source looking for newcontents; when a new content is available, the extraction task takes place.Agents use the DPL and the DB to store extracted contents.

For each review the following information are extracted, when available:

1. the title, assigned by the author to the review to summarize its content andto give it emphasis;

2. the body of the review, which consists of a short natural language text;

3. the overall polarity rating indicator: some sources allow authors to summarizethe polarity expressed in their reviews by means of different kinds of ratingindicators: numeric values (from 0 to 5), marks (from A to F) or stars. Inorder to handle heterogeneous rating indicators in a common way, all valueshave been normalized in a range between 0 (very negative opinion) and 1 (verypositive opinion);

4. the publishing date of the review;

5. personal data about the author, like name, age and city of residence.

Page 56: Sentiment Analysis for the Italian language

44CHAPTER 3. A SUPERVISED APPROACH TO OVERALL OPINION POLARITY ANALYSIS

Figure 3.3: Overall architecture of the Product Review Crawler module.

Only the first three data (1. - 2. - 3.) are currently used during the OvOP analysisprocess, while the last two (4. - 5.) have been stored for future use. In this thesis weconsider only two classes of reviews: positive and negative. A review is consideredas positive if its overall polarity rating indicator is greater then 0.6; a review isconsidered negative if its overall polarity rating indicator is lower or equal to 0.4.Neutral reviews, with a overall polarity rating indicator greater than 0.4 and loweror equal to 0.6, have not been considered in this thesis due to the following reasons:

1. to reduce the complexity of the classification task, which could be relaxed froma multi-class classification problem (positive, negative and neutral class) to abinary classification problem (positive and negative class).

2. neutral reviews, as described in Section 2.3.1 could convey an ambiguousOvOP. More specifically they can be used by the reviewer to provide objectiveinformation, to provide an unclear opinion, where negative and positive cluesmitigate each other or to intentionally provide a neutral opinion.

3.3.2 Analysis Module

The Analysis Module is responsible for analyzing the set of input documents andcollect, for each document, the data required to build its representation in thevector space model. AM, indeed, is aimed at transforming the textual content of a

Page 57: Sentiment Analysis for the Italian language

3.3. THE SENT-IT FRAMEWORK 45

document into a numeric representation that could be used to describe its overallpolarity orientation: several aspects of the document are be taken in account inorder to build the vector space model representation.

AM allows users to define a customized analysis strategies, according to the spe-cific set of features used in document representation. Analysis could be performed,for instance, by looking at both linguistic and statistical properties characterizingthe document. SENT-IT framework includes a set of analysis submodules properlyspecialized in order to focus on a specific aspect of input documents. Submodulesextend the set of OvOP clues adopted as features for document representation, inthe case of the English language, in [63] and [24]. Moreover additional modules havebeen proposed in our work to improve the effectiveness of OvOP classification, byconsidering some of the particularities characterizing the Italian language and thegenre of product reviews. Following submodules have been developed:

• the TF-IDF submodule: aimed at evaluating the term frequency–inverse doc-ument frequency of a term (or a stem) t with respect to a given document dconstituting the input c. The TF-IDF submodule performs a two-step analy-sis: first step involves the analysis of the terms appearing in each documentof the corpus. It is exploited by a multiple set of threads sharing a commondictionary of terms already seen in the corpus. When a new term is found,a new entry representing the term is added to the common dictionary; if theterm has already been found, the number of its total occurrences in the corpusis increased. The second step starts only when the whole corpus has beenanalyzed; for each document d, for each term t ∈ d, TF-IDF(t) is calculated;

• the n-gram extraction submodule: aimed at identifying and collecting the setof n-grams appearing in the input corpus and the number of their respectiveoccurrences for each document. This feature has been proficiently used in [63].Each n-gram extraction module works on a specific value of n; we exploited, inour experimental activities, unigrams (n = 1), bigrams (n = 2) and trigrams(n = 3).

• the punctuation submodule: aimed at identifying and counting the number ofoccurrences of punctuation symbols appearing in a document d. In particularthe submodule is focused on question and exclamation marks, which are usedto convey subjectivity and appear quite frequently in product reviews. Thepunctuation module has been trained, moreover, to recognize and extract aset of emoticons, the sequences of punctuation symbols, like :-) and :-(, usedin textual documents to express emotions. The use of emoticons as a clue forsubjectivity as already been exploited in [53].

• the POS tag submodule: aimed at identifying and counting the number ofoccurrences of terms appearing in a document d labelled with a specific POStag. The submodule allows pairs (term, POS tag) to be used as features in

Page 58: Sentiment Analysis for the Italian language

46CHAPTER 3. A SUPERVISED APPROACH TO OVERALL OPINION POLARITY ANALYSIS

document representation; this specific kind of features has been analyzed andevaluated in both [63] and [24].

• the text statistical submodule: based on the results presented in [24] for the En-glish language, the submodule evaluates, for each document, a set of statisticproperties, including:

– average word length: total number of characters in the document dividedby total number of words in the document;

– average sentence length: total number of words in the document dividedby the total number of sentences in the document;

– long words : number of words exceeding seven characters that appear inthe document;

– number of sentences.

This set of features is aimed, according to the assumption expressed in [24],at representing the complexity of a given document; in particular most of thedescribed features are popular in the field of genre recognition and classificationand have also been applied to Sentiment Analysis and Opinion Mining. In[19] average sentence length has been exploited in order to perform OpinionMining; negative sentences, according to collected results, tend to be longerand written in a more varied language with respect to positive sentences. In[30] authors note, moreover, that objective documents tend to be shorter withrespect to subjective documents.

Performing document analysis is a time consuming task, in particular when largecorpora are considered. For this reason while designing the architecture of the AMwe focused on providing concurrency execution of multiple analysis threads, eachoperating on a single document at time. In order to implement concurrent analysisa set of ad hoc data structures for collecting results has been defined; such datastructures are required, for instance, to evaluate properties involving the wholecorpora of input documents, like the TF-IDF of each term.

AM module allows, moreover, pre-processing tasks to be defined: such tasksare aimed at reducing the amount of unnecessary information provided by eachdocument and, consequently, at improving the effectiveness of the analysis tasks.Pre-processing can include: stop words removal, POS tagging and POS filtering(e.g.: analysis task can be focused only on adjectives, et al.), spell checking, entitiesextraction and replacement (e.g.: replace all symbols used to represent currencieswith a common placemark, et al.). Stop words removal has been included as pre-processing task in each experimental activity described in this thesis.

Page 59: Sentiment Analysis for the Italian language

3.4. EXPERTIMENTS 47

3.4 Expertiments

3.4.1 The Movie Review Corpus

We performed our first set of experiments on the movie reviews domain. Movie re-views domain has been already investigated by several works presented in literature[82, 71, 2, 9] and in particular in [63], and represents one of the “de facto” goldstandards in OvOP evaluation. As stated in [63] movie reviews domain is experi-mentally convenient because large collection of reviews are available, even for theItalian language. Moreover most of the available reviews present a rating indicator(e.g.: stars, votes) assigned by authors in order to summarize their opinion. Ratingindicators are used as class labels in training and evaluating supervised approachesto OvOP analysis: in fact no additional manual annotation at document level isrequired.

Movie domain, according to [82], presents some specific issues which could affectthe effectiveness of OvOP analysis. In particular the author shows how the accuracyof the proposed methodology falls to 65,83 % when applied to a set of documentsconstituted by 120 movie reviews. In particular in [82] and [63] authors identify thefollowing issues affecting the movie domain:

1. the “good actor trapped in a bad movie” issue: the overall opinion about amovie expressed by the reviewer is in contrast with respect to the opinionsabout the actors included into the same document. The following reviewrepresents an example of described issue:

Un film discreto....che pero’ presentando molti vari personaggidella scena italiana...non riesce secondo me , a raggiungere la qualitadi EX...sempre dello stesso regista e presentato nel 2009...Bravo..comesempre De Luigi...splendido nei ruoli comici...e peccato per le pic-cole partecipazioni di Bisio e della Littizzetto. In qualche scenasi ride...ma ci si accorge anche di una sceneggiatura, scritta a piumani...

The overall opinion expressed by the reviewer about the movie is not stronglypositive (the rating indicator provided by the user is 5/10) and mainly orientedto neutrality. However one of the sentences of the review includes a strongpositive reference concerning one of the actors, by using terms like bravo (good)and splendido (wonderful). The presence of such terms, which are domainindependent clues of positive orientation for the Italian language influences,in fact, the evaluation of the OvOP of the review.

In [63] the author shows how supervised methodologies for OvOP evaluationat document level are less prone to the influence of the “good actor trappedin a bad movie” issue with respect to the lexical resource based methodologiesintroduced in [82], whose granularity is focused on sentences.

Page 60: Sentiment Analysis for the Italian language

48CHAPTER 3. A SUPERVISED APPROACH TO OVERALL OPINION POLARITY ANALYSIS

The described issue could be further generalized to cover a larger set of reviewdomains: the opinion polarity of the whole is not necessarily the sumof the opinion polarity of the parts. Car and phone review domainspresent, for instance, such issue.

2. Movie reviews usually include a brief summarization of the plot of the de-scribed movie: an objective part of the review not providing any additionalcontribution to the overall polarity. Words appearing in the plot may, how-ever, be opinion polarity bearing, affecting the evaluation of the OvOP in awrong way.

Following review includes an example of the described issue. The first para-graph provides a summarized description of the plot of the movie, includingstrongly oriented terms such as falsi, falsita, ipocrisia, bellezza. However suchterms have not been used by the reviewer to convey its opinion; in fact only thelast paragraph ”Una storia vera. Sean Penn nelle vesti di un grande regista.Un film da vedere” carries the orientation expressed by the reviewer.

Un giovane che si allontana dalla societa, dai falsi pregiudizi, dalconsumismo. L’effimero attaccamento alle cose materiali provocafalsita e ipocrisia. E’ allora che inizia la ricerca della verita, dellabellezza, a contatto con la Natura, da soli, into the wild... Perdersiper ritrovare se stesso...”perche bisogna chiamare le cose con il lorovero nome”...ed e a questo punto che la ricerca si compie. Una storiavera. Sean Penn nelle vesti di un grande regista. Un film da vedere.

According to previous examples both described issues are language independentand affect documents written in Italian language too.

The Movie Review Corpus (MRC) has been collected from the FilmUp9 website:a set of reviews, written in Italian language, about more than 4500 movies. Thischoice is related with both the structure of the site, allowing reviews to be easilyextracted, and the presence, for each review, of an overall polarity rating indicator.However, according to the capabilities of the harvesting infrastructure described in3.3.1, other movie review services could be easily integrated in the future to furtherconsolidate the collection.

In order to build a comprehensive corpus useful in training and evaluation, morethan 3000 reviews referring to 300 different movies have been collected. The dis-tribution of OvOP assigned by the reviewers is not fair, as observed in [63] for theEnglish corpus and reported in Figure 3.4.

The positive reviews are 2038 (64.7% of the entire corpus), while the negativereviews are only 694 (22% of the corpus).

9www.filmup.leonardo.it/opinioni

Page 61: Sentiment Analysis for the Italian language

3.4. EXPERTIMENTS 49

Figure 3.4: Distribution of preassigned OvOP in MRC.

3.4.2 Results

A balanced training set has been collected for experimental purposes by randomlychoosing 500 reviews with polarity greater than 0.6 and 500 reviews with polaritylower or equal to 0.4 from the MRC.

For each review content and title are retrieved merged: such solution does notprovide any additional importance to the words used by the author to summarizethe title of its review. Stop word removal and stemming are applied to reduce thedimension of the vector space model representing the training set.

Following approaches to document representation have been investigated:

• U3 : the set of unigrams occurring 3 or more time in the whole training set;

• UB3 : the set of unigrams and bigrams occurring 3 or more time in the wholetraining set;

• UBT3 : the set of unigrams, bigrams and trigrams occurring 3 or more timein the whole training set.

The weight assigned to a document di with respect to a selected n-gram nj is definedas {

1 if occ(nj, di) > 00 otherwise

where occ(nj, di) is the number of occurrences of the n-gram nj in document di. Theproposed weighting metric is simpler than other weighting metrics, such as the TF-IDF, usually adopted in text classification. As stated in [63], n-gram occurrencesbased representation is more effective than TF in OvOP classification. Preliminaryresults, obtained comparing the results produced by classifiers based on TF withclassifiers based on n-gram occurrences, show last weighting metric out-performs

Page 62: Sentiment Analysis for the Italian language

50CHAPTER 3. A SUPERVISED APPROACH TO OVERALL OPINION POLARITY ANALYSIS

TF in each representation model. The contributions provided by the text statisticsubmodule and the punctuation submodule are included in U3, UB3 and UBT3.

Six different classifiers have been trained, applying U3, UB3 and UBT3 to boththe Naıve Bayes(NB) and the Support Vector Machine (SVM) algorithm. The WekaAPI provides an implementation of both NB and SVM algorithms; the trainingalgorithm we used is Sequential Minimal Optimization (SMO) [65]. Evaluation isachieved with a 3-cross folding methodology, provided by the cross evaluationworkflow template of the EM.The performance of the built classifiers is measured in terms of accuracy, definedas the percentage of documents classified in the correct way. In particular, we areinterested in evaluating the accuracy+ and the accuracy−, measured with respect tothe subset of positive and negative reviews respectively. Table 3.2 shows the globalaccuracy of each trained classifier, grouped with respect to its machine learningalgorithm.

Table 3.2: Average accuracy of trained classifiers.

U3 UB3 UBT3Naıve Bayes 82,2 82,4 82,5SVM 84,9 84,4 84,4

Table 3.3 extends results presented in Table 3.2 in terms of measured accuracy+

and accuracy−; these two indexes are interesting in order to assess the complexityof both positive and negative review classification task.

Table 3.3: Average accuracy+ and accuracy− of trained classifiers.

U3 UB3 UBT3accuracy+ accuracy− accuracy+ accuracy− accuracy+ accuracy−

Naıve Bayes 85,4 79,0 86,2 78,6 86,2 78,8SVM 85,2 84,6 83,8 85,0 83,6 85,2

The second part of our experimental evaluation is aimed at computing the per-formance achieved by a new set of classifiers, trained on a reduced vector spacemodel, filtered according to a feature selection criterion . Feature selection is thetask that aims at identifying the subset of features most effective for a given classi-fication process, by reducing the noise introduced by a sparse representation model.Feature selection allows trained classifiers to achieve better performance and to re-duce their computational requirements. In this work we adopt the Information Gain(IG) feature selection criterion. IG is defined as the number of bits of information

Page 63: Sentiment Analysis for the Italian language

3.4. EXPERTIMENTS 51

obtained for category prediction by knowing the presence or absence of a feature ina document. Equation 3.1 reports the general formulation of IG with respect to theinput feature t:

IG(t) = −m∑

i=1

Pr(ci)logPr(ci)+Pr(t)m∑

i=1

Pr(ci|t)logPr(ci|t)+Pr(6 t)m∑

i=1

Pr(ci| 6 t)logPr(ci| 6 t)

(3.1)with {ci}mi=1 set of available classes. Each feature t constituting the representationmodel can be ranked, accordingly to its IG(t) value. Only the n best features areused for representation of the training set. Table 3.4 displays the list of the 50features with the highest IG value when feature ranking is applied to the UBT3representation model.

Table 3.4: Top 50 features extracted from the training set with the highest IG value.

1 bellissim 11 bravissim 21 splendid 31 bel 41 eccezional2 brutt 12 po 22 attor 32 favol 42 depp3 bell 13 piac 23 pir 33 harry 43 grindhous4 jack 14 noios 24 grand 34 johnny 44 noi5 pessim 15 ridicol 25 perfett 35 colonn 45 stup6 ottim 16 interpret 26 sparrow 36 inutil 46 insuls7 fantast 17 bast 27 sonor 37 straordinar 47 ’ottim film’8 evit 18 simpson 28 orrend 38 ’colonn sonor’ 48 stupid9 peggior 19 butt 29 will 39 ’film bellissim’ 49 jones10 delusion 20 ’jack sparrow’ 30 schifezz 40 brav 50 molt

Table 3.5 shows the accuracy of the trained classifiers, based respectively on theU3 and UBT3 representation, at varying of n. The number of unigrams appearingat least 3 times into the training set is lower than 3000; for this reason n is limitedto 2000 potential features. Figure 3.5 displays the accuracy curves obtained fromdata in Table 3.5.

Obtained results confirm that the assumptions introduced in [63], related withthe ability of unigrams to express the OP of a given text, conserve their soundnesseven in the Italian language. More specifically representation models based entirelyor partially on the occurrence of unigrams lead to the best performances, with boththe adopted training approaches NB and SVM. Although evaluated on differentcorpora, constituted by documents written in English language and with specificdimensions and polarity distribution, the classifiers trained by both [71] and [63]shows performances comparable to those presented in this Section.

Our experimental activities show that SVM classifiers clearly over perform NBclassifiers in OvOP classification task, as previously stated by [63]. In particular, by

Page 64: Sentiment Analysis for the Italian language

52CHAPTER 3. A SUPERVISED APPROACH TO OVERALL OPINION POLARITY ANALYSIS

Table 3.5: Average accuracy of the U3 and UBT3 based classifiers after featureselection.

U350features 100features 250features

Naıve Bayes 81,0 83,8 83,8SVM 83,3 86,2 85,5

500features 1Kfeatures 2KfeaturesNaıve Bayes 85,6 86,7 85,4SVM 86,8 87,5 85,7

UBT350features 100features 250features 500features

Naıve Bayes 80,1 84,4 84,5 85,4SVM 82,4 85,5 85,9 87,2

1Kfeatures 2Kfeatures 3KfeaturesNaıve Bayes 86,6 86,8 85,5SVM 87,9 89,0 87,8

Figure 3.5: Feature selection and accuracy for both NB and SVM classifiers.

Page 65: Sentiment Analysis for the Italian language

3.5. A NOVEL VISUALIZATION APPROACH FOR POLARITY CLASSIFIED REVIEWS 53

analyzing values reported in Table 3.3, it is clear that NB classifiers tend to classifypositive reviews better than negative ones, while SVM classifiers tend to be morefair, showing similar accuracy values for instances of both positive and negativeclasses.

Feature selection improves the accuracy achieved by the trained classifiers; inparticular the IG selection metric shows an average improvement in accuracy be-tween 2% and 4.6%. The highest improvement in accuracy is achieved when IG isapplied to UBT3 with SVM as learning algorithm.

Feature selection may also be useful to distinguish between features that areeffective for OvOP classification and features that introduce noise in the documentrepresentation model. Results reported in Table 3.5, obtained by varying n identifiesthe set of features that achieve the best performance.

By looking at the set of most relevant features, included in Table 3.4 and rankedwith respect to the IG metric, it is possible to identify the stems of strongly polar-ized adjectives like bellissim (wonderful), brutt (ugly), pessim (worst), the stems ofadverbs used as adjective amplifier, such as po’ (quite), and the stems related withdomain specialized terms or multi-terms, like, for example ‘colonn sonor’ (sound-track) or interpret (actor).

Accuracy achieved by the classifiers trained on the selected set of features ishigher than results presented by [71] and [63]. In the future we expect to increasefurthermore the size of our corpus, in order to deeper investigate how accuracy ofOvOP classification changes according to larger training sets.

3.5 A novel visualization approach for polarity

classified reviews

3.5.1 Basics of graph theory

In the following we introduce some standard graph theory notation, for more detailsrefer to [85]. A graph G is a pair G = (V,E), where V is a finite non-empty set ofelements called vertices and E is a finite set of distinct unordered pairs {u, v} ofdistinct elements of V called edges.A multigraph is a triple MG = (V,E, f) where V is a finite non-empty set of vertices,E is the set of edges, and f : E → {{u, v} | u, v ∈ V, u 6= v} is a surjective function.An edge-colored multigraph is a triple ECMG = (MG,C, c) where: MG = (V,E, f)is a multigraph, C is a set of colors, c : E → C is an assignment of colors to edgesof the multigraph.In a multigraph MG = (V,E, f), edges e1, e2 ∈ E are called multiple or paralleliff f(e1) = f(e2). Thus, a graph as a particular multigraph G = (V,E, f) withoutparallel edges.Given an edge e = {u, v} ∈ E, we say that e is incident to u and v; moreover u andv are neighboring vertices.

Page 66: Sentiment Analysis for the Italian language

54CHAPTER 3. A SUPERVISED APPROACH TO OVERALL OPINION POLARITY ANALYSIS

Given a vertex x ∈ V , we denote with deg(x) its degree, i.e., the number ofedges incident to x, and with dmax the maximum degree of the graph, i.e., dmax =maxz∈V {deg(z)}. In an edge-colored (multi)graph ECMG, where ck ∈ C, we definedegk(x) the number of edges of color ck incident to vertex x. A vertex of degree 0 iscalled isolated, a vertex of degree 1 is called pendant.A path P = {v1, v2, . . . , vs} is a sequence of neighboring vertices ofG, i.e., {vi, vi+1} ∈E, 1 ≤ i ≤ s − 1. A graph G = (V,E) is connected if: ∀x, y ∈ V , ∃ a pathP = {x = v1, v2, . . . , vs = y}, with {vk, vk+1} ∈ E, 1 ≤ k ≤ s − 1. Two vertices xand y in a connected graph are at distance dist if the shortest path connecting themis composed of exactly dist edges.

3.5.2 Zz-structures

Zz-structures [58] represent a graph-centric system of conventions for data and com-puting. A zz-structure is composed by linear sequences of cells, called ranks ; ineach rank, cells are connected with links of the same color. The set of ranks withlinks of the same color is called dimension. The set of dimensions, each of themdistinguished by its color, constitutes a zz-structure. The starting and the endingcell of a rank are called, headcell and tailcell, respectively, and the direction fromthe starting (ending) to the ending (starting) cell is called posward (respectively,negward).

For any dimension, a cell can only have one connection in the posward direction,and one in the negward direction. This ensures that all paths are non-branching,and thus embodies the simplest possible mechanism for traversing links.

Formally a zz-structure is defined as follows [17, 18]. Consider an edge-coloredmultigraph ECMG = (MG,C, c) where: MG = (V,E, f) is a multigraph com-posed of a set of vertices V , a set of edges E and a surjective function f : E →{{u, v} | u, v ∈ V, u 6= v}. C is a set of colors, and c : E → C is an assignmentof colors to edges of the multigraph. Finally, deg(x) (respectively, degk(x)) denotesthe number of edges incident to x, (respectively, of color ck).

A zz-structure is an edge-colored multigraph S = (MG,C, c), where MG =(V,E, f), and ∀x ∈ V, ∀k = 1, 2, ..., |C|, degk(x) = 0, 1, 2. Each vertex of a zz-structure is called zz-cell and each edge a zz-link. The set of isolated vertices isV0 = {x ∈ V : deg(x) = 0}.

An example of a zz-structure, related to scientific papers is given in Figure 3.6.Each vertex contains the reference to a paper and shows some summary information,like the initial part of title, authors and publication year. Normal-red, dotted-greenand dashed-blue lines represent different colors. Normal-red lines group papers of thesame year (v1, . . . , v4 have been published in 2007, while v5, . . . , v11 are (or will be)published in 2008); dotted-green lines identify papers sharing at least two authors:for example, v7, v9 and v10 share authors Dattolo and Tasso; finally, dashed-bluelines group papers sharing a topic (for example, papers v7 and v10 share topic Web2.0).

Page 67: Sentiment Analysis for the Italian language

3.5. A NOVEL VISUALIZATION APPROACH FOR POLARITY CLASSIFIED REVIEWS 55

Figure 3.6: An example of zz-structure.

Dimensions

An alternative way of viewing a zz-structure is a union of subgraphs, each of whichcontains edges of a unique color.

Consider a set of colors C = {c1, c2, ..., c|C|} and a family of indirect edge-coloredgraphs {D1, D2, ..., D|C|}, where Dk = (V,Ek, f, {ck}, c), with k = 1, ..., |C|, is a

graph such that: 1) Ek 6= ∅; 2) ∀x ∈ V , degk(x) = 0, 1, 2. Then, S =⋃|C|

k=1Dk is a

zz-structure.

Given a zz-structure S =⋃|C|

k=1Dk, then each graph Dk, k = 1, . . . , |C|, is a

distinct dimension of S.

In Figure 3.6, we identify three dimensions: year, co-authors and topic, respec-tively represented by normal-red, dotted-green and dashed-blue lines.

Ranks

A rank is in a particular dimension and it must be a connected component.

Consider a dimension Dk = (V,Ek, f, {ck}, c), k = 1, . . . , |C| of a zz-structure

S =⋃|C|

k=1Dk. Then, each of the lk connected components of Dk is called a rank.

A dimension can contain one (if lk = 1) or more ranks. Moreover, the numberlk of ranks differs in each dimension Dk. In Figure 3.6, dimension year containsonly two ranks 2007 (constituted by papers v1, . . . , v4) and and 2008 (constitutedby papers v5, . . . , v11), while dimensions co-authors and topic contain four ranks;more specifically, dimension co-authors contains dattolo-vitali (papers v1, v2 andv8), canazza-dattolo (v3 and v6), dattolo-luccio (v4, v5 and v11) and dattolo-tasso(v7, v9 and v10)

Given a rank Rki , an alternative way of viewing a dimension is a union of ranks:

Dk =⋃lk

i=1Rki ∪ V k

0 .

Page 68: Sentiment Analysis for the Italian language

56CHAPTER 3. A SUPERVISED APPROACH TO OVERALL OPINION POLARITY ANALYSIS

Head and tail cells

If we focus on a vertex x, Rki = . . . x−2x−1xx+1x+2 . . . is expressed in terms of

negward and posward cells of x: x−1 is the negward cell of x and x+1 the poswardcell. We also assume x0 = x. In general x−i (x+i) is a cell at distance i in thenegward (posward) direction. In particular

Given a rank Rki = (V k

i , Eki , f, {ck}, c), a cell x is the headcell of Rk

i iff ∃ itsposward cell x+1 and 6 ∃ its negward cell x−1. Analogously, a cell x is the tailcell ofRk

i iff ∃ its negward cell x−1 and 6 ∃ its posward cell x+1.

Views and navigation

An important contribute of zz-structures is the possibility to contextualize informa-tion and to retrieve all information related to a given cell, starting from the samecell. Personalized views are shown to the users, when they choose a vertex as focusof his/her attention and a set of preferred topics, keywords or tags.

Rectangular H-views[51] visualize a focus cell and its neighbors on two specifieddimensions.

In order to introduce the H-view definition, we use the notation x ∈ Ri(x) to

indicate that Ri(x) ∈ Di is the rank related to x of color ci.

Given a zz-structure S =⋃|C|

k=1Dk, where Dk =

⋃lki=1R

ki ∪ V k

0 , and whereRk

i = (V ki , E

ki , f, {ck}, c), the H-view of size l = 2m + 1 and of focus x ∈ V =⋃lk

i=0 Vki , on main vertical dimension Da and secondary horizontal dimension Db

(a, b ∈ {1, ..., lk}), is defined as a tree whose embedding in the plane is a partiallyconnected colored l × l mesh in which:

• the central vertex, in position ((m+ 1), (m+ 1)), is the focus x;

• the horizontal central path (the m + 1-th row) from left to right, focussed invertex x ∈ Rb

(x) is:

x−g . . . x−1xx+1 . . . x+p where xs ∈ Rb(x), for s = −g, . . . ,+p (g, p ≤ m).

• for each cell xs, s = −g, . . . ,+p, the related vertical path, from top to bottom,is:(xs)−gs . . . (xs)−1xs(xs)+1 . . . (xs)+ps , where (xs)t ∈ Ra

(xs), for t = −gs, . . . ,+ps

(gs, ps ≤ m).

Intuitively, the H-view extracts ranks along the two chosen dimensions. Notethat, the name H-view comes from the fact that the columns remind the verticalbars in a capital letter H.

Similar to H-views are the I-views, so-called from the fact that the rows remindthe horizontal serif in a capital letter I; a formal definition can be found in [17].

Page 69: Sentiment Analysis for the Italian language

3.5. A NOVEL VISUALIZATION APPROACH FOR POLARITY CLASSIFIED REVIEWS 57

As example of H-view, consider Figure 3.7 that refers to the zz-structure of Fig-ure 3.6. The focus paper is v7, and the chosen dimensions are year and co-author.

Figure 3.7: An example of H-view on focus v7.

The view has size l = 2m + 1 = 5, the focus is v7, the horizontal central pathis v−2

7 v−17 v7v

+17 v+2

7 = v5v6v7v8v9 (g, p = 2). The vertical path related to v−17 = v6 is

(v−17 )−1(v−1

7 )= v3v6 (gs = 1 and ps = 0), that is (v−17 )−1 = v3 is the headcell of the

rank as gs = 1 < m = 2.

3.5.3 Data Visualization Module

The Data Visualization Module is aimed at improving the way users interact witha corpus of product reviews; in particular we focused on designing a novel approachwhere OvOP of reviews could effectively be used to support the review browsing andretrieval. Moreover we wanted to provide users the possibility to annotate reviewsaccording to their preferences by defining, for each review, a set of tags; tags canused to automatically define new navigation criteria users can exploit. The DVMsupports users in two main functions:

• Dynamic and personalized access to the corpus. Knowledge inferred duringthe OvOPA activity for a given set of documents is represented by means of aZz-structures. A cell of the Zz-structure is associated to each review; colorededges are used to represent semantic interconnections among reviews. Severaldimensions aimed at improving the effectiveness of users’ navigation activityhave been identified: reviews related to the same movie, review containingsame tags, reviews expressing similarly oriented opinions.

Figure 3.8 shows a view related to the list of reviews found searching for thequery ”Johnny Depp”. Each cell is composed by the movie’s title (e.g. SweeneyTodd, in first zz-cell), an emoticon aimed at identifying the polarity of thereview (positive or negative), a short reference to the review (”Spettacolare e

Page 70: Sentiment Analysis for the Italian language

58CHAPTER 3. A SUPERVISED APPROACH TO OVERALL OPINION POLARITY ANALYSIS

Figure 3.8: Set of reviews retrieved from the MRC with the query ”Johnny Depp”.

dire poco! ...”), the publishing date (e.g, 01-03-2008), the data source (e.g.,FilmUp), the search tool and an advanced tool button (identified by the ”+”symbol). Title, emoticon, date, source and search tool are clickable and areassociated to related dimensions; as highlighted (with red color associatedto search tool icon) in Figure 3.8, current horizontal dimension contains thesequence of results for ”Johnny Depp”. If user clicks on title ”Pirati dei ...”in third cell, dimension related to reviews with the same title is visualized, asshown in Figure 3.9. The cell with red border represents the last user selection,while all reviews with same title are visualized in vertical dimension and aremarked by red color associated to titles. In this way, user has access to acartesian view, on two semantic dimensions: ”Johnny Depp” and title ”Piratidei ...”. Next user clicks will propose new views on user chosen dimensions.

• Creation of new interconnections among existing data and knowledge [17]. Theadvanced tool button enables users to: (a) add one or more reviews in a newdimension, tagging existing reviews. A new dimension will be composed by allreviews labelled with the same tag; (b) browse and visualize reviews selectinga dimension from the set of user created tags.

3.6 Conclusions

In this Chapter we faced the problem of defining supervised methodologies based onmachine learning aimed at OvOP classification. In particular we focused on OvOPapplied to product reviews written in the Italian Language.

In order to simplify and organize experimental activity a software platform de-voted to Sentiment Analysis has been designed and implemented: the SENT-ITframework. The framework supports development and evaluation of OvOP classi-fication engine; users can define their own experimental activity by selecting thecomponents to use for each processing task. Four different workflows have beendefined, in order to properly map several scenarios that can involve OvOP classifi-cation. The SENT-IT framework includes a toolbox of services and resources aimed

Page 71: Sentiment Analysis for the Italian language

3.6. CONCLUSIONS 59

Figure 3.9: A view related to dimensions ”Johnny Depp” and ”Pirati dei Caraibi”.

Page 72: Sentiment Analysis for the Italian language

60CHAPTER 3. A SUPERVISED APPROACH TO OVERALL OPINION POLARITY ANALYSIS

at performing several linguistic tasks, such as POS tagging, stemming, stop wordsremoval, et al.; each tool has been trained specifically for the Italian language.

The SENT-IT framework has been used to develop a set of domain dependentclassifiers, based on NB and SVM learning methods. Such classifiers have beentrained and tested on the collected Movie Review Corpus. The average achievedaccuracy is higher than 80%, with a best result of 89% obtained with IG featureselection applied the UBT3 feature set.

A novel visualization model, based on Zz-structures, has been proposed andanalyzed, in order to improve the way users browse and interact with a corpus ofproduct reviews. In particular we have presented a solution where OvOP can beused as a direction for navigation of user generated reviews.

Page 73: Sentiment Analysis for the Italian language

Chapter 4

Domain Independent SentimentAnalysis

Abstract

In this Chapter we investigate the problem of domain independent OvOPclassification for the Italian language. More specifically we are interested inmoving towards the results obtained in Chapter 3: we aim at generalizingthe OvOP classification process even to those documents not concerning thedomains used for training. In this Chapter different methodologies will beanalyzed, in order to evaluated and improve the effectiveness of domain inde-pendent OvOP; in particular meta-classification will be exploited to generateclassifiers devoted to OvOP based on existing domain dependent trained clas-sifiers. Three corpora, concerning cars, cell phones and books, have beencollected and used to generate a high accuracy domain dependent classifier,using the UBT3 feature set introduced in Chapter 3 and SVM. Several dif-ferent OvOP classifiers have been trained and evaluated by combining reviewcorpora in different ways for both training and testing.

4.1 Introduction

OvOP classification is a strongly domain dependent process; this phenomenon hasbeen described in detail in Chapter 2: terms and sentences can convey differentpolarities in different domains. The sentence ”go read the book” [62] contributesin different ways to the OvOP of a review according to its domain: positive if thereview concerns a book but negative in case of a review describing a movie.

This issue has been described in [82] by observing the variations of polarity as-signed to the term ”unpredictable” when used across different domains. [19] and[24] confirm that OvOP classification can concretely provide different performances

Page 74: Sentiment Analysis for the Italian language

62 CHAPTER 4. DOMAIN INDEPENDENT SENTIMENT ANALYSIS

when applied across different domains; accuracy of sentiment classification, in par-ticular, is influenced by topic dependency in many ways. Different lexicons andstylistic properties, in fact, are used in different domains to convey subjectivity in adifferent way. Temporal dependency, as stated in [69], represents another issue thatis strongly related with topic dependency, for example in the case of reviews coveringelectronic devices, such as cell phones or laptops, where continuous innovation cantransform a valuable feature to a poor one in few months.

An OvOP classifier, trained according to the most relevant features of a givendomain, could not be able to recognize the polarity carriers of a different domain.Several approaches have been proposed in literature in order to deal with the prob-lem of training a domain independent OvOP classifier, aimed at providing goodperformance even on domains that differ from those adopted for training. Suchapproaches span from identifying good domain independent features common toeach domain [96], to developing a set of domain specific classifiers, used to infer theorientation of a given document form an unknown domain [59].

None of the proposed approaches, however, provides a significant improvement tothis issue; sentiment analysis is, in fact, strongly influenced by domain dependency.In this Chapter we investigate how such issue affects the OvOP classification for theItalian language and the approach that can be exploited in order to minimize theeffects of domain dependency on OvOP.

4.2 Related Work

4.2.1 Aue and Gamon

In [5] authors explore different approaches aimed at customizing a sentiment classi-fication system to a new target domain in the absence of large amounts of labeleddata. Four different domains are considered: movie reviews [63], book reviews andtwo different sets of user provided feedbacks collected from web surveys. Data setsdiffer each other in terms of average length (varying from long reviews to singlesentence feedbacks) and lexicon used in each domain.

Four different feature sets (all n-grams, unigrams, bigrams, trigrams) and sixdifferent log likelihood ratio (LLR)[21] cutoffs (no cutoff, top 20k/10k/5k/2k/1kfeatures) have been combined during training of the OvOP classifier. Each classifierhas been trained on one domain and tested on each domain, its own training domainand the other three foreign domains. The accuracy of inter-domain OvOP classifi-cation varies between 81,42% and 52,16%; the accuracy of classifiers evaluated ontheir own domain is, on the other hand, very high. On the movie domain OvOPclassification reaches an accuracy of 90,45%, overcoming the best results reportedin [63] on the same data set.

Domain differences are substantial and, as stated in [5], a classifier trained onone domain may be barely able to overcome the baseline in another domain. For this

Page 75: Sentiment Analysis for the Italian language

4.2. RELATED WORK 63

reason a second experiment has been performed training a general purpose OvOPclassifier on the whole set of available training documents; the generated classifierhas been evaluated with respect to each domain, showing an accuracy varying from63,92% to 74,99%. An extended version of the general purpose OvOP classifier hasbeen trained by limiting the features used during training to those that appear inthe target domain with no significant improvements.

A third experimental activity, based on the development of an ensemble classifier,a model that allows to combine the output of several classifiers in order to determinethe overall classification [20] eachother. A set of three OvOP classifiers, basedrespectively on unigrams, bigrams, and trigrams has been trained for each domain.Nine classifiers, trained from 3 different domains, have been tested on the forthdomain; the vector constituted by the output of the nine classifiers is used to train ameta-classifier. The meta-classifier, trained on that set, calibrates the combinationof scores from the individual classifiers on a held-out data set. Accuracy of domainindependent OvOP is significantly improved in three of the four domains, with abest performance of 80,03%, with respect to the performances granted by the generalpurpose OvOP classifier.

4.2.2 Engstrom

An accurate investigation of the topic of domain independent OvOP classificationis presented in [24]; the author is interested in defining a feature set that can beproficiently used to infer OvOP across different domains. Moreover the authormoves further, by evaluating the effectiveness of the same feature set for OpinionMining; subjectivity, according to the author, can be affected too by differencesbetween domains, even if this issue is not as effective as it is to OvOP. Five corpora,covering different domains, have been used in order to evaluate the effectivenessof the proposed feature set; a three class classification architecture is described,in order to distinguish between neutral, positive and negative class. The PairwiseCoupling model is used in order to enable classification by means of binary SVM.

Documents are represented using 3572 terms from two manually collected lex-icons: a lexicon of sentiment bearing words, constituted by terms like ”best” and”lovely”, and a lexicon of potential sentiment bearing words with at least one mean-ing that conveys a specific orientation. An accuracy of 61,5% has been obtained oninter-domain OvOP classification.

An extended model is presented, in order to overcome the limitations of the de-scribed approach; text statistics and punctuation, negations, intensifiers and Word-Net based extensions of the list of sentiment bearing words have been exploitedin different combinations. Such extension, however, did not lead to a significantimprovement, with a best accuracy of 69%. Better results have been obtained forthe subjectivity classification task; subjectivity classification appears less affectedby topic-dependency than OvOP classification.

Page 76: Sentiment Analysis for the Italian language

64 CHAPTER 4. DOMAIN INDEPENDENT SENTIMENT ANALYSIS

4.2.3 Agrin

In [2] the author presents a set of features for document representation devotedto domain independent OvOP classification; in particular the work is aimed atimplementing a methodology that could be applied to the blogosphere, where nodomain assumption can be made. Two different data sets, including the moviereview corpus described in [63], have been used to evaluate inter-domain accuracyof trained OvOP classifiers. Several combinations of statistical and lexical basedfeature sets have been exploited and evaluated; in particular information providedby the General Inquirer about opinionated terms and their expansion by means ofWordNet relationships are used, in order to define a set of domain independentfeatures.

Results, however, do not present any significant improvement to the problem, byproviding an maximum inter-domain accuracy of 68%, with respect to a maximumintra-document accuracy of 81%, by using statistical features. Domain independentfeatures, moreover, provide a lower accuracy of 67% across different domains anddifferent learning methods. This work confirms that inter-domain OvOP is a com-plex task, dictated, according to the author, by the types of writing and nuancesused within a document set.

4.2.4 Blitzer

Another contribution to the problem of domain independent OvOP has been pro-vided in [9], where an extension to the structural correspondence learning (SCL)algorithm [10], devoted to domain independent OvOP, is presented. The SCL algo-rithm is based on the following assumption: if two terms, belonging to completelydifferent domains, present both an high correlation with a third term common toboth domains, such as the term ”excellent”, they will probably contribute in thesame way to the OvOP. The common terms, which are used to transfer polarityfrom a set of labelled documents to another domain, where no labelled documentsare available, are defined as pivot features.

Two different metrics for pivot selection have been proposed and evaluated; inparticular the SCL-MI criterion, based on Mutual Information between potentialpivot features and the source label, has been introduced in order to deal with thespecific task of OvOP classification. Four different sets of product reviews havebeen generated by collecting reviews available on the Amazon website covering thefollowing domains: books, DVDs, electronics and kitchen appliances. For eachdomain 2000 reviews, equally distributed between positive and negative documents,are collected and used to generated domain dependent classifiers based on unigramsand bigrams.

Results show how the SCL algorithm can significantly improve the effectivenessof domain independent OvOP classification in each analyzed domain; in particularwhen transferring from the domain of kitchen to electronics, the SCL-MI algorithm

Page 77: Sentiment Analysis for the Italian language

4.3. DOMAIN INDEPENDENT OVOP 65

outperforms the domain classifier with a margin of 2,4%. Moreover the unsuper-vised A-distance has been used to provide an evaluation of the divergence betweendomains.

4.3 Domain Independent OvOP

In this Chapter we focus on training and evaluating classifiers aimed at inter-domainOvOP classification. In particular we are interested in assessing how feature setsexploited in Chapter 3 perform when applied to documents covering an unknowndomain. Three different experiments based on experience described in [5] have beenperformed, in order to evaluate the accuracy of OvOP classification in an inter-domain environment.

In order to understand the numerical magnitude of the complexity introducedby domain dependency to OvOP classification, four classifiers are trained, each on adifferent domain D using the UBT3 feature set, and cross-evaluated. More specifi-cally we are interested in evaluating the performances of each classifier with respectto both its target domain and its foreign domains. Obtained results constitute thebaseline for improvements provided by following experiments.

Our second set of experiments deals with the training of a general purpose OvOPclassifier, based on the whole set of available labelled documents, not grouped accord-ing to their domain. We assume that a general purpose classifier trained on multipledomains, will be less affected by domain-dependency than a classifier trained onlywith documents from one domain. Such approach reduces, in fact, the dependencyfrom domain dependent features used in document representation, as clearly provenin Table 3.4 for the movie review domain.

Our third experiment is based on a different approach, based on a two-stepOvOP classification process. Different OvOP classifiers are combined to composean ensemble [20], where each classifier contributes to the evaluation of the OvOP ofinput documents. OvOP classifiers constituting the ensemble can differ each otheralong various parameters: in our scenario each classifier constituting the ensembleis trained on a different domain.

The OvOP classification process of a given document starts with the evaluation,for each domain specific classifier constituting the ensemble, of the inter-domainOvOP; each classifier classifies the document with an accuracy depending on itstraining domain, according to its specific training domain, will be able to classify ornot the document in the right way. Scores provided by each classifier, in a continuosrange [0,1], where 0 represents the strongly negative documents and1 the stronglypositive ones, are combined together to obtain the overall evaluation provided bythe ensemble.

As suggested in [95], several mechanism can be used to combine the outputprovided by each classifier, the simplest one being voting: the output class is the classthat has been chosen by the majority of the classifiers constituting the ensemble.

Page 78: Sentiment Analysis for the Italian language

66 CHAPTER 4. DOMAIN INDEPENDENT SENTIMENT ANALYSIS

Figure 4.1: The meta-classification OvOP process.

This particular way of combining results, however, can have no sense in our specificscenario. In fact we assume that the predictions made by each classifier on domainsdifferent from the one used for its training may be grossly incorrect. For this reasonwe replace the voting mechanism, as described in [5], with a meta-learner, a learningalgorithm aimed at discovering how outputs of the base learners can be combinedto provide the best performances [81].

The input of the meta-classifier has as many attributes as the number of classifiersconstituting the ensemble; the attribute values represent the predictions of theselearners on the input document, as described by the diagram in in Figure 4.1. Inour approach the ensemble is constituted by three different domain based OvOPclassifiers; such solution, however, can be scaled up to include several different baselearners.

In order to perform training and evaluation of the proposed approach, a newworkflow, the meta-cross evaluation template, has been introduced to the SENT-ITframework. Only a small set of labelled documents from an unknown domain D isrequired to train a meta-classifier devoted to D, given a set of already trained OvOP

Page 79: Sentiment Analysis for the Italian language

4.4. EXPERIMENTS 67

classifiers.

4.4 Experiments

4.4.1 Test set

In order to evaluate the issues that affect the OvOP classification process acrossdifferent domains, three different data sets have been collected. By using the ProductReview Crawler, described in detail in Section 3.3.1, we generated the followingcorpora:

• the Cell Review Corpus: a collection of 1340 reviews concerning cell phonesand smart phones;

• the Car Review Corpus: a collection of 1583 reviews covering the domain ofcars;

• the Book Review Corpus: a collection of 2125 reviews about books written byItalian authors.

Documents have been extracted from the Dooyoo1 product review portal; for eachcorpora a subset constituted by 500 positive and 500 negative documents is usedfor training and evaluation. The distribution of reviews across different ratings is,even in this case, not fair: 63,45% of the collected documents, in fact, have beentagged by users as positive. The predominance of positive documents characterizeseach of the analyzed domains and is consistent with the movie review corpus and,moreover, with the similar corpus described in [63], [84] and [19].

Documents from the four domains present several different properties: booksand cell phones reviews tend to be longer, with an average number of sentences of,respectively, 17,45 and 14,32. Both movie and car reviews, on the other hand, areshorter, with an average number of sentences of, respectively, 10,11 and 9,21. An-other interesting aspect that characterizes the four domains concerns the thesaurusused by reviewers: both cell and car domains present a huge number of technicalterms, aimed at describing the features of each product. Movie reviews, instead,provides a smaller amount of technical terms (e.g.: colonna sonora (soundtrack) orregista (director) and a significant amount of proper names, referring to actors anddirectors. The book review corpus presents, moreover, a particular property: tech-nical words are not present. Reviewers uses more frequently common terms, aimed,in particular, at describing the plot of the book and the emotions the book inspiredthem. A proof of this evidence is available in Table 4.2, where the most significantfeatures, in terms of Information Gain, have been listed for each of the three OvOPclassifiers trained on the new corpora.

1www.dooyoo.it

Page 80: Sentiment Analysis for the Italian language

68 CHAPTER 4. DOMAIN INDEPENDENT SENTIMENT ANALYSIS

4.4.2 Results

Table 4.1 shows the accuracy obtained by OvOP classifiers trained on the three newdata sets; the UTB3 feature set, which provided the best performances on the moviereview corpus, is used for document representation. Features have been ordered bymeans of the Information Gain they provide to the OvOP classification task; onlythe set of 2000 features with the highest IG has been considered in training.

Table 4.1: Average three-fold cross-validation accuracies for each domain dependentOvOP classifier trained according to the UBT3 feature set.

Movie Cell Car BookNB 86,8 77,79 77 75,69SVM 89,0 88,84 82,11 80,49

according to the results reported in Chapter 3, SVM provides better results withrespect to NB across different domains, with a margin ranging between 2% to 5%.Differences in accuracy between NB and SVM present, however, an anomaly on thecell phones domain, where the improvement caused by the introduction of SVM isabout 11.05%.

Table 4.2 reports the 30 features providing the higher Information Gain for eachdomain; two interesting observations arise from the analysis of the collected data.Both cell and car domains, reasonably, present an huge number of technical anddomain dependent terms, which are used by reviewers to describe the features ofeach different class of products, for example stemmed n-grams like ’macchin fo-tograf ’ (camera), jtd, and ’common rail, or to refer to brands and models, like ford,daewo and u700. Orientation is conveyed in both domains, moreover, by stemmedattributes like ottim (best), buon (good), and pessim (worst).

Book domain, on the other hand, does not present technical and domain relatedterms, except for references to authors and book titles, such as pirandell (LuigiPirandello) or ’giorg falett’ (Giorgio Faletti). Common opinion bearing adjectivesare not present in the list reported in Table 4.2; in fact opinion is conveyed by morecomplicated expressions such as ’non consigl’ (not suggested). This assumption hasbeen proven by manually analyzing a subset of reviews extracted randomly formthe book corpora: opinion is usually conveyed by more subtle and complicatedexpressions, including irony, which cannot be easily identified and used in documentrepresentation. We think that the lower accuracy obtained from the OvOP classifiertrained on the book domain is partly due the lack of simple and clear polaritybearing clues.

Moreover, we think that another issue, which concurs in determining the poorperformance of the OvOP classification process in the book domain, is related withthe length of the reviews: book reviews, in fact, present an higher average number of

Page 81: Sentiment Analysis for the Italian language

4.4. EXPERIMENTS 69

sentences and words. Most of the content of each review is used to describe the plotof the book, without providing a proper contribution to the OvOP of the document.In fact it acts as noise in document representation, reducing, consequently, theeffectiveness of the OvOP classification process.

However, even considering the issues related with the book domain, the OvOPclassification approach described in Chapter 3, grants significant results in term ofaccuracy across different domains. Results are similar to the best performancesdescribed in literature for the English language, in particular in [63], [19] and [5].

Table 4.3 shows the accuracy obtained by OvOP classifiers trained using theUTB3 feature set when tested on documents covering a different domain. The lossof accuracy varies between 2,79% (when movie review classifier is used to classifycell review with NB) and 31,1% (when the cell review classifier is used to classifybook reviews with SVM), affecting both classifiers trained using NB and SVM.

The analysis of the obtained results confirms that OvOP classification of reviewsfrom book and car domains is an harder task, which cannot proficiently be achievedwithout a domain dependent classifier. Movie and cell domains, on the other hand,can be classified easily according to the OvOP they convey; we think it could berelated with the fact that in such domains subjectivity is determined in a less subtleway: features used in representation convey OvOP in a more effective way.

Another interesting result emerging from Table 4.3 concerns the performancesof the classifier trained on the movie review: such classifier grants the best accuracyon each domain when NB learning method is used, with a margin varying from0,97% to 4,08%. However results are completely different when SVM is exploited aslearning method; movie classifier is outperformed by car based classifier on the celldomain and by the book based classifier on the car domain. We think this resultis related with the ability of SVM to deal with the higher sparsity of data, withrespect to NB. No distinct clusters of domains seem to arise from the results.

Table 4.4 reports the accuracy obtained by training an OvOP classifier on threecorpora and testing it on the forth corpus. Each training set is constituted by3000 reviews: 1500 labelled as positive and 1500 as negative. Results confirm thatcell domain is easier to classify according to the OvOP of its reviews. Moreoverwe can reasonably assume, by comparing this results with those reported in Table4.1 that, for the cell domain, technical lexicon, even if frequently used as in Table4.2, provides only a small part of the OvOP of each review. Most of the polarity,instead, is conveyed by simple and strongly oriented adjectives and adverbs. Ourassumption has been furthermore proven by manually analyzing each stemmed n-gram constituting the list of 2000 features used by the OvOP classifier trained onthe cell domain.

The car domain, instead, presents an opposite behavior: when domain relatedfeatures are not used for document representation, accuracy presents a loss of about24,44% in the worst case. The loss is similar for both NB and SVM; we assume thisissue could be explained by an higher dependency of the car domain from domaindependent term as polarity carriers. OvOP classification of a car review requires,

Page 82: Sentiment Analysis for the Italian language

70 CHAPTER 4. DOMAIN INDEPENDENT SENTIMENT ANALYSIS

Table 4.2: Top 30 features extracted from each training set with the highest IGvalue.

Cell Car Bookottim diesel pirandellgioc ford fuimpost daewo sconsiglpretes comod padronfotograf nuov sessconsent 750 costitufilm ottim don’macchin fotograf’ graz queglvocal buon giuseppmal ril socialqualsias pres rappresentscienz ’br non’ ’metr ciel’’telefon pens’ pessim ’non piac’4c jtd abit’bl 4c’ prestazion falett’modell punt’ mitic p?registr matiz ’non consigl’integr benzin morform gestion ’non tratt’svegl ’anterior posterior’ ’quot legg’affar ’fiat punt’ pescatoru700 turb ottobr’non eccels’ lanos copertintropp ’39 ari condizion’ numer’telefon cellular’ comodissim provvidentchiamant parchegg scolastanim 16 ’giorg falett’delus rail liberscrittur gioiellin veritchiam ’common rail’ scend

Page 83: Sentiment Analysis for the Italian language

4.4. EXPERIMENTS 71

Table 4.3: Average three-fold cross-validation accuracy for each domain dependentOvOP classifier applied to different domains.

NBMovie Cell Car Book

Movie 86,8 75 60,68 63,66Cell 71,5 77,79 57,87 58,08Car 69,6 71,92 77 59,87Book 68,7 71,06 58,1 75,69

SVMMovie Cell Car Book

Movie 89 74,32 63,69 63,44Cell 70,9 88,84 62,39 57,74Car 67,7 76,92 82,11 59,31Book 70,8 75,29 64,39 80,49

Table 4.4: Classification accuracy of a classifier trained on three domains and testedon the forth domain.

Training Testing NB SVMCell, Car, Book Movie 72,8 71Movie, Car, Book Cell 76,82 76,53Movie, Cell, Book Car 54,46 57,67Cell, Car, Movie Book 60,75 60,31

Page 84: Sentiment Analysis for the Italian language

72 CHAPTER 4. DOMAIN INDEPENDENT SENTIMENT ANALYSIS

in order to be proficiently performed, a deeper relationship with the specific andtechnical terms used by reviewers.

Our assumptions have been further proven by the results reported in Table 4.5,where a single OvOP classifier, trained all generated corpora, is tested across differ-ent domains with both NB and SVM learning methods.

Table 4.5: Classification accuracy of a classifier trained on the four domains.

Testing NB SVMMovie 78,7 76,9Cell 71,92 79,42Car 65,6 69,3Book 72,46 69,45

Our last experimental activity is related with the training and testing of a meta-classifier, aimed at performing OvOP classification on new domains, based on a smallamount of labelled data from the target domain. Four different meta-classifiers havebeen trained, in order to deal with the different combinations of training and targetdomains; 250, 500 and 1000 labelled documents from the target domain have beenused, respectively, during experimental activity. Domain dependent classifiers havebeen trained using NB as learning method, while the meta-classifier is based onSVM. Table 4.6 reports the accuracy measured for each target domain and for eachset of labelled documents used for training. Each meta-classifier has been tested ona set of 500 labelled documents from the target domain not used for training.

The results clearly show how OvOP meta-classifier performs better than thegeneral purpose classifier previously described, on each analyzed domain, providingan accuracy ranging from 75,04% for the car reviews to 84,0% for movie reviews. Bycomparing the results with respect to those reported in Table 4.5 it clearly emergesthat the improvement in accuracy is similar across different domains.

No clear relationship between the number of labelled documents in the targetdomain used for training and the provided accuracy can be inferred from the re-sults; in two domains a lower number of labelled training documents leads to betterperformances, similarly to the results described in [5]. However, this evidence is notvalid for the cell review domain, where a larger set of labelled training documentsis required to improve the effectiveness of OvOP classification process.

Our results, obtained applying OvOP to documents written in Italian language,present similarities with analogous experiences performed on reviews written inEnglish language, as in [5] and [9]. In particular we have shown how our meta-classification process, based on domain based classifiers and the UBT3 feature set,performs relatively well even on inter-domain OvOP classification process.

Page 85: Sentiment Analysis for the Italian language

4.5. CONCLUSIONS 73

Table 4.6: Classification accuracy of a meta-classifier evaluated on the four domains.

Target 250 500 1000Movie 84,0 81,4 80,0Cell 80,63 80,63 83,23Car 74,85 75,04 74,85Book 75,85 75,45 75,05

4.5 Conclusions

In this Chapter we have faced the problem of defining supervised methodologies,based on machine learning, aimed at domain independent OvOP classification ofdocuments written in the Italian Language. Product reviews covering four differentdomains (movies, cell phones, cars, and books) have been collected and analyzed.

SENT-IT framework, described in Chapter 3, has been used to train and assessdifferent OvOP classifiers; in particular we focused on evaluating the accuracy ofeach OvOP classifier when applied to reviews covering domains different from thoseused for training. OvOP classifiers are based on the UBT3 feature set, limitedto the 2000 features with the highest Information Gain value. Results show howclassification of inter-domain OvOP is harder than intra-domain OvOP, with a lossin accuracy between 2,79% and 31,1%.

Moreover we have tried to train a general purpose OvOP classifier, using all fourcorpora as training set, by adopting the UBT3 feature set for document represen-tation. We also exploited a set of four OvOP classifiers, each trained on three datasets and tested on the remaining domain.

A significant improvement to inter-domain OvOP classification has been achievedby adopting an ensemble classification approach; three domain dependent OvOPclassifiers are trained on their respective domains and applied, in parallel, to a smallset of labelled documents covering the forth domain. For each document a triplet isgenerated, by collecting the output of each domain dependent OvOP classifier. Theset of triplets is used to train a SVM based meta-classifier, aimed at determininghow the responses of each domain dependent classifier can concur to the evaluationof the OvOP of a review in a forth domain.

The results we obtained from our experimental activity suggest that inter-domainOvOP is harder than intra-domain OvOP classification even in the case of theItalian language. Although a general purpose classifier, trained on the whole setof domain dependent corpora, can improve the effectiveness of OvOP classification,our approach based on meta-classification performs better, requiring only a smallset of labelled documents from a new domain to be available.

Page 86: Sentiment Analysis for the Italian language

74 CHAPTER 4. DOMAIN INDEPENDENT SENTIMENT ANALYSIS

Page 87: Sentiment Analysis for the Italian language

Chapter 5

Automatic Generation of LexicalResources for Sentiment Analysis

Abstract

In this Chapter we introduce an unsupervised approach for generating a sen-timent oriented lexical resource for the Italian language. In particular we in-vestigate two different algorithms, based on the shortest path models, aimedat assigning an orientation indicator to a set of terms. Both proposed al-gorithms have been applied to the graphs built upon two different lexicalresources available for the Italian language: the Open Office dictionary anda social web based dictionary. Sentiment oriented terms constituting the re-sources generated by both algorithms have been manually validated, in orderto assess the assigned polarity. Sentiment oriented terms have been usedto perform unsupervised and domain independent OvOP analysis. Resultscollected during the experimental phase show how OvOP based on lexical re-sources provides lower performances than supervised approaches presented atChapter 3. Classification accuracy, in particular, significantly decreases whenthe OvOP analysis task is performed on negative oriented documents.1

5.1 Introduction

Several research experiences presented in literature in the field of Sentiment Analysisare based on collections of terms (dictionaries, thesaurus, et al.), which have beenmanually or automatically labelled according to the specific connotation (“positive”or “negative” but even “objective” or “subjective”) they usually convey.

Such sets of terms could be constituted, for example, by positive terms (e.g.:good, beautiful, smart, best), which may be seen in a text as clues bearing positiveorientation, and by negative terms (e.g.: bad, worst, negative, horrible) which, on

1A special thank to Matteo Borsari for his contribution to the work described in this chapter.

Page 88: Sentiment Analysis for the Italian language

76CHAPTER 5. AUTOMATIC GENERATION OF LEXICAL RESOURCES FOR SENTIMENT ANALYSIS

the other hand, are usually used to express negative polarity. Terms could alsobe labelled according with the subjectivity they could convey to a given text, like“opinion”, “think” or “like”.

Some of the resources which have been presented in literature move towardsthe binary classification of input terms, assigning to each term a value aimed atrepresenting the strength of its polarity, for example by using a value in [-1,1],where the sign represents positivity or negativity of the term [3] [38] [40].

The importance of generating sentiment oriented lexical resources has been ex-plained as relevant in developing systems devoted to Sentiment Analysis by manyresearch papers. In particular in [34] the authors showed how a set of opinion bearingadjectives could be proficiently used to identify sentences in a given text expressingsubjectivity. In particular the authors showed how checking the presence of auto-matically labelled adjectives grants a better performance in Opinion Mining thanto simply checking the presence of unlabelled adjectives2.

Developing a sentiment and subjectivity oriented lexical resources is a time-consuming task which could hardly be achieved manually or in a short time; the mostsignificant example of manually annotated lexical resource aimed at representingsentiment orientation of a set of terms, for the English language, is represented bythe General Inquirer3 lexicon [76]. Constituted by 3596 words, including adjectives,verbs, and adverbs, the General Inquirer has been used as Gold Standard by manyresearchers to evaluate the precision of automatically generated lexical resources[83], [84], [3], [25], [26], [27], [28] . Each term included in the General Inquirer hasbeen properly labelled as positive or negative, according to its prior orientation:1,614 terms have been classified as positive and 1,982 terms are negative.

Sentiment oriented lexical resources have been used in OvOP analysis and Opin-ion Mining in several works: in [82] the OvOP of an input document has beenassessed as the algebraic sum of the prior polarity assigned to the terms constitut-ing the lexical resource, which appear in the document. In [8], on the other hand,sentiment oriented terms constituting a lexical resource have been used to definenew features for document representation, aimed at improving the effectiveness ofthe SVM classifier. For instance new features could be defined in order to representspecific properties of a given document with respect to the sentiment oriented lex-ical resource, such as, for example: “the document contains more than 5 stronglyoriented positive terms”. The results described in [8] show how such feature could ef-fectively improve the performances of the subjectivity classification task; moreoverthey show how features based on automatically generated lexical resources couldoutperform features based on manually labelled terms.

In this Chapter we introduce a novel unsupervised approach aimed at generating

2This results, however, in only partly confirmed by the results described in Chapter 3 for thespecific task of Sentiment Analysis. Adjectives constitute, as in Table 3.4 only a part of the mostrelevant features which are used by the movie domain dependent classifier in order to performOvOP analysis and classification.

3http://www.wjh.harvard.edu/ inquirer/

Page 89: Sentiment Analysis for the Italian language

5.1. INTRODUCTION 77

sentiment oriented lexical resource, constituted by adjectives in Italian language. Foreach adjective the proposed methodology evaluates an orientation indicator, in therange [-1,1], where -1 is represents very negative orientation, while 1 represents verypositive orientation. Our approach represents a refinement of similar approachespresented in literature, properly adapted in order to be applied to the limited set ofexisting lexical resources available for the Italian language. Two different algorithmsfor automatic generation of sentiment-oriented resources have been applied on twodifferent existing sets of terms. The four generated resources have been manuallyannotated in order to assess the effectiveness of the unsupervised labelling approach.This work represents the first approach presented in literature to automaticallygenerate a sentiment oriented lexical resource for the Italian language.

5.1.1 Prior subjectivity status contextualization

Sentiment oriented lexical resources represent, for each term, its prior subjectivitystatus [93], defined as the polarity or subjectivity the term expresses when taken outof the context. However terms could vary their subjectivity or polarity accordingto the specific context and domain in which they are used, as discussed in Chapter2. In fact terms like fotocamera (digital camera) or Johnny Depp, which representsome of the most significant features for the domain dependent classifiers defined inChapters 3 and 4, could not be properly characterized by a prior subjectivity status:their subjectivity, and more specifically the polarity, they convey is strongly boundto the specific context in which they are used. For this reason, as stated by Wilsonin [93], such prior information should be contextualized, in order to more towardscontext polarity identification.

Contextualization of prior information could be performed at several differentlevels, by varying the dimension of the context under analysis. In fact, as statedin [66], several lexical elements, at both sentence and document level, could act asvalence shifters, by modifying the prior orientation (and even the strength) of theterms which are used in a text. The most significant valence shifters described in[66] and the different levels at which contextualization process could take place aredescribed in Figure 5.1.

Negation and intensifiers are the simplest form of contextual valence shifters; theycould, in fact, switch the orientation of a term or increase/decrease its strength. Forexample the sentence Lui non e stupido (He’s not stupid) has a positive orienta-tion, obtained by switching the negative orientation of term stupido (stupid)4. Inthe sentence La batteria e poco efficiente (Battery is rather efficient), on the otherhand, the polarity of term efficiente (efficient) has been reduced in strength by theintensifier poco (rather).

When negation and intensifiers are used in conjunction as valence shifters of the

4The sentence represent a particular figure of speech, called litotes. Litotes describes the ex-pression of an idea by a denial of its opposite, principally by using multiple negations.

Page 90: Sentiment Analysis for the Italian language

78CHAPTER 5. AUTOMATIC GENERATION OF LEXICAL RESOURCES FOR SENTIMENT ANALYSIS

Figure 5.1: Prior subjectivity status contextualization process.

Page 91: Sentiment Analysis for the Italian language

5.2. RELATED WORK 79

same term (e.g.: Lui non e molto intelligente (He’s not very smart)) determiningthe new orientation of the term could be a difficult task, even for the human expert;intensifiers and negation, when applied to the same terms, could not be consideredas independent one other. Some intensifiers could even act as negations, modifyingboth the strength and the orientation of the term on which they are applied: forexample the sentence Scarsamente sufficiente (Barely sufficient) expresses a negativeorientation. The scarsamente (barely) intensifier modifies the orientation of thesufficiente (sufficient) term, which is characterized by a slightly positive (or at leastneutral) prior orientation.

Irony and modal operators, aimed at expressing possibility or necessity, couldalso act as valence shifters at sentence level; in order to properly handle describedphenomena complex analysis tools could be required [66].

Contextualization of prior information could also be performed at documentlevel; in particular several linguistic elements could act as valence shifters for agiven term even if located elsewhere in the text. Connectors (e.g. however, indeed,but, and, et al.) represent the most important valence shifter at document level: aconnector could be used to mitigate, improve or even deny the polarity of a givensentence. Following example could help to understand the effects of connectors onprior oriented terms: Sebbene egli sia molto bravo in matematica, e un terribileinsegnante (Although he is brilliant at math, he is a horrible teacher). The firstsentence provides a very positive orientation towards the subject of the sentence,while the second sentence clearly expresses a negative opinion; the positive contri-bution provide by the first sentence is, in fact, globally neutralized by the sebbene(although) connector, resulting in an overall negative score for the document.

Connectors have been exploited in [33] in order to support unsupervised labellingof orientation of a given set of adjectives. In particular “and” and “but” connectorshave been used to determine the number and the type (conjunction or opposition)of co-occurrences of each couple of adjectives from the input set in a large corpus; aclustering algorithm has been used to determine the two different classes of adjectives(positive and negative).

Reported speech could be seen as another example of document level valenceshifter just as in the case of to modal operators at sentence level, reported speechcould include several opinion bearing terms, expressing positive or negative orienta-tion, which in fact do not contribute to the overall opinion of the document, becausethey are not part of author’s opinion.

5.2 Related Work

5.2.1 Hatzivassiloglou and McKeown

In [33], authors describe a novel approach aimed at determining the orientation ofa set of adjectives; the orientation of an adjective is predicted by analysing how it

Page 92: Sentiment Analysis for the Italian language

80CHAPTER 5. AUTOMATIC GENERATION OF LEXICAL RESOURCES FOR SENTIMENT ANALYSIS

co-occurs with other adjectives in a large corpus of unlabelled documents.More specifically the work exploited by the authors is based on the assumption

that adjectives which share the same orientation, co-occur more frequently in con-junction (e.g. Lui e bravo e gentile (He’s good and gentle)). On the other handadjectives which co-occur more frequently in disjunction (e.g.: Lui e bravo ma allostesso tempo molto pigro (He’s good but very lazy at the same time)) would probablyconvey opposite orientations.

In order to infer the orientation of adjectives according to their co-occurrences inthe large corpus of documents5, used during the experimental activity, the followingtasks have been performed:

1. All conjunctions of adjectives are extracted from the corpus along with relevantmorphological relations. A two-level finite-state grammar has been used inorder to extract conjunctions: 13.426 pairs of co-occurring adjectives have beencollected. In addition 2.005 pairs of opposite adjectives have been manuallycollected, representing morphological relations like fortunate-sfortunato (lucky-unlucky), where a prefix could be used to completely switch the orientationof an adjective (this phenomenon could be seen in both English and Italianlanguage).

2. The set of the extracted conjunctions is split into a training set and a test set;test set has been dynamically collected by including conjunctions where eachof the conjoined adjectives appears at least α times together with any otheradjective occurring in the test set. Varying the value of parameter α from 2 to5 has allowed for collecting four different test sets: α parameter controls the“hardness” of the test set. In fact, as reported in [33], a lower value of α leadsto a significant loss in accuracy, from 92,37% with α = 5 to 78,08 with α = 2.

3. The conjunctions in the training set have been used to train a log-linear clas-sifier, aimed at determining if a pair of adjectives has similar or oppositeorientation. The trained classifier has been used on the different test sets inorder to infer the orientation of each conjunction. The output of the classifica-tion task can be represented as a graph, constituted by nodes, the adjectives,and edges, representing opposite or similar orientation between adjectives.

4. The set of adjectives constituting the different test sets has been partitionedin two clusters by using a clustering algorithm.

5. According to the experimental observation stating that frequency of positiveadjectives is higher in the document set than the frequency of negatively ori-ented adjectives, the items contained into the largest cluster have been labelled

5The Wall Street Journal Corpus of 1987, constituted by more than 21.000.000 words, is avail-able from the ACL Data Collection Initiative as CD-ROM 1 (http://ldc.upenn.edu/Catalog/).

Page 93: Sentiment Analysis for the Italian language

5.2. RELATED WORK 81

as positive. Indeed elements of the smallest cluster have been classified as neg-ative.

The accuracy of the described algorithm has been assessed, for each test set, bycomparing the orientation automatically assigned to each adjective with 1.336 man-ually labelled adjectives (657 as positive and 679 as negative) occurring at least 20times in the document set. The algorithm infers the orientation of adjectives withan accuracy varying from 78.08% when α = 2 to 92.37% for α = 5.

5.2.2 Turney and Littman

In order to cope with the problem of generating a lexical resource of opinionatedadjectives, authors in [83, 84] move towards an information retrieval approach, notbased on the computation linguistic tools exploited in [33]. The method is based onthe evaluation of the Pointwise Mutual Information (PMI); PMI is used to providewith a value to the semantic association between two terms ti and tj. PMI is definedas:

PMI(ti, tj) = logPr(ti, tj)

Pr(ti)Pr(tj)(5.1)

where Pr(ti) represents the frequency of term ti and Pr(ti, tj) the frequency ofco-occurrences of terms ti and tj.

PMI is used to determine the prior orientation of a given term. Positively orientedterms, according to authors’ assumption, are characterized by an higher semanticassociation with other positively oriented term sand, consequently by an higher valueof PMI. On the other hand negatively oriented terms provide a higher PMI whensemantic association is calculated with respect to other negatively oriented terms.

In order to evaluate the orientation of a term by means of the PMI, two sets ofpositive and negative “seed” terms have been defined:

Sp = {good, nice, excellent, positive, fortunate, correct, superior}

Sn = {bad, nasty, poor, negative, unfortunate, wrong, inferior}

The orientation O(t) of a target term t is given by

O(t) =∑ti∈Sp

PMI(t, ti)−∑

tj∈Sn

PMI(t, tj) (5.2)

which represents the sum of the weights of the semantic association of t with theseed positive terms minus the sum of the weights of the semantic association of twith the seed negative terms.

In [84] the PMI of each term with respect to the seed terms is computed usingthe following two methods:

Page 94: Sentiment Analysis for the Italian language

82CHAPTER 5. AUTOMATIC GENERATION OF LEXICAL RESOURCES FOR SENTIMENT ANALYSIS

1. the PMI–IR method: based on information retrieval techniques, uses a websearch engine to evaluate the frequency of terms ti and tj and the frequencyof their co-occurrences. Given ti and tj, three different queries are submittedto the search engine, respectively “ti”, “tj” and “ti NEAR tj”; the numberof matching documents returned for each query is used to compute the valueof PMI. In [84] authors exploited the Altavista search engine6, because in2002 it was the only search engine providing the possibility to use the NEARoperator, which is used to match only those documents where the terms usedas operands appear at a maximum distance of ten terms one another;

2. the PMI-LSA method applies Latent Semantic Analysis (LSA) to calculate thestrength of the semantic association between words; more specifically SingularValue Decomposition (SVD) is used to analyse the statistical relationshipsamong words in a corpus. SVD is applied to a matrix X, in which the rowvectors are words and the column vectors are chunks of text (e.g., sentences,paragraphs, documents). Each cell represents the weight of the correspondingword in the respective chunk of text, calculated as the TF-IDF score of theword for the chunk.

The described approach has been tested with both the set of sentiment-orientedterms constituting the General Inquirer lexicon (GI) term and the set of termsdefined in [33] (HM).

PMI-IR has been calculated by using three different corpora indexed by theAltaVista search engine at the time of the experiment:

1. the whole set of English documents indexed by AltaVista, constituted by 350million pages containing about 100 billion word occurrences (AV-Eng);

2. the set of documents from the “.ca” domains, constituted by 7 million pagescontaining about 2 billion word occurrences (AV-CA);

3. the set of documents collected by the Touchstone Applied Science Associatesfor developing “The Educator’s Word Frequency Guide (TASA). 61.000 pagescontaining about 10 million word occurrences constitute the TASA test set.

On the HM term set, when PMI-IR is evaluated on the AV.Eng corpus, thedescribed approach outperformed the method defined in [33]. In particular accuracygrows from 78,08% to 87,13%. On the other hand when applied to the TASA corpusthe method provides an accuracy of 61,83%, with respect to an accuracy of 78,08%obtained in [33]. Authors, moreover, showed how the NEAR operator, used in queryformulation in order to calculate Pr(ti, tj), clearly outperforms the AND operator.

The PMI-LSA has been evaluated only on the TASA test set, the smallest one,due to its computational complexity, which could not be applied to the larger sets.

6www.altavista.com

Page 95: Sentiment Analysis for the Italian language

5.2. RELATED WORK 83

On the same test set (TASA) the PMI-LSA method shows significant improvementsover PMI-IR, with an increase varying form 6 to 9% in accuracy.

5.2.3 Kamps et al.

In [38] authors presented a methodology aimed at generating a sentiment-orientedlist of adjectives by focusing on the lexical relations between synsets defined inWordNet.

In particular, authors exploited relationships (synonymy) between synsets togenerate a graph representation of the connections between adjectives. The graphhas been used to evaluate the semantic distance between terms called geodesic dis-tance. Geodesic distance d(ti, tj) has been defined as the length of the shortest pathconnecting term ti with term tj. If two terms are not connected, their geodesicdistance has value ∞. Figure 5.2 represents a subset of the nodes and edges whichhave been generated by connecting synsets sharing a synonymy relation.

Page 96: Sentiment Analysis for the Italian language

84CHAPTER 5. AUTOMATIC GENERATION OF LEXICAL RESOURCES FOR SENTIMENT ANALYSIS

Figure1:PartoftheW

ordNetdatabasefrom

thevistapointofadjective‘good.’TheedgesareSYNSET

relations,nodesareonlyconnectedbyashortestpath.

Figure2:TheMPL’stoadjectives‘good’and‘bad’.Nodesareconnectedbyedgesoflengthcorre-

spondingtothe

MPL.

Figure3:Thevaluesassignedbythe

EVAfunction.

Fig

ure

5.2:

Asu

bse

tof

the

nodes

and

edge

sco

nst

ituti

ng

the

Wor

dN

etgr

aph

anal

yze

din

[38]

.

Page 97: Sentiment Analysis for the Italian language

5.2. RELATED WORK 85

The orientation O(t) of a synset representing the adjective t is calculated as

O(t) =d(t, ”bad”)− d(t, ”good”)

d(”bad”, ”good”)(5.3)

with d(”bad”, ”good”) = 4, O(”bad”) = −1 and O(”good”) = 1

An adjective t is classified as positive if O(t) > 0, otherwise it is classified asnegative. The method has been assessed on a small subset of the term set defined in[84] (TL) constituted by 663 adjectives; such limitation has been justified by the factthat the method proposed by authors could be applied only on adjectives connectedto the two seed terms. The accuracy evaluated with respect to the reduced TL termset is about 67,32%, which is clearly lower than the one obtained in the previouslydescribed approaches.

5.2.4 Takamura et al.

A particular approach to the problem of determining the term orientation of a set ofterms has been presented in [78] for the Japanese language. The authors describedthe problem using the spin model, a physical model based on a set of interactingelectrons. Each electron has a specific spin direction, with value +1 or -1; eachelectron propagates its spin direction to its neighbors in an iterative way, until thesystem reaches a stable state.

The spin model represents each term as an electron and the term orientationas its spin direction. The neighborhood matrix representing interactions betweenelectrons eti and etj has been defined by looking at the occurrences of term tj intothe gloss of term ti and vice versa. The underlying assumption is that terms will beused more frequently in glosses assigned to other terms having similar orientation.

The spin model has been iteratively updated until the “minimum energy” con-figuration is reached; the observed accuracy is about 70%. The method, however,presents several issues including computational complexity and, on the contrary withrespect to previously described methods, does not provide any value describing theterm orientation strength.

5.2.5 Esuli et al.

A significant contribution to the problem of determining the term orientation of aset of terms has been provided by Esuli et al. in [25, 26, 27], where the developmentof the SentiWordNet resource is described. More specifically the authors focusedon determining both the subjectivity and the orientation of WordNet synsets, bycomparing the similarity of the vector representation of the gloss used to describeeach synset. Each synset is characterized by three specific attributes: Obj(t), Pos(t)and Neg(t) describing, respectively, the objective, positive and negative orientationof the synset.

Page 98: Sentiment Analysis for the Italian language

86CHAPTER 5. AUTOMATIC GENERATION OF LEXICAL RESOURCES FOR SENTIMENT ANALYSIS

The assumption underlying the research experience of Esuli et al. is similar tothe one inspiring Takamura’s work: synsets sharing the same orientation or the samesubjectivity will probably present glosses characterized by an higher similarity.

In order to assess the orientation of a synset, its gloss is represented as a vector,having the TF-IDF value of each term appearing into the gloss as an element.Vectors, whose elements have been properly filtered by using a Mutual Informationfiltering function, are used to train two different binary classifiers: a classifier aimedat discriminating between objective and subjective synsets and a second classifieraimed at distinguishing between negative and positive synsets.

Terms training sets have been automatically generated by expanding a startingset of terms, labelled as positive or negative, provided as input. Expansion has beenachieved by navigating the relationships connecting WordNet synsets including syn-onymy, direct antonym, indirect antonymy, hypernymy and hyponymy (not limitedexclusively to synonymy as in [38]).

For each term the vector representation of the gloss assigned to its synset isgenerated; vector representations of both positive and negative collected terms areused to train the binary classifier aimed at distinguishing between negative andpositive synsets.

Two different seed list have been exploited in experimental activity: the set of14 terms used by Turney in [84] and the pair of terms {good,bad} used by Kampsin [38]: no significant differences in terms of accuracy have been identified when thetwo different expanded sets are used for training the classifier. Different learningmethods have been exploited, including Naıve Bayesian classifiers, Support VectoreMachine, PrTF-IDF and Rocchio classifier.

Trainer classifiers have been tested on the HM, TL and KA term set; on the HMtest set the best results are obtained with SVM (87.38% accuracy). On the TL testset the best results are obtained with the PrTFIDF learner (83.09%), while on theKA test set the best results are obtained with SVM (88.05%). Evaluated accuracyrepresents a significant improvement with respect to previously described researchexperiences.

Moreover the proposed methodology has been used on a different resource fromWordNet: the Merriam-Webster on-line dictionary. Results showed a best accuracyof 83.71%, 79.78%, and 85.44% on the HM, TL, and KA test sets. Such result proveshow the proposed approach could be proficiently used even on other existing lexicalresources. Figure 5.3 shows the classification of the different synsets assigned to theterm “efficient” provided by the SentiWordNet resource.

5.3 Determining the polarity orientation

In this Section we introduce our novel approach for determining the orientation ofa set of adjectives of the Italian language starting from a set of manually classifiedseeds and a lexical resource (e.g.: dictionary, lexicon) with a graph-like structure.

Page 99: Sentiment Analysis for the Italian language

5.3. DETERMINING THE POLARITY ORIENTATION 87

Figure 5.3: The classification of the term ”efficient” provided by SentiWordNet.

Our approach is based on the assessment of each of the shortest paths connectinga term of the lexical resource with one of the terms constituting the seeds list; weare interested in evaluating if two proposed metrics exploiting the shortest pathcomputation could be used to describe semantic similarity between terms.

Our work is mainly inspired by both Kamps et al. [38] and Turney et al. [84]; inparticular our approach extends the model of geodesic distance proposed by Kampsby considering several kinds of relationships connecting terms and, moreover, byintroducing the notion of decay of the contribution provided by each seed term indetermining the orientation of a term. Our model includes in graph representa-tion of the lexical resource several relationships between terms, such as synonymy,antinomy, weak synonymy and weak antinomy7.

The shortest path ts = {t, . . . , s} between the term t and the seed term s couldinclude different kind of edges, each representing a different relationship betweenterms. Each edge, according to its type, is used to evaluate the overall contributionprovided by the seed s to the orientation of term t. We assume that synonymousrelationships preserve the orientation of the contribution provided by the seed s toterm t, while antonymous relationships tend to provide term t with a contributionthat is opposite to the orientation of seed s.

The function decay(i, j) is used to calculate the orientation and the strength ofthe contribution of the edge v ∈ ts connecting the node i with the node j to theoverall contribution of ts and is defines as:

decay(i, j) =

+1 if nodes i and j are connected by a synonymy relationship−1 if nodes i and j are connected by a antinomy relationship

+0, 8 if nodes i and j are connected by a weak syn. relationship−0, 8 if nodes i and j are connected by a weak ant. relationship

(5.4)

7Also known as “see-also” relationship, aimed at connecting terms which are similar but notproperly synonymous or antonymous.

Page 100: Sentiment Analysis for the Italian language

88CHAPTER 5. AUTOMATIC GENERATION OF LEXICAL RESOURCES FOR SENTIMENT ANALYSIS

The function decay(i, j) is based on a novel assumption, not considered by Kampset al., stating that weak relationships, like the see-also relation, connecting twoterms provide a lower contribution than both similarity and antinomy. If the pathts contains edges representing weak relationships, the overall contribution of thepath should be less significant than paths constituted only by similarity edges. Thevalue 0,8 has been assigned as decay to edges representing weak relationships; suchedges reduce the contribution of orientation provided by the seed term by 80%.

The function d(ts) evaluates the overall decay over the path ts = {t, . . . , s}connecting term t with seed s; overall decay is defined as:

d(ts) =s∏

i=1

decay(i, i+ 1) (5.5)

d(ts) = 0 when there is no path connecting term t with the seed s; in this casethe seed s does not provide any contribution to the orientation of the term t. Thefunction d(ts) is defined as the product of the function decay(i, j) applied to eachedge constituting the path ts. For example given the path ts, including two edgesrepresenting weak synonymy relationships and two edges associated with synonymyrelationships, the overall decay will be 0,64. This means that the seed s will con-tribute to the orientation of t only by providing 64% of its orientation value (definedin the range [-1,1]). On the other hand, if the shortest path ts includes an edge repre-senting a synonymy relationship and an edge representing an antinomy relationship,the contribution of s to the orientation of t will be the opposite its orientation value,with d(ts) = −1.

Semantic similarity between a term t and the list of positive seed terms is definedas:

m+(t) =

∑s∈S+ O(s)× d(ts)∑

s∈S+ O(s)(5.6)

where O(s) represents the polarity manually assigned to each term constituting theseed list S+. Semantic similarity between a term t and the list of negative seedterms is defined, similarly, as:

m−(t) =

∑s∈S− O(s)× d(ts)∑

s∈S− O(s)(5.7)

The orientation of a term t with respect to the list of positive and negative seedterms S+

⋃S− could be calculated as follows:

O(t) = m+(t) +m−(t) (5.8)

A term t is classified as positive if O(t) > 0, otherwise it is classified as negative.A second algorithm has been proposed as a variant to the function O(t) defined in

Equation 5.8 in order to provide a different and simpler representation of the decaymodel. The same decay value is assigned to each relationships: the contribution

Page 101: Sentiment Analysis for the Italian language

5.3. DETERMINING THE POLARITY ORIENTATION 89

Figure 5.4: The term polarity evaluation process.

provided by the seed s to the term t is reduced by the same amount independentlyfrom the kind of edges constituting the path ts. The decay value k has been empir-ically estimated in 0,8; decay value should, moreover, not be applied to those termswhich are directly connected with a seed term. The Ovariant(t) is defined as:

Ovariant(t) =

∑s∈S+ O(s)× sign(d(ts))× k|ts|−1∑

s∈S+ O(s)+

∑s∈S− O(s)× sign(d(ts))× k|ts|−1∑

s∈S− O(s)(5.9)

where |ts| is the length of the shortest path connecting the term t with the seed s andfunction sign(d(ts)) is used to control switches in polarity due to edges connectingopposite oriented terms.

The process we followed in order to determine adjective orientation, which isrepresented by the scheme in Figure 5.4, is composed by the following steps:

1. definition of a set S+ of one or more positive seed terms and a set S− of oneor more negative seed terms, which will be used to determine the orientationof term t. In our model we limit the terms of S+ and S− to adjectives, asproposed by both Turney and Kamps in their respective works. In particularwe initialize our seed lists by translating in Italian the bag of 14 adjectivesproposed by Turney in [84]. Our seed lists are constituted by the followingadjectives:

S+ = {buono, bello, eccellente, positivo, fortunato, corretto, superi-ore}

Page 102: Sentiment Analysis for the Italian language

90CHAPTER 5. AUTOMATIC GENERATION OF LEXICAL RESOURCES FOR SENTIMENT ANALYSIS

S− = {cattivo, brutto, povero, negativo, sfortunato, errato, inferi-ore}

2. filtering of the input lexical resource in order to consider only adjectives;

3. generation of a graph representation of the input lexical resource, by represent-ing each adjective as a vertice and each synonymy, antinomy, weak synonymyand weak antinomy relationship as an edge connecting two adjectives;

4. for each vertice t calculation of the set of shortest paths PT = {ts1 , ts2 , . . . , tsn}connecting, respectively, t with each seed term in S+

⋃S−. For each seed term

a breadth-first search (BFS) is performed. Edges are explored in the followingorder: synonymy, antinomy, weak synonymy, and weak antinomy.

5. for each vertice t calculation of the value of both O(t) and Ovariant(t) accord-ing with the set of shortest paths Pt evaluated at previous step for term t.Terms which are not connected to any seed term are excluded from the outputresource.

5.4 Experiments

Two different existing dictionaries of terms written in the Italian language have beenused in order to generate the sentiment oriented lexical resources. Both dictionarieshave been chosen according to their availability and to the presence of semanticrelations between lemmas to be used to build a graph-based representation of thedictionary.

5.4.1 The OpenOffice dictionary

The OpenOffice dictionary (D1) for the Italian language8 we used in our experimen-tal activity is constituted by 25372 lemmas, including 8941 adjectives. A lemmatizedform, a short definition, a set of usage examples and his Part-Of-Speech tag consti-tute each lemma.

A synonymy relation relates each lemma constituting the dictionary with theother lemmas; by using synonymy relations as edges and lemmas as nodes, theOpenOffice dictionary could be represented as a graph and be used as input to oursentiment annotation algorithms. The set of nodes and edges constituting the graphrepresentation has been extracted from the OpenOffice dictionary raw file by meansof a set of regular expressions .

Extracted data have been filtered in order to remove lemmas which do not rep-resent adjectives.

8http://it.openoffice.org/linguistico/thesaurus.html

Page 103: Sentiment Analysis for the Italian language

5.4. EXPERIMENTS 91

Figure 5.5: Data provided by the SinonimiMaster dictionary for the term efficiente(efficient).

5.4.2 The SinonimiMaster dictionary

The SinonimiMaster dictionary (D2) is a free available dictionary9 constituted by53949 lemmas, including 8888 adjectives. Each lemma has a Part-Of-Speech tagrepresenting its linguistic function. Such attribute has been used to filter out terms,which are not adjectives, during the experimental activity. Moreover lemmas areconnected to each other by means of four different relationships: synonymy, anti-nomy, weak synonymy, and weak antinomy. The SinonimiMaster dictionary, then,could be easily represented as a graph structure.

Data harvesting and parsing have been performed by means of an ad-hoc crawlingmodule, written in Java, based on a set of XPath extraction patterns. Figure 5.5shows an example of the data provided by the SinonimiMaster dictionary for theterm efficiente (efficient).

5.4.3 Test set

In order to assess the coverage of the generated lexical resources, a group (L1) of248 orientation-bearing adjectives (118 positive, 130 negative) has been manuallycollected. Eight users have been asked to provide a list of adjectives that could beused to express positive or negative attitude. 53 adjectives (L1+) (28 positive and25 negative) have been provided by two or more annotators; users did not classifyany of the 53 shared adjectives as members of different classes. The list of adjectiveswith two or more occurrences included into the L1 collection is reported in Table5.4.3.

Moreover a second collection (L2) of 280 orientation-bearing adjectives, randomlyextracted by D1

⋃D2, has been generated; L2 is aimed at evaluating the accuracy

9http://www.homolaicus.com/linguaggi/sinonimi/

Page 104: Sentiment Analysis for the Italian language

92CHAPTER 5. AUTOMATIC GENERATION OF LEXICAL RESOURCES FOR SENTIMENT ANALYSIS

of the proposed algorithms for determining the term orientation. Four annotatorshave been involved in the development of the L2 collection; each adjective can beclassified as positive, negative or neutral by human annotators. Inter-agreementbetween annotators is up to 82,7 % over the L2 collection (adjectives labelled withthe same class by all four annotators).

The disagreement between annotators is mostly due to adjectives, which havebeen classified as neutral: for example adjective sovraumano (superhuman) has beenclassified by 2 annotators as positive and by 2 annotators as neutral. Only oneadjective, invertito (inverted) has been classified as both positive and negative bythe human annotators. In order to overcome disagreement between annotators, onlyadjectives labelled with the same class ci by at least 3 annotators are considered: 14adjectives having an ambiguous classification have been removed from the L2 testset.

5.4.4 Seed sets and parameters

As discussed in Section 5.3 in the experimental activity we used the following seedsets, which represent the Italian translation of adjectives originally exploited by Tur-ney et al. in [84]:

S+ = {buono, bello, eccellente, positivo, fortunato, corretto, superiore}

S− = {cattivo, brutto, povero, negativo, sfortunato, errato, inferiore}

The prior orientation of each seed O(s), used by our algorithms to compute theorientation of any other adjective constituting the input lexical resource, is set to 1∀s ∈ S+ and -1 ∀s ∈ S−.

5.4.5 Results

The experimental activity required several different lexical resources to be generatedaccording to the number of combinations of input lexical resource, classificationalgorithms and parameters (seed sets, prior orientation of seed terms, decay valuesused by the decay(i, j) function, et al.).

In particular, we tried several alternatives including different seed sets, as ex-ploited by Esuli et al. too, and decay values; in this section only the results obtainedby the best performing combinations will be further discussed.

An orientation value has been assigned by our algorithm to 6908 adjectives con-stituting the OpenOffice dictionary (77,26% of the original set of attributes includedin it). 4341 (62,84%) adjectives have been classified as positive while 2567 (37,16%)have been classified as negative. Table 5.2 reports the list of positive and negativeadjectives with the highest orientation value O(t); seed terms are not included.

Page 105: Sentiment Analysis for the Italian language

5.4. EXPERIMENTS 93

Adjective+ Occ. Adjective− Occ.bello 7 brutto 6

simpatico 7 cattivo 6buono 5 antipatico 5

amabile 4 negativo 5emozionante 4 irascibile 4

gioioso 4 triste 4gradevole 4 disastroso 3

interessante 4 fetido 3ottimo 4 incapace 3

tranquillo 4 odioso 3allegro 3 orribile 3

divertente 3 pesante 3dolce 3 sbagliato 3giusto 3 spaventoso 3

perfetto 3 arrogante 2piacevole 3 diffidente 2sensibile 3 errato 2

soddisfacente 3 imperfetto 2amichevole 2 inaffidabile 2disponibile 2 isterico 2eccellente 2 noioso 2elegante 2 presuntuoso 2

fantastico 2 scontroso 2funzionale 2 violento 2intelligente 2 volgare 2luminoso 2positivo 2

superlativo 2

Table 5.1: Adjectives provided by users in L1 with two or more occurrences.

Page 106: Sentiment Analysis for the Italian language

94CHAPTER 5. AUTOMATIC GENERATION OF LEXICAL RESOURCES FOR SENTIMENT ANALYSIS

Table 5.2: Positive and negative adjectives with the highest orientation value O(t)generated from the OpenOffice dictionary.

Adjective+ Polarity Adjective− Polarityinappuntabile 0,734 inopportuno -0,69651132

favorevole 0,727 inadatto -0,692204375esatto 0,707 inabile -0,691089402

dabbene 0,705 disgustoso -0,687652757eccellente 0,687 nauseabondo -0,665446446delizioso 0,686 inadeguato -0,662615889benevole 0,684 incapace -0,662291163

giusto 0,678 inidoneo -0,661069463costruttivo 0,677 malaugurato -0,65419271vantaggioso 0,675 turbolento -0,65419271

sereno 0,675 iniquo -0,648001769prelibato 0,672 infame -0,644739477

appropriato 0,662 scellerato -0,644288016

Similarly 6615 adjectives included into the SinonimiMaster dictionary (74,42%of the original set) have been automatically analysed and classified according withtheir orientation O(t): 4221 (66,84%) adjectives have been classified as positive,while 2194 (33,16%) adjectives as been classified as negative. Table 5.3 reports thelist of positive and negative adjectives with the highest orientation value O(t); evenin this Table seed terms are not included.

Adjectives, which are not connected with at least one seed term, positive ore neg-ative, cannot be classified; the amount of not connected adjectives is similar for bothlexical resources, even if several kinds of relationships, like in the SinonimiMasterdictionary, are exploited during the graph representation building step.

Generated lexical resources have been assessed with respect to the L1 test set;results in Table 5.4 show an accuracy of 83,87% on the whole set of adjectivesconstituting the L1 test set for the lexical resource build starting form the OpenOfficedictionary.

Moreover, when evaluated with respect to both L1 and L1+ test set, the resourcebased on the SinonimiMaster dictionary shows an accuracy between 2 and 4% lowerthan best results. The lower accuracy characterizing the SinonimiMaster dictionaryis even more empathized by its higher coverage.

Another interesting result emerging from the L1 test set is the fact that mostof the misclassifications can be attributed to negative oriented adjectives; morespecifically mean accuracy is about 9% lower on negative adjectives than to positiveadjectives. This phenomenon is common to both lexical resources; SinonimiMasterresource, in particular, is influenced by this issue, providing a mean different in

Page 107: Sentiment Analysis for the Italian language

5.4. EXPERIMENTS 95

Table 5.3: Positive and negative adjectives with the highest orientation value O(t)generated from the SinonimiMaster dictionary.

Adjective+ Polarity Adjective− Polaritystimabile 0,629307968 schifoso -0,57693726salutare 0,610773366 sfavorevole -0,573704163

splendido 0,610773366 storto -0,557707135squisito 0,610773366 nauseabondo -0,545980246

eccellente 0,610773366 inidoneo -0,545980246valido 0,606472043 inabile -0,545980246

divertente 0,589220222 inefficiente -0,545980246benfatto 0,589220222 sbagliato -0,543614796salubre 0,58510436 maldestro -0,533311282

composto 0,58510436 incompetente -0,533311282provetto 0,58510436 minoritario -0,533120505perfetto 0,581308764 inadeguato -0,524360257

onorevole 0,576038086 disonesto -0,524360257fondato 0,574604961 spiacevole -0,524360257

simpatico 0,570849226 bieco -0,524360257ricco 0,570849226 avverso -0,524360257

vincente 0,570849226 disgustoso -0,520181487plausibile 0,570849226 pasticcione -0,51175105pacioso 0,570849226 infausto -0,503409488agiata 0,570849226 perturbato -0,503409488

delizioso 0,567600234 ostile -0,503409488prelibato 0,564359578 orribile -0,502740269

scrupoloso 0,564359578 altezza -0,502740269egregio 0,561371086 riprovevole -0,502125232valevole 0,555613104 infame -0,502125232

utilizzabile 0,555613104 sporco -0,502125232gustoso 0,555613104 impreparato -0,497640807

appetitoso 0,555613104 imbranato -0,497640807opportuno 0,552484647 sleale -0,481824328

idoneo 0,552484647 indegno -0,481548622

Page 108: Sentiment Analysis for the Italian language

96CHAPTER 5. AUTOMATIC GENERATION OF LEXICAL RESOURCES FOR SENTIMENT ANALYSIS

accuracy of classification of positive and negative adjectives of 12%.

Table 5.4: Coverage and accuracy of both generated sentiment-classified lexicalresources with respect to test set L1.

Dictionary Coverage L1+ Accuracy L1+ Coverage L1 Accuracy L1OpenOffice 98,11% 94,34% 96,37% 83,87%SinonimiMaster 100% 92,45% 96,37% 80,24%

Furthermore, generated lexical resources have been evaluated, with respect tothe test set L2 of manually labelled adjectives, constituted by 266 terms. Table 5.5reports the results we obtained by comparing the L2 test set with both the lexicalresources classified according with, respectively, O(t) and Ovariant(t).

Accuracy evaluated on both classification models and both lexical resourcespresents similar values, ranging form 89,29% to 91.07% for the best case, whenfunction O(t) is used to determine the orientation of the SinonimiMaster dictionary.In fact obtained results does not allow to determine which of the proposed functionsfor determining the orientation of a term performs better on the L2 test set. More-over the analysis of the results reported in Table 5.5 do not confirm which lexicalresource performs better than the other. While the OpenOffice dictionary obtaineda better performance on L1 and L1+, the SinonimiMaster dictionary outperformedon the L2 test set, with a margin of more than 2%.

Although the observed accuracy could not be properly compared to previous ex-periences described for the English language and our test sets are limited to less than20% of the generated resources, the results suggest that our proposed methodologyworks well on the adjectives of the Italian language. Average accuracy, in partic-ular, is comparable to results presented by both Esuli and Turney for the Englishlanguage.

Another interesting result emerging from experimental activity is that synonymyrelationship provides most of the connection between terms, which is required torepresent and simulate our model. In fact, accuracy evaluated on the OpenOfficeresource, whose graph is built exclusively on the synonymy relationship, does notimply significant losses with respect to the SinonimiMaster resource, where fourdifferent relationships have been exploited.

5.5 OvOP analysis based on sentiment oriented

terms

In this Section we introduce a methodology devoted to OvOP analysis focused onthe sets of labelled adjectives, which have been generated during the experimental

Page 109: Sentiment Analysis for the Italian language

5.5. OVOP ANALYSIS BASED ON SENTIMENT ORIENTED TERMS 97

Table 5.5: Accuracy of generated sentiment-classified lexical resources with respectto test set L2.

Dictionary Accuracy L2O(t) on OpenOffice 89,29%Ovariant(t) on OpenOffice 88,92%O(t) on SinonimiMaster 91,07%O(t)variant on SinonimiMaster 90,86%

activity described in Section 5.4. In particular we are interested in comparing theaccuracy achieved by our approach based on linguistic resources with methodologiesconnected to machine learning described in Chapters 3 and 4.

The methodology we propose builds on the following assumption: the OvOP ofa document d is the algebraic sum of the prior subjectivity status of each adjectiveappearing in the document. We assume, in other terms, that labelled adjectivesprovide most of the orientation of a document: by identifying labelled adjectives theOvOP of a document can be determined.

This assumption is, in fact, similar to the idea exploited by Turney in [82]:document OvOP can be evaluated as the average semantic orientation (or OP) ofthe sentences constituting the document, which contain, with an higher probability,orientation clues. Sentences are identified and extracted from a given document bymeans of a set of extraction rules based on Part-Of-Speech tagging; for example ruleJJ+NN is used to extract each sentence containing an adjective followed by a name.For each extracted sentence polarity orientation is evaluated by using the PMI-IRequation.

In our approach we exploited a similar extraction task: given a document we areinterested in extracting sentences that convey, with a high probability, its orienta-tion. However, opposite to Turney’s approach, we are able to provide an evaluationof prior subjectivity status for adjectives only; verbs and nouns do not provide anyprior contribution to the OvOP classification process. Therefore we decided to limitour extraction rules only to the set of Part-Of-Speech tags reported in Table 5.6.

Extraction rules are aimed not only at extracting the adjectives appearing inthe document but also at contextualizing them with respect to two different kindof valence shifters at sentence level: negation and intensifiers, described in detail in[66].

We assume, in fact, that such valence shifters, even if limited at sentence level,could improve the effectiveness of OvOP analysis task by providing a contextualizedscore to each adjective constituting the document, even when used inside a negationor when paired with an adverb that intensifies its prior status subjectivity. In orderto simplify the problem of applying valence shifters to the right adjectives insidecomplex sentences, we assume that each negation or intensifier can update only to

Page 110: Sentiment Analysis for the Italian language

98CHAPTER 5. AUTOMATIC GENERATION OF LEXICAL RESOURCES FOR SENTIMENT ANALYSIS

Table 5.6: Extraction rules used for OvOP analysis.

Rule Word 1 Word 2 Word 3 Word 4R1 Adjective (ADJ) — — —R2 Adverbe (ADV) Adjective (ADJ) — —R3 Negation (NEG) Adjective (ADJ) — —R4 Negation (NEG) Verb (VRB) Adjective (ADJ) —R5 Negation (NEG) Adverbe (ADV) Adjective (ADJ) —R6 Negation (NEG) Verb (VRB) Adverbe (ADV) Adjective (ADJ)

Rule ExampleR1 buono (good)R2 molto buono (very good)R3 non buono (not good)R4 non e buono (it is not good)R5 non molto buono (not very good)R6 non e molto buono (it is not very good)

the prior subjectivity value of the first following adjective.A list (L3) of 70 adverbs, which are used as intensifiers for the adjectives of the

Italian language, has been collected; two human annotators analysed independentlyeach other the list of adverbs in order to distinguish them between the following twoclasses:

1. intensifiers increasing the strength of the following adjective but not alteringits orientation; this class includes terms like molto (very), veramente (truly).When an adjective appears into the same sentence with this kind of intensifier,its strength is doubled;

2. intensifiers reducing the strength of the following adjective but not alteringits orientation; this class includes terms like poco (few), malamente (badly).When an adjective appears into the same sentence with this kind of intensifier,its strength is halved.

For each sentence s extracted from an input document d, by means of the extractionrules defined in Table 5.6, an Opinion Polarity score is calculated as:

OP (s) = O(adjs)× Int(s)×Neg(s) (5.10)

with O(adjs) beeing the prior polarity of the adjective adjs appearing in s, Int(s)the effect provided by the intensifier and Neg(s) the switches in orientation due tothe presence of a negation. If no intensifier i ∈ L3 appears in s, Int(s) = 1; similarly

Page 111: Sentiment Analysis for the Italian language

5.6. EXPERIMENTS 99

if no negation appears in s, Neg(s) = 1, which means that the orientation of adjs isunchanged.

The OvOP score is calculated, then, as the sum of the OP score of each extractedsentence constituting the document, as follows:

OvOP (d) =∑s∈Sd

OP (s) (5.11)

where Sd is the set of sentences extracted from document d using the previouslydefined set of extraction rules. A document is classified as positive if OvOP (d) > 0or as negative otherwise.

Our methodology aimed at determining the OvOP of a given document requiresthe following steps to be performed:

1. Part-Of-Speech tagging: the input document is transformed, by means of thePOS tagger specialized on the Italian language integrated into the SENT-IT, into an ordered list of triplets {term, POS tag, lemmatized form},where each triple represents a specific term or punctuation appearing in thedocument.

2. Sentence splitting and filtering: extraction rules are used on the list of triples inorder to extract the sentences which represent, according to our assumption,the most polarity bearing contents of the input document. Each sentenceidentified by the set of extraction rules is constituted by a set of couples {(POStag1, lemmatized form1), . . . , (POS tag4, lemmatized form4), }, whereeach pair represents the lemmatized form of a term appearing in the sentenceand its POS tag.

3. Valence shifters evaluation: for each sentence s the functions Neg(s) andInt(s) are calculated. The list L3 is used as a lookup to determine the effectinduced by adverbs appearing in s on existing attributes.

4. function OP (s) is computed for each sentence s by using one of the automat-ically generated lexical resources described in Section 5.4.

Figure 5.6 shows a diagram representing the various steps constituting our proposedmethodology to OvOP evaluation using automatically generated lexical resources.

5.6 Experiments

5.6.1 Test set

The proposed methodology has been experimented on a test set constituted by 7856pre-classified documents, randomly extracted from the test sets used in Chapters3 and 4 and equally distributed between positive and negative documents. The

Page 112: Sentiment Analysis for the Italian language

100CHAPTER 5. AUTOMATIC GENERATION OF LEXICAL RESOURCES FOR SENTIMENT ANALYSIS

Figure 5.6: The OvOP analysis process.

Page 113: Sentiment Analysis for the Italian language

5.6. EXPERIMENTS 101

documents constituting the test set cover the whole set of domains analysed inChapters 3 and 4; OvOP analysis based on pre-classified adjectives is, in fact, adomain independent task and an heterogeneous test set in required in order toproperly evaluate its classification accuracy.

In particular we assume that orientation is provided only by adjectives, whoseprior subjectivity status is known, and by valence shifters, domain independententities which act as ”contextualizers” for adjectives: in fact no domain-dependentfeatures are considered in OvOP analysis.

5.6.2 Results

Several experiments have been required in order to evaluate the performance achievedby the proposed methodology, on the corpus of documents exploited as test set, foreach list of classified adjectives generated by the algorithm described in Section 5.4.

Table 5.7: Accuracy of lexical resource based OvOP analysis.

ADJ NEG+ADJ ADV+ADJ NEG+ADV+ADJO(t) on OpenOffice 62,88% 64,71% 64,9% 64,8%

Ovariant(t) on OpenOffice 62,9% 64,23% 64,47% 63,9%O(t) on SinonimiMaster 63,37% 65,63% 65,25% 65,22%

O(t)variant on SinonimiMaster 63,52% 66,94% 65,34% 66,21%

Table 5.7 shows the accuracy obtained for each generated lexical resource, whenthe set of extraction rules used for identification of polarity bearing sentences ischanged. Accuracy is relatively low when the OP of sentences is calculated only onthe prior subjectivity status of adjectives appearing in them (rule R1); more specif-ically when valence shifters are not considered in determining the OvOP of testingdocuments, the accuracy varies between 62,88% (using function O(t) on adjectivesconstituting the OpenOffice dictionary) and 63,52% (using function Ovariant(t) onadjectives constituting the SinonimiMaster dictionary). Results, moreover, show adifference, in terms of observed accuracy between the resources based on the two dic-tionaries, that is similar for both the functions used in determining the orientationof adjectives.

Table 5.7 shows an average 2,3% increase in accuracy when rules aimed at deter-mining negations are used (R3, R4) in addiction to rule R1. This trend is commonacross the experiments on our four lexical resources; the best accuracy, 66,94%,has been observed when the SinonimiMaster resource, labelled according with theO(t)variant function, is used. The difference, in terms of accuracy, between the dif-ferent functions used in determining the orientation of adjectives, is higher for theSinonimiMaster resource (1,69%) then for the OpenOffice resource (0,48%).

Page 114: Sentiment Analysis for the Italian language

102CHAPTER 5. AUTOMATIC GENERATION OF LEXICAL RESOURCES FOR SENTIMENT ANALYSIS

Applying extraction rules aimed at identifying adverbs (R2) in addiction to ruleR1 improves results, with an average 1,85% increase in accuracy with respect to thebase case involving only rule R1.

When the whole set of extraction rules is applied, in order to identify bothnegation and intensifiers as valence shifters for the prior subjectivity status providedby adjectives, the measured accuracy is higher then the base case, varying between63,9% and 66,21%. However, except in the case when function Ovariant(t) is usedon adjectives constituting the SinonimiMaster dictionary, the average accuracy islower than in previously described scenarios, when the effects of both negations andintensifiers have not been considered simultaneously.

We think that this loss in accuracy could be caused by two different issues: theassumption that negations and intensifiers could be applied only to the first followingadjective is too weak and, at the same time, the valence shifting effect provided byadverbs in L3 should not be limited just to two classes but should consider a widerrange of possible values.

Moreover in our experimental activity we did not focus on the presence of linguis-tic elements, as described by Polanyi in [66], which could act, at the same time, asboth negations and intensifiers. Another issue which requires a deeper investigationis related with the effect that arises when both negations and intensifiers are appliedon the same adjective; the sentence Non e molto buono (It is not very good) could,in fact, be perceived in two different ways: slightly positive (negation and intensifierare used to reduce the strength of the polarity conveyed by the term buono (good))and slightly negative (negation and intensifier are used to reduce the strength ofthe polarity conveyed by the term buono (good) and, at the same time, to alter itsorientation). In our approach, indeed, this sentence is seen as strongly negativelyoriented, because the intensifier doubles the prior subjectivity status of the termbuono (good) while negation switches its orientation from positive to negative.

Results show that, in OvOP analysis process, the lexical resource generated byapplying function Ovariant(t) on adjectives constituting the SinonimiMaster dictio-nary, performs better than any other resource with all different set of extractionrules that have been evaluated, as clearly shown by the graph in Figure 5.7.

With respect to results described in Chapters 3 and 4, collected from machinelearning method applied to OvOP analysis and classification, results in Table 5.7show a significant loss in accuracy; in fact, by comparing the best results obtainedfor each experiment, we experienced a loss of 22,06% in accuracy with respect toour domain dependent classifier trained on the movie domain and a loss of 17,06%with respect to the accuracy provided by the domain independent meta classifier.

We think that such differences could be explained, in part, by the particulargenre of documents which have been used for testing the proposed methodology:product reviews. In particular the use of domain dependent lexicons, which is acommon characteristic across the whole set of collected reviews, even from heteroge-nous domains, does not fit well with the assumption that most of the polarity isconveyed by adjectives appearing in a document. In particular, by analyzing the list

Page 115: Sentiment Analysis for the Italian language

5.7. CONCLUSIONS 103

Figure 5.7: Accuracy of OvOP analysis.

of most significant (in terms of Information Gain) features which have been identi-fied for OvOP classification in the domain of movie reviews, reported in Table 3.4,it arises that 50% of the stems used as features are nouns (proper and common),verbs and adverbs. Similar rates have been identified for each domain of productreviews that has been collected and analyzed.

For example the sentence Il computer dispone di moltissima RAM (The computerhas a huge amount of RAM), which clearly conveys a positive orientation expressedby the reviewer towards the product, reveals itself useless in our approach, becauseno classified adjective is present in it.The sentence will not be recognized by theextraction rules defined in Table 5.6 and, consequently, does not contribute to de-termine the OvOP of the document containing it.

5.7 Conclusions

In this Chapter we coped with the problem of determining the orientation of a list ofadjectives of the Italian language. We have tried to solve the problem by proposinga unsupervised novel approach, based on shortest paths and decay, applied to a setof two different lexical resources. Four collections of opinionated adjectives havebeen collected as a result of the experimental activity and tested with respect tothree different collections of adjectives, which have been manually classified as pos-itive or negative. The average accuracy we have obtained, even if not comparableto similar research experiences for the English language, is higher than 80%. Such

Page 116: Sentiment Analysis for the Italian language

104CHAPTER 5. AUTOMATIC GENERATION OF LEXICAL RESOURCES FOR SENTIMENT ANALYSIS

result suggests us that our methodology could be proficiently used on the adjectivesconstituting the Italian language. Moreover a new methodology, aimed at deter-mining the OvOP of a given document, has been proposed; in particular this newmethodology differs from the machine learning approaches, exploited in Chapters 3and 4, because it is based exclusively on the prior subjectivity status of the adjec-tives constituting the generated resources. In order to improve the effectiveness ofour approach, a limited set of valence shifters, including negations and intensifiers,has been handled in order to contextualize the polarity expressed by an adjectivewith respect to the specific sentence in which it appears. Results show how theaccuracy of our novel methodology for OvOP is significantly reduced with respectto the approaches based on feature representation and machine learning. Loss inprecision has been estimated, in the worst case, in a margin of 22,06%.

Page 117: Sentiment Analysis for the Italian language

Chapter 6

Conclusions

In this thesis we have investigated the issue of defining and evaluating novel method-ologies for Sentiment Analysis focused on the Italian language. Our work representsthe first attempt to solve this particular kind of problem for the Italian language andour results, at the time, cannot be compared with any other research work carriedout on the same language.

The SENT-IT framework has been developed in order to properly support andsimplify the experimental activity described in this thesis and, at the same time,to develop a platform, which could be used for future further improvements. TheSENT-IT framework comprises a set of tools specifically developed or integrated,devoted to both linguistic analysis on the Italian language and machine learning.The WEKA library has been integrated in order to provide the SENT-IT frameworkwith the machine learning tools described in Chapters 3 and 4.

In Chapter 3 we have proposed a supervised algorithm, based on several differ-ent learning methods including Naıve Bayesian and SVM classifiers, with a view todetermining the Overall Opinion Polarity of a product review. In particular, weidentified a set of document representation features, partly borrowed from the lit-erature for the English language, aimed at properly represent the OvOP of a givendocument. Information Gain has been exploited as feature selection criteria in or-der to improve the accuracy of the classification activity and, at the same time, toreduce the complexity, in both terms of time and space, of the overall process. Re-sults described in Chapter 3 show how our proposed approach works well on reportsrepresenting a single domain, with an accuracy, in the best case, of 89%.

In Chapter 4 we investigated the problem of domain independent OvOP cal-culation, by implementing and experimenting a meta-classifier based on ensemblemethodologies. By using the techniques developed in Chapter 3 and the featuresprovided by the SENT-IT framework, we trained four different domain dependentOvOP classifiers, each of them granting an accuracy higher than 80%. Results of theclassification task performed by the domain dependent classifiers are used to train ameta-classifier, aimed at determining which domain dependent classifier suits bestwhen OvOP must be performed on a document from an unknown domain. Results

Page 118: Sentiment Analysis for the Italian language

106 CHAPTER 6. CONCLUSIONS

described in Chapter 4 show how our proposed approach works well across differentdomains; more specifically the proposed meta-classification approach overcomes theresults provided by the general purpose OvOP classifier trained on the whole set ofcorpora.

Four different corpora constituted by product reviews on different domains havebeen collected in order to perform both training and testing tasks of the proposedmethodologies. Such corpora could be used, in future, as a Gold Standard fordetermining the accuracy of Sentiment Analysis methodologies and algorithms.

In Chapter 5 we have focused on the problem of automatically determining theprior orientation of a given set of adjectives. More specifically we have investigatedin our experimental activity two different lexical resources where adjectives are con-nected to each other by means of one or more different semantic relations, suchas synonymy and antinomy. Such relationships have been used in order to build agraph representation of the input lexical resources. Two different algorithms, bothbased on evaluation of the shortest paths connecting an adjective with a set of seedterms have been proposed, aimed at determining the polarity of an adjective as afunction of its semantic distance from the seed terms. Four different groups of ori-ented adjectives have been collected and assessed. The accuracy of the classificationapproach is significantly high, even if no clear evidence of which algorithm performsbest arises from the collected results.

In Chapter 5, moreover, we have proposed a novel domain independent OvOPanalysis methodology based on the prior status subjectivity of the automaticallyclassified adjectives. The methodology has been tested on a large set of documents,randomly selected from the set of corpora collected in Chapter 4. Results haveshown that the accuracy is significantly lower than both the domain dependentand the domain independent classifiers based on machine learning which have beendeveloped in Chapters 3 and 4.

The work this thesis builds on represents a first methodological approach toSentiment Analysis, and more specifically to OvOP analysis for the Italian language.The SENT-IT framework, thanks to its modularity and flexibility, could be usedin the future further investigate new methodologies and resources for SentimentAnalysis. Moreover we plan to test, in the next few months, the integration of theclassifiers generated by our experimental activity based on the SENT-IT frameworkwith a web interface or an existing product review website.

Page 119: Sentiment Analysis for the Italian language

Appendix A

Publications

List of publications that have arisen out of this PhD research:

1. P. Casoto, C. Tasso. An Hybrid Approach for Improving Word Sense Dis-ambiguation and Text Clustering. In Proceedings of the 2nd Italian ResearchConference on Digital Library Management Systems, Padua, Italy, 29-30 Jan-uary 2007, pp.105-110.

2. P. Casoto, A. Dattolo, P. Omero, N. Pudota, C. Tasso. A New MachineLearning Based Approach for Sentiment Classification of Italian documents.In Proceedings of the 3rd Italian Research Conference on Digital Library Man-agement Systems Padua, Italy, 24-25, January 2008, pp. 77-82.

3. N. Pudota, P. Casoto, A. Dattolo, P. Omero, C. Tasso. Towards Bridging theGap between Personalization and Information Extraction. In Proceedings ofthe 3rd Italian Research Conference on Digital Library Management SystemsPadua, Italy, 24-25, January 2008, pp. 33-40.

4. P. Casoto, A. Dattolo, F. Ferrara, P. Omero, N. Pudota, C. Tasso. Gener-ating and sharing personal information spaces. In Proceedings of AdaptiveHypermedia and Adaptive Web-Based Systems: Adaptation for the Social WebWorkshop Hannover, Germany, 2008, pp. 14-23.

5. P. Casoto, A. Dattolo, C. Tasso. Sentiment Classification for the Italian Lan-guage: a Case Study on Movie Reviews. In Journal of Internet Technology,Vol 9(4), ISSN 1607-9264.

6. A. Baruzzo, P. Casoto. A Flexible Service-Oriented Digital Platform for e-Content Management in Cultural Heritage. In Proceedings of IABC Workshop- Intelligenza Artificiale nei Beni Culturali Cagliari, Italy, 11 September 2008,pp. 38-45.

Page 120: Sentiment Analysis for the Italian language

108 APPENDIX A. PUBLICATIONS

7. A. Baruzzo, P. Casoto, P. Challapalli, A. Dattolo. An Intelligent ServiceOriented Approach for Improving Information Access in Cultural Heritage.In Proceedings of Information Access in Cultural Heritage Workshop - ECDL2008 Aarhus, Denmark, 18 September 2008, ISBN 978-90-813489-1-1.

8. A. Baruzzo, P. Casoto, A. Dattolo, C. Tasso. Handling Evolution in DigitalLibraries. In Proceedings of the 5th Italian Research Conference on DigitalLibrary Management Systems, Padua, Italy, 29-30 January 2009, pp- 34-50.

9. A. Baruzzo, P. Casoto, A. Dattolo. A Conceptual Model for Digital LibrariesEvolution. In Proceedings of the 5th Web Information Systems and technolo-gies WEBIST 2009 Lisboa, Portugal, 23-26 March 2009, pp. 299-304, ISBN978-989-8111-81-4.

10. A. Baruzzo, P. Casoto, P. Challapalli, A. Dattolo, N. Pudota, C. Tasso. To-ward Semantic Digital Libraries: Exploiting Web 2.0 and Semantic Services inCultural Heritage. In Journal of Digital Information, Vol 10(6), ISSN 1368-7506.

11. P. Casoto, A. Dattolo, P. Omero, N. Pudota, C. Tasso. Accessing, Analyzing,and Extracting Information from User Generated Contents. In Handbook ofResearch on Web 2.0, 3.0, and X.0: Technologies, Business, and Social Appli-cations, edited by San Murugesan, IGI Global (Information Science Reference),USA, 2010, ISBN 978-160-5663-84-5, ISBN10 1605663840.

Page 121: Sentiment Analysis for the Italian language

Bibliography

[1] Ahmed Abbasi, Hsinchun Chen, and Arab Salem. Sentiment analysis in multi-ple languages: Feature selection for opinion classification in web forums. ACMTrans. Inf. Syst., 26:12:1–12:34, June 2008.

[2] Nate Agrin. Introduction: Developing a flexible sentiment analysis techniquefor multiple domains, 2006.

[3] Alina Andreevskaia and Sabine Bergler. Mining WordNet for a fuzzy senti-ment: Sentiment tag extraction from WordNet glosses. In Proceedings of theEuropean Chapter of the Association for Computational Linguistics (EACL),2006.

[4] Giuseppe Attardi and Maria Simi. Blog mining through opinionated words. InProceedings of TREC 2006, the Fifteenth Text Retrieval Conference, Gaithers-burg , US, 2006. NIST.

[5] Anthony Aue and Michael Gamon. Customizing sentiment classifiers to newdomains: a case study. In Submitted to RANLP-05, the International Con-ference on Recent Advances in Natural Language Processing, Borovets, BG,2005.

[6] Andrea Baruzzo, Paolo Casoto, Antonina Dattolo, and Carlo Tasso. A concep-tual model for digital libraries evolution. In Joaquim Filipe and Jose Cordeiro,editors, WEBIST, pages 299–304. INSTICC Press, 2009.

[7] Mikhail Bautin, Lohit Vijayarenu, and Steven Skiena. International sentimentanalysis for news and blogs. In Proceedings of the International Conferenceon Weblogs and Social Media (ICWSM), 2008.

[8] Steven Bethard, Hong Yu, Ashley Thornton, Vasileios Hatzivassiloglou, andDan Jurafsky. Automatic extraction of opinion propositions and their holders.In James G. Shanahan, Janyce Wiebe, and Yan Qu, editors, Proceedings of theAAAI Spring Symposium on Exploring Attitude and Affect in Text: Theoriesand Applications, Stanford, US, 2004.

Page 122: Sentiment Analysis for the Italian language

110 BIBLIOGRAPHY

[9] John Blitzer, Mark Dredze, and Fernando Pereira. Biographies, bollywood,boom-boxes and blenders: Domain adaptation for sentiment classification.In Proceedings of ACL-07, the 45th Annual Meeting of the Association ofComputational Linguistics, pages 440–447, Prague, CZ, June 2007. Associationfor Computational Linguistics.

[10] John Blitzer, Ryan McDonald, and Fernando Pereira. Domain adaptationwith structural correspondence learning. In Conference on Empirical Methodsin Natural Language Processing, Sydney, Australia, 2006.

[11] Eric Brill. Transformation-based error-driven learning and natural languageprocessing: a case study in part-of-speech tagging. Comput. Linguist., 21:543–565, December 1995.

[12] Luıs Cabral and Ali Hortacsu. The dynamics of seller reputation: Theory andevidence from eBay. Working paper, downloaded version revised in March,2006.

[13] P. Casoto, A. Dattolo, P. Omero, N. Pudota, and C. Tasso. Sentiment classifi-cation for the italian language. In Proceedings of the 2008 4th Italian ResearchConference on Digital Library Systems, IRCDL 2008, Padua, Italy, January24-25, 2008.

[14] P Casoto, A Dattolo, and C Tasso. Sentiment classification for the italianlanguage: a case study on movie reviews. Journal Of Internet Technology,(Intelligent Agent and Knowledge Mining):365–373, 2008.

[15] Hamish Cunningham. A definition and short history of language engineering.Nat. Lang. Eng., 5:1–16, March 1999.

[16] Sanjiv R. Das and Mike Y. Chen. Yahoo! for Amazon: Sentiment parsingfrom small talk on the Web. In Proceedings of EFA 2001, European FinanceAssociation Annual Conference, Barcelona, ES, 2001.

[17] Antonina Dattolo and Flaminia L. Luccio. Formalizing a model to representand visualize concept spaces in e-learning environments. In Jose Cordeiro,Joaquim Filipe, and Slimane Hammoudi, editors, WEBIST (1), pages 339–346. INSTICC Press, 2008.

[18] Antonina Dattolo and Flaminia L. Luccio. Visualizing personalized views invirtual museum tours. In Proc. of the Conference on Human System Interac-tion, May 25-27, Krakow, Poland, (2008), 2008.

[19] Kushal Dave, Steve Lawrence, and David M. Pennock. Mining the peanutgallery: Opinion extraction and semantic classification of product reviews. InProceedings of WWW-03, 12th International Conference on the World WideWeb, pages 519–528, Budapest, HU, 2003. ACM Press.

Page 123: Sentiment Analysis for the Italian language

BIBLIOGRAPHY 111

[20] Thomas G. Dietterich. Machine-learning research – four current directions.AI MAGAZINE, 18:97–136, 1997.

[21] Ted Dunning. Accurate methods for the statistics of surprise and coincidence.Comput. Linguist., 19:61–74, March 1993.

[22] Koji Eguchi and Victor Lavrenko. Sentiment retrieval using generative models.In Proceedings of the Conference on Empirical Methods in Natural LanguageProcessing (EMNLP), pages 345–354, 2006.

[23] Paul Ekman. Emotion in the Human Face. Cambridge University Press,second edition, 1982.

[24] Charlotta Engstrom. Topic dependence in sentiment classification. Master’sthesis, University of Cambridge, 2004.

[25] Andrea Esuli and Fabrizio Sebastiani. Determining the semantic orientationof terms through gloss classification. In Proceedings of CIKM-05, the ACMSIGIR Conference on Information and Knowledge Management, pages 617–624, Bremen,DE, 2005. ACM Press.

[26] Andrea Esuli and Fabrizio Sebastiani. Determining term subjectivity and termorientation for opinion mining. In Proceedings EACL-06, the 11rd Conferenceof the European Chapter of the Association for Computational Linguistics,pages 193–200, Trento, IT, 2006.

[27] Andrea Esuli and Fabrizio Sebastiani. Sentiwordnet: A publicly availablelexical resource for opinion mining. In Proceedings of LREC-06, the 5th Con-ference on Language Resources and Evaluation, Genova, IT, 2006.

[28] Andrea Esuli and Fabrizio Sebastiani. Pageranking wordnet synsets: An appli-cation to opinion mining. In Proceedings of ACL-07, the 45th Annual Meetingof the Association of Computational Linguistics, pages 424–431, Prague, CZ,June 2007. Association for Computational Linguistics.

[29] Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, TalShaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. Methodsfor domain-independent information extraction from the web: an experimen-tal comparison. In Proceedings of the 19th national conference on Artificalintelligence, AAAI’04, pages 391–398. AAAI Press, 2004.

[30] Aidan Finn, Nicholas Kushmerick, and Barry Smyth. Genre classification anddomain transfer for information filtering. In Proceedings of the 24th BCS-IRSGEuropean Colloquium on IR Research: Advances in Information Retrieval,number 2291 in Lecture Notes in Computer Science, pages 353–362, Glasgow,2002.

Page 124: Sentiment Analysis for the Italian language

112 BIBLIOGRAPHY

[31] Wolfgang Gatterbauer, Paul Bohunsky, Marcus Herzog, Bernhard Krupl, andBernhard Pollak. Towards domain-independent information extraction fromweb tables. In Proceedings of the 16th international conference on World WideWeb, WWW ’07, pages 71–80, New York, NY, USA, 2007. ACM.

[32] Gregory Grefenstette, Yan Qu, James G. Shanahan, and David A. Evans.Coupling niche browsers and affect analysis for an opinion mining application.In Proceeding of RIAO-04, Avignon, FR, 2004.

[33] Vasileios Hatzivassiloglou and Kathleen R. McKeown. Predicting the semanticorientation of adjectives. In Proceedings of ACL-97, 35th Annual Meeting ofthe Association for Computational Linguistics, pages 174–181, Madrid, ES,1997. Association for Computational Linguistics.

[34] Vasileios Hatzivassiloglou and Janyce Wiebe. Effects of adjective orientationand gradability on sentence subjectivity. In Proceedings of the InternationalConference on Computational Linguistics (COLING), 2000.

[35] Yi Hu, Jianyong Duan, Xiaoming Chen, Bingzhen Pei, and Ruzhan Lu. A newmethod for sentiment classification in text retrieval. In Robert Dale, Kam-FaiWong, Jian Su, and Oi Yee Kwong, editors, IJCNLP, volume 3651 of LectureNotes in Computer Science, pages 1–9. Springer, 2005.

[36] Nitin Jindal and Bing Liu. Identifying comparative sentences in text docu-ments. In Proceedings of SIGIR-06, the 29th annual international ACM SIGIRconference on Research and development in information retrieval, pages 244–251, Seattle, US, 2006. ACM Press.

[37] Nitin Jindal and Bing Liu. Mining comparative sentences and relations. InProceedings of AAAI-06, the 21st National Conference on Artificial Intelli-gence, Boston, US, 2006. AAAI Press.

[38] Jaap Kamps, Maarten Marx, Robert J. Mokken, and Maarten de Rijke. UsingWordNet to measure semantic orientation of adjectives. In LREC, 2004.

[39] Hiroshi Kanayama and Tetsuya Nasukawa. Fully automatic lexicon expansionfor domain-oriented sentiment analysis. In Proceedings of the Conference onEmpirical Methods in Natural Language Processing (EMNLP), pages 355–363,Sydney, Australia, July 2006. Association for Computational Linguistics.

[40] Soo-Min Kim and Eduard Hovy. Determining the sentiment of opinions.In Proceedings COLING-04, the Conference on Computational Linguistics,Geneva, CH, 2004.

[41] Soo-Min Kim and Eduard Hovy. Automatic identification of pro and con rea-sons in online reviews. In Proceedings of the COLING/ACL Main ConferencePoster Sessions, pages 483–490, 2006.

Page 125: Sentiment Analysis for the Italian language

BIBLIOGRAPHY 113

[42] Soo-Min Kim and Eduard Hovy. Extracting opinions, opinion holders, andtopics expressed in online news media text. In Proceedings of ACL/COLINGWorkshop on Sentiment and Subjectivity in Text, Sidney, AUS, 2006.

[43] Soo-Min Kim and Eduard Hovy. Identifying and analyzing judgment opin-ions. In Proceedings of the Joint Human Language Technology/North Ameri-can Chapter of the ACL Conference (HLT-NAACL), 2006.

[44] Soo-Min Kim and Eduard Hovy. Crystal: Analyzing predictive opinions onthe web. In Proceedings of the Joint Conference on Empirical Methods inNatural Language Processing and Computational Natural Language Learning(EMNLP-CoNLL), 2007.

[45] Moshe Koppel and Itai Shtrimberg. Good news or bad news? let the marketdecide. In Proceedings of the AAAI Spring Symposium on Exploring Attitudeand Affect in Text: Theories and Applications, Standford, US, 2004.

[46] Namhee Kwon, Stuart Shulman, and Eduard Hovy. Multidimensional textanalysis for eRulemaking. In Proceedings of Digital Government Research(dg.o), 2006.

[47] Michael Laver, Kenneth Benoit, and John Garry. Extracting policy positionsfrom political texts using words as data. American Political Science Review,97(2):311–331, 2003.

[48] Hugo Liu, Henry Lieberman, and Ted Selker. A model of textual affect sensingusing real-world knowledge. In Proceedings of Intelligent User Interfaces (IUI),pages 125–132, 2003.

[49] Hugo Liu, Ted Selker, and Henry Lieberman. Visualizing the affective struc-ture of a text document. In CHI ’03 extended abstracts on Human factorsin computing systems, CHI ’03, pages 740–741, New York, NY, USA, 2003.ACM.

[50] Andrew McCallum. Information extraction: Distilling structured data fromunstructured text. Queue, 3:48–57, November 2005.

[51] Michael J. McGuffin and m. c. schraefel. A comparison of hyperstructures:Zzstructures, mSpaces, and polyarchies. In Proceedings of 15th ACM Confer-ence on Hypertext and Hypermedia, pages 153–162, August 2004.

[52] Rada Mihalcea, Carmen Banea, and Janyce Wiebe. Learning multilingualsubjective language via cross-lingual projections. In Proceedings of the Asso-ciation for Computational Linguistics (ACL), pages 976–983, Prague, CzechRepublic, June 2007.

Page 126: Sentiment Analysis for the Italian language

114 BIBLIOGRAPHY

[53] G. Mishne. Experiments with mood classification in blog posts. In 1st Work-shop on Stylistic Analysis Of Text For Information Access, 2005.

[54] Gilad Mishne and Natalie Glance. Predicting movie sales from blogger senti-ment. In Proceedings ofAAAI-CAAW-06, the Spring Symposia on Computa-tional Approaches to Analyzing Weblogs, Stanford, US, 2006.

[55] Satoshi Morinaga, Kenji Yamanishi, Kenji Tateishi, and Toshikazu Fukushima.Mining product reputations on the Web. In Proceedings of KDD-02, 8th ACMInternational Conference on Knowledge Discovery and Data Mining, pages341–349, Edmonton, CA, 2002. ACM Press.

[56] Tony Mullen and Robert Malouf. A preliminary investigation into sentimentanalysis of informal political discourse. In AAAI Symposium on ComputationalApproaches to Analysing Weblogs (AAAI-CAAW), pages 159–162, 2006.

[57] Tony Mullen and Robert Malouf. Taking sides: User classification for informalonline political discourse. Internet Research, 18:177–190, 2008.

[58] Theodor Holm Nelson. A cosmology for a different computer universe: Datamodel, mechanisms, virtual machine and visualization infrastructure. J. Digit.Inf., 5(1), 2004.

[59] Sara Owsley, Sanjay Sood, and Kristian J. Hammond. Domain specific affec-tive classification of documents. In Proceedings ofAAAI-CAAW-06, the SpringSymposia on Computational Approaches to Analyzing Weblogs, Stanford, US,2006.

[60] Bo Pang and Lillian Lee. A sentimental education: Sentiment analysis usingsubjectivity summarization based on minimum cuts. In Proceedings of ACL-04, 42nd Meeting of the Association for Computational Linguistics, pages 271–278, Barcelona, ES, 2004. Association for Computational Linguistics.

[61] Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sen-timent categorization with respect to rating scales. In Proceedings of ACL-05,43nd Meeting of the Association for Computational Linguistics, pages 115–124,Ann Arbor, US, 2005. Association for Computational Linguistics.

[62] Bo Pang and Lillian Lee. Opinion mining and sentiment analysis. Foundationand Trends in Information Retrieval, 2(1-2):1–135, 2008.

[63] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up? sentimentclassification using machine learning techniques. In Proceedings of EMNLP-02,the Conference on Empirical Methods in Natural Language Processing, pages79–86, Philadelphia, US, 2002. Association for Computational Linguistics.

Page 127: Sentiment Analysis for the Italian language

BIBLIOGRAPHY 115

[64] Scott Piao, Sophia Ananiadou, Yoshimasa Tsuruoka, Yutaka Sasaki, and JohnMcNaught. Mining opinion polarity relations of citations. In InternationalWorkshop on Computational Semantics (IWCS), pages 366–371, 2007. Shortpaper.

[65] John C. Platt. Fast training of support vector machines using sequential min-imal optimization, pages 185–208. MIT Press, Cambridge, MA, USA, 1999.

[66] Livia Polanyi and Annie Zaenen. Contextual lexical valence shifters. In YanQu, James Shanahan, and Janyce Wiebe, editors, Proceedings of the AAAISpring Symposium on Exploring Attitude and Affect in Text: Theories andApplications. AAAI Press, 2004. AAAI technical report SS-04-07.

[67] M. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.

[68] Nirmala Pudota, Paolo Casoto, Antonina Dattolo, Paolo Omero, and CarloTasso. Towards bridging the gap between personalization and informationextraction. In Maristella Agosti, Floriana Esposito, and Costantino Thanos,editors, IRCDL, pages 33–40. DELOS: an Association for Digital Libraries,2008.

[69] Jonathon Read. Using emoticons to reduce dependency in machine learningtechniques for sentiment classification. In Proceedings of ACL-05, 43nd Meet-ing of the Association for Computational Linguistics, Ann Arbor, US, 2005.Association for Computational Linguistics.

[70] Ellen Riloff, Janyce Wiebe, and William Phillips. Exploiting subjectivity clas-sification to improve information extraction. In Proceedings of AAAI-05 , the20th National Conference on Artificial Intelligence, pages 1106–1111, Pitts-burgh, US, 2005. AAAI Press.

[71] Franco Salvetti, Stephen Lewis, and Christoph Reichenbach. Impact of lexicalfiltering on overall opinion polarity identification. In Proceedings of the AAAISpring Symposium on Exploring Attitude and Affect in Text: Theories andApplications, Stanford, US, 2004.

[72] Fabrizio Sebastiani. Machine learning in automated text categorization. ACMComput. Surv., 34:1–47, March 2002.

[73] Yohei Seki, Koji Eguchi, and Noriko Kando. Analysis of multi-documentviewpoint summarization using multi-dimensional genres. In Proceedings of theAAAI Spring Symposium on Exploring Attitude and Affect in Text: Theoriesand Applications, pages 142–145, 2004.

[74] Swapna Somasundaran, Josef Ruppenhofer, and Janyce Wiebe. Detectingarguing and sentiment in meetings. In Proceedings of the SIGdial Workshopon Discourse and Dialogue, 2007.

Page 128: Sentiment Analysis for the Italian language

116 BIBLIOGRAPHY

[75] Swapna Somasundaran, Theresa Wilson, Janyce Wiebe, and Veselin Stoyanov.QA with attitude: Exploiting opinion type analysis for improving question an-swering in on-line discussions and the news. In Proceedings of the InternationalConference on Weblogs and Social Media (ICWSM), 2007.

[76] Philip J. Stone, Dexter C. Dunphy, Marshall S. Smith, and Daniel M. Ogilvie.The General Inquirer: A Computer Approach to Content Analysis. MIT Press,1966.

[77] Veselin Stoyanov, Claire Cardie, and Janyce Wiebe. Multi-perspective ques-tion answering using the OpQA corpus. In Proceedings of the Human LanguageTechnology Conference and the Conference on Empirical Methods in Natu-ral Language Processing (HLT/EMNLP), pages 923–930, Vancouver, BritishColumbia, Canada, October 2005. Association for Computational Linguistics.

[78] Hiroya Takamura, Takashi Inui, and Manabu Okumura. Extracting emotionalpolarity of words using spin model. In Proceedings of ACL-05, 43rd AnnualMeeting of the Association for Computational Linguistics, Ann Arbor, US,2005. Association for Computational Linguistics.

[79] Hiroya Takamura, Takashi Inui, and Manabu Okumura. Latent variable mod-els for semantic orientations of phrases. In Proceedings of the European Chapterof the Association for Computational Linguistics (EACL), 2006.

[80] Matt Thomas, Bo Pang, and Lillian Lee. Get out the vote: Determining sup-port or opposition from Congressional floor-debate transcripts. In Proceed-ings of the Conference on Empirical Methods in Natural Language Processing(EMNLP), pages 327–335, 2006.

[81] Ljupco Todorovski and Saso Dzeroski. Combining classifiers with meta deci-sion trees. Mach. Learn., 50:223–249, March 2003.

[82] Peter Turney. Thumbs up or thumbs down? Semantic orientation appliedto unsupervised classification of reviews. In Proceedings of ACL-02, 40th An-nual Meeting of the Association for Computational Linguistics, pages 417–424,Philadelphia, US, 2002. Association for Computational Linguistics.

[83] Peter D. Turney and Michael L. Littman. Unsupervised learning of semanticorientation from a hundred-billion-word corpus. CoRR, cs.LG/0212012, 2002.

[84] Peter D. Turney and Michael L. Littman. Measuring praise and criticism:Inference of semantic orientation from association. ACM Transactions onInformation Systems, 21(4):315–346, 2003.

[85] W. T. Tutte. Graph theory / W.T. Tutte ; foreword by Crispin St. J.A. Nash-Williams. Cambridge University Press, Cambridge [Cambridgeshire] ; NewYork, NY, USA :, 1984.

Page 129: Sentiment Analysis for the Italian language

BIBLIOGRAPHY 117

[86] Alessandro Valitutti, Carlo Strapparava, and Oliviero Stock. Developing af-fective lexical resources. PsychNology Journal, 2(1):61–83, 2004.

[87] Casey Whitelaw, Navendu Garg, and Shlomo Argamon. Using appraisal tax-onomies for sentiment analysis. In Proceedings of MCLC-05, the 2nd MidwestComputational Linguistic Colloquium, Columbus, US, 2005.

[88] Casey Whitelaw, Navendu Garg, and Shlomo Argamon. Using appraisal tax-onomies for sentiment analysis. In Proceedings of CIKM-05, the ACM SIGIRConference on Information and Knowledge Management, Bremen, DE, 2005.

[89] Janyce Wiebe and Ellen Riloff. Creating subjective and objective sentenceclassifiers from unannotated texts. In Proceeding of CICLing-05, InternationalConference on Intelligent Text Processing and Computational Linguistics., vol-ume 3406 of Lecture Notes in Computer Science, pages 475–486, Mexico City,MX, 2005. Springer-Verlag.

[90] Janyce Wiebe and Theresa Wilson. Learning to disambiguate potentially sub-jective expressions. In Proceedings of the 6th CoNLL, pages 112–118, Taipei,TW, 2002.

[91] Janyce M. Wiebe, Wilson Theresa, Rebecca F. Bruce, Matthew Bell, andMelanie Martin. Learning subjective language. Computational linguistics,30(3):277–308, 2004.

[92] Janyce M. Wiebe, Theresa Wilson, and Matthew Bell. Identifying colloca-tions for recognizing opinions. In Proceedings of the ACL/EACL Workshopon Collocation, Toulouse, FR, 2001.

[93] Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. Recognizing contextualpolarity in phrase-level sentiment analysis. In Proceedings of Human Lan-guage Technologies Conference/Conference on Empirical Methods in NaturalLanguage Processing (HLT/EMNLP 2005), Vancouver, CA, 2005.

[94] Theresa Wilson, Janyce Wiebe, and Rebecca Hwa. Just how mad are you?Finding strong and weak opinion clauses. In Proceedings of AAAI-04, 21stConference of the American Association for Artificial Intelligence, pages 761–769, San Jose, US, 2004. AAAI Press / The MIT Press.

[95] Ian H. Witten and Eibe Frank. Data Mining: Practical Machine LearningTools and Techniques. Morgan Kaufmann Series in Data Management Sys-tems. Morgan Kaufmann, June 2005.

[96] Hui Yang, Luo Si, and Jamie Callan. Knowledge transfer and opinion detectionin the TREC2006 blog track. In Proceedings of TREC, 2006.

Page 130: Sentiment Analysis for the Italian language

118 BIBLIOGRAPHY

[97] Jeonghee Yi, Tetsuya Nasukawa, Razvan Bunescu, and Wayne Niblack. Sen-timent analyzer: Extracting sentiments about a given topic using naturallanguage processing techniques. In Proceeding of ICDM-03, the 3ird IEEEInternational Conference on Data Mining, pages 427– 434, Melbourne, US,2003. IEEE Computer Society.

[98] Hong Yu and Vasileios Hatzivassiloglou. Towards answering opinion ques-tions: Separating facts from opinions and identifying the polarity of opinionsentences. In Michael Collins and Mark Steedman, editors, Proceedings ofEMNLP-03, 8th Conference on Empirical Methods in Natural Language Pro-cessing, pages 129–136, Sapporo, JP, 2003.

[99] Taras Zagibalov. Kinds of features for chinese opinionated information re-trieval. In Proceedings of the 45th Annual Meeting of the ACL: Student Re-search Workshop, ACL ’07, pages 37–42, Morristown, NJ, USA, 2007. Associ-ation for Computational Linguistics.

[100] Taras Zagibalov and John Carroll. Automatic seed word selection for unsu-pervised sentiment classification of chinese text. In Proceedings of the 22ndInternational Conference on Computational Linguistics - Volume 1, COLING’08, pages 1073–1080, Morristown, NJ, USA, 2008. Association for Computa-tional Linguistics.