predicting the semantic orientation of adjectives

Predicting the Semantic Orientation of Adjectives

Vasileios Hatzivassiloglou and Kathleen R. McKeown

Presenter: Gabriel Nicolae

Introduction

Orientation/polarity = direction of deviation from the norm

Nearly synonymous

simple vs. simplistic

Antonyms

hot vs. cold

Introduction

In linguistic constructs such as conjunctions the choice of arguments and connectives are mutually constrained.

The tax proposal was

simple and well-received

simplistic but well-received

simplistic and well-received

by the public.

Exceptions

Goals

Automatically identify antonyms Distinguish near synonyms

How? by retrieving semantic orientation information

using indirect information collected from a large corpus

Why? dictionaries and similar sources (thesauri, WordNet) do

not include explicitly semantic orientation information lack of links between antonyms and synonyms when

they depend on the domain of the discourse

Overview of their approach

Correlation between indicators and semantic orientation

direct indicators: affixes (in-, un-) mostly negatives exceptions: independent, unbiased

indirect indicators: conjunctions conjoined adjectives usually are of the same orientation for

most connectives the situation is reversed for but

fair and legitimate

corrupt and brutal

fair and brutal

corrupt and legitimatevs.

semantically anomalousfrom corpus

General algorithm

1. Extract conjunctions of adjectives and morphological relations

2. Label each two conjoined adjectives as being of the same or different orientation using a log-linear regression model

3. Separate adjectives into two subsets of different orientation using a clustering algorithm

4. The group with the higher average frequency is labeled as positive

Data collection

Corpus: 21 million word 1987 Wall Street Journal

Training data: a set of adjectives with predetermined (hand-annotated) orientation labels (+ or -) 1,336 adjectives (657 +, 679 -)

The training set was validated by four other people 500 adjectives: 89.15% agreement

Test data: 15,048 conjunction tokens 9,296 distinct pairs of conjoined adjectives (type)

Data collection (cont.)

Each conjunction token is classified according to three variables: conjunction used

and, or, but, either-or, neither-nor type of modification

attributive, predicative, appositive, resultative number of the modified noun

singular, plural

Validation of the conjunction hypothesis

Results Their conjunction hypothesis is validated overall and for almost

all individual cases There are small differences in the behavior of conjunctions

between linguistic environments (as represented by the three attributes)

Conjoint antonyms appear far more frequently than expected by chance in conjunctions other than but

Prediction of link type

Baseline 1: always guessing that a link is of the same orientation type => 77.84% accuracy

Baseline 2: Baseline 1 + but exhibits the opposite pattern => 80.82% accuracy

Morphological relationships: Adjectives related in form almost always have different

semantic orientations Highly accurate (97.06%), but applies only to 1,336

labeled adjectives (891,780 possible pairs) E.g. adequate-inadequate, thoughtful-thoughtless

Baseline 1 + Morphology => 78.86% accuracy Baseline 2 + Morphology => 81.75% accuracy

Prediction of link type (cont.)

Log-linear regression model

x: the vector of the observed counts in the various conjunction categories

w: the vector of weights to be learnedy: the response of the system

Using the method of iterative stepwise refinement they selected 9 predictor variables from all 90 possible predictor variables.

Small improvement: 80.97% accuracy (82.05% accuracy using Morphology) but now each prediction is rated between 0 and 1

xwT

e

ey

1

Clustering

Input: a graph of adjectives connected by dissimilarity links Small dissimilarity value => same-orientation link High dissimilarity value => different-orientation link

Method used: apply an iterative optimization procedure on each connected component, based on the exchange method, a non-hierarchical clustering algorithm

Idea: find the partition P such that the objective function Φ is minimized

2

1 ,

),(1

)(i

yxCyxi i

yxdC

P

Labeling the clusters as + or -

In oppositions of gradable adjectives where one member is semantically unmarked, the unmarked member is the most frequent one about 81% of the time

Unmarked => positive orientation almost always

So, label as positive the group that has the highest average frequency of words.

Graph connectivity and performance

They tested how graph connectivity affects the overall performance

predicting the semantic orientation of adjectives

Documents