mtech seminar presentation [iit-bombay]

108
Resources for Sentiment Analysis Seminar Presentation Sagar Ahire 133050073 IIT Bombay 02 May, 2014 Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 1 / 48

Upload: sagar-ahire

Post on 26-Jan-2015

124 views

Category:

Technology


0 download

DESCRIPTION

Seminar presentation made by me for the topic of 'Resources for Sentiment Analysis' at IIT Bombay. Includes a set of bonus slides for additional information which was not actually presented.

TRANSCRIPT

Page 1: MTech Seminar Presentation [IIT-Bombay]

Resources for Sentiment AnalysisSeminar Presentation

Sagar Ahire133050073

IIT Bombay

02 May, 2014

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 1 / 48

Page 2: MTech Seminar Presentation [IIT-Bombay]

Roadmap

1 Introduction

2 Sentiwordnet

3 SO-CAL

4 Wordnet-Affect

5 Indian-Language Sentiwordnets

6 Conclusions

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 2 / 48

Page 3: MTech Seminar Presentation [IIT-Bombay]

Introduction

Roadmap: We Are Here

1 Introduction

2 Sentiwordnet

3 SO-CAL

4 Wordnet-Affect

5 Indian-Language Sentiwordnets

6 Conclusions

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 3 / 48

Page 4: MTech Seminar Presentation [IIT-Bombay]

Introduction Overview

Overview

An overview of today’s presentation:

This presentation covers lexical resources for sentiment analysis.

Four resources are covered, each using a different approach forrepresentation and creation:

Sentiwordnet, created automatically, with 3 graded scores per synsetSO-CAL, created manually, with a graded score per wordWordnet-Affect, created semi-automatically, with affect information foreach synsetIndian-Language Sentiwordnet, created by projecting the EnglishSentiwordnet

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 4 / 48

Page 5: MTech Seminar Presentation [IIT-Bombay]

Introduction Overview

Overview

An overview of today’s presentation:

This presentation covers lexical resources for sentiment analysis.

Four resources are covered, each using a different approach forrepresentation and creation:

Sentiwordnet, created automatically, with 3 graded scores per synsetSO-CAL, created manually, with a graded score per wordWordnet-Affect, created semi-automatically, with affect information foreach synsetIndian-Language Sentiwordnet, created by projecting the EnglishSentiwordnet

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 4 / 48

Page 6: MTech Seminar Presentation [IIT-Bombay]

Introduction Overview

Overview

An overview of today’s presentation:

This presentation covers lexical resources for sentiment analysis.

Four resources are covered, each using a different approach forrepresentation and creation:

Sentiwordnet, created automatically, with 3 graded scores per synset

SO-CAL, created manually, with a graded score per wordWordnet-Affect, created semi-automatically, with affect information foreach synsetIndian-Language Sentiwordnet, created by projecting the EnglishSentiwordnet

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 4 / 48

Page 7: MTech Seminar Presentation [IIT-Bombay]

Introduction Overview

Overview

An overview of today’s presentation:

This presentation covers lexical resources for sentiment analysis.

Four resources are covered, each using a different approach forrepresentation and creation:

Sentiwordnet, created automatically, with 3 graded scores per synsetSO-CAL, created manually, with a graded score per word

Wordnet-Affect, created semi-automatically, with affect information foreach synsetIndian-Language Sentiwordnet, created by projecting the EnglishSentiwordnet

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 4 / 48

Page 8: MTech Seminar Presentation [IIT-Bombay]

Introduction Overview

Overview

An overview of today’s presentation:

This presentation covers lexical resources for sentiment analysis.

Four resources are covered, each using a different approach forrepresentation and creation:

Sentiwordnet, created automatically, with 3 graded scores per synsetSO-CAL, created manually, with a graded score per wordWordnet-Affect, created semi-automatically, with affect information foreach synset

Indian-Language Sentiwordnet, created by projecting the EnglishSentiwordnet

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 4 / 48

Page 9: MTech Seminar Presentation [IIT-Bombay]

Introduction Overview

Overview

An overview of today’s presentation:

This presentation covers lexical resources for sentiment analysis.

Four resources are covered, each using a different approach forrepresentation and creation:

Sentiwordnet, created automatically, with 3 graded scores per synsetSO-CAL, created manually, with a graded score per wordWordnet-Affect, created semi-automatically, with affect information foreach synsetIndian-Language Sentiwordnet, created by projecting the EnglishSentiwordnet

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 4 / 48

Page 10: MTech Seminar Presentation [IIT-Bombay]

Introduction Sentiment Analysis

Sentiment Analysis

Sentiment Analysis: Determining the opinion expressed in a text

Approaches:

Classifier-basedLexicon-based

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 5 / 48

Page 11: MTech Seminar Presentation [IIT-Bombay]

Introduction Sentiment Analysis

Sentiment Analysis

Sentiment Analysis: Determining the opinion expressed in a text

Approaches:

Classifier-basedLexicon-based

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 5 / 48

Page 12: MTech Seminar Presentation [IIT-Bombay]

Introduction Sentiment Analysis

Sentiment Analysis

Sentiment Analysis: Determining the opinion expressed in a text

Approaches:

Classifier-based

Lexicon-based

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 5 / 48

Page 13: MTech Seminar Presentation [IIT-Bombay]

Introduction Sentiment Analysis

Sentiment Analysis

Sentiment Analysis: Determining the opinion expressed in a text

Approaches:

Classifier-basedLexicon-based

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 5 / 48

Page 14: MTech Seminar Presentation [IIT-Bombay]

Introduction Sentiment Analysis

Why Lexicon-based Approach?

The classifier-based approach has the following drawbacks:

Domain Specificity (Example: Movie reviews mentioning ‘writer’,‘plot’, etc.) [Bro01]

Lack of Context (Example: ‘good’ vs ‘not good’ vs ‘not very good’)

The lexicon-based approach aims at solving these problems.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 6 / 48

Page 15: MTech Seminar Presentation [IIT-Bombay]

Introduction Sentiment Analysis

Why Lexicon-based Approach?

The classifier-based approach has the following drawbacks:

Domain Specificity (Example: Movie reviews mentioning ‘writer’,‘plot’, etc.) [Bro01]

Lack of Context (Example: ‘good’ vs ‘not good’ vs ‘not very good’)

The lexicon-based approach aims at solving these problems.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 6 / 48

Page 16: MTech Seminar Presentation [IIT-Bombay]

Introduction Sentiment Analysis

Why Lexicon-based Approach?

The classifier-based approach has the following drawbacks:

Domain Specificity (Example: Movie reviews mentioning ‘writer’,‘plot’, etc.) [Bro01]

Lack of Context (Example: ‘good’ vs ‘not good’ vs ‘not very good’)

The lexicon-based approach aims at solving these problems.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 6 / 48

Page 17: MTech Seminar Presentation [IIT-Bombay]

Introduction Sentiment Lexicons

Sentiment Lexicons

A sentiment lexicon is a sentiment database for language units of the form(lexical unit, sentiment).

Choices for lexical unit:

Word

Word sense

Phrase, etc.

Choices for sentiment:

Fixed categorization into ‘positive’ and ‘negative’

Graded sets like ‘strongly positive’, ‘mildly positive’, ‘neutral’, ‘mildlynegative’, ‘strongly negative’

Score in an interval like [0, 1] or [−1,+1]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 7 / 48

Page 18: MTech Seminar Presentation [IIT-Bombay]

Introduction Sentiment Lexicons

Sentiment Lexicons

A sentiment lexicon is a sentiment database for language units of the form(lexical unit, sentiment).Choices for lexical unit:

Word

Word sense

Phrase, etc.

Choices for sentiment:

Fixed categorization into ‘positive’ and ‘negative’

Graded sets like ‘strongly positive’, ‘mildly positive’, ‘neutral’, ‘mildlynegative’, ‘strongly negative’

Score in an interval like [0, 1] or [−1,+1]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 7 / 48

Page 19: MTech Seminar Presentation [IIT-Bombay]

Introduction Sentiment Lexicons

Sentiment Lexicons

A sentiment lexicon is a sentiment database for language units of the form(lexical unit, sentiment).Choices for lexical unit:

Word

Word sense

Phrase, etc.

Choices for sentiment:

Fixed categorization into ‘positive’ and ‘negative’

Graded sets like ‘strongly positive’, ‘mildly positive’, ‘neutral’, ‘mildlynegative’, ‘strongly negative’

Score in an interval like [0, 1] or [−1,+1]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 7 / 48

Page 20: MTech Seminar Presentation [IIT-Bombay]

Introduction Sentiment Lexicons

Approaches for Creation

Manual

Automatic

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 8 / 48

Page 21: MTech Seminar Presentation [IIT-Bombay]

Sentiwordnet

Roadmap: We Are Here

1 Introduction

2 Sentiwordnet

3 SO-CAL

4 Wordnet-Affect

5 Indian-Language Sentiwordnets

6 Conclusions

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 9 / 48

Page 22: MTech Seminar Presentation [IIT-Bombay]

Sentiwordnet

Introduction to Sentiwordnet

Sentiwordnet [ES06] is an automatically generated sentiment lexicon madeusing Wordnet. Its salient features are:

High coverage

Support for graded sentiment labels

Support for both sentiment classification and subjectivity detection

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 10 / 48

Page 23: MTech Seminar Presentation [IIT-Bombay]

Sentiwordnet

Introduction to Sentiwordnet

Sentiwordnet [ES06] is an automatically generated sentiment lexicon madeusing Wordnet. Its salient features are:

High coverage

Support for graded sentiment labels

Support for both sentiment classification and subjectivity detection

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 10 / 48

Page 24: MTech Seminar Presentation [IIT-Bombay]

Sentiwordnet

Introduction to Sentiwordnet

Sentiwordnet [ES06] is an automatically generated sentiment lexicon madeusing Wordnet. Its salient features are:

High coverage

Support for graded sentiment labels

Support for both sentiment classification and subjectivity detection

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 10 / 48

Page 25: MTech Seminar Presentation [IIT-Bombay]

Sentiwordnet

Introduction to Sentiwordnet

Sentiwordnet [ES06] is an automatically generated sentiment lexicon madeusing Wordnet. Its salient features are:

High coverage

Support for graded sentiment labels

Support for both sentiment classification and subjectivity detection

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 10 / 48

Page 26: MTech Seminar Presentation [IIT-Bombay]

Sentiwordnet Structure

Structure of Sentiwordnet

Sentiwordnet = Wordnet + Sentiment Information.

Each synset s is given three sentiment scores:

Positive score Pos(s)

Negative score Neg(s)

Objective score Obj(s)

Pos(s) +Neg(s) +Obj(s) = 1

Example Synset

beautifula: Pos = 0.75, Neg = 0.00, Obj = 0.25

aURL: http://sentiwordnet.isti.cnr.it/search.php?q=beautiful

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 11 / 48

Page 27: MTech Seminar Presentation [IIT-Bombay]

Sentiwordnet Structure

Structure of Sentiwordnet

Sentiwordnet = Wordnet + Sentiment Information.Each synset s is given three sentiment scores:

Positive score Pos(s)

Negative score Neg(s)

Objective score Obj(s)

Pos(s) +Neg(s) +Obj(s) = 1

Example Synset

beautifula: Pos = 0.75, Neg = 0.00, Obj = 0.25

aURL: http://sentiwordnet.isti.cnr.it/search.php?q=beautiful

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 11 / 48

Page 28: MTech Seminar Presentation [IIT-Bombay]

Sentiwordnet Structure

Structure of Sentiwordnet

Sentiwordnet = Wordnet + Sentiment Information.Each synset s is given three sentiment scores:

Positive score Pos(s)

Negative score Neg(s)

Objective score Obj(s)

Pos(s) +Neg(s) +Obj(s) = 1

Example Synset

beautifula: Pos = 0.75, Neg = 0.00, Obj = 0.25

aURL: http://sentiwordnet.isti.cnr.it/search.php?q=beautiful

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 11 / 48

Page 29: MTech Seminar Presentation [IIT-Bombay]

Sentiwordnet Creation

Creation Steps

The top-level steps in the algorithm to create Sentiwordnet are as follows:

1 Selection of seed set

2 Expansion using Wordnet’s semantic relations

3 Training of a team of ternary classifiers

4 Classification of each Wordnet synset using the classifiers

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 12 / 48

Page 30: MTech Seminar Presentation [IIT-Bombay]

Sentiwordnet Creation

Creation Steps

The top-level steps in the algorithm to create Sentiwordnet are as follows:

1 Selection of seed set

2 Expansion using Wordnet’s semantic relations

3 Training of a team of ternary classifiers

4 Classification of each Wordnet synset using the classifiers

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 12 / 48

Page 31: MTech Seminar Presentation [IIT-Bombay]

Sentiwordnet Creation

Creation Steps

The top-level steps in the algorithm to create Sentiwordnet are as follows:

1 Selection of seed set

2 Expansion using Wordnet’s semantic relations

3 Training of a team of ternary classifiers

4 Classification of each Wordnet synset using the classifiers

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 12 / 48

Page 32: MTech Seminar Presentation [IIT-Bombay]

Sentiwordnet Creation

Creation Steps

The top-level steps in the algorithm to create Sentiwordnet are as follows:

1 Selection of seed set

2 Expansion using Wordnet’s semantic relations

3 Training of a team of ternary classifiers

4 Classification of each Wordnet synset using the classifiers

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 12 / 48

Page 33: MTech Seminar Presentation [IIT-Bombay]

Sentiwordnet Creation

Creation Steps

The top-level steps in the algorithm to create Sentiwordnet are as follows:

1 Selection of seed set

2 Expansion using Wordnet’s semantic relations

3 Training of a team of ternary classifiers

4 Classification of each Wordnet synset using the classifiers

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 12 / 48

Page 34: MTech Seminar Presentation [IIT-Bombay]

SO-CAL

Roadmap: We Are Here

1 Introduction

2 Sentiwordnet

3 SO-CAL

4 Wordnet-Affect

5 Indian-Language Sentiwordnets

6 Conclusions

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 13 / 48

Page 35: MTech Seminar Presentation [IIT-Bombay]

SO-CAL

Introduction to SO-CAL

SO-CAL is a system that uses a manually-constructed lexicon. Its salientfeatures are:

Highly detailed lexicon

Graded sentiment label

Low coverage, but high accuracy

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 14 / 48

Page 36: MTech Seminar Presentation [IIT-Bombay]

SO-CAL

Introduction to SO-CAL

SO-CAL is a system that uses a manually-constructed lexicon. Its salientfeatures are:

Highly detailed lexicon

Graded sentiment label

Low coverage, but high accuracy

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 14 / 48

Page 37: MTech Seminar Presentation [IIT-Bombay]

SO-CAL

Introduction to SO-CAL

SO-CAL is a system that uses a manually-constructed lexicon. Its salientfeatures are:

Highly detailed lexicon

Graded sentiment label

Low coverage, but high accuracy

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 14 / 48

Page 38: MTech Seminar Presentation [IIT-Bombay]

SO-CAL

Introduction to SO-CAL

SO-CAL is a system that uses a manually-constructed lexicon. Its salientfeatures are:

Highly detailed lexicon

Graded sentiment label

Low coverage, but high accuracy

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 14 / 48

Page 39: MTech Seminar Presentation [IIT-Bombay]

SO-CAL Structure

Features Used

SO-CAL classifies words into various features and treats each featuredifferently in the lexicon. They are:

Adjectives

Nouns, Verbs, Adverbs and Multiwords

Intensifiers and Downtoners

Negation

Irrealis Blocking

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 15 / 48

Page 40: MTech Seminar Presentation [IIT-Bombay]

SO-CAL Structure

Features Used

SO-CAL classifies words into various features and treats each featuredifferently in the lexicon. They are:

Adjectives

Nouns, Verbs, Adverbs and Multiwords

Intensifiers and Downtoners

Negation

Irrealis Blocking

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 15 / 48

Page 41: MTech Seminar Presentation [IIT-Bombay]

SO-CAL Structure

Features Used

SO-CAL classifies words into various features and treats each featuredifferently in the lexicon. They are:

Adjectives

Nouns, Verbs, Adverbs and Multiwords

Intensifiers and Downtoners

Negation

Irrealis Blocking

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 15 / 48

Page 42: MTech Seminar Presentation [IIT-Bombay]

SO-CAL Structure

Features Used

SO-CAL classifies words into various features and treats each featuredifferently in the lexicon. They are:

Adjectives

Nouns, Verbs, Adverbs and Multiwords

Intensifiers and Downtoners

Negation

Irrealis Blocking

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 15 / 48

Page 43: MTech Seminar Presentation [IIT-Bombay]

SO-CAL Structure

Features Used

SO-CAL classifies words into various features and treats each featuredifferently in the lexicon. They are:

Adjectives

Nouns, Verbs, Adverbs and Multiwords

Intensifiers and Downtoners

Negation

Irrealis Blocking

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 15 / 48

Page 44: MTech Seminar Presentation [IIT-Bombay]

SO-CAL Structure

Features Used

SO-CAL classifies words into various features and treats each featuredifferently in the lexicon. They are:

Adjectives

Nouns, Verbs, Adverbs and Multiwords

Intensifiers and Downtoners

Negation

Irrealis Blocking

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 15 / 48

Page 45: MTech Seminar Presentation [IIT-Bombay]

SO-CAL Structure

Structure of SO-CAL

Sentiment scoring:

Words are scored in [−5,+5]

Intensifiers and negation further act upon these scores

Examples

good: +3monstrosity: −5masterpiece: +5

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 16 / 48

Page 46: MTech Seminar Presentation [IIT-Bombay]

SO-CAL Structure

Structure of SO-CAL

Sentiment scoring:

Words are scored in [−5,+5]

Intensifiers and negation further act upon these scores

Examples

good: +3monstrosity: −5masterpiece: +5

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 16 / 48

Page 47: MTech Seminar Presentation [IIT-Bombay]

SO-CAL Structure

Structure of SO-CAL

Sentiment scoring:

Words are scored in [−5,+5]

Intensifiers and negation further act upon these scores

Examples

good: +3monstrosity: −5masterpiece: +5

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 16 / 48

Page 48: MTech Seminar Presentation [IIT-Bombay]

Wordnet-Affect

Roadmap: We Are Here

1 Introduction

2 Sentiwordnet

3 SO-CAL

4 Wordnet-Affect

5 Indian-Language Sentiwordnets

6 Conclusions

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 17 / 48

Page 49: MTech Seminar Presentation [IIT-Bombay]

Wordnet-Affect

Introduction to Wordnet-Affect

Wordnet-Affect [SV04] is a semi-automatically generated sentiment lexiconmade using Wordnet. It associates affective information with eachsynset. Its salient features are:

Highly detailed

Ability to handle sentiment differently depending on emotion

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 18 / 48

Page 50: MTech Seminar Presentation [IIT-Bombay]

Wordnet-Affect

Introduction to Wordnet-Affect

Wordnet-Affect [SV04] is a semi-automatically generated sentiment lexiconmade using Wordnet. It associates affective information with eachsynset. Its salient features are:

Highly detailed

Ability to handle sentiment differently depending on emotion

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 18 / 48

Page 51: MTech Seminar Presentation [IIT-Bombay]

Wordnet-Affect Structure

Structure of Wordnet-Affect

Wordnet-Affect = Wordnet + Affect Information.

Affect is represented using the following:

An a-label which represents the emotion,

The valency which indicates the sentiment.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 19 / 48

Page 52: MTech Seminar Presentation [IIT-Bombay]

Wordnet-Affect Structure

Structure of Wordnet-Affect

Wordnet-Affect = Wordnet + Affect Information.Affect is represented using the following:

An a-label which represents the emotion,

The valency which indicates the sentiment.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 19 / 48

Page 53: MTech Seminar Presentation [IIT-Bombay]

Wordnet-Affect Structure

Structure of Wordnet-Affect

The a-label is a tree of emotions starting at a root node with eachleaf node corresponding to a synset.

The valency can be any of positive, negative, neutral or ambiguous.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 20 / 48

Page 54: MTech Seminar Presentation [IIT-Bombay]

Wordnet-Affect Structure

Structure of Wordnet-Affect

The a-label is a tree of emotions starting at a root node with eachleaf node corresponding to a synset.

The valency can be any of positive, negative, neutral or ambiguous.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 20 / 48

Page 55: MTech Seminar Presentation [IIT-Bombay]

Wordnet-Affect Structure

root

mental-state

cognitive-state affective-state

mood emotion

positive-emotion

joy

elation

love

worship

negative-emotion

sadness

melancholy

shame

embarrassment

. . .

. . .

physical-state . . .

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 21 / 48

Page 56: MTech Seminar Presentation [IIT-Bombay]

Wordnet-Affect Creation

Creation Steps

Wordnet-Affect was created using the following steps:

Manual creation of initial resource

Automatic expansion using Wordnet relations

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 22 / 48

Page 57: MTech Seminar Presentation [IIT-Bombay]

Wordnet-Affect Creation

Creation Steps

Wordnet-Affect was created using the following steps:

Manual creation of initial resource

Automatic expansion using Wordnet relations

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 22 / 48

Page 58: MTech Seminar Presentation [IIT-Bombay]

Wordnet-Affect Creation

Creation Steps

Wordnet-Affect was created using the following steps:

Manual creation of initial resource

Automatic expansion using Wordnet relations

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 22 / 48

Page 59: MTech Seminar Presentation [IIT-Bombay]

Indian-Language Sentiwordnets

Roadmap: We Are Here

1 Introduction

2 Sentiwordnet

3 SO-CAL

4 Wordnet-Affect

5 Indian-Language Sentiwordnets

6 Conclusions

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 23 / 48

Page 60: MTech Seminar Presentation [IIT-Bombay]

Indian-Language Sentiwordnets

Introduction to Indian-Language Sentiwordnets

Indian-language Sentiwordnets can be created using Wordnet projection[JRB10]. This approach has the following salient features:

Easy to create once backing resources are available

No reduplication of effort

Use of tried-and-tested representations

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 24 / 48

Page 61: MTech Seminar Presentation [IIT-Bombay]

Indian-Language Sentiwordnets

Introduction to Indian-Language Sentiwordnets

Indian-language Sentiwordnets can be created using Wordnet projection[JRB10]. This approach has the following salient features:

Easy to create once backing resources are available

No reduplication of effort

Use of tried-and-tested representations

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 24 / 48

Page 62: MTech Seminar Presentation [IIT-Bombay]

Indian-Language Sentiwordnets Creation

Creation Steps

The process of projecting a Sentiwordnet has the following steps:

Fetch a synset from the English Sentiwordnet.

Find the corresponding Hindi synset using Indowordnet.

Assign sentiment scores from English synset to Hindi synset.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 25 / 48

Page 63: MTech Seminar Presentation [IIT-Bombay]

Indian-Language Sentiwordnets Creation

Creation Steps

The process of projecting a Sentiwordnet has the following steps:

Fetch a synset from the English Sentiwordnet.

Find the corresponding Hindi synset using Indowordnet.

Assign sentiment scores from English synset to Hindi synset.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 25 / 48

Page 64: MTech Seminar Presentation [IIT-Bombay]

Indian-Language Sentiwordnets Creation

Creation Steps

The process of projecting a Sentiwordnet has the following steps:

Fetch a synset from the English Sentiwordnet.

Find the corresponding Hindi synset using Indowordnet.

Assign sentiment scores from English synset to Hindi synset.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 25 / 48

Page 65: MTech Seminar Presentation [IIT-Bombay]

Conclusions

Roadmap: We Are Here

1 Introduction

2 Sentiwordnet

3 SO-CAL

4 Wordnet-Affect

5 Indian-Language Sentiwordnets

6 Conclusions

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 26 / 48

Page 66: MTech Seminar Presentation [IIT-Bombay]

Conclusions

A Comparison of the Resources

Criterion SWN SO-CAL WN-Affect IL-SWN

Sentiment 3 x [0, 1] [−5,+5] Affect 3 x [0, 1]Lexical Unit Synset Word Synset SynsetBacking Resource Wordnet None Wordnet SWN + In-

dowordnetCreation Automatic Manual Automatic ProjectionNo of Entries 117,000 5,000 900 16,000

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 27 / 48

Page 67: MTech Seminar Presentation [IIT-Bombay]

Conclusions

Concluding Remarks

To conclude, there are three choices in making a sentiment lexicon:

Creation Approach: Manual, Automatic, Semi-Automatic orProjection

Lexical Unit: Word, Synset or Higher Representations

Sentiment: Labels, Graded Scores or Affect Information

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 28 / 48

Page 68: MTech Seminar Presentation [IIT-Bombay]

Conclusions

Concluding Remarks

To conclude, there are three choices in making a sentiment lexicon:

Creation Approach: Manual, Automatic, Semi-Automatic orProjection

Lexical Unit: Word, Synset or Higher Representations

Sentiment: Labels, Graded Scores or Affect Information

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 28 / 48

Page 69: MTech Seminar Presentation [IIT-Bombay]

Conclusions

Concluding Remarks

To conclude, there are three choices in making a sentiment lexicon:

Creation Approach: Manual, Automatic, Semi-Automatic orProjection

Lexical Unit: Word, Synset or Higher Representations

Sentiment: Labels, Graded Scores or Affect Information

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 28 / 48

Page 70: MTech Seminar Presentation [IIT-Bombay]

Conclusions

Concluding Remarks

To conclude, there are three choices in making a sentiment lexicon:

Creation Approach: Manual, Automatic, Semi-Automatic orProjection

Lexical Unit: Word, Synset or Higher Representations

Sentiment: Labels, Graded Scores or Affect Information

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 28 / 48

Page 71: MTech Seminar Presentation [IIT-Bombay]

Conclusions

Concluding Remarks: Creation Approach

Manual Approach Automatic Approach

High annotation accuracy Low annotation accuracyHigh time investment Low time investmentMore details supported Less details supported

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 29 / 48

Page 72: MTech Seminar Presentation [IIT-Bombay]

Conclusions

Concluding Remarks: Lexical Unit

Word Synset

Unreliable for polysemous words Reliable for polysemous wordsNo pre-processing required Requires WSDProjection is comparatively difficult Projection is comparatively easier

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 30 / 48

Page 73: MTech Seminar Presentation [IIT-Bombay]

Conclusions

Concluding Remarks: Sentiment

Graded scores have been shown to be better than mere labels in general.Moreover, a graded score resource can always be converted to alabel-based resource.Affect information can help in specialized circumstances.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 31 / 48

Page 74: MTech Seminar Presentation [IIT-Bombay]

Conclusions

Future Work

Possible directions in the future:

Automatic resources for higher-level lexical units like phrases, trees,etc.

Manual resources for synsets

Manual lexicons for Indian languages

Techniques for building dynamic resources to incorporate ‘netspeak’and other slang

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 32 / 48

Page 75: MTech Seminar Presentation [IIT-Bombay]

Conclusions

Future Work

Possible directions in the future:

Automatic resources for higher-level lexical units like phrases, trees,etc.

Manual resources for synsets

Manual lexicons for Indian languages

Techniques for building dynamic resources to incorporate ‘netspeak’and other slang

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 32 / 48

Page 76: MTech Seminar Presentation [IIT-Bombay]

Conclusions

References I

Julian Brooke, A semantic approach to automatic text sentimentanalysis, M.A. thesis, Stanford University, 2001.

Andrea Esuli and Fabrizio Sebastiani, SentiWordNet: A publiclyavailable lexical resource for opinion mining, Proceedings of the 5thConference on Language Resources and Evaluation (LREC-06), 2006,pp. 417–422.

Andrea Esuli, Automatic generation of lexical resources for opinionmining: Models, algorithms and applications, Ph.D. thesis, Universitadi Pisa, 2008.

Christiane Fellbaum, Wordnet: An electronic lexical database, ABradford Book, 1998.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 33 / 48

Page 77: MTech Seminar Presentation [IIT-Bombay]

Conclusions

References II

Vasileios Hatzivassiloglou and Kathleen R. McKeown, Predicting thesemantic orientation of adjectives, Proceedings of the 35th AnnualMeeting of the Association for Computational Linguistics and EighthConference of the European Chapter of the Association forComputational Linguistics, Association for Computational Linguistics,1997, pp. 174–181.

Aditya Joshi, Balamurali A R, and Pushpak Bhattacharyya, Afall-back strategy for sentiment analysis in hindi: a case study,Proceedings of ICON 2010: 8th International Conference on NaturalLanguage Processing, Macmillan Publishers, India, 2010.

Jaap Kamps, Maarten Marx, Robert J. Mokken, and Maartende Rijke, Using wordnet to measure semantic orientations ofadjectives, Proceedings of LREC-04, 4th International Conference onLanguage Resources and Evaluation, 2004, pp. 1115–1118.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 34 / 48

Page 78: MTech Seminar Presentation [IIT-Bombay]

Conclusions

References III

Ellen Riloff and Janyce Wiebe, Learning extraction patterns forsubjective expressions, Proceedings of the 2003 Conference onEmpirical Methods in Natural Language Processing, Association forComputational Linguistics, 2003, pp. 105–112.

Carlo Strapparava and Alessandro Valitutti, WordNet-Affect: anaffective extension of WordNet, Proceedings of the 4th InternationalConference on Language Resources and Evaluation (LREC-04), 2004,pp. 1083–1086.

Peter D. Turney and Michael L. Littman, Measuring praise andcriticism: Inference of semantic orientation from association, ACMTransactions on Information Systems 21 (2003), no. 4, 315–346.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 35 / 48

Page 79: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Wordnet

Wordnet

Wordnet [Fel98] is a lexical database organized by word sense. Thefundamental unit of storage is called a synset.

An Example Synset

brilliant, superba: of surpassing excellence“a brilliant performance”; “a superb actor”

aURL: http://wordnetweb.princeton.edu/perl/webwn?s=brilliant

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 36 / 48

Page 80: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Wordnet

Wordnet

Wordnet [Fel98] is a lexical database organized by word sense. Thefundamental unit of storage is called a synset.

An Example Synset

brilliant, superba: of surpassing excellence“a brilliant performance”; “a superb actor”

aURL: http://wordnetweb.princeton.edu/perl/webwn?s=brilliant

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 36 / 48

Page 81: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Wordnet

Semantic Relations in Wordnet

Wordnet synsets are linked to each other by relations called semanticrelations. Some of them are:

Antonymy

Meronymy

Hypernymy

Hyponymy

Similar to, etc.

These relations are helpful in creating the training set for classifyingsynsets to create Sentiwordnet.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 37 / 48

Page 82: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Wordnet

Semantic Relations in Wordnet

Wordnet synsets are linked to each other by relations called semanticrelations. Some of them are:

Antonymy

Meronymy

Hypernymy

Hyponymy

Similar to, etc.

These relations are helpful in creating the training set for classifyingsynsets to create Sentiwordnet.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 37 / 48

Page 83: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Wordnet

Semantic Relations in Wordnet

Wordnet synsets are linked to each other by relations called semanticrelations. Some of them are:

Antonymy

Meronymy

Hypernymy

Hyponymy

Similar to, etc.

These relations are helpful in creating the training set for classifyingsynsets to create Sentiwordnet.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 37 / 48

Page 84: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Background

Sentiment Classification

Initial work that automatically detected the sentiment of a word led totoday’s modern lexicons. This included:

Use of conjunction-separated adjectives [HM97]

PMI-based Extraction using Web Queries [TL03]

Graph Expansion using Wordnet [KMMdR04]

Classification using Wordnet Glosses [Esu08]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 38 / 48

Page 85: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Background

Sentiment Classification

Initial work that automatically detected the sentiment of a word led totoday’s modern lexicons. This included:

Use of conjunction-separated adjectives [HM97]

PMI-based Extraction using Web Queries [TL03]

Graph Expansion using Wordnet [KMMdR04]

Classification using Wordnet Glosses [Esu08]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 38 / 48

Page 86: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Background

Sentiment Classification

Initial work that automatically detected the sentiment of a word led totoday’s modern lexicons. This included:

Use of conjunction-separated adjectives [HM97]

PMI-based Extraction using Web Queries [TL03]

Graph Expansion using Wordnet [KMMdR04]

Classification using Wordnet Glosses [Esu08]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 38 / 48

Page 87: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Background

Sentiment Classification

Initial work that automatically detected the sentiment of a word led totoday’s modern lexicons. This included:

Use of conjunction-separated adjectives [HM97]

PMI-based Extraction using Web Queries [TL03]

Graph Expansion using Wordnet [KMMdR04]

Classification using Wordnet Glosses [Esu08]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 38 / 48

Page 88: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Background

Sentiment Classification

Initial work that automatically detected the sentiment of a word led totoday’s modern lexicons. This included:

Use of conjunction-separated adjectives [HM97]

PMI-based Extraction using Web Queries [TL03]

Graph Expansion using Wordnet [KMMdR04]

Classification using Wordnet Glosses [Esu08]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 38 / 48

Page 89: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Background

Subjectivity Detection

Work that identifies whether a term is indeed subjective is necessary tofilter out objective words from sentiment classification. This includes:

Adapting Wordnet Glosses to Subjectivity Detection [Esu08]

Bootstrapping Subjective Expressions from a Corpus [RW03]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 39 / 48

Page 90: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Background

Subjectivity Detection

Work that identifies whether a term is indeed subjective is necessary tofilter out objective words from sentiment classification. This includes:

Adapting Wordnet Glosses to Subjectivity Detection [Esu08]

Bootstrapping Subjective Expressions from a Corpus [RW03]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 39 / 48

Page 91: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Background

Subjectivity Detection

Work that identifies whether a term is indeed subjective is necessary tofilter out objective words from sentiment classification. This includes:

Adapting Wordnet Glosses to Subjectivity Detection [Esu08]

Bootstrapping Subjective Expressions from a Corpus [RW03]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 39 / 48

Page 92: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Structure of SO-CAL

Adjectives

Adjectives were collected from a 500-document corpus and annotated witha sentiment score from −5 to +5.

Examples

good: +3sleazy: −3

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 40 / 48

Page 93: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Structure of SO-CAL

Nouns, Verbs, Adverbs, Multiwords

This was extended to other parts of speech and multiword expressions, fora total of about 5,000 words.

Examples

monstrosity: −5masterpiece: +5inspire: +2funny: +2 vs. act funny: −1

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 41 / 48

Page 94: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Structure of SO-CAL

Intensifiers and Downtoners

Intensifiers are words that increase sentiment intensity while downtonersare words that reduce sentiment intensity. For example extraordinarily andsomewhat.

Intensifiers and downtoners are modeled as percentage modifiers.

Examples

slightly: −50%extraordinarily: +50%

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 42 / 48

Page 95: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Structure of SO-CAL

Intensifiers and Downtoners

Intensifiers are words that increase sentiment intensity while downtonersare words that reduce sentiment intensity. For example extraordinarily andsomewhat.Intensifiers and downtoners are modeled as percentage modifiers.

Examples

slightly: −50%extraordinarily: +50%

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 42 / 48

Page 96: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Structure of SO-CAL

Negation

Negation is modeled as a numeric shift of value 4 towards the oppositesentiment.

Examples

good: +3 ⇒ not good: −1atrocious: −5 ⇒ not atrocious: −1

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 43 / 48

Page 97: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Structure of SO-CAL

Irrealis Blocking

An irrealis marker is a word that indicates that the sentiment may not bereliable because the event hasn’t actually happened. For example, ‘would’,‘expect’, ‘if’, quotation marks, etc.

Sentences with irrealis markers are ignored for sentiment analysis.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 44 / 48

Page 98: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Structure of SO-CAL

Irrealis Blocking

An irrealis marker is a word that indicates that the sentiment may not bereliable because the event hasn’t actually happened. For example, ‘would’,‘expect’, ‘if’, quotation marks, etc.Sentences with irrealis markers are ignored for sentiment analysis.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 44 / 48

Page 99: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Sentiwordnet Creation

Seed Set

Two seed sets are created:

Lp for positive synsets

Ln for negative synsets

Each synset representation consists of:

The terms

The defninition

The sample phrases

Explicit indication of negation

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 45 / 48

Page 100: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Sentiwordnet Creation

Seed Set

Two seed sets are created:

Lp for positive synsets

Ln for negative synsets

Each synset representation consists of:

The terms

The defninition

The sample phrases

Explicit indication of negation

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 45 / 48

Page 101: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Sentiwordnet Creation

Wordnet Expansion

Relations of Wordnet used for expansion:

Direct antonymy

Similarity

Derived from

Pertains to

Attribute

Also see

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 46 / 48

Page 102: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Sentiwordnet Creation

Wordnet Expansion

Relations of Wordnet used for expansion:

Direct antonymy

Similarity

Derived from

Pertains to

Attribute

Also see

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 46 / 48

Page 103: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Sentiwordnet Creation

Classifiers

8 classifiers were created differing in:

No of iterations of expansion (0, 2, 4, 6)

Learning algorithm (SVM, Rocchio)

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 47 / 48

Page 104: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Sentiwordnet Creation

Classifiers

8 classifiers were created differing in:

No of iterations of expansion (0, 2, 4, 6)

Learning algorithm (SVM, Rocchio)

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 47 / 48

Page 105: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Sentiwordnet Creation

Classifiers

8 classifiers were created differing in:

No of iterations of expansion (0, 2, 4, 6)

Learning algorithm (SVM, Rocchio)

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 47 / 48

Page 106: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Sentiwordnet Creation

Classifiers

Each ternary classifier is a sum of 2 binary classifiers:

Positive vs. Not Positive

Negative vs. Not Negative

The results are combined as:Positive Not Positive

Negative Objective Negative

Not Negative Positive Objective

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 48 / 48

Page 107: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Sentiwordnet Creation

Classifiers

Each ternary classifier is a sum of 2 binary classifiers:

Positive vs. Not Positive

Negative vs. Not Negative

The results are combined as:Positive Not Positive

Negative Objective Negative

Not Negative Positive Objective

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 48 / 48

Page 108: MTech Seminar Presentation [IIT-Bombay]

Additional Slides Sentiwordnet Creation

Classifiers

Each ternary classifier is a sum of 2 binary classifiers:

Positive vs. Not Positive

Negative vs. Not Negative

The results are combined as:Positive Not Positive

Negative Objective Negative

Not Negative Positive Objective

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 48 / 48