automated classification of book blurbs according to the...

Post on 25-Aug-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Automated Classification of Book

Blurbs According to the Emotional Tags

of the Social Network Zazie

V. FRANZONI, V. POGGIONI AND F. ZOLLO

DIPARTIMENTO DI MATEMATICA E INFORMATICA

UNIVERSITÀ DEGLI STUDI DI PERUGIA

Zazie

Zazie is an Italian social network for book readers that introduces a new

dimension on book characterization, the emotional icon tagging.

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Zazie was created by

Digit-Pub with Marco

Ghezzi and Barbara Sgarzi

on the model of Anobii,

with the introduction of

emotional icon tags.

lightningread again

Zazie’s Mood Icons

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Zazie’s Mood Icons

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

smile

sad

love

angry

think

cry

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

smile

sad

love

angry

think

cry

Automated classification of books

?

Always present:• Title• Author• Editor• Pages• Blurb

Which information can we use?

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

for a supervised learning approach

?

The Idea

Emotional automated classification according toZazie.

The necessity arises from the presence of a lot ofbooks that have not been tagged yet by the userswith the goal of an emotion-driven search.

Lexical analysis of the book blurbs.

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

The Approach

Correlation between the characteristics of a bookblurb and the emotional icons associated to thebook by the users.

Book blurbs can contain relevant emotionalinformation.

Blurbs are written to attract the reader, emphasizingsome book aspects with the use of emotional terms.

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

System Architecture

Zazie DB

Filtering

• Book filtering• Grouping• Tag filtering• Mood filtering

Filtered DB

Blurb analysis

• Preprocessing:- Stop words- Tokenization- Lemmatization

• Emotion extraction

DatasetClassifiers

• J48• BFTree• …

Classification Model

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Architecture of the model

Zazie DB

Filtering

• Book filtering• Grouping• Tag filtering• Mood filtering

Filtered DB

Blurb analysis

• Preprocessing:- Stop words- Tokenization- Lemmatization

• Emotion extraction

DatasetClassifiers

• J48• BFTree• …

Classification Model

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Zazie Database

38374 records:

associations of tags in the MOOD set to books.

8 fields:

(user_id, book_isbn, mood)

(book_isbn, title,pages, publisher, author,blurb)

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Architecture of the model

Zazie DB

Filtering

• Book filtering• Grouping• Tag filtering• Mood filtering

Filtered DB

Blurb analysis

• Preprocessing:- Stop words- Tokenization- Lemmatization

• Emotion extraction

DatasetClassifiers

• J48• BFTree• …

Classification Model

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Architecture of the model

Zazie DB

Filtering

• Book filtering• Grouping• Tag filtering• Mood filtering

Filtered DB

Blurb analysis

• Preprocessing:- Stop words- Tokenization- Lemmatization

• Emotion extraction

DatasetClassifiers

• J48• BFTree• …

Classification Model

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

DB filtering

Book filtering: books, most tagged by the community

Grouping: tag, count for each book (book_isbn, mood)

Tag filtering: books, predominant moods (standard

deviation)

Mood filtering: emotional moodsangry,cry,love,sad,smile,think

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Book filtering

Distribution of the records with respect to the MOODS, after the book filtering step.

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Tag filtering

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Architecture of the model

Zazie DB

Filtering

• Book filtering• Grouping• Tag filtering• Mood filtering

Filtered DB

Blurb analysis

• Preprocessing:- Stop words- Tokenization- Lemmatization

• Emotion extraction

DatasetClassifiers

• J48• BFTree• …

Classification Model

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Architecture of the model

Zazie DB

Filtering

• Book filtering• Grouping• Tag filtering• Mood filtering

Filtered DB

Blurb analysis

• Preprocessing:- Stop words- Tokenization- Lemmatization

• Emotion extraction

DatasetClassifiers

• J48• BFTree• …

Classification Model

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Filtered Database

Distribution of records, with respect to selected MOODs at the end of the filtering steps.

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

40%

37%

9%

6%

300 records

High variance

Unbalanced distribution

Preliminary dataset

A new dataset is under testing

Architecture of the model

Zazie DB

Filtering

• Book filtering• Grouping• Tag filtering• Mood filtering

Filtered DB

Blurb analysis

• Preprocessing:- Stop words- Tokenization- Lemmatization

• Emotion extraction

DatasetClassifiers

• J48• BFTree• …

Classification Model

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Architecture of the model

Zazie DB

Filtering

• Book filtering• Grouping• Tag filtering• Mood filtering

Filtered DB

Blurb analysis

• Preprocessing:- Stop words- Tokenization- Lemmatization

• Emotion extraction

DatasetClassifiers

• J48• BFTree• …

Classification Model

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Preprocessing

Normalization of the DB:

1. Stop words deletion e.g, articles and preposition

2. Tokenization ignoring punctuation marks and digits

3. Lemmatization using Morph-it! Reducing noise due to

variabilities such as singolar or plural, male or female etc.

All lemmata are kept in case of ambiguity.

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Architecture of the model

Zazie DB

Filtering

• Book filtering• Grouping• Tag filtering• Mood filtering

Filtered DB

Blurb analysis

• Preprocessing:- Stop words- Tokenization- Lemmatization

• Emotion extraction

DatasetClassifiers

• J48• BFTree• …

Classification Model

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Architecture of the model

Zazie DB

Filtering

• Book filtering• Grouping• Tag filtering• Mood filtering

Filtered DB

Blurb analysis

• Preprocessing:- Stop words- Tokenization- Lemmatization

• Emotion extraction

DatasetClassifiers

• J48• BFTree• …

Classification Model

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Emotion extraction

Synset retrieval (WordNet/MultiWordNet) for each lemma.

Exploitation of the affective domain WordNet-Affect to

associate an emotion to each synset.

Terms which don’t convey emotional information are

filtered out.

Multiple occurrences of the same emotion are counted.

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

WordNet-Affect

Emotional hierarchyof WordNet-Affect(296 nodes)

…too finely pronged!

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Emotion reduction

Two techniques were implemented:

To the third level in WordNet-Affect hierarchy

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

WordNet-Affect

Third level ofemotional hierarchyof WordNet-Affect(32 nodes)

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Emotion reduction

Two techniques were implemented:

To the third level in WordNet-Affect hierarchy

To an extended set of Ekman model of emotions:

anger, disgust, fear, happiness, sadness, surprise

+

neutral, ambiguous

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Emotion extraction

Example:

Emotions extracted from the blurb of the book «The Count of

Montecristo» by Alexandre Dumas (emotion[#occ]).

anxiety[3], enthusiasm[1], love[1], affection[1],

joy[1], negative-fear[2], general-dislike[1]

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Architecture of the model

Zazie DB

Filtering

• Book filtering• Grouping• Tag filtering• Mood filtering

Filtered DB

Blurb analysis

• Preprocessing:- Stop words- Tokenization- Lemmatization

• Emotion extraction

DatasetClassifiers

• J48• BFTree• …

Classification Model

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Architecture of the model

Zazie DB

Filtering

• Book filtering• Grouping• Tag filtering• Mood filtering

Filtered DB

Blurb analysis

• Preprocessing:- Stop words- Tokenization- Lemmatization

• Emotion extraction

DatasetClassifiers

• J48• BFTree• …

Classification Model

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Dataset

Selected features for book representation:

Author (nominal attribute)

Emotions extracted from the blurb (32 or 8 numerical

attributes)

Mood (nominal attribute): class attribute

Publisher, pages attributes were discarded as

representative of a specific edition.

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Classification model building

Multiclass classification model: each book is associated to

one of the 6 selected moods angry,cry,love,sad,smile,think

Classification models were built by means of Weka software,

using different machine learning algorithms:

Decision tree

Decision rules

Bayesian classifiers

Random forest

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Classification model evaluation

Cross validation technique with ten folds, in particular algorithms based

on decision trees, without pruning showed the best results for accuracy,

precision and recall.

Decision trees also give a more readable model.

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

TP: totally right classifications

N: #instances

NC: #classes

For each class i:TPi: true positive

FPi: false positiveFNi: false negative

Experiments

Best accuracy levels obtained with J48 and BFTree.

Classification results with respect to selectedemotional MOODs

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Conclusions

Experiments are encouraging, considering ongoing

improvements.

The blurb is confirmed to be a good source of

emotional information about a book, to be analyzed

with the aim of sentiment analysis and emotion

recognition.

Zazie provides directly a emotional model of classes:

we don’t need a manually annotation.

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Further developments

Dataset improvement in both preprocessing/filtering

(use of web-based proximity measures) and emotion

extraction with ontology-driven approach that uses

the ArsEmotica ontology

Binary classification

Feedback process from Zazie’s side

Extension to a multilabel classification

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

Questions and comments

poggioni@dmi.unipg.it

valentina.franzoni@dmi.unipg.it

fabiana.zollo@gmail.com

V. Franzoni, V. Poggioni and F. Zollo, Dipartimento di Matematica e Informatica, Università degli Studi di PerugiaAutomated Classification of Book Blurbs According to the Emotional Tags of the Social Network Zazie.

top related