the italian hate map: semantic content analytics for social good

Post on 21-Jan-2018

621 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Italian Hate Map: semantic content analytics for social good

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis (Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group)

I-CiTies 2015 2015 CINI Annual Workshop on ICT for Smart Cities and Communities

Palermo (Italy) - October 29-30, 2015

2Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

3

The Italian HateMap

http://users.humboldt.edu/mstephens/hate/hate_map.html

Inspired by the Hate Map built by

the Humboldt University

joint research with a psychologists team of Rome University and a

no-profit agency focused on human

rights

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

4

http://users.humboldt.edu/mstephens/hate/hate_map.html

Insight:To aggregate rough people-based data in order to analyze

complex phenomena.

The Italian HateMap

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

5(Not a new idea) Map of cholera in London, 1854

red = cholera cases blue = water

The Italian HateMap

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

6

Research Question:Is it possible to extract and process social media

to detect intolerant content posted on social networks and identify the most at-risk areas of the

Italian country?

The Italian HateMap

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

7

A framework for real-time Semantic Analysis of Social Streams

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

8

CrowdPulse

Social Data Extraction

features

Semantic Tagging

Sentiment Analysis Processing & VisualizationCataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

9

workflowCrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

10

Step 1: Social Data ExtractionCrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

11

Step 1: Social Data Extraction

Extraction

Source

Heuristics

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

12

Step 1: Social Data Extraction

Extraction

Source

Heuristics

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

13

Step 1: Social Data Extraction

Extraction

Source

Heuristics

ContentUser

Geo

Content+Geo

#icities2015#democrats

#traffic

@barack_obama@comunepalermo

#earthquake

Page

Group

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

14

Step 1: Social Data Extraction

Extraction

Source

Heuristics

ContentUser

Geo

Content+Geo

#www2015#democrats

#traffic

@barack_obama@comunefi

#earthquake

Page

GroupWe only extract public content

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

15

Use Case

Heuristics: Twitter content- 76 intolerant seed terms, defined by the psychologists teams - 5 intolerance dimensions: violence (against women), racism,

homophobia, disability, anti-semitism

CROWDPULSE SETTINGS

The Italian Hate Map

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

16

Use Case

Extracted content (seed term: nano/midget)

Tweet about an Italian ministry

CROWDPULSE SETTINGS

Tweet about iPod nano

Tweet about an Italian football player

The Italian Hate Map

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

17

Use Case

Tweet about an Italian ministry

CROWDPULSE SETTINGS

Tweet about iPod nano

Tweet about an Italian football player

The Italian Hate Map

Many non-intolerant Tweets are extracted!

XX

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

18

Use Case

Sentiment Analysis and Semantic Tagging of the content

CROWDPULSE SETTINGS

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

The Italian Hate Map

Keyword-based representation introduces a lot of noise in the analysis

nano

?

(midget)

(ipod nano)

Semantic TaggingMotivations

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015 19

“E’inutile, il mio nano non segnerà mai”

?

Semantic TaggingMotivations

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

INTOLERANTNOT INTOLERANT?

20

• Entity Linking Algorithms• Input: textual content • Output: identification and

disambiguation of the entities mentioned in the text.

(1) http://tagme.di.unipi.it

(2) http://spotlight.dbpedia.org

21

Step 2: Semantic Tagging

Solution: semantic processing of extracted content

Algorithms

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

22

Use Case

Non-intolerant Tweets are detected and filtered out.

CROWDPULSE SETTINGS

The Italian Hate Map

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

23

CrowdPulseStep 3: Sentiment Analysis

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

24

Sentiment AnalysisMotivations

Is this content conveying any opinion?

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

25

Sentiment AnalysisMotivations

Is this content conveying any opinion?

This is a crucial issue if people-based findings have to be generated

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

26

Sentiment AnalysisDefinition

“It is the field of study that analyzes people’s

opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as

products, services, organizations, individuals, issues, events, topics, and

their attributes “ (*)

(Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008)

We concentrated on the polarity detection taskCataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

27

Use Case

Tweets with positive or neutral sentiment are detected and filtered out.

CROWDPULSE SETTINGS

The Italian Hate Map

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

28

Use CaseCROWDPULSE SETTINGS

The Italian Hate Map

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

29

CrowdPulseStep 4: Processing

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

30

Use Case

We have to build a map, so we only need geotagged content

CROWDPULSE SETTINGS

The Italian Hate Map

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

31

Use CaseCROWDPULSE SETTINGS

The Italian Hate Map

Definition of heuristics to increase the number of geotagged Tweets

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

32

Use CaseThe Italian Hate Map

Dimension #Tweets #Geo %Geo

Homophobia 110,774 8,501 7,66%

Racism 154,170 1,940 1,24%

Violence 1,102,494 28,886 2,62%

Disability 479,654 3,410 0,75%

Anti-Semitism 6,000 1,150 18,03%

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

33

CrowdPulseStep 4: Data Visualization

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

34

Use CaseCROWDPULSE OUTPUT

The Italian Hate Map

Violence against women Disability

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

based on OpenStreetMap

35

Use CaseCROWDPULSE OUTPUT

The Italian Hate Map

Racism Homophobia

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

based on OpenStreetMap

Conclusions

36

Crowdsourcing-based approach

Social content containing the seed terms is extracted and processed in

real-time

Semantic Processing exploited to delete non-intolerant

Tweets

Sentiment Analysis

used to filter out Tweet with irony

1. 2.

3. 4. Analytics Console used to build real-time hate

maps

Almost 2,000,000 social content extracted and analyzed.

The Italian Hate Map

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Lessons Learned

37Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

38

Lessons LearnedThe Italian Hate Map

Given the maps and given the output of the linguistic analysis of intolerant Tweets (co-occurrences between terms,

time lapse, etc.), the psychologists team defined some guidelines to tackle and prevent intolerant behaviors.

These guidelines have been freely distributed to public administration on early 2015.

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Lessons Learned

39

Pipeline of state of the art techniquesSemantic Processing, Sentiment Analysis, Machine Learning, Data Visualization

Use Case: The Italian Hate Map

DEFINITION OF A FRAMEWORK FOR REAL-TIME SEMANTIC CONTENT ANALYSIS

Thanks to the huge availability of textual data very complex

phenomena can be analyzed in a totally new way

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

questions?Cataldo Musto, PhD

cataldo.musto@uniba.it @cataldomusto

http://www.di.uniba.it/~swap

top related