event detection via lda for the mediaeval2012 sed task

14
Thursday, 4 October 2012 MediaEval2012 Social Event Detection Task Konstantinos N. Vavliakis Fani A. Tzima Pericles A. Mitkas Event Detection via LDA for the MediaEval2012 SED Task Information Technologies Institutes Centre for Research and Technology - Hellas Electrical and Computer Engineering Department Aristotle University of Thessaloniki Intelligent Systems and Software Engineering Labgroup http://issel.ee.auth.gr

Upload: mediaeval2012

Post on 02-Dec-2014

658 views

Category:

Documents


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Event Detection via LDA for the MediaEval2012 SED Task

Thursday, 4 October 2012

MediaEval2012 Social Event Detection Task

Konstantinos N. VavliakisFani A. Tzima

Pericles A. Mitkas

Event Detection via LDA for the MediaEval2012 SED TaskEvent Detection via LDA for the MediaEval2012 SED Task

Information Technologies InstitutesCentre for Research and Technology - Hellas

Electrical and Computer Engineering Department

Aristotle University of Thessaloniki

Intelligent Systems and Software Engineering Labgrouphttp://issel.ee.auth.gr

Page 2: Event Detection via LDA for the MediaEval2012 SED Task

MediaEval2012 Social Event Detection Task

04/10/2012 2

Goal: Discover social events

3 Challenges:1.Find technical events in Germany2.Find all soccer events in Hamburg

(Germany) and Madrid (Spain) 3.Find demonstration and protest events of

the Indignados movement in Madrid

Social Event Detection at MediaEval 2012

Page 3: Event Detection via LDA for the MediaEval2012 SED Task

MediaEval2012 Social Event Detection Task

04/10/2012 3

City Classifier

Clean Text

(remove stop

words/html tags)

Translate (using Google

Translate)

Stemming

(Porter stemmer)

City Classifier

(tf-idf for each city)

Identify Topics (per city, using LDA)

Select Relevant Topics

Identify Events

(by detecting peaks)

Merge Events

(of consecutive

days)

Split Events

(by location)

Manually Create Topics

Pre-processing

Topic Identification Event Detection

Event Optimization

Methodology

Page 4: Event Detection via LDA for the MediaEval2012 SED Task

MediaEval2012 Social Event Detection Task

04/10/2012 4

Preprocessing

Clean text by removing html tags and stop words

Translate non-English words

Perform stemming using the Porter Stemmer

Title Cleaned Title English Title Stemmed i-wall wall wall wall2009...Pallasso trist // Sad Clown pallasso trist sad clown clown sad sad clown clown sad sad clownConjunt Monumental de Sant Pere de Terrassa

conjunt monumental sant pere terrassa

set monumental sant pere terrassa

set monument sant pere terrassa

Seagull in the port seagull port seagull port seagul portWinter doesn't affect the small land of the gnomes - 9/365

winter doesn affect small land gnomes

winter doesn affect small land gnomes

winter doesn affect small land gnome

Jan-09 january january januariTidy chaos - 3/365 tidy chaos tidy chaos tidi chao

E.g.:

Page 5: Event Detection via LDA for the MediaEval2012 SED Task

MediaEval2012 Social Event Detection Task

04/10/2012 5

City Classification5 cities TF-IDF values of the terms for each city Classified photos according to maximum TF-IDF aggregated valueUsers: Users can not be in more than 2 cities in the same day User statisticsResults: 4149 non classified photos Very good results for city classification, excellent at country level

Page 6: Event Detection via LDA for the MediaEval2012 SED Task

MediaEval2012 Social Event Detection Task

04/10/2012 6

Topic Identification

Extract Topicsusing LDA with Gibbs

Sampling

Select Relevant Topics

Manually Create Topics

Photos of a City

Examples of LDA topics:

Concept Participation in Topic

sol 0.1544spanish 0.1116revolution 0.1050acampada 0.0983puerta 0.0262mayo 0.0243manifestación 0.0217….

Page 7: Event Detection via LDA for the MediaEval2012 SED Task

MediaEval2012 Social Event Detection Task

04/10/2012 7

Topic Selection

Extract Topicsusing LDA with Gibbs

Sampling

Select Relevant Topics

Manually Create Topics

Photos of a City

Each photos belongs to many topics Select photos containing “indignados” or

“acampa” and sum their values per topic E.g.:

PhotoID Topic Participation in Topic

5776147261 7 0.72

5776147261 14 0.125776147261 21 0.085776147261 6 0.025776147261 25 0.01….

Topic Sum18 456.5849 223.470 27.131 24.1722 23.39

….

Page 8: Event Detection via LDA for the MediaEval2012 SED Task

MediaEval2012 Social Event Detection Task

04/10/2012 8

Event Detection & Optimization

Event Detection Find photos of selected topics Count photos per day If higher than a threshold add them to a

new event

Event Optimization Merge events happening in consecutive

days Split events by geolocation distance

Page 9: Event Detection via LDA for the MediaEval2012 SED Task

MediaEval2012 Social Event Detection Task

04/10/2012 9

Selected/Total Topics:

2/50

Selected/Total Topics:

6/50

Selected/Total Topics:

8/50

ManualTopic

ManualTopic

0

10

20

30

40

50

60

70

80

90

100

80.98

40.52 35.85

76.29

63.35

31.1

26.2625.31

84.58

0.16

0.724

0.578

94.9

50.98

Precision Recall F-Measure NMI

Results - C1: Technical events in Germany

Page 10: Event Detection via LDA for the MediaEval2012 SED Task

MediaEval2012 Social Event Detection Task

04/10/2012 10

Selected/Total Topics:

1/50

Selected/Total Topics:1/100

Selected/Total Topics:1/100

ManualTopic

ManualTopic

0

10

20

30

40

50

60

70

80

90

100

75.72

86.67 91.21 88.18 88.18

77.67 81.78 84

90.76

0.7680.85

0.847

93.49 93.49Precision Recall F-Measure NMI

Results – C2: Soccer Events in Hamburg/Madrid

Page 11: Event Detection via LDA for the MediaEval2012 SED Task

MediaEval2012 Social Event Detection Task

04/10/2012 11

Selected/Total Topics:5/100

Selected/Total Topics:5/100

Selected/Total Topics:

3/50

ManualTopic

ManualTopic

0

10

20

30

40

50

60

70

80

90

100

88.53 90.76 86.59 88.91 88.9184.29 86.11

85.38 89.83

0.33

73.8

0.347

90.78 90.78Precision Recall F-Measure NMI

Results – C3: Protest Events of Indignados

Page 12: Event Detection via LDA for the MediaEval2012 SED Task

MediaEval2012 Social Event Detection Task

04/10/2012 12

Conclusions

Effective and generalized methodology

The selection of topics is the key

Topics created by LDA close to manual topic’s

results

Really good precision

Stemming may improve (slightly) the results

Problems in “vague” topics

Page 13: Event Detection via LDA for the MediaEval2012 SED Task

MediaEval2012 Social Event Detection Task

04/10/2012 13

Relevant and Future Work

Automatically detect all events from a dataset

using detected topics

Dynamic merging of topics

The concept of important event

is socially defined -> Personalized detection

Page 14: Event Detection via LDA for the MediaEval2012 SED Task

MediaEval2012 Social Event Detection Task

04/10/2012 14

Thank You!

Email:

[email protected]