mention-anomaly-based event detection and tracking in twitter adrien guille & cécile favre eric...

29
Mention-anomaly- based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014, Beijing, China August 20, 2014

Upload: diego-woodley

Post on 01-Apr-2015

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

Mention-anomaly-based Event

Detection and Tracking in

Twitter

Adrien Guille & Cécile FavreERIC Lab, University of Lyon 2,

FranceIEEE/ACM ASONAM 2014, Beijing,

ChinaAugust 20,

2014

Page 2: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

2

What is Twitter & why study it?

Twitter: micro-blogging service 140-character messages

Ever growing number of Twitter users Pro: Timely source of information Con: Information overload

How can we use Twitter for automated event detection and tracking?

August 20, 2014

Page 3: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

3

Related Work

Idea: spot bursty patterns Term-weighting-based approaches

Peaky Topics [Shamma11], Trending Score [Benhardus13]

Possible ambiguity, lack of context Topic-modeling-based approaches

On-line LDA [Lau12], ET-LDA [Yuheng12] Lack of scalability

Clustering-based approaches EDCoW [Weng11], TwEvent [Li12], ET [Parikh13] Noisy event descriptions

August 20, 2014

Page 4: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

4

Issues & Proposal

August 20, 2014

Shortcomings of existing methods Event duration is a fixed parameter Only the textual content of tweets is considered

We propose a novel approach and method that Dynamically estimate each event duration Exploit the social aspect of tweet streams through mentions

Page 5: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

5

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

Proposed Method

August 20, 2014

Page 6: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

6

Problem Formulation

Input Corpus C containing N

tweets partitioned into n time-slices

Vocabularies V and V@

Output The k most impactful events

August 20, 2014

Event: A bursty topic and a value Mag translating its magnitude of impact

Bursty Topic: A time interval I, a main term t, a set S of weighted related terms

Page 7: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

7

Overview of the proposed method

August 20, 2014

Two-phase flow 1: Analyse the mention

frequency of each word in V@ to detect events (Mag,I,t,Ø)

2: Select related words and generating the final list of the k most impactful events while controling redundancy

MABED, Mention-Anomaly-Based Event Detection

Page 8: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

8

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

PHASE 1

Proposed Method

August 20, 2014

Page 9: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

9

Detecting Events with Mention Anomaly

August 20, 2014

Computing the anomaly at a point i for word t Requires computing the expected volume

of tweets containing at least one mention and t, at i

Normal distribution: Expectation: Anomaly:

Measuring the magnitude of impact Integrating anomaly:

Page 10: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

10

Detecting Events with Mention Anomaly

August 20, 2014

For each word t in V@

Solve a « Maximum Contiguous Subsequence Sum » type of problem:

Eventually, each event is described by A main word t A period of time I The magnitude of its impact Mag

Page 11: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

11

Detecting Events with Mention Anomaly

August 20, 2014

Example

Page 12: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

12

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

PHASE 2

Proposed Method

August 20, 2014

Page 13: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

13

Selecting Words Describing Events

August 20, 2014

Identifying candidate words Set of p words that co-occur the most with t

during I Selecting the most

relevant words Measure the

similarity between candidate words and the main word frequency [Erdem12]

Apply a threshold θ

Page 14: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

14

Selecting Words Describing Events

August 20, 2014

Example

Page 15: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

15

Generating the List of Top k Events

August 20, 2014

Event graph & redundancy graph

Detecting duplicated events Connectivity of main terms in the event graph Overlap between intervals, threshold σ

Merging duplicated events Identifying connected components in the

redundancy graph

Page 16: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

16

Generating the List of Top k Events

August 20, 2014

Example

Page 17: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

17

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

Evaluation

August 20, 2014

Page 18: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

18

Experimental Setup

August 20, 2014

Corpora C(en): 1,437,126 tweets published in

November 2009 C(fr): 2,086,136 tweets published in March

2012 Baselines for comparison

Trending Score (TS) [Benhardus13] and ET [Parikh13]

α-MABED Parameter setting

(α-)MABED: 30-min time-slices, p=10, θ=0.7, σ=0.5

Trending Score, ET: 1-day time-slices

Page 19: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

19

Evaluation Metrics

August 20, 2014

Manual annotation Two human annotators judging the significancy

of the top 40 events detected by each method (κ = 0.72)

Precision Significant events / All detected events

Recall Distinct significant events / All detected events

DERate [Li12] Duplicated events / Significant events

Page 20: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

20

Quantitative Evaluation

August 20, 2014

Performance of the five methods on the two corpora

Page 21: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

21

Quantitative Evaluation

August 20, 2014

Impact of σ on MABED

Page 22: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

22

Qualitative Evaluation

August 20, 2014

Improved readability Excerpt of the list of events detected in C(en) by MABED

Page 23: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

23

Qualitative Evaluation

August 20, 2014

Improved temporal precision & reduced redundancy

Importance of dynamically estimating events duration Politics-related events

tend to be discussed longer [Romero11]

Page 24: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

24

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

Included in the open-source social media data mining tool SONDY [Guille13]

http://mediamining.univ-lyon2.fr/people/guille/mabed.php

Implementation

August 20, 2014

Page 25: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

25

Time-oriented Interface

August 20, 2014

Page 26: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

26

Impact-oriented Interface

August 20, 2014

Page 27: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

27

Topic-oriented Interface

August 20, 2014

Page 28: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

28

Conclusion & Future Work

August 20, 2014

Propose a novel approach and method for detecting events in Twitter

Verified hypothesis Considering mentions helps detecting significant

events Experimental results on two different datasets

demonstrate the accuracy and the robustness of the proposed method

Future work More features to model discussions between

users

Page 29: Mention-anomaly-based Event Detection and Tracking in Twitter Adrien Guille & Cécile Favre ERIC Lab, University of Lyon 2, France IEEE/ACM ASONAM 2014,

A. Guille & C. Favre: Mention-Anomaly-Based Event Detection in Twitter

29

References

August 20, 2014

[Shamma11] D. A. Shamma, L. Kennedy, and E. F. Churchill, “Peaks and persistence: modeling the shape of microblog conversations,” in CSCW, 2011

[Benhardus13] J. Benhardus and J. Kalita, “Streaming trend detection in twitter,” IJWBC, vol. 9, no. 1, 2013

[Lau12] J. H. Lau, N. Collier, and T. Baldwin, “On-line trend analysis with topic models: #twitter trends detection topic model online,” in COLING, 2012

[Yuheng12] H.Yuheng, J.Ajita, D.S.Dorée, and W.Fei, “What were the tweets about? topical associations between public events and twitter feeds,” in ICWSM, 2012

[Weng11] J. Weng and B.-S. Lee, “Event detection in twitter,” in ICWSM, 2011

[Li12] C. Li, A. Sun, and A. Datta, “Twevent: Segment-based event detection from tweets,” in CIKM, 2012

[Parikh13] R. Parikh and K. Karlapalem, “Et: events from tweets,” in companion WWW, 2013

[Erdem12] O. Erdem, E. Ceyhan, and Y. Varli, “A new correlation coefficient for bivariate time-series data,” in MAF, 2012

[Guille13] A. Guille, C. Favre, H. Hacid, and D. Zighed, “Sondy: An open source platform for social dynamics mining and analysis,” in SIGMOD, 2013