event clusters detection on flickr images using a suffix-tree structure

93
1 Event Cluster Detection on Flickr Images using a Suffix-Tree Structure Massimiliano Ruocco and Heri Ramampiaro Dept. Of Computer and Information Science Norwegian University of Science and Technology [email protected] Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Upload: massimiliano-ruocco

Post on 23-Jan-2015

616 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

1

Event Cluster Detection on Flickr Images using a Suffix-Tree Structure

Massimiliano Ruocco and Heri Ramampiaro

Dept. Of Computer and Information Science Norwegian University of Science and Technology

[email protected]

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Page 2: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

2

Outline

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Page 3: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

3

Outline

1.  Introduction 1.  Problem Statement 2.  Related Works 3.  Contributions

2.  Proposed approach 1.  Problem definition 2.  Preliminary 3.  Algorithm Overview

3.  Evaluation 4.  Conclusions

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Page 4: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

4

Problem Statement

Event Detection

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Page 5: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

5

Problem Statement

Event Detection

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Event detection topic has its origin from the TDT (Topic Detection and Tracking) project(1):

(1) http://projects.ldc.upenn.edu/TDT/!

Page 6: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

6

Problem Statement

Event Detection

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Event detection topic has its origin from the TDT (Topic Detection and Tracking) project(1):

-  Objective: aggregate stories over time into single event topic

(1) http://projects.ldc.upenn.edu/TDT/!

Page 7: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

7

Problem Statement

Event Detection

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Event detection topic has its origin from the TDT (Topic Detection and Tracking) project(1):

-  Objective: aggregate stories over time into single event topic

(1) http://projects.ldc.upenn.edu/TDT/!

Something happening in a certain place at a certain time [Yang, Pierce, Carbonell 1999]

Page 8: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

8

Problem Statement

Event Detection

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Page 9: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

9

Problem Statement

Event Detection

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Most previous works focus on time-tagged document streams can be classified as:

Page 10: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

10

Problem Statement

Event Detection

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Most previous works focus on time-tagged document streams can be classified as:

-  Retrospective Detection : discover unidentified events in a collection of news [Yang et al. 1998]

Page 11: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

11

Problem Statement

Event Detection

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Most previous works focus on time-tagged document streams can be classified as:

-  Retrospective Detection : discover unidentified events in a collection of news [Yang et al. 1998]

-  Online Detection : detect events in real-time from a stream of news [Brants et al. 2003]

Page 12: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

12

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Problem Statement

Web Photo-Sharing Apps – New Needs

Page 13: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

13

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Huge Amount of Pictures

Problem Statement

Web Photo-Sharing Apps – New Needs

Page 14: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

14

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Huge Amount of Pictures

Time!User!Location!Tags!

26 Oct 2010 RMax

26:12, 23:14 Roma, Sky, Bridge

…!

Problem Statement

Web Photo-Sharing Apps – New Needs

Page 15: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

15

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Huge Amount of Pictures

Time!User!Location!Tags!

26 Oct 2010 RMax

26:12, 23:14 Roma, Sky, Bridge

…!

New Needs

Knowledge Extraction

Browse

Retrieve

Problem Statement

Web Photo-Sharing Apps – New Needs

Page 16: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

16

Problem Statement

Challenges

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Page 17: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

17

Problem Statement

Challenges

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Event detection on Tagged Picture from Photo-Sharing Apps -  Web-scale environment -  Use of contextual information -  Noisy annotation

Page 18: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

18

Problem Statement

Challenges

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Event detection on Tagged Picture from Photo-Sharing Apps -  Web-scale environment -  Use of contextual information -  Noisy annotation

Page 19: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

19

Related Works

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Page 20: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

20

Related Works

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Event Clustering (Visual/Temporal information) [Loui, Savakis 2002]

-  Albuming user photo collections

-  Not scalable to large dataset!

-  Limited to user photo collection! -  No Locational Information!

Page 21: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

21

Related Works

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Event Clustering (Visual/Temporal information) [Loui, Savakis 2002]

-  Albuming user photo collections

-  Not scalable to large dataset!

-  Limited to user photo collection! -  No Locational Information!

-  Event/Place Semantic Identification (Temporal information) [Rattenbury et al. 2007]

-  Extraction of event and place semantics for tags assigned to Flickr photos

-  Scale-Structure Identification (SSI) method to analyze the tag usage distribution

-  SSI is limited for large dataset!

-  Location information is not considered!

Page 22: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

22

Related Works

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Event Clustering (Visual/Temporal information) [Loui, Savakis 2002]

-  Albuming user photo collections

-  Not scalable to large dataset!

-  Limited to user photo collection! -  No Locational Information!

-  Event/Place Semantic Identification (Temporal information) [Rattenbury et al. 2007]

-  Extraction of event and place semantics for tags assigned to Flickr photos

-  Scale-Structure Identification (SSI) method to analyze the tag usage distribution

-  SSI is limited for large dataset!

-  Location information is not considered!

-  Event Tag Detection (Spatial/Temporal information) [Chen, Roy 2009] -  Detect event tags from Flickr photos

-  As [Rattenbury et al. 2007] use SSI method to analyze the tag usage distribution

-  SSI is used over locational and spatial distributions simultaneously

Page 23: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

23

Problem Definition

Hypothesis

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Page 24: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

24

Problem Definition

Hypothesis

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Something happening in a certain place at a certain time [Yang, Pierce, Carbonell 1999]

Page 25: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

25

Problem Definition

Hypothesis

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Something happening in a certain place at a certain time [Yang, Pierce, Carbonell 1999]

Something happening in a certain place at a certain time with a certain tag

Page 26: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

26

Problem Definition

Hypothesis

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Something happening in a certain place at a certain time [Yang, Pierce, Carbonell 1999]

Something happening in a certain place at a certain time with a certain tag

Event Cluster ej {tj=tj, dti=dtj, gi=gj, Ii,Ij ek }

Page 27: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

27

Problem Definition

Hypothesis

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Something happening in a certain place at a certain time [Yang, Pierce, Carbonell 1999]

Something happening in a certain place at a certain time with a certain tag

Event Cluster ej {tj=tj, dti=dtj, gi=gj, Ii,Ij ek } Not the opposite !

Page 28: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

28

Problem Definition

Hypothesis – Landmark clusters

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

colosseo!g

Location Event Cluster ek

{tj=tj, dti=dtj, gi=gj, Ii,Ij ek }

Page 29: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

29

Problem Definition

Hypothesis – Landmark clusters

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

Location

colosseo!g

dt

Event Cluster ek

{tj=tj, dti=dtj, gi=gj, Ii,Ij ek }

Page 30: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

30

Problem Definition

Hypothesis – Landmark clusters

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

Location

colosseo!g

dt

Event Cluster ek {tj=tj, dti=dtj, gi=gj, Ii,Ij ek } Not the opposite !

Event Cluster ek

Landmark Clusters

{tj=tj, dti=dtj, gi=gj, Ii,Ij ek }

Page 31: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

31

Problem Definition

Hypothesis – Event clusters

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

Location

g

Page 32: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

32

Problem Definition

Hypothesis – Event clusters

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

Location

g

dt

Page 33: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

33

Problem Definition

Hypothesis – Event clusters

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

Location

g

dt

applepies!

Page 34: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

34

Problem Definition

Hypothesis – Event clusters

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

Location

g

dt

applepies!

Event Cluster ek {tj=tj, dti=dtj, gi=gj, Ii,Ij ek }

Event Cluster ek

Landmark Clusters

Event Clusters

{tj=tj, dti=dtj, gi=gj, Ii,Ij ek }

Page 35: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

35

Problem Definition

Hypothesis – Event clusters

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

Location

g

dt

applepies!

Event Cluster ek {tj=tj, dti=dtj, gi=gj, Ii,Ij ek } The opposite is true !

Event Cluster ek

Landmark Clusters

Event Clusters

{tj=tj, dti=dtj, gi=gj, Ii,Ij ek }

Page 36: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

36

Problem Definition

Hypothesis – Event clusters

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

Location

g applepies!

Event Cluster ek {tj=tj, dti=dtj, gi=gj, Ii,Ij ek } The opposite is true !

{tj=tj, dti=dtj, gi=gj, Ii,Ij ek }

Event Cluster ek

Landmark Clusters

Event Clusters

Page 37: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

37

Problem Definition

New Formulation

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

Location

g applepies!

time

g applepies!

dt

Event Cluster ek Event

Clusters

Location

=

Sdgt Sgt

Page 38: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

38

Problem Definition

New Formulation

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

Location

g applepies!

time

g applepies!

dt

Event Cluster ek { (dt, g, t) : Sdgt = Sgt} with (Sdgt = ek)

Event Cluster ek Event

Clusters

{ (dt, g, t) : Sdgt = Sgt} with (Sdgt = ek)

Location

=

Sdgt Sgt

Page 39: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

39

Preliminary

Suffix-Tree Clustering [Zamir 1998]

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Page 40: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

40

Preliminary

Suffix-Tree Clustering [Zamir 1998]

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Suffix-Tree based

Page 41: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

41

Preliminary

Suffix-Tree Clustering [Zamir 1998]

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Suffix-Tree based -  Mainly used in text (web) document clustering

Page 42: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

42

Preliminary

Suffix-Tree Clustering [Zamir 1998]

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Suffix-Tree based -  Mainly used in text (web) document clustering -  Three step process:

1  Document cleaning 2  Base clusters identification 3  Base clusters merging

Page 43: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

43

Preliminary

Suffix-Tree Clustering [Zamir 1998]

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Suffix-Tree based -  Mainly used in text (web) document clustering -  Three step process:

1  Document cleaning 2  Base clusters identification 3  Base clusters merging

-  Incremental clustering

Page 44: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

44

Preliminary

Suffix-Tree Clustering [Zamir 1998]

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Suffix-Tree based -  Mainly used in text (web) document clustering -  Three step process:

1  Document cleaning 2  Base clusters identification 3  Base clusters merging

-  Incremental clustering -  Cluster label inferred by the tree structure

Page 45: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

45

Preliminary

Suffix-Tree Clustering [Zamir 1998]

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Suffix-Tree based -  Mainly used in text (web) document clustering -  Three step process:

1  Document cleaning 2  Base clusters identification 3  Base clusters merging

-  Incremental clustering -  Cluster label inferred by the tree structure -  Phrase-Based model

Page 46: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

46

Preliminary

Suffix-Tree Clustering [Zamir 1998]

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Suffix-Tree based -  Mainly used in text (web) document clustering -  Three step process:

1  Document cleaning 2  Base clusters identification 3  Base clusters merging

-  Incremental clustering -  Cluster label inferred by the tree structure -  Phrase-Based model -  Snippet-tolerant

Page 47: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

47

Preliminary

Suffix-Tree Clustering [Zamir 1998]

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Suffix-Tree based -  Mainly used in text (web) document clustering -  Three step process:

1  Document cleaning 2  Base clusters identification 3  Base clusters merging

-  Incremental clustering -  Cluster label inferred by the tree structure -  Phrase-Based model -  Snippet-tolerant -  Overlapped clusters

Page 48: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

48

Preliminary

Suffix-Tree

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Page 49: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

49

Preliminary

Suffix-Tree

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Given a string S suffix-tree is a Compact Trie containing all the suffixes of S

-  Rooted directed tree -  Each internal node other than root has at least two children -  Each edge leaving a particular node is labelled with a non-empty

substring of S

Page 50: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

50

Preliminary

Suffix-Tree

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Given a string S suffix-tree is a Compact Trie containing all the suffixes of S

-  Rooted directed tree -  Each internal node other than root has at least two children -  Each edge leaving a particular node is labelled with a non-empty

substring of S

Papua ‘apua’ ‘pua’ ‘ua’ ‘a’

Page 51: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

51

Preliminary

Suffix-Tree

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Given a string S suffix-tree is a Compact Trie containing all the suffixes of S

-  Rooted directed tree -  Each internal node other than root has at least two children -  Each edge leaving a particular node is labelled with a non-empty

substring of S

-  Suffix-Tree construction performs in linear time (O(n)) ([Ukkonen 1995])

Papua ‘apua’ ‘pua’ ‘ua’ ‘a’

Page 52: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

52

Algorithm Overview

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Page 53: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

53

Algorithm Overview

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Suffix Tree Construction

Event clusters extraction

Event Clusters merge

Data cleaning Data extension

… Primary!Party!Election!Campaign!

… Concert!Music!John! …

Ii = (T, g, dt)

Page 54: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

54

Algorithm Overview Data Cleaning and Extension

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Page 55: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

55

Algorithm Overview Data Cleaning and Extension

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Cleaning: Ii = (T,g,dt) Ii’ = (T’,g,dt) -  Stopword removal (with extended vocabulary) + Stemming

Page 56: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

56

Algorithm Overview Data Cleaning and Extension

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Cleaning: Ii = (T,g,dt) Ii’ = (T’,g,dt) -  Stopword removal (with extended vocabulary) + Stemming

-  Extension: Ii’ = (T’,g,dt) Ii’’ = (T’’,g,dt) -  Spatial and Temporal information are encoded in the annotation set T

Page 57: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

57

Algorithm Overview Data Cleaning and Extension

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Cleaning: Ii = (T,g,dt) Ii’ = (T’,g,dt) -  Stopword removal (with extended vocabulary) + Stemming

-  Extension: Ii’ = (T’,g,dt) Ii’’ = (T’’,g,dt) -  Spatial and Temporal information are encoded in the annotation set T

T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

where s1 and s2 encoding function from date/location to string

Page 58: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

58

Algorithm Overview Data Cleaning and Extension

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Cleaning: Ii = (T,g,dt) Ii’ = (T’,g,dt) -  Stopword removal (with extended vocabulary) + Stemming

-  Extension: Ii’ = (T’,g,dt) Ii’’ = (T’’,g,dt) -  Spatial and Temporal information are encoded in the annotation set T

T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

where s1 and s2 encoding function from date/location to string

s1 and s2 define the granularity in space (geographical grid) and time

Page 59: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

59

Algorithm Overview Data Cleaning and Extension

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Cleaning: Ii = (T,g,dt) Ii’ = (T’,g,dt) -  Stopword removal (with extended vocabulary) + Stemming

-  Extension: Ii’ = (T’,g,dt) Ii’’ = (T’’,g,dt) -  Spatial and Temporal information are encoded in the annotation set T

T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

where s1 and s2 encoding function from date/location to string

acmm2010 florence multimedia

26Oct2010 43.77:11.24 acmm2010 26Oct2010 43.77:11.24 florence 26Oct2010 43.77:11.24 multimedia

s1 and s2 define the granularity in space (geographical grid) and time

Page 60: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

60

Algorithm Overview Data Cleaning and Extension

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Cleaning: Ii = (T,g,dt) Ii’ = (T’,g,dt) -  Stopword removal (with extended vocabulary) + Stemming

-  Extension: Ii’ = (T’,g,dt) Ii’’ = (T’’,g,dt) -  Spatial and Temporal information are encoded in the annotation set T

T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

where s1 and s2 encoding function from date/location to string

acmm2010 florence multimedia

26Oct2010 43.77:11.24 acmm2010 26Oct2010 43.77:11.24 florence 26Oct2010 43.77:11.24 multimedia

s1 and s2 define the granularity in space (geographical grid) and time

s1(26/10/2010) s2(43.777864,11.249029)

T’ T’’

Page 61: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

61

Algorithm Overview ST Construction and Event Extraction

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Page 62: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

62

Algorithm Overview ST Construction and Event Extraction

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Image Ii’’ : document snippet

Ψl

Ψ’l

Ii’’ = (T’’,g,dt) T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

Page 63: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

63

Algorithm Overview ST Construction and Event Extraction

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Image Ii’’ : document snippet

-  Extract Candidate event clusters Ψl : -  Ψl ([s1(dt) + s2(g) + ti ])

Ψl

Ψ’l

Ii’’ = (T’’,g,dt) T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

Page 64: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

64

Algorithm Overview ST Construction and Event Extraction

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Image Ii’’ : document snippet

-  Extract Candidate event clusters Ψl : -  Ψl ([s1(dt) + s2(g) + ti ])

Ψl

Ψ’l

Ii’’ = (T’’,g,dt) T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

Event Cluster ek { (dt, g, t) : Sdgt = Sgt} with (Sdgt = ek)

Page 65: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

65

Algorithm Overview ST Construction and Event Extraction

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Image Ii’’ : document snippet

-  Extract Candidate event clusters Ψl : -  Ψl ([s1(dt) + s2(g) + ti ])

-  Extract Ψ’l ([s2(g) + ti ])

Ψl

Ψ’l

Ii’’ = (T’’,g,dt) T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

Event Cluster ek { (dt, g, t) : Sdgt = Sgt} with (Sdgt = ek)

Page 66: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

66

Algorithm Overview ST Construction and Event Extraction

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Image Ii’’ : document snippet

-  Extract Candidate event clusters Ψl : -  Ψl ([s1(dt) + s2(g) + ti ])

-  Extract Ψ’l ([s2(g) + ti ]) -  Compare Ψl and Ψ’l

Ψl

Ψ’l

Ii’’ = (T’’,g,dt) T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

Event Cluster ek { (dt, g, t) : Sdgt = Sgt} with (Sdgt = ek)

Page 67: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

67

Algorithm Overview ST Construction and Event Extraction

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Image Ii’’ : document snippet

-  Extract Candidate event clusters Ψl : -  Ψl ([s1(dt) + s2(g) + ti ])

-  Extract Ψ’l ([s2(g) + ti ]) -  Compare Ψl and Ψ’l -  IF (Ψl = Ψ’l) Ψl ([s1(dt) + s2(g) + ti ]) is event cluster

Ψl

Ψ’l

Ii’’ = (T’’,g,dt) T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

Event Cluster ek { (dt, g, t) : Sdgt = Sgt} with (Sdgt = ek)

Page 68: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

68

Algorithm Overview ST Construction and Event Extraction

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Image Ii’’ : document snippet

-  Extract Candidate event clusters Ψl : -  Ψl ([s1(dt) + s2(g) + ti ])

-  Extract Ψ’l ([s2(g) + ti ]) -  Compare Ψl and Ψ’l -  IF (Ψl = Ψ’l) Ψl ([s1(dt) + s2(g) + ti ]) is event cluster -  Label inferred from the structure

Ψl

Ψ’l

Ii’’ = (T’’,g,dt) T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

Event Cluster ek { (dt, g, t) : Sdgt = Sgt} with (Sdgt = ek)

Page 69: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

69

Algorithm Overview Extraction and Merge

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Ψl

Ψ’l

Page 70: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

70

Algorithm Overview Extraction and Merge

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Extracted event clusters : {e1, …,en}

Ψl

Ψ’l

Page 71: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

71

Algorithm Overview Extraction and Merge

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Extracted event clusters : {e1, …,en} -  Merge semantically similar cluster:

Ψl

Ψ’l

Page 72: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

72

Algorithm Overview Extraction and Merge

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Extracted event clusters : {e1, …,en} -  Merge semantically similar cluster:

Ψl

Ψ’l

θ(ei,e j ) =ei ∩ e jmin(ei,e j )

Page 73: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

73

Evaluation - Dataset

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Page 74: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

74

Evaluation - Dataset

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Dataset collected from Flickr -  Only geo-tagged picture -  12 June 2008 – 11 June 2010 (729 days) -  San Francisco Area

#Images ~ 350K #Tags ~ 3M

Page 75: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

75

Evaluation - Measure

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Page 76: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

76

Evaluation - Measure

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  List of ranked Clusters: {e1, e2, …}

Page 77: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

77

Evaluation - Measure

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  List of ranked Clusters: {e1, e2, …} -  Ranking according to cluster's size: |ei|

Page 78: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

78

Evaluation - Measure

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  List of ranked Clusters: {e1, e2, …} -  Ranking according to cluster's size: |ei| -  Drawback: lack of ground truth (recall measure)

Page 79: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

79

Evaluation - Measure

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  List of ranked Clusters: {e1, e2, …} -  Ranking according to cluster's size: |ei| -  Drawback: lack of ground truth (recall measure)

Top-K Precision :

Rk

KRk : relevant clusters in the first k returned

Page 80: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

80

Evaluation - Measure

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  List of ranked Clusters: {e1, e2, …} -  Ranking according to cluster's size: |ei| -  Drawback: lack of ground truth (recall measure)

Top-K Precision :

Rk

KRk : relevant clusters in the first k returned

Top-20 (K=20)

Page 81: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

81

Evaluation

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Experiment on different granularity in time and space -  Time:

-  Space: Latitude Precision Longitude

Precision Square Size

(Meters)

0.01 0.01 1000m X 1000m

0.005 0.005 500m X 500m

0.002 0.002 200m X 200m

0.001 0.001 100m X 100m

1 day 1 week

Example 2008Oct12 2008:43

Page 82: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

82

Evaluation - Results

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

100 m 200 m 500 m 1000 m

1 Day 1 Week 1 Day 1 Week 1 Day 1 Week 1 Day 1 Week

#Clusters #Ev. Prec. #Ev. Prec. #Ev. Prec. #Ev. Prec. #Ev. Prec. #Ev. Prec. #Ev. Prec. #Ev. Prec.

1 1 100% 1 100% 1 100% 1 100% 1 100% 1 100% 1 100% 1 100%

2 2 100% 2 100% 2 100% 2 100% 2 100% 2 100% 2 100% 1 50%

3 3 100% 3 100% 3 100% 3 100% 3 100% 3 100% 3 100% 2 67%

20 15 75% 14 70% 15 75% 14 70% 14 70% 13 65% 13 65% 14 70%

Page 83: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

83

Evaluation - Results

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Top-

20 p

reci

sion

Page 84: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

84

Conclusion

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Page 85: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

85

Conclusion

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Novel algorithm for event cluster extraction: -  from large amount of Flickr images -  Multi-user photo collection -  Incremental clustering algorithm

Page 86: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

86

Conclusion

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Novel algorithm for event cluster extraction: -  from large amount of Flickr images -  Multi-user photo collection -  Incremental clustering algorithm

-  Extension of STC previously used only to cluster text documents

Page 87: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

87

Conclusion

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Novel algorithm for event cluster extraction: -  from large amount of Flickr images -  Multi-user photo collection -  Incremental clustering algorithm

-  Extension of STC previously used only to cluster text documents -  Based on a Suffix-Tree (construction O(n))

Page 88: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

88

Conclusion

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Novel algorithm for event cluster extraction: -  from large amount of Flickr images -  Multi-user photo collection -  Incremental clustering algorithm

-  Extension of STC previously used only to cluster text documents -  Based on a Suffix-Tree (construction O(n)) -  Automatic annotation of clusters

Page 89: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

89

Conclusion

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Novel algorithm for event cluster extraction: -  from large amount of Flickr images -  Multi-user photo collection -  Incremental clustering algorithm

-  Extension of STC previously used only to cluster text documents -  Based on a Suffix-Tree (construction O(n)) -  Automatic annotation of clusters -  Noise reduction in the tag using extended vocabulary for stopword

removal

Page 90: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

90

Conclusion

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Novel algorithm for event cluster extraction: -  from large amount of Flickr images -  Multi-user photo collection -  Incremental clustering algorithm

-  Extension of STC previously used only to cluster text documents -  Based on a Suffix-Tree (construction O(n)) -  Automatic annotation of clusters -  Noise reduction in the tag using extended vocabulary for stopword

removal -  Spatial and Time information considered

Page 91: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

91

Conclusion

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Novel algorithm for event cluster extraction: -  from large amount of Flickr images -  Multi-user photo collection -  Incremental clustering algorithm

-  Extension of STC previously used only to cluster text documents -  Based on a Suffix-Tree (construction O(n)) -  Automatic annotation of clusters -  Noise reduction in the tag using extended vocabulary for stopword

removal -  Spatial and Time information considered -  Analysis of different granularity of time and space

Page 92: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

92

Thanks ( ) for the attention!

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

谢谢

http://www.idi.ntnu.no/~ruocco/

Page 93: Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

93

Thanks ( ) for the attention!

QUESTIONS?

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

谢谢

http://www.idi.ntnu.no/~ruocco/