image source [1] -...

49
Image Source [1]

Upload: others

Post on 19-Oct-2019

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Image Source [1]

Page 2: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Image Sources [1], [2]

Page 3: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

DocClustering Extension

26.01.2015

Hauptseminar Information Retrieval

Ramin Safarpour, Muhammad El-Hindi Image Source [2]

Page 4: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Future Work

Demo

Challenges

Concept & Tools

Motivation

Image Source [1] 4

Page 5: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Labeling A document is easy…

5 Image Sources [1], [3]

Page 6: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Labeling A document is easy…

How about

MANY?

6 Image Sources [1], [3]

Page 7: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Clustering: Reveals

inherent structure

7 Image Sources [1]

Page 8: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Another piece to the puzzle

MediaWiki

8 Image Sources [1]

Page 9: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Future Work

Demo

Challenges

Concept & Tools

Motivation

Image Source [1] 9

Page 10: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

10

Page 11: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Core

API

Front End

MediaWiki architecture

(Content processing, Users, Caching, DB)

(WebAPI, Client libraries)

(User Interface)

11 http://www.mediawiki.org/wiki/MediaWiki

Page 12: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Core

API

Front End

MediaWiki architecture

Hooks

API

Special Pages

(Content processing, Users, Caching, DB)

(WebAPI, Client libraries)

(User Interface)

12 http://www.mediawiki.org/wiki/MediaWiki

Page 13: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

IR-Modell

D1, D2, D3, D4, D5, …

Representation Vectors

Pre-processing

𝑽

V1, V2, V3, V4, V5, …

Documents 𝑫

Clustering D5, …

D6, …

D1, D2

D3, D4

Trans-formation

Clusters 𝑪𝒊 ⊆ 𝑫

13

Page 14: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

API API

DocClustering Extension

Core Hooks Front End

Special Pages

Php-NLP-Tools

Preprocessor Feature-Creator Classes

Cluster-Pages Special-Pages

DB-Connector

MediaWiki Layer

Extension Layer

Utils Layer

14

Page 15: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

“NlpTools

is a library for natural language processing written in php.”

php-nlp-tools.com

To

ke

niz

e

Stemm

Document- Representation

Sto

p-

Wo

rds

Clustering

An

aly

ze

15 http://php-nlp-tools.com/

Page 16: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

“NlpTools

is a library for natural language processing written in php.”

php-nlp-tools.com

To

ke

niz

e

Stemm

Document- Representation

Sto

p-

Wo

rds

Clustering

An

aly

ze

To

ke

niz

e

Stemm

Document- Representation

Sto

p-

Wo

rds

Clustering

An

aly

ze

Loose Coupling Loose Coupling

16 http://php-nlp-tools.com/

Page 17: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

http://php-nlp-tools.com/ 17

Page 18: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

IR-Modell

D1, D2, D3, D4, D5, …

Representation Vectors

Pre-processing

𝑽

V1, V2, V3, V4, V5, …

Documents 𝑫

Clustering D5, …

D6, …

D1, D2

D3, D4

Trans-formation

Clusters 𝑪𝒊 ⊆ 𝑫

18

Page 19: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Special Characters / Tags

Tokenization

Normalization

Stemming

Pre-Processing Stack

Stopwords

19 Image Source [1]

Page 20: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

IR-Modell

D1, D2, D3, D4, D5, …

Representation Vectors

Pre-processing

𝑽

V1, V2, V3, V4, V5, …

Documents 𝑫

Clustering D5, …

D6, …

D1, D2

D3, D4

Trans-formation

Clusters 𝑪𝒊 ⊆ 𝑫

20

Page 21: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Word Occu(w, doc1)

Tf(w, doc1)

Tfidf(w, doc1)

Player 5 5/9 0

Munich 3 3/9 0

Train 1 1/9 0

Parliament 0 0 0

∑ 9 1

Feature Creation

Term Occurrence

21

𝑜𝑐𝑐𝑢𝑟𝑤𝑜𝑟𝑑,𝑑𝑜𝑐 = 𝑓𝑟𝑒𝑞𝑤,𝑑

Term Frequency

𝑡𝑓𝑤𝑜𝑟𝑑,𝑑𝑜𝑐 = 𝑓𝑟𝑒𝑞𝑤,𝑑

𝑓𝑟𝑒𝑞𝑤,𝑑𝑤∈𝑑

TF-IDF

𝑡𝑓𝑖𝑑𝑓𝑤𝑜𝑟𝑑,𝑑𝑜𝑐 = 𝑡𝑓𝑤,𝑑 ∙ 𝑖𝑑𝑓𝑤

𝑖𝑑𝑓𝑤𝑜𝑟𝑑 = 𝑙𝑜𝑔𝑁

𝑛𝑤

Page 22: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Word Occu(w, doc1)

Tf(w, doc1)

Tfidf(w, doc1)

Player 5 5/9 5/9*log(2)

Munich 3 3/9 0

Train 1 1/9 0

Parliament 0 0 0

∑ 9 1

Feature Creation

Term Occurrence

22

𝑜𝑐𝑐𝑢𝑟𝑤𝑜𝑟𝑑,𝑑𝑜𝑐 = 𝑓𝑟𝑒𝑞𝑤,𝑑

Term Frequency

𝑡𝑓𝑤𝑜𝑟𝑑,𝑑𝑜𝑐 = 𝑓𝑟𝑒𝑞𝑤,𝑑

𝑓𝑟𝑒𝑞𝑤,𝑑𝑤∈𝑑

TF-IDF

𝑡𝑓𝑖𝑑𝑓𝑤𝑜𝑟𝑑,𝑑𝑜𝑐 = 𝑡𝑓𝑤,𝑑 ∙ 𝑖𝑑𝑓𝑤

𝑖𝑑𝑓𝑤𝑜𝑟𝑑 = 𝑙𝑜𝑔𝑁

𝑛𝑤

Occu(w, doc1)

Tf(w, doc1)

Tfidf(w, doc1)

0 0/10 0

5 5/10 0

2 2/10 0

3 3/10 3/10*log(2)

10 1 Do

cum

en

t 3

?

Page 23: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Tfidf(w, doc1)

?

?

?

?

Tfidf(w, doc1)

?

?

?

?

Stored Features

23

Word Occu(w, doc1)

Tf(w, doc1)

Player 5 5/9

Munich 3 3/9

Train 1 1/9

Parliament 0 0

∑ 9 1

Occu(w, doc1)

Tf(w, doc1)

0 0/10

5 5/10

2 2/10

3 3/10

10 1

Image Source [1]

Page 24: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Tfidf(w, doc1)

?

?

?

?

Tfidf(w, doc1)

?

?

?

?

Stored Features

24

Word Occu(w, doc1)

Tf(w, doc1)

Player 5 5/9

Munich 3 3/9

Train 1 1/9

Parliament 0 0

∑ 9 1

Occu(w, doc1)

Tf(w, doc1)

0 0/10

5 5/10

2 2/10

3 3/10

10 1

Image Source [1]

Page 25: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

IR-Modell

D1, D2, D3, D4, D5, …

Representation Vectors

Pre-processing

𝑽

V1, V2, V3, V4, V5, …

Documents 𝑫

Clustering D5, …

D6, …

D1, D2

D3, D4

Trans-formation

Clusters 𝑪𝒊 ⊆ 𝑫

25

Page 26: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

How many???

26

Page 27: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

How many???

27

Page 28: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

How many???

28

Page 29: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Recipe

𝜺

DBSCAN – Clustering Density-Based Spatial Clustering of Applications with Noise

Size of search area 𝜀 =

Min. #neighbors e.g. 4

𝒎𝒊𝒏𝑷𝒐𝒊𝒏𝒕𝒔 =

Core point

29

Page 30: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Recipe

𝜺

DBSCAN – Clustering Density-Based Spatial Clustering of Applications with Noise

Size of search area 𝜀 =

Min. #neighbors e.g. 4

𝒎𝒊𝒏𝑷𝒐𝒊𝒏𝒕𝒔 =

Repeat for all unvisited points

Core point

Border point

30

Page 31: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Recipe

𝜺

DBSCAN – Clustering Density-Based Spatial Clustering of Applications with Noise

Size of search area 𝜀 =

Min. #neighbors e.g. 4

𝒎𝒊𝒏𝑷𝒐𝒊𝒏𝒕𝒔 =

Repeat for all unvisited points

Core point

Border point

Noise point

31

Page 32: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Future Work

Demo

Challenges

Concept & Tools

Motivation

32 Image Source [1]

Page 33: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Challenges

33 Image Source [4]

Page 34: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Challenges

34

Clustering

Transformation

Preprocessing

Image Source [4]

Page 35: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

35

Clustering – the wrong choice?

- Varying #clusters - Handles noise

- Not accurate ->High dim. data ->Varying density

DBSCAN

Clique - Varying #clusters - Handles noise

- Comp. expensive ->backtracking

K-Means - Field-tested - Handles huge data

- Fixed #clusters - Sensitive to noise

Parameter selection

Page 36: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Transformation – the right representation?

36

𝑡𝑓𝑤𝑜𝑟𝑑,𝑑𝑜𝑐 = 𝑓𝑟𝑒𝑞𝑤,𝑑

𝑓𝑟𝑒𝑞𝑤,𝑑𝑤∈𝑑

Vs. It depends.

Page 37: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Transformation – the right representation?

37

𝑡𝑓𝑤𝑜𝑟𝑑,𝑑𝑜𝑐 = 𝑓𝑟𝑒𝑞𝑤,𝑑

𝑓𝑟𝑒𝑞𝑤,𝑑𝑤∈𝑑

𝒕𝒇𝒘𝒐𝒓𝒅,𝒅𝒐𝒄 = 𝒇𝒓𝒆𝒒𝒘,𝒅

𝒎𝒂𝒙𝒘{𝒇𝒓𝒆𝒒𝒘,𝒅}

𝑡𝑓𝑤𝑜𝑟𝑑,𝑑𝑜𝑐 = 𝑓𝑟𝑒𝑞𝑤,𝑑

𝑓𝑟𝑒𝑞𝑤,𝑑𝑤∈𝑑

𝑡𝑓𝑖𝑑𝑓𝑤𝑜𝑟𝑑,𝑑𝑜𝑐 = 𝑜𝑐𝑐𝑢𝑟𝑤,𝑑 ∙ 𝑖𝑑𝑓𝑤 𝒕𝒇𝒊𝒅𝒇𝒘𝒐𝒓𝒅,𝒅𝒐𝒄 = 𝒕𝒇𝒘,𝒅 ∙ 𝒊𝒅𝒇𝒘

𝟏

𝟑𝟎𝟎

𝟏

𝟑𝟎

Vs.

𝟑𝟎 1

It depends.

Page 38: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Preprocessing – the best corpus?

38

Corpus Size

Language

Seman tics

Page 39: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Preprocessing – the best corpus?

39

Corpus Size

Language

Seman tics

Named Entities

Dates

POS – filter nouns

Tags

Infoboxes

“FC Bayern München”

1990,1991,…

[[Link]]

Headings

Page 40: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

DEMO

40 Image Source [1]

Page 41: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Future Work

Demo

Challenges

Concept & Tools

Motivation

41 Image Source [1]

Page 42: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Future Work – On-The-Fly-Clustering

42 Image Source [5]

Page 43: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Future Work – On-The-Fly-Clustering

43

Guestimate (Sampling)

Use Headings only

Optimize Code

External-Framework (Spark)

Image Source [5]

Page 44: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Future Work – Sub-Clusters

44

Sports Team sports

Soccer Indoor soccer

Futsal Image Source [6], "Photo: Lachlan Fearnley"

Page 45: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Future Work –

45

Ball sports

Soccer clubs

City

University

Ship

Image Sources [3]

Page 46: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

MediaWiki Extension - Hook: Preprocess Documents - API: Cluster - SepcialPage: Show

- Works partially

WikiClustering - DBSCAN failed - k-Means works -> needs improvment - By all means not trivial!

46

Summary

Page 47: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Questions?

47 Image Source [1]

Page 48: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Image References / Attributions

• [1] PowerPoint ClipArt Search http://insertmedia.office.microsoft.com/

• [2] http://commons.wikimedia.org/wiki/Category:High-resolution_official_Wikimedia_logos

• [3] http://pixabay.com/en/tag-price-yellow-blank-308409/ • [4] Oceancetaceen, Goldene Leiter des Forums in Duisburg,

http://commons.wikimedia.org/wiki/File:Goldene_Leiter.JPG

• [5] http://mrg.bz/DUEfWy • [6] L. Fearnley, Russian Dools,

http://commons.wikimedia.org/wiki/File:Russian_Dolls.jpg

48

Page 49: Image Source [1] - Fraunhoferkontext.fraunhofer.de/haenelt/kurs/Referate/ElHindi_Sarfarpor_DocClustering.pdf · Ramin Safarpour, Muhammad El-Hindi Image Source [2] Future Work Demo

Links and Literature

• http://www.mediawiki.org/wiki/MediaWiki • http://php-nlp-tools.com/

• Pang-Ning Tan et al. (2005), Introduction to Data Mining,

Chapter 8 “Cluster Analysis”, Addison-Wesley • Martin Ester et al. (1996). Simoudis, Evangelos; Han, Jiawei;

Fayyad, Usama M., eds. "A density-based algorithm for discovering clusters in large spatial databases with noise". Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press. pp. 226–231.

• Gabriel Valiente (2002). Algorithms on Trees and Graphs. Berlin / Heidelberg / New York: Springer-Verlag.