a polynomial time matching algorithm of ordered tree patterns having height-constrained variables

38
A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height- Constrained Variables Kazuhide Aikou 1 , Yusuke Suzuki 1,2 , Takayoshi Shoudai 1 , Tomoyuki Uchida 2 , Tetsuhiro Miyahara 2 1. Department of Informatics, Kyushu University, Japan 2. Faculty of Information Sciences, Hiroshima City University, Japan

Upload: abrianna-miles

Post on 02-Jan-2016

33 views

Category:

Documents


2 download

DESCRIPTION

A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables. Kazuhide Aikou 1 , Yusuke Suzuki 1,2 , Takayoshi Shoudai 1 , Tomoyuki Uchida 2 , Tetsuhiro Miyahara 2. Department of Informatics, Kyushu University, Japan - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

Kazuhide Aikou1, Yusuke Suzuki1,2, Takayoshi Shoudai1,

Tomoyuki Uchida2, Tetsuhiro Miyahara2

1. Department of Informatics, Kyushu University, Japan

2. Faculty of Information Sciences, Hiroshima City University, Japan

Page 2: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

Contents

1. Backgrounds and Motivations

2. Preliminaries

- Ordered Term Trees

- Height-Constrained Variables

3. A Matching Algorithm of Ordered Term Trees having Height-Constrained Variables

4. Conclusions and Future Works

Page 3: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

Increase of Tree-structured Data( Web Documents, HTML/XML, etc. )

Discovery of Tree-structured PatternsCommon to Tree-structured Data

App.:Knowledge Discoveryfrom Web Documents

<Salesperiod> <Quarter>Winter1998</Quarter> <Design> <Designnumber>C365</Designnumber> <Description>North Star Polo</Description> <Unitssold>35500</Unitssold> </Design></Salesperiod>

<Quarter>

Winter1998

<Salesperiod>

<Design>

<Designnumber> <Unitssold><Description>

C365 North Star Polo 35500

<HTML>

<Head> <Body>

<Title><Table>

Text_university

<Table> <Table>

Ordered Term Trees

Our Works:• COLT for Term Trees• Web Mining Systems Using Learning

Algorithms for Term Trees

Backgrounds

Page 4: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

Ordered trees expresssemi-structured data (HTML, XML, etc).

<HTML>

  <HEAD>text1</HEAD>

  <BODY>

   <DIV>text2</DIV>

   <FONT>text3</FONT>

   <FONT>text4</FONT>

  </BODY>

</HTML>

HTML Data

TAG

TEXT

Object Exchange Model

1 2

<HTML>

<HEAD> <BODY>

1 2 3

<DIV><FONT><FONT>

1text1

1 1 1text2 text3 text4

Preliminaries

<HTML>

<HEAD> <BODY>

<DIV><FONT><FONT>text1

text2 text3 text4

Ordered Trees with Edge Labels

Page 5: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

x,y,...: variable labels

Variable h2

An ordered term treet=(V,E,H)

V: A vertex setE: An edge setH: A variable set

Ordered Tree Patterns with Internal Structured Variables

u1

u2

u5

u3

u6 u7 u8

x

y

u4

The child ports of h2

The parent portof h2

The parent port of h1

The child port of h1

Variables with at least one child port

Multi-child port variables

A variable can be substituted with an arbitrary ordered tree.

Variable h1

Variables with exactly one child port

Single-child port variables

Ordered Term Trees with Multi-Child Port Variables

Page 6: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

vi

w4

w2

w3

w1

vi

w4

w2

w3

w1

u6u5

u2

u3v2

u1

vi

w4

w2

u7

u4u4

u7

u6u5

u2

u3v2

u1

y

v4

v3v2

v1

vi

w4

w2

w3

w1u1

x

u7

y

u6u5

u4u3u2

v4

v3v2

v1 u1

x

u7

y

u6u5

u4u3u2

v4

v3v2

v1

u4

u7

u6u5

u2

v2

y

v4

v3v2

v1 u1

u3

An ordered tree T1 An ordered treeT2

Replacements of the variables with T1 and T2 An ordered term tree t A new ordered tree T

Identify the root of T1 with the parent port.

Identify the two leaves with the two child ports.

u6u5

u2

u3v2

u1

vi

w4

w2

u7

u4

Identify the root of T2 with the parent port.

Chose one of the leaves of T2 and Identify it with the child port.

Substitutions

Page 7: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

x

y

A substitution

match

An ordered treeA linear ordered term tree

Linear Ordered Term Trees:All variables have mutually distinct variable labels.All variable replacements are decided independently.

Page 8: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

INPUT T: an ordered tree; t: a linear ordered termtree with multi-child port variables.

PROBLEM Does t match T?

This matching problem is computed in O(nN) time, where n is the number of vertices in t and N is the number of vertices in T [Suzuki et al., ILP 02].

This matching problem is computed in O(nN) time, where n is the number of vertices in t and N is the number of vertices in T [Suzuki et al., ILP 02].

Matching Problem for Linear Ordered Term Trees with Multi-Child Port Variables

Page 9: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

<HTML>

<HEAD>text1</HEAD>

<BODY>

<DIV>text2</DIV>

<FONT>text3</FONT>

<FONT>text4</FONT>

</BODY>

</HTML>

An HTML file

1 2

<HTML>

<HEAD> <BODY>

1 2 3

<DIV><FONT><FONT>

1text1

1 1 1text2 text3 text4

height

Observation:Most of ordered trees obtained from HTML files have low height.

Page 10: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

A tree of a big height is rare.Then, it becomes a feature if there is a long branch.

A tree of a big height is rare.Then, it becomes a feature if there is a long branch.

0

10

20

30

40

0 500 1000 1500 2000

Size = The number of vertices in a tree

Height

Relationships between the size of the tree representing an HTML file and the height of it.

Page 11: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

( i , j )

( i’, j’)

0 < i j≦

The trunklength i

i

Theheight j

j

Trunk Length: The path length between the root and the leaf which are identified with the ports.

Height-constrainedHeight-constrained single-child port variablesvariables

Page 12: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

Example.

(2,2) (2,4)

123

O.KN.G.An orderedterm tree t

An ordered tree T

Page 13: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

A linear ordered term tree t

(1,2) (4,6)

An ordered tree T

INPUT:

PROBLEM: Does t match T?

MATCHING PROBLEMfor Linear Ordered Term Trees with Height-Constrained Single-Child Port Variables

Page 14: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

Main TheoremMain Theorem

MATCHING PROBLEM for Linear Ordered Term Trees with Height-Constrained Single-Child Port Variables is computed in O(N max{nDmax, S}) time, where

n: the number of vertices of t,

N: the number of vertices of T,

S: the total amount of the lowest trunk lengths of all variables of t,

Dmax: the maximum number of children of a vertex of T.

Page 15: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

Sub Term Tree and SubtreeA linear ordered term tree t An ordered tree T

(4,6)

(1,1)

t[u’](4,6)

(1,1)

u’

(1,2)

uT[u]

u and all descendants of u

-T[v]

v

which are not proper descendants of v

Page 16: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

Idea:Corresponding Sets CS(u)

v

u

v’

(v’,i,j) CS(u)∈t T

t=(Vt,Et,Ht): a term tree, T=(VT,ET): a tree.CS(u)Vt×NN×NN : a corresponding set of a vertex uVT.

(v’,i,j) CS(u)∈   shows that there is a descendant v of u such that

(1) t[v’] matches T[v],(2) the length between u and v is i (if i < i’-1), and(3) the height of T[u]-T[v] is j.

match

v

T[v]

v

(i’,j’)

t[v’]

v’

ji

u

v

Page 17: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

uv’ T

(v’,0,0) CS(u)∈

match

t

Therefore,(v’,0,0)CS(u) if and only if t[v’] matches T[u].

(i’,j’)

(the root of t,0,0)CS(the root of T) if and only if t matches T.

Page 18: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

Algorithm MatchingMatching(t,T)

Initialization;

while there is an unmarked vertex u of T do begin

Mark u;

VID-Inheriting(u);

C-Set-Attaching(u)

end

1

2

3

Page 19: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

Algorithm MatchingMatching(t,T)

Initialization;

while there is an unmarked vertex u of T do begin

Mark u;

VID-Inheriting(u);

C-Set-Attaching(u)

end

Page 20: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

(1,2) (2,2)(1,2) (2,2)

2

1

7

3

98

4 65

Vertex identifiers

Breadth-firstsearch order

Initialization:Vertex Identifiers

A linear ordered term tree t

The children of an internal vertexhave consecutive vertex identifiers.This saves computation time of main processes.This saves computation time of main processes.

Page 21: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

Compute the corresponding set of each vertex from leaves to the root.

t1

7

3

98

4 65

2

(1,2) (3,6)

TA

E I

C

G

N

B

J

ML

F H

K

D

Q

O

Initialization: For all leaves u of T,Mark u;CS(u):={(u’,0,0) | u’ is a leaf of t.}; height(u):=0;

7

98

4 6CS(D)   (4,0,0),(6,0,0),= (7,0,0),(8,0,0), (9,0,0) height(D)=0

CS(K)   (4,0,0),(6,0,0),= (7,0,0),(8,0,0),   (9,0,0) height(K)=0

CS(F)   (4,0,0),(6,0,0),= (7,0,0),(8,0,0),   (9,0,0) height(F)=0

CS(L)   (4,0,0),(6,0,0),= (7,0,0),(8,0,0),   (9,0,0) height(L)=0

CS(M)   (4,0,0),(6,0,0),= (7,0,0),(8,0,0),   (9,0,0) height(M)=0

CS(H)   (4,0,0),(6,0,0),= (7,0,0),(8,0,0),   (9,0,0) height(H)=0

CS(Q)   (4,0,0),(6,0,0),= (7,0,0),(8,0,0),   (9,0,0) height(Q)=0

CS(J)   (4,0,0),(6,0,0),= (7,0,0),(8,0,0),   (9,0,0) height(J)=0

J

ML

F H

K

D

Q

P

from leaves to the root

Page 22: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

Algorithm MatchingMatching(t,T)

Initialization;

while there is an unmarked vertex u of T do begin

Mark u;

VID-Inheriting(u);

C-Set-Attaching(u)

end

Page 23: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

N can become a vertex 3.

v’

u’

(i,j)

VID-Inheriting (1/3): Let v’ be the child port of an (i,j)-height constrained variable. For an internal vertex u of a tree, if there is an element (v’,i’,j’) in the CS of a child of u, add (v’, min{i’+1,i-1}, *) to CS(u).

7

3

(3,6)

Example

C

J

(7,0,0) CS(∈ Q)

(7,0,0) CS(J)∈

Add (7,1,1) to CS(P)

Add (7,2,2) to CS(O)

Add (7,2,3) to CS(N)

I

N

O

P

Q

Add (7,2,4) to CS(I)

If i’=i-1 then the parent of u can match the parent port u’.

Next slide

Page 24: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

T

cb

a

4

         ∈ CS(a)

3

Choose the smallest height

(7,2,4) , (7,2,5)

(7,1,1) CS(b)∈height(b)=4

(7,1,3) CS(c)∈height(c)=3

7

3

(4,6)

cb

(7,2,4) CS(a)∈

VID-Inheriting (2/3):Case: At least two children have (v’,i’,*) for a vertex v’ and an integer i’.

Page 25: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

VID-Inheriting (3/3):Case: A child has (v’,i’,*) and another child has (v’,i’’,*) for distinct integers i’ and i’’.

cb

a

4

     ,      ∈ CS(a)

3

(7,2,4) (7,3,5)

T

(7,1,3) CS(b)∈height(b)=4

(7,2,2) CS(c)∈

height(c)=3

7

3

(4,6)

cb

Add all triplets to CS(u) (at most i triplets)

• CS(a) contains at most S triplets.• Then the total time complexity of Inheriting of a vertex a

is O(Sma), where ma is the number of the children of a.

Page 26: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

Algorithm MatchingMatching(t,T)

Initialization;

while there is an unmarked vertex u of T do begin

Mark u;

VID-Inheriting(u);

C-Set-Attaching(u)

end

Page 27: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

C-Set-Attaching (Small Examples)

4 65

2

4 65

2

(1,2)

t

t

B

F HD

E G

B

F HD

(4,0,0)CS(D)

(5,0,0)CS(F)

(6,0,0)CS(H)

(2,0,0) should be added to CS(B).

(4,0,0)CS(D)

(5,0,0)CS(G)

(6,0,0)CS(H)

height(F)=2

height(E)=1

(2,0,0) is added to CS(B).

(5,0,0)CS(G) covers [E,G].

Page 28: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

4 65

2

(1,2)

t

E G

B

F HD

(4,0,0)CS(D)

(5,1,1)CS(F)

(6,0,0)CS(H)

height(G)=2height(E)=1

(2,0,0) is added to CS(B).

(5,1,1)CS(F) covers [E,G].

4 65

2

(1,2)

t

E G

B

F HD

(4,0,0)CS(D)

(5,1,1)CS(F)

(6,0,0)CS(H)

height(G)=2height(E)=3

(2,0,0) may not be added to CS(B).

(5,1,1)CS(F) covers [F,G] but cannot cover E.

Page 29: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

(4,8) (3,4) (5,5) (4,7)

1 2 3 4 5 6 7 8 9 10

11

C-Set-Attaching (A Big Example)

t

An ordered term tree

Page 30: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

CS(K)=

(1,0,0),

height(A)=9

CS(A)= (2,0,0),

(4,0,0)

height(B)=5

CS(B)

= (5,0,0)height(C)=4

CS(C)= (3,3,4),

(6,0,0)

height(D)=5

CS(D)

=(3,3,3)

height(E)=3

CS(E)= (1,0,0),

(4,0,0)(7,2,3)

height(F)=2

CS(F)

=

(2,0,0),(4,0,0),(5,0,0),(8,4,4)

height(G)=5

CS(G)

=

(5,0,0),(6,0,0),(8,4,4),(9,0,0)

height(H)=6

CS(H)

=

(3,3,5),(6,0,0)

height(I)=5

CS(I)

=(7,2,3),(10,3,3)

height(J)=7

CS(J)

=

height(K)=1

φ (4,0,0),(8,4,4)

height(L)=9

CS(L)

=

(5,0,0),(9,0,0)

height(M)=4

CS(M)

=(6,0,0),(10,3,4)

height(N)=4

CS(N)

=

A B C D E F G H I J K L M N

An ordered tree O

Page 31: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

1 2 3 4 5 6 7 8 9 10

A

B

C

D

E

F

G

H

I

J

K

L

M

N

First, we prepare a virtual table for a new graph.Rows and columns represent vertices of T and t, respectively.

Page 32: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

(3,3,3)

height(R)=3

CS(E)= (1,0,0),

(4,0,0)(7,2,3)

height(F)=2

CS(F)

= (2,0,0),(4,0,0),(5,0,0),(8,4,4)

height(G)=5

CS(G)

=

(5,0,0),(6,0,0),(8,4,4),(9,0,0)

height(H)=6

CS(H)

=

(3,3,5),(6,0,0)

height(I)=5

CS(I)

=(3,3,4),(6,0,0)

height(F)=5

CS(D)

=

E F G H

O

ID

(3,4)

7

11

7

E

F

G

H

I

[E,F] (7,2,3)CS(F) covers [E,F].

An ordered tree An ordered term tree

Add a vertex labeled with [E,F] to F7 in the table.

Page 33: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

(3,3,3)

height(E)=3

CS(E)= (1,0,0),

(4,0,0)(7,2,3)

height(F)=2

CS(F)

= (2,0,0),(4,0,0),(5,0,0),(8,4,4)

height(G)=5

CS(G)

=

(5,0,0),(6,0,0),(8,4,4),(9,0,0)

height(H)=6

CS(H)

=

(3,3,5),(6,0,0)

height(I)=5

CS(I)

=(3,3,4),(6,0,0)

height(D)=5

CS(D)

=

(5,5)

8

11

(3,4)

7

7 8

E

F

G

H

I

[E,G]

[E,F]

E F G H

O

ID

An ordered tree An ordered term tree

(8,4,4)CS(G) covers [E,G].

Add a vertex labeled with [E,G] to G8 in the table.

Page 34: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

(3,3,3)

height(E)=3

CS(E)= (1,0,0),

(4,0,0)(7,2,3)

height(F)=2

CS(F)

= (2,0,0),(4,0,0),(5,0,0),(8,4,4)

height(G)=5

CS(G)

=

(5,0,0),(6,0,0),(8,4,4),(9,0,0)

height(H)=6

CS(H)

=

(3,3,5),(6,0,0)

height(I)=5

CS(I)

=(3,3,4),(6,0,0)

height(D)=5

CS(D)

=

(5,5)

8

11

(3,4)

7

7 8

E

F

G

H

I

[E,G]

[H,H]

[E,F]

E F G H

O

ID

An ordered tree An ordered term tree

(8,4,4)CS(H) covers [H,H].Add a directed edge from [E,F] at F7 to [E,G] at G8, because two consecutive variables cover all vertices from E to G.

Add a vertex labeled with [H,H] to H8 in the table.

Page 35: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

1 2 3 4 5 6 7 8 9 10

A

B

C

D

E

F

G

H

I

J

K

L

M

N

[B,K]

[B,K]

[J,K]

[K,N]

[E,F]

[H,H]

[M,N]

[B,K]

[B,K]

vstart

vgoal

[B,K]

[J,K]

[K,N]

[M,N]

[E,G]

• If there is a directed path from vstart to vgoal, (11,0,0) is added to CS(O).

• The total time complexity of C-Set-Attaching of a vertex u of T and a vertex u’ of t is O(mu

2 m’u’), where mu and m’u’ are the numbers of the children of u and u’, respectively.

Page 36: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

Total Time Complexity

VID-Inheriting(u): O(Smu) C-Set-Attaching(u): O(mu

2m’u’)mu: the number of children of a vertex u of T,

m’u’: the number of children of a vertex u’ of t. Total: O(N max{nDmax,S})

n: the number of vertices of t,N: the number of vertices of T,S: the total amount of the lowest trunk lengths of all variables of t,

Dmax: the maximum number of children of a vertex of T.

Page 37: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

Conclusions• An O(N max{nDmax,S}) Time Matching Algorithm for

Ordered Term Trees with Height-Constrained Variables.

• [Our Related Works] Polynomial-Time Learning Algorithms for Ordered Term Trees with Height-Constrained Variables [Suzuki et al., PRICAI'04], [Matsumoto and Shoudai, ALT'04].

Future Works:Future Works:• An Efficient Matching Algorithm for Ordered Term Trees

with Height-Constrained Multi-Child Port Variables.

• Polynomial-Time Learning Algorithms for Ordered Term Trees with Height-Constrained Multi-Child Port Variables.

Page 38: A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

Thank you for your attention.