guided forest edit distance: better structure comparisons by using domain-knowledge z.s. peng h.f....
Post on 21-Dec-2015
224 views
TRANSCRIPT
![Page 1: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/1.jpg)
Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge
Z.S. Peng
H.F. Ting
![Page 2: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/2.jpg)
The Forest Edit Distance
![Page 3: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/3.jpg)
Edit distance of two ordered, labeled forests
Edit operations between E and F Relabling node i in E by the label of node j in F
4
2 3
1
4
1
2
3
7
5 6
E F
a
h
f m
a
me
z
v
uy
![Page 4: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/4.jpg)
Edit distance of two ordered, labeled forests
Edit operations between E and F Relabling node i in E by the label of node j in F
Relabel (3,5)
4
2 3
1
4
1
2
3
7
5 6
E F
a
h
f m
a
me
z
v
uy y
![Page 5: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/5.jpg)
Edit distance of two ordered, labeled forests
Edit operations between E and F Relabling node i in E by the label of node j in F
Cost of the operation: (3,5)
4
2 3
1
4
1
2
3
7
5 6
E F
a
h
f m
a
me
z
v
uy p
![Page 6: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/6.jpg)
Edit distance of two ordered, labeled forests
Edit operations between E and F Delete node i from E
4
2 3
1
4
1
2
3
7
5 6
E F
a
h
f m
a
me
z
v
uy
![Page 7: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/7.jpg)
Edit distance of two ordered, labeled forests
Edit operations between E and F Delete node i from E
Delete (2,-)
4
2 3
1
4
1
2
3
7
5 6
E F
a
h
f m
a
me
z
v
uy
![Page 8: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/8.jpg)
Edit distance of two ordered, labeled forests
Edit operations between E and F Delete node i from E
Delete (2,-)
4
3
1
4
1
2
3
7
5 6
E F
a
h
m
a
me
z
v
uy
![Page 9: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/9.jpg)
Edit distance of two ordered, labeled forests
Edit operations between E and F Delete node i from E
Cost of the operation: (2,-)
4
3
1
4
1
2
3
7
5 6
E F
a
h
m
a
me
z
v
uy
![Page 10: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/10.jpg)
Edit distance of two ordered, labelled forests
Edit operations between E and F Delete node j from F
The cost of operation: (-,j)
4
2 3
1
4
1
2
3
7
5 6
E F
a
h
f m
a
me
z
v
uy
![Page 11: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/11.jpg)
Edit distance of two ordered, labelled forests
The edit distance (E,F) between E and F is the minimum cost of edit operations that transform E to E' and F to F' such that E' = F'.
4
2 3
1
4
1
2
3
7
5 6
E F
a
h
f m
a
me
z
v
uy
4
2 3
1
4
1
2
3
7
5 6
a
h
f m
a
me
z
v
uy
![Page 12: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/12.jpg)
Edit distance of two ordered, labelled forests
The edit distance (E,F) between E and F is the minimum cost of edit operations that transform E to E' and F to F' such that E' = F'.
4
2 3
1
4
1
2
3
7
5 6
E F
a
h
f m
a
me
z
v
uy
4
2 3
1
4
1
2
3
7
5 6
a
h
f m
a
me
z
v
uy e
![Page 13: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/13.jpg)
Edit distance of two ordered, labelled forests
The Guided edit distance (E,F,G) between E and F with respect to a third forest G is the minimum cost of edit operations that transform E to E' and F to F' such that E' = F' include G as a subforest.
4
2 3
1
4
1
2
3
7
5 6
E F
a
h
f m
a
me
z
v
uy
4
2 3
4
1 3
a
m
a
mee
3
1 2
a
me
G
![Page 14: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/14.jpg)
Application 1: RNA comparisons
Cherry small circular viroid-Like RNA GI:2347024 between base 287 and base 337. The Hammerhead motif of the RNA is printed in bold.
![Page 15: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/15.jpg)
Application 2: Comparing XML documents
XML documents with same Document Type Descriptor should be aligned with this DTD to get more accurate results
![Page 16: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/16.jpg)
The algorithms
(E,F)
Tai 1979:Zhang and Shasha 1989:
where Klein 1998:
(E,F,G):
This paper:
))()(|||(| 22 FdEdFEO
))()(|||(| FEFEO
|)|log|||(| 2 FFEO
))(|)()(|||||(| 2GLFEGFEO
)}(),(min{)( XdXLX
![Page 17: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/17.jpg)
Special Cases
a
a
c
c
b
a
c
c
a
c
c
f
f
![Page 18: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/18.jpg)
Special Cases
a
a
c
c
b
a
c
c
a
c
c
f
f
Longest Constraint Common Subsequence
Constrained Sequence Alignment
![Page 19: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/19.jpg)
The algorithms
Constrained Longest Common Subsequent Tsai 2003:
Constrained Sequence Alignment Chin et al. :
This paper:
where
Since G has one leaf, the time becomes
|)||||(| gfeO
|)||||(| gfeO
))(|)()(|||||(| 2GLFEGFEO )}(),(min{)( XdXLX
|)||||(| GFEO
![Page 20: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/20.jpg)
Our algorithm for computing (E,F,G)
Dynamic Programming
![Page 21: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/21.jpg)
The sub-problems
Post-order numbering (naming) of the nodes
5
3 4
1 2
14
10
1211
138
7
9
6
18
16
15
17
20
19 2221
23
![Page 22: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/22.jpg)
The sub-problems
: A "consecutive" sub-forest
'..iiE
5
3 4
1 2
14
10
1211
138
7
9
6
18
16
15
17
20
19 2221
23
![Page 23: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/23.jpg)
The sub-problems
: A "consecutive" sub-forest
'..iiE
5
3 4
1 2
14
10
1211
138
7
9
6
18
16
15
17
20
19 2221
23
21..4E
![Page 24: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/24.jpg)
The sub-problems
),,( '..'..'.. kkjjii GFE
5
3 4
1 2
5
1
32
48
7
9
6
9
6
7
8
2
1 43
5
E F G
![Page 25: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/25.jpg)
The sub-problems
),,( 3..27..47..2 GFE
5
3 4
1 2
5
1
32
48
7
9
6
9
6
7
8
2
1 43
5
E F G
7..2E 7..4F 3..2G
![Page 26: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/26.jpg)
is equal to the minimum of the followings:
),()( ..,1)..(,)..( 00jGFE kjjsiis
1.
2.
3.
4.
5.
),()( ..,)..(,1)..( 00 iGFE kjjsiis
)],[],[()( ..,1)()..(,1)()..( 00 jEiEGFE kjsjsisis
),()()( )..(1)..(1)..(1)(..,1)()..(,1)()..( 00jiGFEGFE psjjsiispskjsjsisis
),()()( 1)..(1)..(1)..(1)(..,1)()..(,1)()..( 00jiGFEGFE sjjsiisskjsjsisis
)( ..,)..(,)..( 00 kjjsiis GFE
![Page 27: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/27.jpg)
1. ),()( ..,)..(,1)..( 00 iGFE kjjsiis
5
3 4
1 2
5
1
32
48
7
9
6
9
6
7
8
2
1 43
5
E F G
![Page 28: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/28.jpg)
1. ),()( ..,)..(,1)..( 00 iGFE kjjsiis
5
3 4
1 2
5
1
32
48
7
9
6
9
6
7
8
2
1 43
5
E F G
![Page 29: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/29.jpg)
2. ),()( ..,1)..(,)..( 00jGFE kjjsiis
5
3 4
1 2
5
1
32
48
7
9
6
9
6
7
8
2
1 43
5
E F G
![Page 30: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/30.jpg)
3.
5
3 4
1 2
5
1
32
48
7
9
6
9
6
7
8
2
1 43
5
E F G
)],[],[()( ..,1)()..(,1)()..( 00 jEiEGFE kjsjsisis
![Page 31: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/31.jpg)
3.
5
3 4
1 2
5
1
32
48
7
9
6
9
6
7
8
2
1 43
5
E F G
)],[],[()( ..,1)()..(,1)()..( 00 jEiEGFE kjsjsisis
![Page 32: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/32.jpg)
4.
5
3 4
1 2
5
1
32
48
7
9
6
9
6
7
8
2
1 43
5
E F G
),()()( )..(1)..(1)..(1)(..,1)()..(,1)()..( 00jiGFEGFE psjjsiispskjsjsisis
![Page 33: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/33.jpg)
4.
5
3 4
1 2
5
1
32
48
7
9
6
9
6
7
8
2
1 43
5
E F G
),()()( )..(1)..(1)..(1)(..,1)()..(,1)()..( 00jiGFEGFE psjjsiispskjsjsisis
![Page 34: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/34.jpg)
5.
5
3 4
1 2
5
1
32
48
7
9
6
9
6
7
8
2
1 43
5
E F G
),()()( 1)..(1)..(1)..(1)(..,1)()..(,1)()..( 00jiGFEGFE sjjsiisskjsjsisis
![Page 35: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/35.jpg)
5.
5
3 4
1 2
5
1
32
48
7
9
6
9
6
7
8
2
1 43
5
E F G
),()()( 1)..(1)..(1)..(1)(..,1)()..(,1)()..( 00jiGFEGFE sjjsiisskjsjsisis
![Page 36: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/36.jpg)
5.
5
3 4
1 2
5
1
32
48
7
9
6
9
6
7
8
2
1 43
5
E F G
),()()( 1)..(1)..(1)..(1)(..,1)()..(,1)()..( 00jiGFEGFE sjjsiisskjsjsisis
![Page 37: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/37.jpg)
The order for solving the sub-problems
for i=1 to |E|
for j=1 to |F|
for h=1 to |G|
for k=1 to (|G|-h+1)
if k is a leaf then find ),,( )1..(..1..1 hkkji GFE
![Page 38: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/38.jpg)
The time complexity
)|)(||||||(| 222 GLGFEO
![Page 39: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/39.jpg)
Sparsify the dynamic program
using a clever trick of Zhang and Shasha
![Page 40: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/40.jpg)
key-root: if it is the root, or has a left-slibling
5
3 4
1 2
5
1
32
4
8
7
9
6
9
6
7
8
2
1 43
5
E F G
2
1
![Page 41: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/41.jpg)
key-root: if it is the root, or has a left-slibling
5
3 4
1 2
5
1
32
4
8
7
9
6
9
6
7
8
2
1 43
5
E F G
2
1
No. of key-roots ≤ no. of leaves
![Page 42: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/42.jpg)
To compute (E,F,G)= (E||1..|E| ,F||1..|F| ,G||1..|G|)
for i=1 to |E|
for j=1 to |F|
for h=1 to |G|
for k=1 to (|G|-h+1)
if k is a leaf
find ),,( )1..(..1..1 hkkji GFE
![Page 43: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/43.jpg)
To compute (E,F,G)= (E||1..|E| ,F||1..|F| ,G||1..|G|)
for i=1 to |E|
for j=1 to |F|
for h=1 to |G|
for k=1 to (|G|-h+1)
if k is a leaf and i and j are key-roots
find ),,( )1..(..1..1 hkkji GFE
![Page 44: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/44.jpg)
The new running time
))(|)()(|||||(| 2GLFEGFEO
![Page 45: Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d6b5503460f94a49ba1/html5/thumbnails/45.jpg)
Thank you