gspan algorithm
TRANSCRIPT
![Page 1: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/1.jpg)
Gspan: Graph-based Substructure Pattern Mining
Presented By: Sadik MussahUniversity of Vermont CS 332 – Data mining
1
- Algorithm -
![Page 2: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/2.jpg)
Outlines• Background• Problem Definition• Authors Contribution• Concepts Behind Gspan• Experimental Result• Conclusion
2
![Page 3: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/3.jpg)
Background• Frequent Subgraph Mining Is An Extension To
Existing Frequent Pattern Mining Algorithms• A Major Challenge Is To Count How Many
Instances of patterns are in the Dataset• Counting Instances Might Be Easy For Sets, But
Subtle For Graphs• Graph Isomorphism Problem
3
![Page 4: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/4.jpg)
Background
Theorem
Given two graphs G and G’ (g prime), G isomorphic to G’ iff min(G) = min(G’)
05/01/23Sadik Mussah
4
![Page 5: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/5.jpg)
Background5
X W
U Y
V
(a)X
W
U
YV
(b)Two Isomorphic graph (a) and (b) with their mapping
function (c) Two Graphs Are Isomorphic If One Can Find A Mapping Of
Nodes Of The First Graph To The Second Graph Such That Labels On Nodes And Edges Are Preserved.
f(V1.1) = V2.2f(V1.2) = V2.5f(V1.3) = V2.3f(V1.4) = V2.4f(V1.5) = V2.1
(c)
G1=(V1,E1,L1) G2=(V2,E2,L2)
1
2
3
4
51
2
3 4
5
![Page 6: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/6.jpg)
Problem: Finding Frequent Subgraphs
• Problem Setting: Similar To Finding Frequent Itemsets For Association Rule Discovery • Input: Database Of Graph Transactions
• Undirected Simple Graph (No Multiples Edges)• Each Graph Transaction Has Labeled Edges/Vertices.• Transactions May Not Be Connected • Minimum Support Thresholds
• Output: Frequent Subgraphs That Satisfy The Support Threshold, Where Each Frequent Subgraph Is Connected.
6
![Page 7: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/7.jpg)
Finding Frequent Subgraphs7
![Page 8: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/8.jpg)
Authors Contribution• Representing Graphs As Strings (Like Treeminer)• No Candidate Generation!• “It Combines The Growing And Checking Of Frequent
Subgraphs Into One Procedure, Thus Accelerates The Mining Process.”• Really Fast, Still A Standard Baseline System That
Most Rivals Compare Their Systems To.
8
![Page 9: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/9.jpg)
Concepts Behind Gspan• The Idea Is To Produces A Depth-first Search (DFS) Codes
For Each Edge In Graphs• Edges Are Sorted According To Lexicographic Order Of
Codes• Yan And Han Proved That Graph Isomororphism Can Be
Tested For Two Graphs Annotated With DFS Codes• Starting With Small Graph Patterns Containing 1-edge,
Patterns Are Expanded Systemically By The DFS Search
• Employ Anti-monotonic Property Of Graph Frequency
9
![Page 10: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/10.jpg)
Lexicographic Ordering In Graph
• It Can Tell Us The Order Of Two Graphs.• The Design Can Help Us Build A Similar Hierarchy.• The Design Should Guarantee Easy-growing From One
Level To The Lower Level And Easy-rolling-up From Low Level To Higher Level.• It May Be Difficult To Have Such Design That No Two
Nodes In This Tree Are Same For Graph Case. • It Can Tell Us Whether The Graph Has Been
Discovered.• And More, The Most Important, If A Graph Has Been
Discovered, All Its Children Nodes In The Hierarchy Must Have Been Discovered.
10
![Page 11: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/11.jpg)
Lexicographic Ordering in Graph
11
...
... ...
1-edge
2-edge
...3-edge ...
......
...
![Page 12: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/12.jpg)
DFS Code And Minimum DFS Code• We Use A 5-tuple (Vi, Vj, L(vi), L(vj), L(vi,vj)) To Represent An
Edge. (It May Be Redudant, But Much Easier To Understand.)• Turn A Graph Into A Sequence Whose Basic Element Is 5-tuple.
Form The Sequence In Such An Order:• To Extend One New Node, Add The Forward
Edge That Connect One Node In The Old Graph With This New Node.• Add All Backward Edge That Connect This
New Node To Other Nodes In The Old Graph• Repeat This Procedure.
12
![Page 13: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/13.jpg)
DFS code13
XY
X
Z
Z
a ab
b c
d
v0v1v2
v3v4
XY
ae0: (0,1,x,y,a)
Xb
e1: (1,2,y,x,b)a
e2: (2,0,x,x,a)
Zc e3: (2,3,x,z,c)b
e4: (3,1,x,y,b)
Zd
e5: (1,4,x,z,d)
![Page 14: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/14.jpg)
DFS Code And Minimum DFS Code14
Depth First Tree And Forward/Backward Edge Set
![Page 15: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/15.jpg)
Minimum DFS code15
Each Graph may have lots of DFS code (why?):one smallest lexicographic one is its Minimum DFS CodeEdge no. (B) (C) (D)
0 (0,1,x,y,a) (0,1,y,x,a) (0,1,x,x,a)
1 (1,2,y,x,b) (1,2,x,x,a) (1,2,x,y,b)
2 (2,0,x,x,a) (2,0,x,y,b) (0,1,y,x,a)
3 (2,3,x,z,c) (2,3,x,z,c) (2,3,y,z,a)
4 (3,1,z,y,b) (3,0,z,y,b) (3,1,z,x,c)
5 (1,4,x,z,d) (0,4,y,z,d) (2,4,y,z,d)
![Page 16: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/16.jpg)
Graph Parent And Its Children16
X
Y
X
Z Z
a
b
ca
Given a DFS code c0=(e0,e1,…,en)if c1=(e0,e1,…,en,ex)if c0<c1, then c0 is c1’s parent,c1 is c0’s child.
?
?
??
???
?
![Page 17: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/17.jpg)
Theorem
• 1. Given Two Graph G0 And G1, G0 Is Isomorphic To G1 Iff Min_dfs_code(g0)=min_dfs_code(g1).
• 2. DFS Code Tree Covers All Graphs Although Some Tree Nodes May Represent The Same Graph
• 3. Given A Node In DFS Code Tree, If Its DFS Code Is Not Its Minimum DFS Code, Prune This Node And Its All Descendants Won’t Change. “Covering”.
17
![Page 18: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/18.jpg)
DFS Code Tree18
...
... ...
1-edge
2-edge
...3-edge ...
......
...
pruned
![Page 19: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/19.jpg)
FSG: two substructure patterns and their potential candidates.
05/01/23Sadik Mussah
19
![Page 20: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/20.jpg)
05/01/23SADIK MUSSAH
20AGM: two substructures joined by two chains
![Page 21: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/21.jpg)
Algorithm21
![Page 22: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/22.jpg)
Algorithm22
![Page 23: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/23.jpg)
Algorithm: Apriorigraph
05/01/23SADIK MUSSAH
23
![Page 24: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/24.jpg)
ALGORITHM: gSpan
05/01/23Xifeng Yan
24
![Page 25: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/25.jpg)
Experimental Result25
![Page 26: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/26.jpg)
Experimental Result26
![Page 27: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/27.jpg)
Conclusion• No Candidate Generation And False Test• Space Saving From Depth First Search • Good Performance: Using “Memory Pool” And
One Major Counting Improvement, It Seems The Performance Will Be Improved 5 Times More. (But Need More Testing).
27
![Page 28: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/28.jpg)
QuestionsQ1) What Two Major Costs From Apriori-like,
Frequent Substructure Mining Algorithms Did Gspan Aim To Reduce/Avoid?
Answer:1) The Creation Of Size K+1 Candidate Subgraphs From Size K Frequent Subgraphs Is More Complicated And Costly The Standard Apriori Large Itemset Generation. 2) Pruning False Positives Is An Expensive Process. Subgraph Isomorphism Problem Is Np-complete.
28
![Page 29: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/29.jpg)
Security Graph 3D Visualization
• https://www.youtube.com/watch?v=JsEm-CDj4qM
05/01/23Sadik Mussah
29
![Page 30: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/30.jpg)
Questions (cont.)• Q2) Which DFS Tree Does The DFS Code Below Belong To?
30
![Page 31: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/31.jpg)
v0Y
x
x
z
z v4v1
v2
v3
a
a
c
bb
d
Answer: tree (c)
![Page 32: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/32.jpg)
Questions• Q3) What Does Gspan Compare When Testing For
Isomorphism Between Two Graphs, And Why?
• Answer: Gspan Compares The Minimum Dfs Codes Of The Two Graphs. Given Two Graphs G And G’, G Is Isomorphic To G’ If Min(g)=min(g’). This Theorem Allows For A Simple String Comparison Of More Complicated Graphs. If Two Nodes Contain The Same Graph But Different Minimum DFS Codes, We Can Prune The Sub-branch Of The Rightmost Of The Two Nodes. This Greatly Decreases The Problem Size.
32
![Page 33: gSpan algorithm](https://reader035.vdocuments.mx/reader035/viewer/2022062412/589f34251a28ab4d568b6be1/html5/thumbnails/33.jpg)
Questions?33