survey on frequent pattern mining on graph data - slides
TRANSCRIPT
![Page 1: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/1.jpg)
Sriskandarajah SuhothayanKasun Gajasinghe
Isuru Loku NarangodaSubash Chaturanga
![Page 2: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/2.jpg)
OutlineIntroductionBasic principlesSolution patterns
![Page 3: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/3.jpg)
IntroductionGraphs can be seen in everywhere.In computer science, graph is viewed as an
abstract data structure which represents relationships among data.
![Page 4: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/4.jpg)
Graph based data miningGraph based data mining is finding out useful
and understandable patterns from graph representation of data.
The main subject area of graph based data mining is identifying the frequently occurring subgraph patterns.
![Page 5: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/5.jpg)
ApproachesIn the recent past a significant work has been
done in this subject area to develop algorithms to mine graph data efficiently.
In this paper we are discussing about such several well known algorithms under following categories.Mathematical Graph Theory Based
ApproachesGreedy Search Based ApproachesInductive Logic Programming ApproachInductive Database Based Approaches
![Page 6: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/6.jpg)
ApplicationsBioInformatics
mine biochemical structures finding out biological conserved sub networks
Chemical compound analysisWeb browsing pattern analysisintrusion network analysismining communication networks
![Page 7: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/7.jpg)
Basic PrinciplesSubgraph categories
general subgraphsinduced subgraphsconnected subgraphs
Subgraph Isomorphism ProblemThis finds whether there exists a one-to-one
mapping from a set of vertices to another set.
![Page 8: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/8.jpg)
Basic PrinciplesGraph Invariants
Quantities to characterize the topological structure of a graph
number vertices, degree of each vertex number of edges connected to the vertex
![Page 9: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/9.jpg)
Solution Approaches
direct
Categorization
Completeness
complete searchheuristic search
Subgraph isomorphismmatching problem
Indirect(solves the subgraph similarity problem)
![Page 10: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/10.jpg)
Solution Approaches
Greedy search Inductive logic programming (ILP) Inductive database Complete level-wise search Support Vector Machine (SVM)
![Page 11: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/11.jpg)
Greedy searchThe conventional solution
Categorized into Depth-First search (DFS) and Breadth-First Search (BFS) Beam search
The disadvantage: as the search proceeds it prunes the branches which do not fit to the maximum branch number limit
![Page 12: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/12.jpg)
Inductive logic programming (ILP)
Induction?
combination of the 'abduction' (guessing) to select some hypotheses and the 'justification' to seek those hypotheses to justify the observed facts.
![Page 13: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/13.jpg)
Inductive logic programming (ILP)
positive examples + negative examples => hypothesis+ background knowledge
background knowledge to control the search process (prune some search
paths) introduce predetermined subgraph patterns ILP can be in any of four categories
![Page 14: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/14.jpg)
Inductive database
Subgraphs and relations among subgraphs are pre-generated sad stored in an inductive database
Advantage: fast operation as the basic patternsDisadvantage: large amount of computation
and memory utilization
![Page 15: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/15.jpg)
Complete level-wise searchIt's Complete and Direct
Here data are not sets of items Rather graphs having the combinations of a
vertex set V(G) and an edge set E(G) which include topological information.
Extended approach of Apriori algorithm is used
![Page 16: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/16.jpg)
Support Vector Machine (SVM)
Used for classification and regression analysis
A non-probabilistic binary linear classifier
SVN is a heuristic search and an indirect method in terms of subgraph isomorphism problem.
![Page 17: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/17.jpg)
Categorization
Mathematical Graph Theory Based Approaches
Greedy Search Based Approaches Inductive Logic Programming Approach Inductive Database Based Approaches Kernel Function Based Approaches
![Page 18: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/18.jpg)
Greedy Search Based Approaches
Use heuristics to evaluate the solution.
Two major works SUBDUE GBI
![Page 19: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/19.jpg)
Graph Based Induction (GBI)Has two methods
one for chunking and the other for extracting patters.
Can arrive at local minimum solutions; using pair wise chunking at each step by the opportunistic beam search.
Ability to reconstruct the original graph as and when needed
The advantage of GBI is that it can handle both directed and undirected labelled graph even with closed paths which includes closed edges.
Use empirical graph size definition, limitation in continuously compressing the graph, graph never becomes a single vertex.
Extract substructures and construct a classifier.
![Page 20: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/20.jpg)
SUBDUE
A graph-based relational learning system
Compress the graphs based on Minimum Description Length (MDL) principle
Not face high computational complexity (uses computationally constrained beam search)
Miss some optimum sub graphs
fewer number of highly interesting patterns; than generating a large number of patterns from which interesting patterns need to be identified.
Runtime much larger than gSpan and FSG: non-linear with the dataset size (because of the implementation of graph isomorphism problem)
![Page 21: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/21.jpg)
Mathematical Approaches Apriori-based methods
– AGM– FSG
Pattern Growth methods– gSpan
![Page 22: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/22.jpg)
Apriori-based Approach AGM
– Used to mine “frequent induced subgraphs”
– Works with both directed and undirected graphs
– Importantly, this algorithm is not limited to the connected graphs. It also supports isolated graphs.
![Page 23: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/23.jpg)
AGMBreadth first search. Create new candidates for level k+1
by joining two graphs at level k.
AGM generates new graphs by adding a new node:
And then proceeds as per Apriori...
![Page 24: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/24.jpg)
FSG– FSG works better on graph data sets with more
edge and vertex labels– This is an optimized algorithm of AGM with added
techniques for efficiency.– FSG increases the efficiency of the candidate
generation of frequent subgraphs by introducing the Transaction ID (TID) method.
– efficient candidate subgraph generation algorithms.
![Page 25: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/25.jpg)
FSG– FSG is a apriori-based and therefore uses level-
wise algorithm
– Faces two challenges: candidate generation: the generation of size
subgraph candidates is more complicated and costly
pruning false positives: subgraph isomorphism test is an NP-complete problem
![Page 26: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/26.jpg)
gSpan– Uses Depth-First-Search (DFS)– can be used to find frequent sub graphs one by
one from small to large ones.
– Advantages• No candidate generation and false test• Better saving of space by DFS.
Pattern growth mathod
![Page 27: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/27.jpg)
GRAPH DATASET
FREQUENT PATTERNS(MIN SUPPORT IS 2)
(A) (B) (C)
(1) (2)
![Page 28: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/28.jpg)
Another three approaches to mine graph based data.
Inductive Logic Programming approach Inductive database approach Kernel function based approach
![Page 29: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/29.jpg)
ILP approach.
ILP systems constructs predictive model for a given data set by searching large space of candidate hypothesis.
WARMR – proposed in 1998. Combination of
Apriori-like level wise search and IPL method. But have a high computational complexity.
FARMER – proposed in 2011. Runs two orders of magnitude than WARMER.
![Page 30: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/30.jpg)
Inductive DB approach.
Databases which are capable of handling patterns within data. Quite different from from typical data bases.
Uses interactive querying process to mine data in these data bases.
MolFea is an effort related to this area. Has a
better computational efficiency which mines linear fragments in chemical compounds..
Also this performs a complete search of the paths in graph data.
![Page 31: Survey on Frequent Pattern Mining on Graph Data - Slides](https://reader035.vdocuments.mx/reader035/viewer/2022081403/554e8781b4c90573338b47d3/html5/thumbnails/31.jpg)
Kernel Function based approach
This “kernel” function basically defines similarity between two graphs
The paper consists of two efforts done based on this approach, which classifies the graphs in to binary classes by SVM (Support Vector - Machine).