graph sampling
TRANSCRIPT
-
8/2/2019 Graph Sampling
1/16
GRAPH SAMPLING
Ori Raz
November 2009
-
8/2/2019 Graph Sampling
2/16
Abstract
Isomorphism between graphs is a knownproblem
One of the latest ideas in the area is graph
sampling, or features selectionWe will try to find correlation between the
graphlet distribution and the
regression/classification tasks
SAT Problem DATASET 3000 formulas
-
8/2/2019 Graph Sampling
3/16
Graph representations
3 ways to represent a graph- Clause Graph
Each vertex represents a clause and an edge between two
clauses when they share a negated literal- Variable Clause Graph
Each vertex represents clause or variable (vertices =
clauses + variables). An edge when a variable in a clause.
Graph is bipartite graph.- Variable Graph
Each vertex represents a variable and an edge when two variables
occur in the same clause.
-
8/2/2019 Graph Sampling
4/16
Graph Sampling
Graph Random walk
2 methods: Random and MCMC
- Random: For each sample we will chooseall vertices randomly
- MCMC: Replacing a random vertex with a
new vertex
Total of 630K samples for each graph
-
8/2/2019 Graph Sampling
5/16
Graph representations - example
F= (x1 or x2 or x3) and (x1 or x2 or x5) and (x3 or x4 or x5) and (x2 or x3 or x5)
-
8/2/2019 Graph Sampling
6/16
Algorithm
- For each formulacreate representations
- For each created fileget samples
- For each sampling fileexecute
- Summarize all results in svm format- For each svm file
create learn and target files- For each learn file
generate learn-strong-features svm files- For each learn-strong-features file
use to learn the fileuse to classify
- For each result filefind correlation with original result
Nauty
svmLightsvmLight
-
8/2/2019 Graph Sampling
7/16
Results - Regression
MCMCVs. Random
0.46
0.465
0.47
0.475
0.48
0.485
0.49
0.495
0.5
0.505
20 40 100 200 400 1000
Number Of Features
Correlation
MCMC
RANDOM
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
20 40 100 200 400 1000
Number Of Features
Correlation
MCMC
RANDOM
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
20 40 100 200 400 1000
Number Of Features
Correlation
MCMC
RANDOM
UNSAT VARIABLE CLAUSE GRAPH
UNSAT VARIABLE GRAPH
SAT VARIABLE CLAUSE GRAPH
-
8/2/2019 Graph Sampling
8/16
Results - Regression
Representation Comparison
MCMC
RANDOM
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
20 40 100 200 400 1000
Number Of Features
Co
rrelation
UNSAT VARIABLE CLAUSE
UNSAT VARIAVLE GRAPH
SAT VARIABLE GRAPH
0
0.1
0.2
0.3
0.4
0.5
0.6
20 40 100 200 400 1000
Number Of Features
Correlation
UNSAT VARIABLE CLAUSE
UNSAT VARIAVLE GRAPH
SAT VARIABLE GRAPH
-
8/2/2019 Graph Sampling
9/16
Results - Regression
Graphlet size affection
UNSAT VARIABLE CLAUSE GRAPH
SAT VARIABLE GRAPH
0
0.1
0.2
0.3
0.4
0.5
0.6
20 40 100 200 400 1000
Number Of Features
Correlation
4 VERTICES
4+5 VERTICES
4+5+6 VERTICES
4+5+6+7 VERTICES
4+5+6+7+8 VERTICES
4+5+6+7+8+9 VERTICES
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
20 40 100 200 400 1000
Number Of Features
Correlation
4 VERTICES
4+5 VERTICES
4+5+6 VERTICES
4+5+6+7 VERTICES
4+5+6+7+8 VERTICES
4+5+6+7+8+9 VERTICES
-
8/2/2019 Graph Sampling
10/16
Results - Classification
MCMC Vs. Random
CLAUSE GRAPH
VARIABLE GRAPH
-0.02
-0.01
0
0.01
0.02
0.03
0.04
0.05
20 40 100 200 400 1000
Number Of Features
Correlation
MCMC
RANDOM
-0.04
-0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
20 40 100 200 400 1000
Number Of Features
Correlation
MCMC
RANDOM
-
8/2/2019 Graph Sampling
11/16
Results - Classification
Representation Comparison
MCMC
RANDOM
-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
0.05
0.06
20 40 100 200 400 1000
Number Of Features
Correlation
CLAUSE GRAPH
VARIABLE CLAUSE
VARIABLE GRAPH
-0.06
-0.04
-0.020
0.02
0.04
0.06
0.08
0.1
0.12
20 40 100 200 400 1000
Number Of Features
Correlation
CLAUSE GRAPH
VARIAVLE CLAUSE
VARIABLE GRAPH
-
8/2/2019 Graph Sampling
12/16
Results - Classification
Graphlet size affection
VARIABLE CLAUSE GRAPH
VARIABLE GRAPH
-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
20 40 100 200 400 1000
Number Of Features
Correlation
4 VERTICES
4+5 VERTICES
4+5+6 VERTICES
4+5+6+7 VERTICES
4+5+6+7+8 VERTICES
4+5+6+7+8+9 VERTICES
-0.04
-0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
20 40 100 200 400 1000
Number Of Features
Correlation
4 VERTICES
4+5 VERTICES
4+5+6 VERTICES
4+5+6+7 VERTICES
4+5+6+7+8 VERTICES
4+5+6+7+8+9 VERTICES
-
8/2/2019 Graph Sampling
13/16
Summary
Regression problem - We found high and solidcorrelation (reaching to 0.5)
Classification problem - We found good correlation(reaching to 0.1)
Representation methods - Both Variable-Clauseand Variable-Graph gave good results. Variable-Clause used least space
Sampling methods - Random method gave better
results for the SAT formulasGraphlet size - In most cases, increasing the
graphlet size caused an higher correlation
-
8/2/2019 Graph Sampling
14/16
Thank you
Questions?
-
8/2/2019 Graph Sampling
15/16
Nauty
Isomorphism testing program
We use it to create a canonicalrepresentation for each sample
Very fast and efficient
Written in C
http://cs.anu.edu.au/people/bdm/nauty/
-
8/2/2019 Graph Sampling
16/16
SVMLight
Implementation for Pattern Recognition
Very easy to use
Written in C
http://svmlight.joachims.org/