graph sampling

Upload: ori-raz

Post on 06-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Graph Sampling

    1/16

    GRAPH SAMPLING

    Ori Raz

    November 2009

  • 8/2/2019 Graph Sampling

    2/16

    Abstract

    Isomorphism between graphs is a knownproblem

    One of the latest ideas in the area is graph

    sampling, or features selectionWe will try to find correlation between the

    graphlet distribution and the

    regression/classification tasks

    SAT Problem DATASET 3000 formulas

  • 8/2/2019 Graph Sampling

    3/16

    Graph representations

    3 ways to represent a graph- Clause Graph

    Each vertex represents a clause and an edge between two

    clauses when they share a negated literal- Variable Clause Graph

    Each vertex represents clause or variable (vertices =

    clauses + variables). An edge when a variable in a clause.

    Graph is bipartite graph.- Variable Graph

    Each vertex represents a variable and an edge when two variables

    occur in the same clause.

  • 8/2/2019 Graph Sampling

    4/16

    Graph Sampling

    Graph Random walk

    2 methods: Random and MCMC

    - Random: For each sample we will chooseall vertices randomly

    - MCMC: Replacing a random vertex with a

    new vertex

    Total of 630K samples for each graph

  • 8/2/2019 Graph Sampling

    5/16

    Graph representations - example

    F= (x1 or x2 or x3) and (x1 or x2 or x5) and (x3 or x4 or x5) and (x2 or x3 or x5)

  • 8/2/2019 Graph Sampling

    6/16

    Algorithm

    - For each formulacreate representations

    - For each created fileget samples

    - For each sampling fileexecute

    - Summarize all results in svm format- For each svm file

    create learn and target files- For each learn file

    generate learn-strong-features svm files- For each learn-strong-features file

    use to learn the fileuse to classify

    - For each result filefind correlation with original result

    Nauty

    svmLightsvmLight

  • 8/2/2019 Graph Sampling

    7/16

    Results - Regression

    MCMCVs. Random

    0.46

    0.465

    0.47

    0.475

    0.48

    0.485

    0.49

    0.495

    0.5

    0.505

    20 40 100 200 400 1000

    Number Of Features

    Correlation

    MCMC

    RANDOM

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    0.45

    0.5

    20 40 100 200 400 1000

    Number Of Features

    Correlation

    MCMC

    RANDOM

    -0.1

    -0.05

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    20 40 100 200 400 1000

    Number Of Features

    Correlation

    MCMC

    RANDOM

    UNSAT VARIABLE CLAUSE GRAPH

    UNSAT VARIABLE GRAPH

    SAT VARIABLE CLAUSE GRAPH

  • 8/2/2019 Graph Sampling

    8/16

    Results - Regression

    Representation Comparison

    MCMC

    RANDOM

    -0.1

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    20 40 100 200 400 1000

    Number Of Features

    Co

    rrelation

    UNSAT VARIABLE CLAUSE

    UNSAT VARIAVLE GRAPH

    SAT VARIABLE GRAPH

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    20 40 100 200 400 1000

    Number Of Features

    Correlation

    UNSAT VARIABLE CLAUSE

    UNSAT VARIAVLE GRAPH

    SAT VARIABLE GRAPH

  • 8/2/2019 Graph Sampling

    9/16

    Results - Regression

    Graphlet size affection

    UNSAT VARIABLE CLAUSE GRAPH

    SAT VARIABLE GRAPH

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    20 40 100 200 400 1000

    Number Of Features

    Correlation

    4 VERTICES

    4+5 VERTICES

    4+5+6 VERTICES

    4+5+6+7 VERTICES

    4+5+6+7+8 VERTICES

    4+5+6+7+8+9 VERTICES

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    20 40 100 200 400 1000

    Number Of Features

    Correlation

    4 VERTICES

    4+5 VERTICES

    4+5+6 VERTICES

    4+5+6+7 VERTICES

    4+5+6+7+8 VERTICES

    4+5+6+7+8+9 VERTICES

  • 8/2/2019 Graph Sampling

    10/16

    Results - Classification

    MCMC Vs. Random

    CLAUSE GRAPH

    VARIABLE GRAPH

    -0.02

    -0.01

    0

    0.01

    0.02

    0.03

    0.04

    0.05

    20 40 100 200 400 1000

    Number Of Features

    Correlation

    MCMC

    RANDOM

    -0.04

    -0.02

    0

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    20 40 100 200 400 1000

    Number Of Features

    Correlation

    MCMC

    RANDOM

  • 8/2/2019 Graph Sampling

    11/16

    Results - Classification

    Representation Comparison

    MCMC

    RANDOM

    -0.04

    -0.03

    -0.02

    -0.01

    0

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    20 40 100 200 400 1000

    Number Of Features

    Correlation

    CLAUSE GRAPH

    VARIABLE CLAUSE

    VARIABLE GRAPH

    -0.06

    -0.04

    -0.020

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    20 40 100 200 400 1000

    Number Of Features

    Correlation

    CLAUSE GRAPH

    VARIAVLE CLAUSE

    VARIABLE GRAPH

  • 8/2/2019 Graph Sampling

    12/16

    Results - Classification

    Graphlet size affection

    VARIABLE CLAUSE GRAPH

    VARIABLE GRAPH

    -0.04

    -0.03

    -0.02

    -0.01

    0

    0.01

    0.02

    0.03

    0.04

    20 40 100 200 400 1000

    Number Of Features

    Correlation

    4 VERTICES

    4+5 VERTICES

    4+5+6 VERTICES

    4+5+6+7 VERTICES

    4+5+6+7+8 VERTICES

    4+5+6+7+8+9 VERTICES

    -0.04

    -0.02

    0

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    20 40 100 200 400 1000

    Number Of Features

    Correlation

    4 VERTICES

    4+5 VERTICES

    4+5+6 VERTICES

    4+5+6+7 VERTICES

    4+5+6+7+8 VERTICES

    4+5+6+7+8+9 VERTICES

  • 8/2/2019 Graph Sampling

    13/16

    Summary

    Regression problem - We found high and solidcorrelation (reaching to 0.5)

    Classification problem - We found good correlation(reaching to 0.1)

    Representation methods - Both Variable-Clauseand Variable-Graph gave good results. Variable-Clause used least space

    Sampling methods - Random method gave better

    results for the SAT formulasGraphlet size - In most cases, increasing the

    graphlet size caused an higher correlation

  • 8/2/2019 Graph Sampling

    14/16

    Thank you

    Questions?

  • 8/2/2019 Graph Sampling

    15/16

    Nauty

    Isomorphism testing program

    We use it to create a canonicalrepresentation for each sample

    Very fast and efficient

    Written in C

    http://cs.anu.edu.au/people/bdm/nauty/

  • 8/2/2019 Graph Sampling

    16/16

    SVMLight

    Implementation for Pattern Recognition

    Very easy to use

    Written in C

    http://svmlight.joachims.org/