testing forest-isomorphismin the adjacency list model

Testing Forest-Isomorphism in the Adjacency List Model

Mitsuru Kusumoto†, Yuichi Yoshida†*

† : Preferred Infrastructure, Inc.

* : National Institute of Informatics.

1

Overview

Given two forests G and H, determine if G ≅ H or G and H are far from being so by looking at very small parts of G and H.

Outline

Introduction

Property testing

Problem setting

Our algorithms

≅ ?

2 / 21

Introduction

3

Property Testing

We want to solve decision problem as efficiently as possible!!

Example : Graph connectivity

Standard setting : BFS is enough. → Θ(n) time.

Property testing : Check if G is connected or G is far from being connected. → O(1) time!?

Connected Not connected

4 / 21

Property Testing

Property testing algorithm is a (randomized) algorithm that checks if input satisfies property P or is far from P with high probability (e.g., ≥ 2/3) with sublinear query or time complexity.

Main Interest

What kinds of properties are testable efficiently?

Connected Not connected

We want to distinguish them

Far from being connected

Close to being connected

5 / 21

Graph Property Testing - Review

The efficiency of property testing algorithms depends on the input models.

Adjacency matrix model

[01010]

[10110]

G = [01001]

[11001]

[00110]

Adjacency list model

v

A

B

C

1

2

3

O(v, 1) = A

O(v, 2) = B

O(v, 3) = C

• Input model for dense graphs. [GGR’98] • Many properties are testable.

(e.g., connectivity, △-freeness, ... .) • Necessity & sufficiency for constant-

time testability are known. [Alon+’09]

• Input model for sparse graphs. [GR’02] [KKR’04]

• Many properties are testable. (e.g., connectivity, H-minor-freeness.)

• But many results assume bounded-degree condition: degrees of vertices must be bounded by some constant.

6 / 21

Graph Property Testing - Review

Only a few efficient algorithms.

Many hardness results: △-freeness, k-colorability, etc., requires Ω(√n) queries. [A+08, B+08, K+04]

Question : Is it possible to obtain efficient algorithms for fundamental problems without bounded-degree condition?

Adjacency list model

v

A

B

C

1

2

3

O(v, 1) = A

O(v, 2) = B

O(v, 3) = C

• Input model for sparse graphs. [GR’02] [KKR’04]

• Many properties are testable. (e.g., connectivity, H-minor-freeness.)

• But many results assume bounded-degree condition: degrees of vertices must be bounded by some constant.

What happens if we do not assume the bounded-degree condition?

7 / 21

Forest-Isomorphism

We focus on forest-isomorphism in adjacency list model.

Input : Two forests G and H represented by adjacency lists and proximity parameter ε > 0.

Query Model : We can access to G and H via following queries:

deg(v): returns the degree of vertex v.

adj(v, i): returns a vertex adjacent to v by i-th edge.

random(): returns a randomly chosen vertex.

≅ ?

8 / 21

Forest-Isomorphism


Input : Two forests G and H represented by adjacency lists and proximity parameter ε > 0.

ε-Farness : d(G, H) := # of edge-(additions / deletions) to transform G to H. (Graph edit distance) For ε>0, (G, H) are ε-far from being isomorphic ⇔ d(G, H) ≥ εn.

Objective: Determine G≅H or d(G, H) ≥ εn.

≅ ?

9 / 21

Forest-Isomorphism


Motivation

Problem is fundamental: Forest is simple structure and isomorphism is a theoretically important problem.

Isomorphism was sometimes considered in property testing literature. [AS’05, AS’08, NS’11]

≅ ?

10 / 21

Forest-Isomorphism


Related Work

If there is no restriction on input, graph isomorphism testing in the adjacency list model requires Ω(√n) queries. [FM’08]

Good motivation for our focus on forests.

If input is a bounded-degree hyperfinite graph, then graph isomorphism is constant-time testable. [NS’11]

But if there is no degree bound, testability was unknown.

≅ ?

11 / 21

Our Contribution

Furthermore, we obtained more general result:

If the input is a forest, every graph property is testable in poly(log n) queries in the adjacency list model.

We use a similar technique with [Newman and Sohler’11].

Query complexity

Upper bound poly(log n)

Lower bound Ω(√log n)

12 / 21

Overview of Our Algorithm

13

Overview of Our Method 1. Partitioning oracle: We define a procedure that removes small fractions of edges to partition the graph into several parts with “good” properties.

G

The Partitioning Oracle

H

2. We check if each corresponding part in G and H is isomorphic or far from so.

If G, H are far from being isomorphic, there is at least one corresponding part in G, H that is also far from being isomorphic.

14 / 21

Partitioning Oracle

Partitioning Oracle: Given ε>0 and access to G, there exists integer s=s(ε) and subgraph G’⊆ G s.t.,

|E(G) – E(G’)| ≤ εn / 3

Each connected component of G’ is either s-bounded-degree-tree or s-rooted-tree.

s-rooted tree: A tree where there exists v ∈ V(T) s.t. deg(v) ≥ s and (size of each sub-tree) < s. (We call the vertex v a root.)

s-bounded-degree-tree: A tree where (degree of each vertex) < s.

v

15 / 21

Partitioning Oracle


|E(G) – E(G’)| ≤ εn / 3


We can provide query access to G’.

Alive Edge Query: Check if edge (v, i) still exists in G’.

The subgraph G’ is chosen deterministically.

If G ≅ H, then G’ ≅ H’.

v

A

B

C

1

2

3

(v, 1) : not alive

(v, 2) : not alive

(v, 3) : alive 16 / 21

Partitioning Oracle


|E(G) – E(G’)| ≤ εn / 3


So…

If d(G, H) = 0 ⇒ d(G’, H’) = 0

G’ and H’ are chosen deterministically.

If d(G, H) ≥ εn ⇒ d(G’, H’) ≥ εn / 3

We remove at most εn / 3 edges from G and H.

Thus, it is enough to consider the partitioned graphs G’ and H’.

17 / 21

Graph Partition

Suppose that G is obtained through the partitioning oracle.

We split G into the following parts for some constants α,γ>1.

G[0] := s-bounded degree trees in G

G[1] := s-rooted trees in G with root degrees in [s, αγ)

G[2] := s-rooted trees in G with root degrees in [αγ, αγ2)

G[3] := s-rooted trees in G with root degrees in [αγ2, αγ3)

...

O(log n) parts

G[0] G[1] G[2] ......

18 / 21

Isomorphism between Each Partitions

Graph partition is useful in the following sense.

Lemma. d(G, H) ≤ Σi d(G[i], H[i]).

Proof. Transformation from G[i] to H[i] for each i would transform G to H. □

Corollary. If d(G, H) ≥ εn, then for βi > 0 with Σ βi = ε, ∃i s.t. d(G[i], H[i]) ≥ βin. □

Thus, it suffices to check the isomorphism between G[i] and H[i] for each i=0,1,2,….

We set β0=ε/2, β1=β2=…=O(ε / log n).

19 / 21

Isomorphism between Each Partitions

Testing G[i]≅H[i]

For i=0 : We can use a tester for the bounded-degree model [NS’11].

For i≥1 : We develop a new algorithm.

Sketch : We randomly sample root vertices.

For each root vertex, we randomly sample its subtrees and create a histogram of subtrees.

After this, we compute the minimum matching between the histograms in G and H.

This minimum matching turns out to be a good approximation to d(G, H).

:2

:2

:1

… 20 / 21

Conclusion

If the input is a forest, every graph property is testable in poly(log n) queries.

Future Work?

Can we obtain similar results for larger graph class than forests?

Outerplanar graphs, Bounded-tree width graphs, Scale-free graphs, …

Query complexity

Upper bound poly(log n)

Lower bound Ω(√log n)

Actually O(log^2^poly(1/ε)(n))

21 / 21

Appendix : Lower bound

22

Lower bound - Overview

1. We construct two distributions of input, D1, D2.

∀(G, H) ∈ D1, G ≅ H

∀(G, H) ∈ D2, d(G, H) ≥ n/8

2. We reduce the isomorphism testing to checking if two probabilistic distributions are the same or not. This requires Ω(√N) queries.

≅ ?

≅ ?

23 / 21

Lower bound

Let Fk := (n / (2klogn)) copies of a star graph with 2k vertices

(Remark that |Fi| = n / logn)

F3

F2

F1

F0

… Flogn 24 / 21

Lower bound

Construct two distributions D1, D2 :

D1 : G=H

D2 : randomly assign Fk to either G or H so that |V(G)| = |V(H)|.

G = F0 ∪ F1 ∪ … Flogn

H = F0 ∪ F1 ∪ … Flogn

G = ................................

H = ...............................

F0 F1 … Flogn

25 / 21

Lower bound

Because we can perform only “random-sampling” and (degree/neighbor)-query, checking if G ≅ H is equivalent to checking two probabilistic distributions are the same.

Lemma. We need Ω(√logn) queries to distinguish D1 and D2.

proba. to observe by random-sampling

F0 F1 F2 Flogn

G

H

G=H

26 / 21

Lower bound

Lemma. ∀(G, H) ∈ D2, d(G, H) ≥ n/8

Proof.

Let Φ:V(G)→V(H) be a bijection achieves minimum graph edit distance. It holds that

d(G, H) ≥ Σv∈V(G) |deg(v) – deg(Φ(v))| / 2.

If we restrict v in the sum to the root of stars, we obtain d(G, H) ≥ Σk=2,3,4,... (n / (2k logn)) ∙ 2k-1/2 ≥ n/8. □

Thus, Ω(logn) lower bound holds. Φ

27 / 21

testing forest-isomorphismin the adjacency list model

Science

subgraph g g

forests g

connected component

testing forestisomorphism

graph isomorphism testing

adjacency list model

query model

c input model