testing forest-isomorphismin the adjacency list model
TRANSCRIPT
Testing Forest-Isomorphism in the Adjacency List Model
Mitsuru Kusumoto†, Yuichi Yoshida†*
† : Preferred Infrastructure, Inc.
* : National Institute of Informatics.
1
Overview
Given two forests G and H, determine if G ≅ H or G and H are far from being so by looking at very small parts of G and H.
Outline
Introduction
Property testing
Problem setting
Our algorithms
≅ ?
2 / 21
Introduction
3
Property Testing
We want to solve decision problem as efficiently as possible!!
Example : Graph connectivity
Standard setting : BFS is enough. → Θ(n) time.
Property testing : Check if G is connected or G is far from being connected. → O(1) time!?
Connected Not connected
4 / 21
Property Testing
Property testing algorithm is a (randomized) algorithm that checks if input satisfies property P or is far from P with high probability (e.g., ≥ 2/3) with sublinear query or time complexity.
Main Interest
What kinds of properties are testable efficiently?
Connected Not connected
We want to distinguish them
Far from being connected
Close to being connected
5 / 21
Graph Property Testing - Review
The efficiency of property testing algorithms depends on the input models.
Adjacency matrix model
[01010]
[10110]
G = [01001]
[11001]
[00110]
Adjacency list model
v
A
B
C
1
2
3
O(v, 1) = A
O(v, 2) = B
O(v, 3) = C
• Input model for dense graphs. [GGR’98] • Many properties are testable.
(e.g., connectivity, △-freeness, ... .) • Necessity & sufficiency for constant-
time testability are known. [Alon+’09]
• Input model for sparse graphs. [GR’02] [KKR’04]
• Many properties are testable. (e.g., connectivity, H-minor-freeness.)
• But many results assume bounded-degree condition: degrees of vertices must be bounded by some constant.
6 / 21
Graph Property Testing - Review
Only a few efficient algorithms.
Many hardness results: △-freeness, k-colorability, etc., requires Ω(√n) queries. [A+08, B+08, K+04]
Question : Is it possible to obtain efficient algorithms for fundamental problems without bounded-degree condition?
Adjacency list model
v
A
B
C
1
2
3
O(v, 1) = A
O(v, 2) = B
O(v, 3) = C
• Input model for sparse graphs. [GR’02] [KKR’04]
• Many properties are testable. (e.g., connectivity, H-minor-freeness.)
• But many results assume bounded-degree condition: degrees of vertices must be bounded by some constant.
What happens if we do not assume the bounded-degree condition?
7 / 21
Forest-Isomorphism
We focus on forest-isomorphism in adjacency list model.
Input : Two forests G and H represented by adjacency lists and proximity parameter ε > 0.
Query Model : We can access to G and H via following queries:
deg(v): returns the degree of vertex v.
adj(v, i): returns a vertex adjacent to v by i-th edge.
random(): returns a randomly chosen vertex.
≅ ?
8 / 21
Forest-Isomorphism
We focus on forest-isomorphism in adjacency list model.
Input : Two forests G and H represented by adjacency lists and proximity parameter ε > 0.
ε-Farness : d(G, H) := # of edge-(additions / deletions) to transform G to H. (Graph edit distance) For ε>0, (G, H) are ε-far from being isomorphic ⇔ d(G, H) ≥ εn.
Objective: Determine G≅H or d(G, H) ≥ εn.
≅ ?
9 / 21
Forest-Isomorphism
We focus on forest-isomorphism in adjacency list model.
Motivation
Problem is fundamental: Forest is simple structure and isomorphism is a theoretically important problem.
Isomorphism was sometimes considered in property testing literature. [AS’05, AS’08, NS’11]
≅ ?
10 / 21
Forest-Isomorphism
We focus on forest-isomorphism in adjacency list model.
Related Work
If there is no restriction on input, graph isomorphism testing in the adjacency list model requires Ω(√n) queries. [FM’08]
Good motivation for our focus on forests.
If input is a bounded-degree hyperfinite graph, then graph isomorphism is constant-time testable. [NS’11]
But if there is no degree bound, testability was unknown.
≅ ?
11 / 21
Our Contribution
Furthermore, we obtained more general result:
If the input is a forest, every graph property is testable in poly(log n) queries in the adjacency list model.
We use a similar technique with [Newman and Sohler’11].
Query complexity
Upper bound poly(log n)
Lower bound Ω(√log n)
12 / 21
Overview of Our Algorithm
13
Overview of Our Method 1. Partitioning oracle: We define a procedure that removes small fractions of edges to partition the graph into several parts with “good” properties.
G
The Partitioning Oracle
H
2. We check if each corresponding part in G and H is isomorphic or far from so.
If G, H are far from being isomorphic, there is at least one corresponding part in G, H that is also far from being isomorphic.
14 / 21
Partitioning Oracle
Partitioning Oracle: Given ε>0 and access to G, there exists integer s=s(ε) and subgraph G’⊆ G s.t.,
|E(G) – E(G’)| ≤ εn / 3
Each connected component of G’ is either s-bounded-degree-tree or s-rooted-tree.
s-rooted tree: A tree where there exists v ∈ V(T) s.t. deg(v) ≥ s and (size of each sub-tree) < s. (We call the vertex v a root.)
s-bounded-degree-tree: A tree where (degree of each vertex) < s.
v
15 / 21
Partitioning Oracle
Partitioning Oracle: Given ε>0 and access to G, there exists integer s=s(ε) and subgraph G’⊆ G s.t.,
|E(G) – E(G’)| ≤ εn / 3
Each connected component of G’ is either s-bounded-degree-tree or s-rooted-tree.
We can provide query access to G’.
Alive Edge Query: Check if edge (v, i) still exists in G’.
The subgraph G’ is chosen deterministically.
If G ≅ H, then G’ ≅ H’.
v
A
B
C
1
2
3
(v, 1) : not alive
(v, 2) : not alive
(v, 3) : alive 16 / 21
Partitioning Oracle
Partitioning Oracle: Given ε>0 and access to G, there exists integer s=s(ε) and subgraph G’⊆ G s.t.,
|E(G) – E(G’)| ≤ εn / 3
Each connected component of G’ is either s-bounded-degree-tree or s-rooted-tree.
So…
If d(G, H) = 0 ⇒ d(G’, H’) = 0
G’ and H’ are chosen deterministically.
If d(G, H) ≥ εn ⇒ d(G’, H’) ≥ εn / 3
We remove at most εn / 3 edges from G and H.
Thus, it is enough to consider the partitioned graphs G’ and H’.
17 / 21
Graph Partition
Suppose that G is obtained through the partitioning oracle.
We split G into the following parts for some constants α,γ>1.
G[0] := s-bounded degree trees in G
G[1] := s-rooted trees in G with root degrees in [s, αγ)
G[2] := s-rooted trees in G with root degrees in [αγ, αγ2)
G[3] := s-rooted trees in G with root degrees in [αγ2, αγ3)
...
O(log n) parts
G[0] G[1] G[2] ......
18 / 21
Isomorphism between Each Partitions
Graph partition is useful in the following sense.
Lemma. d(G, H) ≤ Σi d(G[i], H[i]).
Proof. Transformation from G[i] to H[i] for each i would transform G to H. □
Corollary. If d(G, H) ≥ εn, then for βi > 0 with Σ βi = ε, ∃i s.t. d(G[i], H[i]) ≥ βin. □
Thus, it suffices to check the isomorphism between G[i] and H[i] for each i=0,1,2,….
We set β0=ε/2, β1=β2=…=O(ε / log n).
19 / 21
Isomorphism between Each Partitions
Testing G[i]≅H[i]
For i=0 : We can use a tester for the bounded-degree model [NS’11].
For i≥1 : We develop a new algorithm.
Sketch : We randomly sample root vertices.
For each root vertex, we randomly sample its subtrees and create a histogram of subtrees.
After this, we compute the minimum matching between the histograms in G and H.
This minimum matching turns out to be a good approximation to d(G, H).
:2
:2
:1
… 20 / 21
Conclusion
If the input is a forest, every graph property is testable in poly(log n) queries.
Future Work?
Can we obtain similar results for larger graph class than forests?
Outerplanar graphs, Bounded-tree width graphs, Scale-free graphs, …
Query complexity
Upper bound poly(log n)
Lower bound Ω(√log n)
Actually O(log^2^poly(1/ε)(n))
21 / 21
Appendix : Lower bound
22
Lower bound - Overview
1. We construct two distributions of input, D1, D2.
∀(G, H) ∈ D1, G ≅ H
∀(G, H) ∈ D2, d(G, H) ≥ n/8
2. We reduce the isomorphism testing to checking if two probabilistic distributions are the same or not. This requires Ω(√N) queries.
≅ ?
≅ ?
23 / 21
Lower bound
Let Fk := (n / (2klogn)) copies of a star graph with 2k vertices
(Remark that |Fi| = n / logn)
F3
F2
F1
F0
… Flogn 24 / 21
Lower bound
Construct two distributions D1, D2 :
D1 : G=H
D2 : randomly assign Fk to either G or H so that |V(G)| = |V(H)|.
G = F0 ∪ F1 ∪ … Flogn
H = F0 ∪ F1 ∪ … Flogn
G = ................................
H = ...............................
F0 F1 … Flogn
25 / 21
Lower bound
Because we can perform only “random-sampling” and (degree/neighbor)-query, checking if G ≅ H is equivalent to checking two probabilistic distributions are the same.
Lemma. We need Ω(√logn) queries to distinguish D1 and D2.
proba. to observe by random-sampling
F0 F1 F2 Flogn
G
H
G=H
26 / 21
Lower bound
Lemma. ∀(G, H) ∈ D2, d(G, H) ≥ n/8
Proof.
Let Φ:V(G)→V(H) be a bijection achieves minimum graph edit distance. It holds that
d(G, H) ≥ Σv∈V(G) |deg(v) – deg(Φ(v))| / 2.
If we restrict v in the sum to the root of stars, we obtain d(G, H) ≥ Σk=2,3,4,... (n / (2k logn)) ∙ 2k-1/2 ≥ n/8. □
Thus, Ω(logn) lower bound holds. Φ
27 / 21