two papers on gnn theory: how power are gnn and deeper ... · graph classi cation via gnn for graph...

30
Two papers on GNN theory: How Power are GNN and Deeper Insight of GCN Semisupervised Presenter: Ji Gao https://qdata.github.io/deep2Read June 3, 2019 Presenter: Ji Gao https://qdata.github.io/deep2Read Two papers on GNN theory: How Power are GNN and Deeper Insight of GCN Semisupervis June 3, 2019 1 / 30

Upload: others

Post on 03-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

Two papers on GNN theory: How Power are GNN andDeeper Insight of GCN Semisupervised

Presenter: Ji Gaohttps://qdata.github.io/deep2Read

June 3, 2019

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 1 / 30

Page 2: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

Outline

1 Review: Graph Neural Networks

2 How powerful are graph neural networks?Graph Neural NetworkGraph Isomorphism and Weisfeiler-Lehman TestGraph Isomorphic Network(GIN)Experiment result

3 Deeper Insights into Graph Convolutional Networks for semi-supervisedlearning

Issues of GCNAlgorithmExperiment result

4 Reference

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 2 / 30

Page 3: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

Graph Neural Network

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 3 / 30

Page 4: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

How powerful are graph neural networks?

How powerful are graph neural networks? Keyulu Xu, Weihua Hu,Jure Leskovec, Stefanie Jegelka, ICLR 2019 [XHLJ18]

GNN is weaker than Weisfeiler-Lehman graph isomorphism test

GNN cannot learn to distinguish certain types

Propose an architecture that is provably as powerful as WL test.

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 4 / 30

Page 5: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

GNN definition

GNN layerThe k-th layer of a GNN is:

a(k)v = AGGREGATE(k)

({h

(k−1)u : u ∈ N (v)

})h

(k)v = COMBINE(k)

(h

(k−1)v , a

(k)v

),

GraphSage - Maxpooling variant[HYL17]

a(k)v = MAX

({ReLU

(W · h(k−1)

u

), ∀u ∈ N (v)

})h

(k)v = W ·

[h

(k−1)v , a

(k)v

]

GCN[KW16]

h(k)v = ReLU

(W ·MEAN

{h

(k−1)u , ∀u ∈ N (v) ∪ {v}

})

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 5 / 30

Page 6: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

Graph classification via GNN

For graph classification, an additional step for GNN is to combine theembedding of all the nodes together:

hG = READOUT({

h(K)v

∣∣ v ∈ G}).

READOUT can be a simple permutation invariant function such assummation or a more sophisticated graph-level pooling function.

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 6 / 30

Page 7: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

Graph Isomorphism

Graph Isomorphism: Graph G1 = (V1,E1) and G2 = (V2,E2) areisomorphic graphs if a permutation p of V1 makes G1 equals G2:Any (u, v) ∈ E1 has (p(u), p(v)) ∈ E2

Verify graph isomorphism is NP, somewhere between P andNP-Complete.

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 7 / 30

Page 8: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

Graph Isomorphism Example

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 8 / 30

Page 9: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

Graph Isomorphism Example

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 9 / 30

Page 10: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

Weisfeiler-Lehman Test

Weisfeiler-Lehman Test is a subtree-based method to solvegraph-isomorphic problem.

Weisfeiler-Lehman Test

Initialize all nodes with some node features, or with same label 1.

Iteratively:1 Aggregates the labels of nodes and their neighborhoods.2 Hashes the aggregated labels into unique new labels.

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 10 / 30

Page 11: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

Weisfeiler-Lehman Test

From https://www.slideshare.net/pratikshukla11/graph-kernelpdf:

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 11 / 30

Page 12: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

WL Test and GNN

In the GraphSage paper, the authors claim that WL Test is onespecial case of GraphSage, only the hashing algorithm is replaced withtrainable aggregation function.

However, as shown in this paper, common aggregators such asmean-pooling/max-pooling is weaker than the hash function.

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 12 / 30

Page 13: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

WL Test is stronger than GNN

Lemma: Let G1 and G2 be any two non-isomorphic graphs. If a graphneural network A : G → RD maps G1 and G2 to differentembeddings, the Weisfeiler-Lehman graph isomorphism test alsodecides G1 and G2 are not isomorphic.

(Assume the hashing function is extremely powerful.)

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 13 / 30

Page 14: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

How to make GNN powerful?

TheoremLet A : G → Rd be a GNN. With a sufficient number of GNN layers, A maps any graphs G1 andG2 that the Weisfeiler-Lehman test of isomorphism decides as non-isomorphic, to differentembeddings if the following conditions hold:

a) A aggregates and updates node features iteratively with

h(k)v = φ

(h

(k−1)v , f

({h

(k−1)u : u ∈ N (v)

})),

where the functions f , which operates on multisets, and φ are injective (1-1 function).

b) A’s graph-level readout, which operates on the multiset of node features{h

(k)v

}, is

injective.

Conclusion: The problem is in the aggregation function (and thereadout function).

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 14 / 30

Page 15: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

How to make GNN powerful?

Idea is simple and straightforward, however,sum/max-pooling/mean-pooling are not injective on multiset.

The authors model the aggregator with a neural network:

Graph Isomorphism Network(GIN)

A GIN layer is defined as:

h(k)v = MLP(k)

((1 + ε(k)

)· h(k−1)

v +∑

u∈N (v)h

(k−1)u

)

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 15 / 30

Page 16: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

GIN Guarantees

TheoremAssume X is countable. There exists a function f : X → Rn so that h(X ) =

∑x∈X f (x) is

unique for each multiset X ⊂ X of bounded size. Moreover, any multiset function g can bedecomposed as g (X ) = φ

(∑x∈X f (x)

)for some function φ.

LemmaAssume X is countable. There exists a function f : X → Rn so that for infinitely many choicesof ε, including all irrational numbers, h(c,X ) = (1 + ε) · f (c) +

∑x∈X f (x) is unique for each

pair (c,X ), where c ∈ X and X ⊂ X is a multiset of bounded size. Moreover, any function gover such pairs can be decomposed as g (c,X ) = ϕ

((1 + ε) · f (c) +

∑x∈X f (x)

)for some

function ϕ.

By these two conditions, GIN is as generalize as WL test with a properMLP φ learned and a proper ε choose.

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 16 / 30

Page 17: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

Graph classification Readout of GIN

GIN use a readout that loads the input on every layers:

hG = CONCAT(READOUT

({h

(k)v |v ∈ G

}) ∣∣ k = 0, 1, . . . ,K).

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 17 / 30

Page 18: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

Study on the variants

1-layer perceptions are not sufficient.

Mean and Max-pooling are not good enough (With counter-examplesabove).

Mean pooling is better for tasks with diverse and rarely repeatfeatures, and max pooling is better for catching a general relationship.

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 18 / 30

Page 19: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

Experiment

Use 9 datasets

Methods in comparison:

GIN-0: ε is always 0GIN-ε: ε is also updated by gradient in the trainingSVMDCNNDGCNNAWLGraphSage-MeanGCN

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 19 / 30

Page 20: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

Experiment result

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 20 / 30

Page 21: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

Deeper Insights into Graph Convolutional Networks forsemi-supervised learning

Deeper Insights into Graph Convolutional Networksforsemi-supervised learning Qimai Li, Zhichao Han, Xiaoming Wu, AAAI2018 [LHW18]

GCN performance doesn’t increase with number of layers, whichcontradicts its design logic

To overcome this problem(Stay with shallow architecture), useco-training and self-training to improve its performance.

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 21 / 30

Page 22: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

GCN formula

GCN

A GCN layer is defined as:

H(k+1) = σ(D−12 AD−

12H(k)W (k))

GCN adds a self-loop to every nodes in the graph.

The hidden vector of a node is updated by a weighted average ofitself and its neighbors.

Laplacian smoothing

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 22 / 30

Page 23: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

When GCN fails

Generate a 2 dimension vector on every vertex on a small dataset

Repeatedly apply Laplacian smoothing to the graph will cause it mixtogether again

Theorem: The result of Laplacian smoothing will converge to D−121θ

if the graph has no bipartite components.

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 23 / 30

Page 24: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

Issues of GCN

GCN used in [KW16] has only 2 layers. However, GCN suppose to bea localize filter, which means it doesn’t propagate the information tothe whole graph.

Example: If you don’t have a near neighbor with label, GCN won’twork.

Authors also claim that ”GCN relies on additional large validation setto select the model, as hyperparameter is crucial.”

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 24 / 30

Page 25: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

Co-train a GCN with Random Walk Model

Idea: Random Walk can explore global graph structure, which iscomplementary to the GCN result.

In particular, use random walk model to provide extra labels for GCNto make prediction.

Calculate the random walk absorbing probabilities matrix

Algorithm:

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 25 / 30

Page 26: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

GCN self-training

Train it again

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 26 / 30

Page 27: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

Experiment

Data: Cora, Citeseer and Pubmed

Methods in comparison:Label propagation via random walk(LP)ChebyNetGCN: With or without validation set.Co-trainingSelf-trainingCo-training set union/intersect Self-training set

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 27 / 30

Page 28: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

Experiment Result

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 28 / 30

Page 29: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

Experiment result II

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 29 / 30

Page 30: Two papers on GNN theory: How Power are GNN and Deeper ... · Graph classi cation via GNN For graph classi cation, an additional step for GNN is to combine the embedding of all the

Reference I

Will Hamilton, Zhitao Ying, and Jure Leskovec, Inductive representation learning on largegraphs, Advances in Neural Information Processing Systems, 2017, pp. 1024–1034.

Thomas N Kipf and Max Welling, Semi-supervised classification with graph convolutionalnetworks, arXiv preprint arXiv:1609.02907 (2016).

Qimai Li, Zhichao Han, and Xiao-Ming Wu, Deeper insights into graph convolutionalnetworks for semi-supervised learning, Thirty-Second AAAI Conference on ArtificialIntelligence, 2018.

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka, How powerful are graphneural networks?, arXiv preprint arXiv:1810.00826 (2018).

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 30 / 30