cs249: graph neural networks

97
CS249: GRAPH NEURAL NETWORKS Instructor: Yizhou Sun [email protected] January 14, 2021 Graph Basics

Upload: others

Post on 17-Oct-2021

27 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS249: GRAPH NEURAL NETWORKS

CS249: GRAPH NEURAL NETWORKS

Instructor: Yizhou [email protected]

January 14, 2021

Graph Basics

Page 2: CS249: GRAPH NEURAL NETWORKS

Contentโ€ข Introduction to Graphs

โ€ขSpectral analysis

โ€ขShallow Embedding

โ€ขGraph Neural Networks

2

Page 3: CS249: GRAPH NEURAL NETWORKS

3

Graph, Graph, Everywhere

Aspirin Yeast protein interaction network

from

H. J

eong

et a

l Nat

ure

411,

41

(200

1)

Internet Co-author network

Page 4: CS249: GRAPH NEURAL NETWORKS

More โ€ฆ

4

Page 5: CS249: GRAPH NEURAL NETWORKS

5

Why Graph Mining?โ€ข Graphs are ubiquitous

โ€ข Chemical compounds (Cheminformatics)

โ€ข Protein structures, biological pathways/networks (Bioinformactics)

โ€ข Program control flow, traffic flow, and workflow analysis

โ€ข XML databases, Web, and social network analysis

โ€ข Graph is a general modelโ€ข Trees, lattices, sequences, and items are degenerated graphs

โ€ข Diversity of graphsโ€ข Directed vs. undirected, labeled vs. unlabeled (edges & vertices), weighted vs.

unweighted, homogeneous vs. heterogeneous

โ€ข Complexity of algorithms: many problems are of high complexity

Page 6: CS249: GRAPH NEURAL NETWORKS

Representation of a Graphโ€ข ๐บ๐บ =< ๐‘‰๐‘‰,๐ธ๐ธ >

โ€ข ๐‘‰๐‘‰ = {๐‘ข๐‘ข1, โ€ฆ ,๐‘ข๐‘ข๐‘›๐‘›}: node setโ€ข ๐ธ๐ธ โŠ† ๐‘‰๐‘‰ ร— ๐‘‰๐‘‰: edge set

โ€ข Adjacency matrixโ€ข ๐ด๐ด = ๐‘Ž๐‘Ž๐‘–๐‘–๐‘–๐‘– , ๐‘–๐‘–, ๐‘—๐‘— = 1, โ€ฆ ,๐‘๐‘

โ€ข ๐‘Ž๐‘Ž๐‘–๐‘–๐‘–๐‘– = 1, ๐‘–๐‘–๐‘–๐‘– < ๐‘ข๐‘ข๐‘–๐‘– ,๐‘ข๐‘ข๐‘–๐‘– >โˆˆ ๐ธ๐ธโ€ข ๐‘Ž๐‘Ž๐‘–๐‘–๐‘–๐‘– = 0, ๐‘–๐‘–๐‘–๐‘– < ๐‘ข๐‘ข๐‘–๐‘– ,๐‘ข๐‘ข๐‘–๐‘– >โˆ‰ ๐ธ๐ธ

โ€ข Undirected graph vs. Directed graphโ€ข ๐ด๐ด = ๐ด๐ดT ๐‘ฃ๐‘ฃ๐‘ฃ๐‘ฃ.๐ด๐ด โ‰  ๐ด๐ดT

โ€ข Weighted graphโ€ข Use W instead of A, where ๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘– represents the weight of edge

< ๐‘ข๐‘ข๐‘–๐‘– ,๐‘ข๐‘ข๐‘–๐‘– >

6

Page 7: CS249: GRAPH NEURAL NETWORKS

Example

7

Yahoo

Mโ€™softAmazon

y 1 1 0a 1 0 1m 0 1 0

y a m

Adjacency matrix A

Page 8: CS249: GRAPH NEURAL NETWORKS

Typical Graph Tasksโ€ข Node level

โ€ข Link prediction

โ€ข Node classification

โ€ข Similarity search

โ€ข Community detection

โ€ข Ranking

8

โ€ข Graph levelโ€ข Graph Prediction

โ€ข Graph similarity search

โ€ข Frequent pattern mining

โ€ข MCS detection

โ€ข Clustering

โ€ข Node level, Graph Property โ€ข Betweenness score predictionโ€ข Travelling salesman problem โ€ข Network attack problemโ€ข Set cover problemโ€ข Maximum clique detection

Page 9: CS249: GRAPH NEURAL NETWORKS

Graph Techniquesโ€ขApproaches before GNNs

โ€ข Heuristics

โ€ข Graph signal processing

โ€ข Graph Kernels

โ€ข Graphical models

9

Page 10: CS249: GRAPH NEURAL NETWORKS

Contentโ€ข Introduction to Graphs

โ€ขSpectral analysis

โ€ขShallow Embedding

โ€ขGraph Neural Networks

10

Page 11: CS249: GRAPH NEURAL NETWORKS

Spectral Analysisโ€ข Graph Laplacians are keys to understand graphsโ€ข Unnormalized graph Laplaciansโ€ข Normalized graph Laplacians

โ€ข Spectral clusteringโ€ข Leverage eigenvectors of graph Laplacians to conduct clustering on graphs

โ€ข Has close relationship with graph cutsโ€ข Label propagation

โ€ข Semi-supervised node classification on graphsโ€ข Also related to graph Laplacians

11

Page 12: CS249: GRAPH NEURAL NETWORKS

What are Graph Laplacian Matrices?โ€ขThey are matrices defined as functions of graph adjacency or weight matrix

โ€ขA tool to study graphs

โ€ขThere is a field called spectral graph theory studying those matrices

12

Page 13: CS249: GRAPH NEURAL NETWORKS

Examples of Graph Laplaciansโ€ข Given an undirected and weighted graph ๐บ๐บ = (๐‘‰๐‘‰,๐ธ๐ธ),

with weight matrix ๐‘Š๐‘Šโ€ข ๐‘Š๐‘Š๐‘–๐‘–๐‘–๐‘– = ๐‘Š๐‘Š๐‘–๐‘–๐‘–๐‘– โ‰ฅ 0โ€ข n: total number of nodesโ€ข Degree for node ๐‘ฃ๐‘ฃ๐‘–๐‘– โˆˆ ๐‘‰๐‘‰: ๐‘‘๐‘‘๐‘–๐‘– = โˆ‘๐‘–๐‘– ๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘–โ€ข Degree matrix ๐ท๐ท: a diagonal matrix with degrees on the

diagonal, i.e., ๐ท๐ท๐‘–๐‘–๐‘–๐‘– = ๐‘‘๐‘‘๐‘–๐‘– and ๐ท๐ท๐‘–๐‘–๐‘–๐‘– = 0 ๐‘–๐‘–๐‘–๐‘– ๐‘–๐‘– โ‰  ๐‘—๐‘—โ€ข Three examples of graph Laplacians

โ€ข The unnormalized graph Laplacianโ€ข ๐ฟ๐ฟ = ๐ท๐ท โˆ’๐‘Š๐‘Š

โ€ข The normalized graph Laplaciansโ€ข Symmetric: ๐ฟ๐ฟ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘  = ๐ท๐ทโˆ’1/2๐ฟ๐ฟ๐ท๐ทโˆ’1/2

โ€ข Random walk related: ๐ฟ๐ฟ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ = ๐ท๐ทโˆ’1๐ฟ๐ฟ

13

Page 14: CS249: GRAPH NEURAL NETWORKS

The Unnormalized Graph Laplacianโ€ข Definition: ๐ฟ๐ฟ = ๐ท๐ท โˆ’๐‘Š๐‘Šโ€ข Properties of ๐ฟ๐ฟ

โ€ข For any vector ๐‘–๐‘– โˆˆ ๐‘…๐‘…๐‘›๐‘›

โ€ข ๐ฟ๐ฟ is symmetric and positive semi-definiteโ€ข The smallest eigenvalue of ๐ฟ๐ฟ is 0, and the

corresponding eigenvector is the constant one vector ๐Ÿ๐Ÿโ€ข ๐Ÿ๐Ÿ is an all-one vector with n dimensionsโ€ข An eigenvector can be scaled by multiplying a nonzero

scalar ๐›ผ๐›ผโ€ข ๐ฟ๐ฟ has n non-negative, real-valued eigenvalues:

14

Page 15: CS249: GRAPH NEURAL NETWORKS

Questionโ€ขWill self loops in graph change ๐ฟ๐ฟ?

15

Page 16: CS249: GRAPH NEURAL NETWORKS

Number of Connected Componentsโ€ขThe multiplicity k of the eigenvalue 0 of ๐ฟ๐ฟequals the number of connected componentsโ€ข Consider ๐‘˜๐‘˜ = 1, i.e., a connected graph, and ๐‘–๐‘–, the corresponding eigenvector for 0

โ€ข Every element of ๐‘–๐‘– has to be equal to each other

16

Page 17: CS249: GRAPH NEURAL NETWORKS

K connected componentsโ€ข The graph can be represented as a block diagonal matrix, and so do matrix ๐ฟ๐ฟ

โ€ข For each block ๐ฟ๐ฟ๐‘–๐‘–, it has eigenvalue 0 with corresponding constant one eigenvector

โ€ข For ๐ฟ๐ฟ, it has k eigenvalues with 0, with corresponding eigenvectors as indicator vectorsโ€ข ๐Ÿ๐Ÿ๐ด๐ด๐‘–๐‘– is an indicator vector, with ones for nodes in ๐ด๐ด๐‘–๐‘–,

i.e., the nodes in component i, and zeros elsewhere 17

Page 18: CS249: GRAPH NEURAL NETWORKS

The normalized graph Laplaciansโ€ขSymmetric:

โ€ข๐ฟ๐ฟ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘  = ๐ท๐ทโˆ’1/2๐ฟ๐ฟ๐ท๐ทโˆ’1/2 = ๐ผ๐ผ โˆ’ ๐ท๐ทโˆ’1/2๐‘Š๐‘Š๐ท๐ทโˆ’1/2

โ€ขRandom walk related: โ€ข๐ฟ๐ฟ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ = ๐ท๐ทโˆ’1๐ฟ๐ฟ = ๐ผ๐ผ โˆ’ ๐ท๐ทโˆ’1๐‘Š๐‘Š

18

Page 19: CS249: GRAPH NEURAL NETWORKS

Properties of ๐‘ณ๐‘ณ๐’”๐’”๐’”๐’”๐’”๐’” and ๐‘ณ๐‘ณ๐’“๐’“๐’“๐’“โ€ข For any vector ๐‘–๐‘– โˆˆ ๐‘…๐‘…๐‘›๐‘›

โ€ข ๐œ†๐œ† is an eigenvalue of ๐‘ณ๐‘ณ๐’“๐’“๐’“๐’“ with eigenvector ๐‘ฃ๐‘ฃ ๐œ†๐œ† is an eigenvalue of ๐‘ณ๐‘ณ๐’”๐’”๐’”๐’”๐’”๐’” with eigenvector ๐‘ค๐‘ค = ๐ท๐ท1/2๐‘ฃ๐‘ฃ

โ€ข ๐œ†๐œ† is an eigenvalue of ๐‘ณ๐‘ณ๐’“๐’“๐’“๐’“ with eigenvector ๐‘ฃ๐‘ฃ ๐œ†๐œ† and ๐‘ฃ๐‘ฃsolves the generalized eigen problem ๐ฟ๐ฟ๐‘ฃ๐‘ฃ = ๐œ†๐œ†๐ท๐ท๐‘ฃ๐‘ฃ

โ€ข 0 is an eigenvalue of ๐‘ณ๐‘ณ๐’“๐’“๐’“๐’“ with eigenvector ๐Ÿ๐Ÿ. 0 is an eigenvalue of ๐‘ณ๐‘ณ๐’”๐’”๐’”๐’”๐’”๐’” with eigenvector ๐ท๐ท1/2๐Ÿ๐Ÿ

โ€ข ๐‘ณ๐‘ณ๐’”๐’”๐’”๐’”๐’”๐’” and ๐‘ณ๐‘ณ๐’“๐’“๐’“๐’“ are positive semi-definite, and have n non-negative real-valued eigenvalues

19

Page 20: CS249: GRAPH NEURAL NETWORKS

Exampleโ€ข Graph built upon Gaussian mixture model with four components

20

Page 21: CS249: GRAPH NEURAL NETWORKS

Spectral Clustering

Page 22: CS249: GRAPH NEURAL NETWORKS

Clustering Graphs and Network Data

โ€ข Applicationsโ€ข Bi-partite graphs, e.g., customers and products, authors and

conferencesโ€ข Web search engines, e.g., click through graphs and Web

graphsโ€ข Social networks, friendship/coauthor graphs

22Clustering books about politics [Newman, 2006]

Page 23: CS249: GRAPH NEURAL NETWORKS

Example of Graph Clusteringโ€ขReference: ICDMโ€™09 Tutorial by Chris Dingโ€ขExample:

โ€ข Clustering supreme court justices according to their voting behavior

23

W =

Page 24: CS249: GRAPH NEURAL NETWORKS

Example: Continue

24

Page 25: CS249: GRAPH NEURAL NETWORKS

Spectral Clustering Algorithmsโ€ขGoal: cluster nodes in the graph into k clusters

โ€ข Idea: Leverage the first k eigenvectors of ๐ฟ๐ฟโ€ขMajor steps:

โ€ข Compute the first k eigenvectors, ๐‘ฃ๐‘ฃ1, ๐‘ฃ๐‘ฃ2, โ€ฆ , ๐‘ฃ๐‘ฃ๐‘˜๐‘˜, of ๐ฟ๐ฟ

โ€ข Each node i is then represented by k values ๐‘ฅ๐‘ฅ๐‘–๐‘– = (๐‘ฃ๐‘ฃ1๐‘–๐‘– , ๐‘ฃ๐‘ฃ2๐‘–๐‘– , โ€ฆ , ๐‘ฃ๐‘ฃ๐‘˜๐‘˜๐‘–๐‘–)

โ€ข Cluster ๐‘ฅ๐‘ฅ๐‘–๐‘–โ€™s using k-means 25

Page 26: CS249: GRAPH NEURAL NETWORKS

Variants of Spectral Clustering Algorithms

โ€ขNormalized spectral clustering according to Shi and Malik (2000)โ€ข Compute the first k eigenvectors, ๐‘ฃ๐‘ฃ1, ๐‘ฃ๐‘ฃ2, โ€ฆ , ๐‘ฃ๐‘ฃ๐‘˜๐‘˜, of ๐ฟ๐ฟ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ

โ€ขNormalized spectral clustering according to Ng, Jordan, and Weiss (2002)โ€ข Compute the first k eigenvectors, ๐‘ฃ๐‘ฃ1, ๐‘ฃ๐‘ฃ2, โ€ฆ , ๐‘ฃ๐‘ฃ๐‘˜๐‘˜, of ๐ฟ๐ฟ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ 

โ€ข Normalizing ๐‘ฅ๐‘ฅ๐‘–๐‘– to have norm 1

26

Page 27: CS249: GRAPH NEURAL NETWORKS

Connections to Graph Cutsโ€ขMin-Cut

โ€ข Minimize the # of cut of edges to partition a graph into 2 disconnected components

27

Page 28: CS249: GRAPH NEURAL NETWORKS

Objective Function of Min-Cut

28

Page 29: CS249: GRAPH NEURAL NETWORKS

Algorithmโ€ขStep 1:

โ€ข Calculate Graph Laplacian matrix: ๐ฟ๐ฟ = ๐ท๐ท โˆ’๐‘Š๐‘Šโ€ขStep 2:

โ€ข Calculate the second eigenvector qโ€ข Q: Why second?

โ€ขStep 3:โ€ข Bisect q (e.g., 0) to get two clusters

29

Page 30: CS249: GRAPH NEURAL NETWORKS

Minimum Cut with Constraints

30

Page 31: CS249: GRAPH NEURAL NETWORKS

Other Objective Functions

31

Page 32: CS249: GRAPH NEURAL NETWORKS

Label Propagation

Page 33: CS249: GRAPH NEURAL NETWORKS

Label Propagation in the Networkโ€ขGiven a network, some nodes are given labels, can we classify the unlabeled nodes by using link information?โ€ข E.g., Node 12 belongs to Class 1, Node 5 Belongs to Class 2

33

12

5

Page 34: CS249: GRAPH NEURAL NETWORKS

Problem Formalization for Label Propagation

โ€ข Given n nodesโ€ข l with labels (e.g., ๐‘Œ๐‘Œ1,๐‘Œ๐‘Œ2, โ€ฆ ,๐‘Œ๐‘Œ๐‘™๐‘™ ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐‘˜๐‘˜๐‘˜๐‘˜๐‘˜๐‘˜๐‘ค๐‘ค๐‘˜๐‘˜)โ€ข u without labels (e.g., ๐‘Œ๐‘Œ๐‘™๐‘™+1,๐‘Œ๐‘Œ๐‘™๐‘™+2, โ€ฆ ,๐‘Œ๐‘Œ๐‘›๐‘› are unknown)

โ€ข ๐‘Œ๐‘Œ ๐‘–๐‘–๐‘ฃ๐‘ฃ ๐‘ก๐‘ก๐‘ก๐‘Ž๐‘Ž ๐‘˜๐‘˜ ร— ๐พ๐พ ๐‘™๐‘™๐‘Ž๐‘Ž๐‘™๐‘™๐‘Ž๐‘Ž๐‘™๐‘™ ๐‘š๐‘š๐‘Ž๐‘Ž๐‘ก๐‘ก๐‘Ž๐‘Ž๐‘–๐‘–๐‘ฅ๐‘ฅโ€ข K is the number of labels (classes)โ€ข ๐‘Œ๐‘Œ๐‘–๐‘–๐‘˜๐‘˜ denotes the probability node i belonging to class k

โ€ข The weighted adjacency matrix is Wโ€ข The probabilistic transition matrix T

โ€ข ๐‘‡๐‘‡๐‘–๐‘–๐‘–๐‘– = ๐‘ƒ๐‘ƒ ๐‘—๐‘— โ†’ ๐‘–๐‘– = ๐‘Ÿ๐‘Ÿ๐‘–๐‘–๐‘–๐‘–โˆ‘๐‘–๐‘–โ€ฒ ๐‘Ÿ๐‘Ÿ๐‘–๐‘–โ€ฒ๐‘–๐‘–

34

Page 35: CS249: GRAPH NEURAL NETWORKS

The Label Propagation Algorithmโ€ขStep 1: Propagate ๐‘Œ๐‘Œ โ† ๐‘‡๐‘‡๐‘Œ๐‘Œ

โ€ข๐‘Œ๐‘Œ๐‘–๐‘– = โˆ‘๐‘–๐‘– ๐‘‡๐‘‡๐‘–๐‘–๐‘–๐‘–๐‘Œ๐‘Œ๐‘–๐‘– = โˆ‘๐‘–๐‘– ๐‘ƒ๐‘ƒ ๐‘—๐‘— โ†’ ๐‘–๐‘– ๐‘Œ๐‘Œ๐‘–๐‘–โ€ข Initialization of Y for unlabeled ones is not important

โ€ขStep 2: Row-normalize Yโ€ข The summation of the probability of each object belonging to each class is 1

โ€ขStep 3: Reset the labels for the labeled nodes. Repeat 1-3 until Y converges

35

Page 36: CS249: GRAPH NEURAL NETWORKS

Example: Iter = 0

36

12

5

Page 37: CS249: GRAPH NEURAL NETWORKS

Example: Iter = 1

37

12

57

8 11

910

3

2

4

0

1

Page 38: CS249: GRAPH NEURAL NETWORKS

Example: Iter = 2

38

12

57

8 11

910

3

2

4

0

1

13

๐‘Œ๐‘Œ6 = ๐‘ƒ๐‘ƒ(7 โ†’ 6)๐‘Œ๐‘Œ7 + ๐‘ƒ๐‘ƒ 11 โ†’ 6 ๐‘Œ๐‘Œ11 + ๐‘ƒ๐‘ƒ 10 โ†’ 6 ๐‘Œ๐‘Œ10+ ๐‘ƒ๐‘ƒ 3 โ†’ 6 ๐‘Œ๐‘Œ3 + ๐‘ƒ๐‘ƒ 4 โ†’ 6 ๐‘Œ๐‘Œ4 + ๐‘ƒ๐‘ƒ 0 โ†’ 6 ๐‘Œ๐‘Œ0

= (3/4,0)+(0,3/4) = (3/4,3/4)After normalization, ๐‘Œ๐‘Œ6 = (1

2, 12)

6

Page 39: CS249: GRAPH NEURAL NETWORKS

Other Label Propagation Algorithmsโ€ข Energy minimizing and harmonic function

โ€ข Semi-Supervised Learning Using Gaussian Fields and Harmonic Functionsโ€ข By Xiaojin Zhu et al., ICMLโ€™03โ€ข https://www.aaai.org/Papers/ICML/2003/ICML03-

118.pdf

โ€ข Graph regularizationโ€ข Learning with Local and Global Consistency

โ€ข By Denny Zhou et al., NIPSโ€™03โ€ข http://papers.nips.cc/paper/2506-learning-with-local-

and-global-consistency.pdf

39

Page 40: CS249: GRAPH NEURAL NETWORKS

From Energy Minimizing Perspectiveโ€ข Consider a binary classification problem

โ€ข ๐‘ฆ๐‘ฆ โˆˆ {0,1}โ€ข Let ๐‘–๐‘–:๐‘‰๐‘‰ โ†’ ๐‘…๐‘…, which maps a node to a real number

โ€ข ๐‘–๐‘– ๐‘–๐‘– = ๐‘ฆ๐‘ฆ๐‘–๐‘–, if i is labeled

โ€ข ๐‘–๐‘– = ๐‘–๐‘–๐‘™๐‘™๐‘–๐‘–๐‘ข๐‘ข

โ€ข The energy functionโ€ข

โ€ข Intuition: if two nodes are connected, they should share similar labels

40

Page 41: CS249: GRAPH NEURAL NETWORKS

Minimizing Energy Function Results in Harmonic Function

โ€ขNote ๐ธ๐ธ ๐‘–๐‘– = ๐‘–๐‘–โ€ฒ ๐ท๐ท โˆ’๐‘Š๐‘Š ๐‘–๐‘– = ๐‘–๐‘–โ€ฒ๐ฟ๐ฟ๐‘–๐‘–!โ€ขGoal: find f such that ๐ธ๐ธ ๐‘–๐‘– is minimized, and ๐‘–๐‘–๐‘™๐‘™ is fixed

โ€ขSolution:โ€ข ๐ท๐ท โˆ’๐‘Š๐‘Š ๐‘–๐‘– = 0

41

Page 42: CS249: GRAPH NEURAL NETWORKS

Solve ๐‘–๐‘–๐‘ข๐‘ขโ€ข Consider W as a block matrix

โ€ข Closed form solution of ๐‘–๐‘–๐‘ข๐‘ข:

โ€ข Where ๐‘ƒ๐‘ƒ = ๐ท๐ทโˆ’1๐‘Š๐‘Šโ€ข Iterative solution of ๐‘–๐‘–๐‘ข๐‘ข:

โ€ข ๐‘–๐‘–๐‘ข๐‘ข = (โˆ‘๐‘ก๐‘ก=0๐‘ƒ๐‘ƒ๐‘ข๐‘ข๐‘ข๐‘ข๐‘ก๐‘ก )๐‘ƒ๐‘ƒ๐‘ข๐‘ข๐‘™๐‘™๐‘–๐‘–๐‘™๐‘™ โ‡’ ๐‘–๐‘–๐‘ข๐‘ข๐‘ก๐‘ก = ๐‘ƒ๐‘ƒ๐‘ข๐‘ข๐‘ข๐‘ข๐‘–๐‘–๐‘ข๐‘ข๐‘ก๐‘กโˆ’1 + ๐‘ƒ๐‘ƒ๐‘ข๐‘ข๐‘™๐‘™๐‘–๐‘–๐‘™๐‘™โ€ข As (๐ผ๐ผ โˆ’ ๐‘ƒ๐‘ƒ๐‘ข๐‘ข๐‘ข๐‘ข)โˆ’1 = ๐ผ๐ผ + ๐‘ƒ๐‘ƒ๐‘ข๐‘ข๐‘ข๐‘ข + ๐‘ƒ๐‘ƒ๐‘ข๐‘ข๐‘ข๐‘ข2 + โ‹ฏ

42

Page 43: CS249: GRAPH NEURAL NETWORKS

From Graph Regularization Perspective

โ€ข Let ๐น๐น be n*K matrix with nonnegative entriesโ€ข For node i, pick the label k such that ๐น๐น๐‘–๐‘–๐‘˜๐‘˜ is the

maximum on among all the k

โ€ข Let ๐‘Œ๐‘Œ be n*K matrix with ๐‘ฆ๐‘ฆ๐‘–๐‘–๐‘˜๐‘˜ = 1, if node i is labeled as class k, and 0 otherwise.

โ€ขCost function: โ€ข Smoothness constraint + fitting constraint

43

Page 44: CS249: GRAPH NEURAL NETWORKS

Solve Fโ€ข Note the first component is related to trace ๐น๐นโ€ฒ๐ฟ๐ฟ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐น๐น

โ€ข Closed form solution:โ€ข

โ€ข โ‡’ ๐น๐นโˆ— = 1 โˆ’ ๐›ผ๐›ผ ๐ผ๐ผ โˆ’ ๐›ผ๐›ผ๐›ผ๐›ผ โˆ’1๐‘Œ๐‘Œ

โ€ข Where ๐›ผ๐›ผ = ๐ท๐ทโˆ’1/2๐‘Š๐‘Š๐ท๐ทโˆ’1/2, ๐›ผ๐›ผ = 11+๐œ‡๐œ‡

is a value between 0 and 1

โ€ข Iterative solution:โ€ข

44

Page 45: CS249: GRAPH NEURAL NETWORKS

Referencesโ€ข A Tutorial on Spectral Clustering by U. Luxburg

http://www.kyb.mpg.de/fileadmin/user_upload/files/publications/attachments/Luxburg07_tutorial_4488%5B0%5D.pdf

โ€ข Learning from Labeled and Unlabeled Data with Label Propagationโ€ข By Xiaojin Zhu and Zoubin Ghahramaniโ€ข http://www.cs.cmu.edu/~zhuxj/pub/CMU-CALD-02-107.pdf

โ€ข Semi-Supervised Learning Using Gaussian Fields and Harmonic Functionsโ€ข By Xiaojin Zhu et al., ICMLโ€™03โ€ข https://www.aaai.org/Papers/ICML/2003/ICML03-118.pdf

โ€ข Learning with Local and Global Consistencyโ€ข By Denny Zhou et al., NIPSโ€™03โ€ข http://papers.nips.cc/paper/2506-learning-with-local-and-global-

consistency.pdf

45

Page 46: CS249: GRAPH NEURAL NETWORKS

Contentโ€ข Introduction to Graphs

โ€ขSpectral analysis

โ€ขShallow Embedding

โ€ขGraph Neural Networks

46

Page 47: CS249: GRAPH NEURAL NETWORKS

Representing Nodes and Graphsโ€ข Important for many graph related tasksโ€ขDiscrete nature makes it very challengingโ€ขNaรฏve solutions

Limitations:Extremely High-dimensionalNo global structure information integratedPermutation-variant

Page 48: CS249: GRAPH NEURAL NETWORKS

Even more challenging for graph representation

โ€ขEx. Graphlet-based feature vector

48

Source: https://haotang1995.github.io/projects/robust_graph_level_representation_learning_using_graph_based_structural_attentional_learning

Source: DOI: 10.1093/bioinformatics/btv130

Requires subgraph isomorphism test: NP-hard

Page 49: CS249: GRAPH NEURAL NETWORKS

Automatic representation Learningโ€ขMap each node/graph into a low dimensional vectorโ€ข๐œ™๐œ™:๐‘‰๐‘‰ โ†’ ๐‘…๐‘…๐‘‘๐‘‘ or ๐œ™๐œ™:๐’ข๐’ข โ†’ ๐‘…๐‘…๐‘‘๐‘‘

โ€ขEarlier methodsโ€ข Shallow node embedding methods inspired by word2vecโ€ข DeepWalk [Perozzi, KDDโ€™14]โ€ข LINE [Tang, WWWโ€™15]โ€ข Node2Vec [Grover, KDDโ€™16]

49Source: DeepWalk๐“๐“ ๐’—๐’— = ๐‘ผ๐‘ผ๐‘ป๐‘ป๐’™๐’™๐’—๐’—, where U is the embedding matrix and ๐’™๐’™๐’—๐’—

is the one-hot encoding vector

Page 50: CS249: GRAPH NEURAL NETWORKS

LINE: Large-scale Information Network Embedding

โ€ข First-order proximity

โ€ข Assumption: Two nodes are similar if they are connected

โ€ข Limitation: links are sparse, not sufficient

50

๐‘ข๐‘ข๐‘–๐‘–: ๐‘Ž๐‘Ž๐‘š๐‘š๐‘™๐‘™๐‘Ž๐‘Ž๐‘‘๐‘‘๐‘‘๐‘‘๐‘–๐‘–๐‘˜๐‘˜๐‘’๐‘’ ๐‘ฃ๐‘ฃ๐‘Ž๐‘Ž๐‘ฃ๐‘ฃ๐‘ก๐‘ก๐‘˜๐‘˜๐‘Ž๐‘Ž ๐‘–๐‘–๐‘˜๐‘˜๐‘Ž๐‘Ž ๐‘˜๐‘˜๐‘˜๐‘˜๐‘‘๐‘‘๐‘Ž๐‘Ž ๐‘–๐‘–

Page 51: CS249: GRAPH NEURAL NETWORKS

Objective function for first-order proximity

โ€ขMinimize the KL divergence between empirical link distribution and modeled link distribution

51

๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘–:๐‘ค๐‘ค๐‘Ž๐‘Ž๐‘–๐‘–๐‘’๐‘’๐‘ก๐‘ก๐‘ก ๐‘˜๐‘˜๐‘ฃ๐‘ฃ๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐‘Ž๐‘Ž๐‘‘๐‘‘๐‘’๐‘’๐‘Ž๐‘Ž(๐‘–๐‘–, ๐‘—๐‘—)

Page 52: CS249: GRAPH NEURAL NETWORKS

Second-Order Proximityโ€ขAssumption:

โ€ข Two nodes are similar if their neighbors are similar

52

๐‘ข๐‘ข๐‘–๐‘–: ๐‘ก๐‘ก๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘’๐‘’๐‘Ž๐‘Ž๐‘ก๐‘ก ๐‘Ž๐‘Ž๐‘š๐‘š๐‘™๐‘™๐‘Ž๐‘Ž๐‘‘๐‘‘๐‘‘๐‘‘๐‘–๐‘–๐‘˜๐‘˜๐‘’๐‘’ ๐‘ฃ๐‘ฃ๐‘Ž๐‘Ž๐‘ฃ๐‘ฃ๐‘ก๐‘ก๐‘˜๐‘˜๐‘Ž๐‘Ž ๐‘–๐‘–๐‘˜๐‘˜๐‘Ž๐‘Ž ๐‘˜๐‘˜๐‘˜๐‘˜๐‘‘๐‘‘๐‘Ž๐‘Ž ๐‘–๐‘–๐‘ข๐‘ข๐‘–๐‘–โ€ฒ: ๐‘ฃ๐‘ฃ๐‘˜๐‘˜๐‘˜๐‘˜๐‘ก๐‘ก๐‘Ž๐‘Ž๐‘ฅ๐‘ฅ๐‘ก๐‘ก ๐‘Ž๐‘Ž๐‘š๐‘š๐‘™๐‘™๐‘Ž๐‘Ž๐‘‘๐‘‘๐‘‘๐‘‘๐‘–๐‘–๐‘˜๐‘˜๐‘’๐‘’ ๐‘ฃ๐‘ฃ๐‘Ž๐‘Ž๐‘ฃ๐‘ฃ๐‘ก๐‘ก๐‘˜๐‘˜๐‘Ž๐‘Ž ๐‘–๐‘–๐‘˜๐‘˜๐‘Ž๐‘Ž ๐‘˜๐‘˜๐‘˜๐‘˜๐‘‘๐‘‘๐‘Ž๐‘Ž ๐‘—๐‘—

Page 53: CS249: GRAPH NEURAL NETWORKS

Objective function for second-order proximity

โ€ขMinimize the KL divergence between empirical link distribution and modeled link distributionโ€ข Empirical distribution

โ€ข Objective function

53

๐’…๐’…๐’Š๐’Š

๐’…๐’…๐’Š๐’Š = ๏ฟฝ๐’Œ๐’Œ

๐’“๐’“๐’Š๐’Š๐’Œ๐’Œ

Page 54: CS249: GRAPH NEURAL NETWORKS

Negative Sampling for Optimizationโ€ขFor second-order proximity derived objective functionโ€ข For each positive link (i, j), sample K negative links (i, n)โ€ข An edge with weight w can be considered as w

binary edges

โ€ข New objective function

54

๐‘˜๐‘˜๐‘Ž๐‘Ž๐‘’๐‘’๐‘Ž๐‘Ž๐‘ก๐‘ก๐‘–๐‘–๐‘ฃ๐‘ฃ๐‘Ž๐‘Ž ๐‘‘๐‘‘๐‘–๐‘–๐‘ฃ๐‘ฃ๐‘ก๐‘ก๐‘Ž๐‘Ž๐‘–๐‘–๐‘™๐‘™๐‘ข๐‘ข๐‘ก๐‘ก๐‘–๐‘–๐‘˜๐‘˜๐‘˜๐‘˜: ๐‘ƒ๐‘ƒ๐‘›๐‘› ๐‘ฃ๐‘ฃ โˆ ๐‘‘๐‘‘๐‘ฃ๐‘ฃ3/4

Page 55: CS249: GRAPH NEURAL NETWORKS

Limitation of shallow embedding techniques

โ€ขToo many parametersโ€ข Each node is associated with an embedding vector, which are parameters

โ€ขNot inductiveโ€ข Cannot handle new nodes

โ€ขCannot handle node attributes

55

Page 56: CS249: GRAPH NEURAL NETWORKS

From shallow embedding to Graph Neural Networks

โ€ขThe embedding function (encoder) is more complicatedโ€ข Shallow embedding

โ€ข๐œ™๐œ™ ๐‘ฃ๐‘ฃ = ๐‘ˆ๐‘ˆ๐‘‡๐‘‡๐‘ฅ๐‘ฅ๐‘ฃ๐‘ฃ, where U is the embedding matrix and ๐‘ฅ๐‘ฅ๐‘ฃ๐‘ฃ is the one-hot encoding vector

โ€ข Graph neural networksโ€ข๐œ™๐œ™ ๐‘ฃ๐‘ฃ is a neural network depending on the graph

structure

56

Page 57: CS249: GRAPH NEURAL NETWORKS

Knowledge Graph Embedding

Page 58: CS249: GRAPH NEURAL NETWORKS

Knowledge Graphโ€ขWhat are knowledge graphs?

โ€ข Multi-relational graph data โ€ข (heterogeneous information network)

โ€ข Provide structured representation for semantic relationships between real-world entities

58

A triple (h, r, t) represents a fact, ex: (Eiffel Tower, is located in, Paris)

Page 59: CS249: GRAPH NEURAL NETWORKS

Examples of KG

59

General-purpose KGs

Common-sense KGs & NLP

Bio & Medical KGs

Product Graphs & E-commerce

Page 60: CS249: GRAPH NEURAL NETWORKS

Applications of KGsโ— Foundational to knowledge-driven AI systemsโ— Enable many downstream applications (NLP

tasks, QA systems, etc)

60

QA & Dialogue systems

Sorry, I don't know that one.

Computational Biology

Natural Language Processing

Recommendation Systems

Knowledge Graphs

Page 61: CS249: GRAPH NEURAL NETWORKS

Knowledge Graph Embeddingโ€ขGoal:

โ€ข Encode entities as low-dimensional vectors and relations as parametric algebraic operations

โ€ขApplications:โ€ข Dialogue agentsโ€ข Question answeringโ€ข Machine comprehensionโ€ข Recommender systemsโ€ข ...

61

Page 62: CS249: GRAPH NEURAL NETWORKS

Key Idea of KG embedding algorithms

โ€ข Define a score function for a triple: ๐‘–๐‘–๐‘Ÿ๐‘Ÿ(๐’‰๐’‰, ๐’•๐’•)โ€ข According to entity and relation representation

โ€ข Define a loss function to guide the trainingโ€ข E.g., an observed triple scores higher than a negative

one

62

Triple

Score Function

Page 63: CS249: GRAPH NEURAL NETWORKS

Summary of Existing Approaches

63

Source: Sun et al., RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space (ICLRโ€™19)

Page 64: CS249: GRAPH NEURAL NETWORKS

TransE: Score Functionโ€ขRelation: translating embedding

โ€ขScore functionโ€ข๐‘–๐‘–๐‘Ÿ๐‘Ÿ ๐’‰๐’‰, ๐’•๐’• = โˆ’ ๐’‰๐’‰ + ๐’“๐’“ โˆ’ ๐’•๐’• = โˆ’๐‘‘๐‘‘(๐’‰๐’‰ + ๐’“๐’“, ๐’•๐’•)

64

China

U.S.

UK

Beijing

D.C.

London

: Capitial

Bordes et al., Translating embeddings for modeling multi-relational data, NeurIPS 2013

Page 65: CS249: GRAPH NEURAL NETWORKS

TransE: Objective Functionโ€ขObjective Function

โ€ข Margin-based ranking lossโ€ข ๐ฟ๐ฟ = โˆ‘ โ„Ž,๐‘Ÿ๐‘Ÿ,๐‘ก๐‘ก โˆˆ๐‘†๐‘† โˆ‘(โ„Žโ€ฒ,๐‘Ÿ๐‘Ÿ,๐‘ก๐‘กโ€ฒ)โˆˆ๐‘†๐‘† โ„Ž,๐‘Ÿ๐‘Ÿ,๐‘ก๐‘ก

โ€ฒ [๐›พ๐›พ + ๐‘‘๐‘‘ (๐’‰๐’‰ +

65

Page 66: CS249: GRAPH NEURAL NETWORKS

TransE: Limitationsโ€ข One-one mapping: ๐‘ก๐‘ก = ๐œ™๐œ™๐‘Ÿ๐‘Ÿ(๐‘ก)

โ€ข Given (h,r), t is unique

โ€ข Given (r,t), h is unique

โ€ข Anti-symmetricโ€ข If r(h,t) then r(t,h) is not true

โ€ข Cannot model symmetric relation, e.g., friendship

โ€ข Anti-reflexiveโ€ข r(h,h) is not true

โ€ข Cannot model reflexive relations, e.g., synonym

66

Page 67: CS249: GRAPH NEURAL NETWORKS

DistMultโ€ข Bilinear score function

โ€ข ๐‘–๐‘–๐‘Ÿ๐‘Ÿ ๐’‰๐’‰, ๐’•๐’• = ๐’‰๐’‰๐‘‡๐‘‡๐‘ด๐‘ด๐‘Ÿ๐‘Ÿ๐’•๐’•โ€ข Where ๐‘ด๐‘ด๐‘Ÿ๐‘Ÿ is a diagonal matrix with diagonal vector ๐’“๐’“

โ€ข A simplification to neural tensor network (NTN)

โ€ข Objective functionโ€ข ๐ฟ๐ฟ = โˆ‘ โ„Ž,๐‘Ÿ๐‘Ÿ,๐‘ก๐‘ก โˆˆ๐‘†๐‘† โˆ‘(โ„Žโ€ฒ,๐‘Ÿ๐‘Ÿ,๐‘ก๐‘กโ€ฒ)โˆˆ๐‘†๐‘† โ„Ž,๐‘Ÿ๐‘Ÿ,๐‘ก๐‘ก

โ€ฒ [๐›พ๐›พ โˆ’ ๐‘–๐‘–๐‘Ÿ๐‘Ÿ ๐’‰๐’‰, ๐’•๐’• + ๐‘–๐‘–๐‘Ÿ๐‘Ÿ ๐’‰๐’‰โ€ฒ, ๐’•๐’•โ€ฒ ]+

โ€ข Limitationโ€ข Can only model symmetric relation

โ€ข ๐‘–๐‘–๐‘Ÿ๐‘Ÿ ๐’‰๐’‰, ๐’•๐’• = ๐‘–๐‘–๐‘Ÿ๐‘Ÿ ๐’•๐’•,๐’‰๐’‰

67Yang et al., Embedding entities and relations for learning and inference in knowledge bases, ICLR 2015

Page 68: CS249: GRAPH NEURAL NETWORKS

RotatE: Score Functionโ€ข Relation: rotation operation in complex space

โ€ข head and tail entities in complex vector space, i.e., ๐ก๐ก, ๐ญ๐ญ โˆˆ โ„‚๐‘˜๐‘˜

โ€ข each relation r as an element-wise rotation from the head entity ๐ก๐ก to the tail entity ๐ญ๐ญ, i.e., โ€ข ๐’•๐’• = ๐’‰๐’‰ โˆ˜ ๐’“๐’“, ๐‘–๐‘–. ๐‘Ž๐‘Ž. , ๐‘ก๐‘ก๐‘–๐‘– = ๐‘ก๐‘–๐‘–๐‘Ž๐‘Ž๐‘–๐‘– ,๐‘ค๐‘ค๐‘ก๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐‘Ž๐‘Ž๐‘–๐‘– = 1โ€ข Equivalently, ๐‘Ž๐‘Ž๐‘–๐‘– = ๐‘Ž๐‘Ž๐‘–๐‘–๐œƒ๐œƒ๐‘Ÿ๐‘Ÿ,๐‘–๐‘– , ๐‘–๐‘–. ๐‘Ž๐‘Ž. , ๐‘Ž๐‘Ž๐‘˜๐‘˜๐‘ก๐‘ก๐‘Ž๐‘Ž๐‘ก๐‘ก๐‘Ž๐‘Ž hi with ๐œƒ๐œƒ๐‘Ÿ๐‘Ÿ,๐‘–๐‘–

โ€ข Score function:โ€ข๐‘–๐‘–๐‘Ÿ๐‘Ÿ ๐‘ก, ๐‘ก๐‘ก = โˆ’||๐’‰๐’‰ โˆ˜ ๐’“๐’“ โˆ’ ๐’•๐’•||

68

Zhiqing Sun, Zhihong Deng, Jian-Yun Nie, and Jian Tang. โ€œRotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space.โ€ ICLRโ€™19.

Page 69: CS249: GRAPH NEURAL NETWORKS

RotatE: Geometric Interpretationโ€ขConsider 1-d case

69

Page 70: CS249: GRAPH NEURAL NETWORKS

RotatE: Objective functionโ€ขSmarter negative sampling

โ€ข The negative triple with higher score is more likely to be sampled

โ€ขCross-entropy loss

70

Page 71: CS249: GRAPH NEURAL NETWORKS

RotatE: Pros and Consโ€ข Pros:

โ€ข Can model relations with different propertiesโ€ข Symmetric: ๐‘Ž๐‘Ž๐‘–๐‘– = +1 ๐‘˜๐‘˜๐‘Ž๐‘Ž โˆ’ 1โ€ข Anti-symmetric: ๐’“๐’“ โˆ˜ ๐’“๐’“ โ‰  ๐Ÿ๐Ÿโ€ข Inverse relations: ๐ซ๐ซ2 = ๏ฟฝ๐ซ๐ซ1

โ€ข E.g., hypernym is the inverse relation of hyponymโ€ข Composition relations: ๐’“๐’“๐Ÿ‘๐Ÿ‘ = ๐’“๐’“๐Ÿ๐Ÿ โˆ˜ ๐’“๐’“๐Ÿ๐Ÿ, i. e. ,๐œฝ๐œฝ๐Ÿ‘๐Ÿ‘ = ๐œฝ๐œฝ๐Ÿ๐Ÿ +๐œฝ๐œฝ๐Ÿ๐Ÿ, if ๐’“๐’“๐‘–๐‘– = ๐‘Ž๐‘Ž๐‘–๐‘–๐œฝ๐œฝ๐‘–๐‘– for j = 1,2,3

โ€ข Cons:โ€ข One-one mappingโ€ข Relations are commutative: i.e., ๐’“๐’“๐Ÿ๐Ÿ โˆ˜ ๐’“๐’“๐Ÿ๐Ÿ = ๐’“๐’“๐Ÿ๐Ÿ โˆ˜ ๐’“๐’“๐Ÿ๐Ÿ

โ€ข Which is not always true, e.g., fatherโ€™s wife โ‰  wifeโ€™s father

71

Page 72: CS249: GRAPH NEURAL NETWORKS

Contentโ€ข Introduction to Graphs

โ€ขSpectral analysis

โ€ขShallow Embedding

โ€ขGraph Neural Networks

72

Page 73: CS249: GRAPH NEURAL NETWORKS

Notationsโ€ขAn attributed graph ๐บ๐บ = (๐‘‰๐‘‰,๐ธ๐ธ)

โ€ข๐‘‰๐‘‰: vertex set

โ€ข๐ธ๐ธ: edge set

โ€ข๐ด๐ด: adjacency matrix

โ€ข๐‘‹๐‘‹ โˆˆ ๐‘…๐‘…๐‘‘๐‘‘0ร—|๐‘‰๐‘‰|: feature matrix for all the nodes

โ€ข๐‘๐‘(๐‘ฃ๐‘ฃ): neighbors of node ๐‘ฃ๐‘ฃโ€ข๐‘ก๐‘ฃ๐‘ฃ๐‘™๐‘™ : Representation vector of node ๐‘ฃ๐‘ฃ at Layer ๐‘™๐‘™

โ€ข Note ๐‘ก๐‘ฃ๐‘ฃ0 = ๐‘ฅ๐‘ฅ๐‘ฃ๐‘ฃโ€ข๐ป๐ป๐‘™๐‘™ โˆˆ ๐‘…๐‘…๐‘‘๐‘‘๐‘™๐‘™ร—|๐‘‰๐‘‰|: representation matrix

73

Page 74: CS249: GRAPH NEURAL NETWORKS

The General Architecture of GNNsโ€ขFor a node v at layer t

โ€ข A function of representations of neighbors and itself from previous layersโ€ข Aggregation of neighborsโ€ข Transformation to a different spaceโ€ข Combination of neighbors and the node itself

74

representation vector from previous layer for node v

representation vectors from previous layer for node vโ€™s neighbors

Page 75: CS249: GRAPH NEURAL NETWORKS

Compare with CNNโ€ขRecall CNN

โ€ข Regular graph

โ€ขGNNโ€ข Extend to irregular graph structure

75

Page 76: CS249: GRAPH NEURAL NETWORKS

Graph Convolutional Network (GCN)

76

โ€ขKipf and Welling, ICLRโ€™17โ€ข , ๏ฟฝ๐ด๐ด = ๐ด๐ด + ๐ผ๐ผโ€ข f: graph filter

โ€ขFrom a node vโ€™s perspective

๐‘พ๐‘พ๐’Œ๐’Œ:๐’“๐’“๐’˜๐’˜๐’Š๐’Š๐’˜๐’˜๐’‰๐’‰๐’•๐’• ๐’”๐’”๐’Ž๐’Ž๐’•๐’•๐’“๐’“๐’Š๐’Š๐’™๐’™ ๐’Ž๐’Ž๐’•๐’• ๐‘ณ๐‘ณ๐’Ž๐’Ž๐’”๐’”๐’˜๐’˜๐’“๐’“ ๐’Œ๐’Œ, ๐’”๐’”๐’‰๐’‰๐’Ž๐’Ž๐’“๐’“๐’˜๐’˜๐’…๐’… ๐’Ž๐’Ž๐’‚๐’‚๐’“๐’“๐’‚๐’‚๐’”๐’”๐’”๐’” ๐’…๐’…๐’Š๐’Š๐’…๐’…๐’…๐’…๐’˜๐’˜๐’“๐’“๐’˜๐’˜๐’…๐’…๐’•๐’• ๐’…๐’…๐’‚๐’‚๐’…๐’…๐’˜๐’˜๐’”๐’”

Page 77: CS249: GRAPH NEURAL NETWORKS

A toy example of 2-layer GCN on a 4-node graph

โ€ขComputation graph

7777

Page 78: CS249: GRAPH NEURAL NETWORKS

GraphSAGEโ€ข Inductive Representation Learning on Large Graphs

William L. Hamilton*, Rex Ying*, Jure Leskovec, NeurIPSโ€™17

78

A more general form

Page 79: CS249: GRAPH NEURAL NETWORKS

More about AGGโ€ขMean

โ€ขLSTMโ€ข๐œ‹๐œ‹ โ‹… : ๐‘Ž๐‘Ž ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘˜๐‘˜๐‘‘๐‘‘๐‘˜๐‘˜๐‘š๐‘š ๐‘๐‘๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘š๐‘š๐‘ข๐‘ข๐‘ก๐‘ก๐‘Ž๐‘Ž๐‘ก๐‘ก๐‘–๐‘–๐‘˜๐‘˜๐‘˜๐‘˜

โ€ขPoolโ€ข๐›พ๐›พ โ‹… : Element-wise mean/max pooling of neighbor set

79

= ๐›พ๐›พ

Page 80: CS249: GRAPH NEURAL NETWORKS

Message-Passing Neural Networkโ€ข Gilmer et al., 2017. Neural Message Passing for Quantum Chemistry. ICML.

โ€ข A general framework that subsumes most GNNsโ€ข Can also include edge information

โ€ข Two stepsโ€ข Get messages from neighbors at step k

โ€ข Update the node latent represent based on the msg

80

e.g., Sum or MLP

e.g., LSTM, GRU

๐‘จ๐‘จ ๐’”๐’”๐’”๐’”๐’˜๐’˜๐’‚๐’‚๐’Š๐’Š๐’Ž๐’Ž๐’”๐’” ๐’‚๐’‚๐’Ž๐’Ž๐’”๐’”๐’˜๐’˜:๐‘ฎ๐‘ฎ๐‘ฎ๐‘ฎ๐‘ฎ๐‘ฎ๐‘ฎ๐‘ฎ, Li et al., Gated graph sequence neural networks, ICLR 2015

Page 81: CS249: GRAPH NEURAL NETWORKS

Graph Attention Network (GAN)โ€ข How to decide the importance of neighbors?

โ€ข GCN: a predefined weight

โ€ข Others: no differentiation

โ€ข GAN: decide the weights using learnable attentionโ€ข Velickovic et al., 2018. Graph Attention Networks. ICLR.

81

Page 82: CS249: GRAPH NEURAL NETWORKS

The attention mechanismโ€ขPotentially many possible designs

82

Page 83: CS249: GRAPH NEURAL NETWORKS

Heterogeneous Graph Transformer (HGT)

โ€ขHow to handle heterogeneous types of nodes and relations?โ€ข Introduce different weight matrices for different types of nodes and relations

โ€ข Introduce different attention weight matrices for different types of nodes and relations

83

๐‘ก๐‘ก

๐‘ฃ๐‘ฃ2๐‘ฃ๐‘ฃ1

Paper

Author Paper

CiteWriteHu et al., Heterogeneous Graph Transformer, WWWโ€™20

Page 84: CS249: GRAPH NEURAL NETWORKS

Meta-Relation-based Parametrization

โ€ข Introduce node- and edge- dependent parameterizationโ€ข Leverage meta relation <source node type, edge type, target node type> to parameterize attention and message passing weight.

84

W<Author, Write, Paper>=WAuthor WWrite WPaper

W<Paper, Cite, Paper>=WPaper WCite WPaper

๐‘ฃ๐‘ฃ2๐‘ฃ๐‘ฃ1Author Paper

CiteWrite

๐‘ก๐‘ก

Page 85: CS249: GRAPH NEURAL NETWORKS

Meta-Relation-based Attentionโ€ขAttention learning is also parameterized based on node type and link type

85

Page 86: CS249: GRAPH NEURAL NETWORKS

Downstream Tasks for Graphs

Page 87: CS249: GRAPH NEURAL NETWORKS

Typical Graph Functionsโ€ขNode level

โ€ข Similarity search

โ€ข Link prediction

โ€ข Classification

โ€ข Community detection

โ€ข Ranking

87

โ€ขGraph levelโ€ข Similarity search

โ€ข Frequent pattern mining

โ€ข Graph isomorphism test

โ€ข Graph matching

โ€ข Classification

โ€ข Clustering

โ€ข Graph generation

Page 88: CS249: GRAPH NEURAL NETWORKS

1. Semi-supervised Node Classification

โ€ขDecoder using ๐‘ง๐‘ง๐‘ฃ๐‘ฃ = ๐‘ก๐‘ฃ๐‘ฃ๐ฟ๐ฟโ€ข Feed into another fully connected layer

โ€ข ๏ฟฝ๐‘ฆ๐‘ฆ๐‘ฃ๐‘ฃ = ๐œŽ๐œŽ(๐œƒ๐œƒ๐‘‡๐‘‡๐‘ง๐‘ง๐‘ฃ๐‘ฃ)โ€ขLoss function

โ€ข Cross entropy loss

โ€ข In a binary classification caseโ€ข ๐‘™๐‘™๐‘ฃ๐‘ฃ = ๐‘ฆ๐‘ฆ๐‘ฃ๐‘ฃ๐‘™๐‘™๐‘˜๐‘˜๐‘’๐‘’ ๏ฟฝ๐‘ฆ๐‘ฆ๐‘ฃ๐‘ฃ + (1 โˆ’ ๐‘ฆ๐‘ฆ๐‘ฃ๐‘ฃ)log(1 โˆ’ ๏ฟฝ๐‘ฆ๐‘ฆ๐‘ฃ๐‘ฃ)

88

Page 89: CS249: GRAPH NEURAL NETWORKS

Applications of Node Classificationโ€ขSocial network

โ€ข An account is bot or not

โ€ขCitation networkโ€ข A paperโ€™s research field

โ€ขA program-derived graphโ€ข The type of a variable

89

Page 90: CS249: GRAPH NEURAL NETWORKS

2. Link Predictionโ€ขDecoder using ๐‘ง๐‘ง๐‘ฃ๐‘ฃ = ๐‘ก๐‘ฃ๐‘ฃ๐ฟ๐ฟ

โ€ข Given a node pair ๐‘ข๐‘ข, ๐‘ฃ๐‘ฃโ€ข Determine its probability ๐‘๐‘๐‘ข๐‘ข๐‘ฃ๐‘ฃ = ๐‘ง๐‘ง๐‘ข๐‘ข๐‘‡๐‘‡๐‘…๐‘…๐‘ง๐‘ง๐‘ฃ๐‘ฃโ€ข R could be different for different relation type

โ€ขLoss functionโ€ข Cross entropy loss

โ€ข ๐‘™๐‘™๐‘ข๐‘ข๐‘ฃ๐‘ฃ = ๐‘ฆ๐‘ฆ๐‘ข๐‘ข๐‘ฃ๐‘ฃ๐‘™๐‘™๐‘˜๐‘˜๐‘’๐‘’๐‘๐‘๐‘ข๐‘ข๐‘ฃ๐‘ฃ + (1 โˆ’ ๐‘ฆ๐‘ฆ๐‘ข๐‘ข๐‘ฃ๐‘ฃ)log(1 โˆ’ ๐‘๐‘๐‘ข๐‘ข๐‘ฃ๐‘ฃ)

90

Page 91: CS249: GRAPH NEURAL NETWORKS

Link Prediction Applicationsโ€ขSocial network

โ€ข Friend recommendation

โ€ขCitation networkโ€ข Citation recommendation

โ€ขMedical networkโ€ข Drug and target binding or not

โ€ขA program-derived graphโ€ข Code autocomplete

91

Page 92: CS249: GRAPH NEURAL NETWORKS

3. Graph Classificationโ€ขDecoder using ๐‘ก๐บ๐บ = ๐‘’๐‘’( ๐‘ง๐‘ง๐‘ฃ๐‘ฃ ๐‘ฃ๐‘ฃโˆˆ๐‘‰๐‘‰)

โ€ข๐‘’๐‘’ โ‹… : ๐‘Ž๐‘Ž ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘‘๐‘‘ ๐‘˜๐‘˜๐‘ข๐‘ข๐‘ก๐‘ก ๐‘–๐‘–๐‘ข๐‘ข๐‘˜๐‘˜๐‘ฃ๐‘ฃ๐‘ก๐‘ก๐‘–๐‘–๐‘˜๐‘˜๐‘˜๐‘˜, e.g., sum

โ€ข Feed ๐‘ก๐บ๐บ into another fully connected layer

โ€ข ๏ฟฝ๐‘ฆ๐‘ฆ๐บ๐บ = ๐œŽ๐œŽ(๐œƒ๐œƒ๐‘‡๐‘‡๐‘ก๐บ๐บ)โ€ขLoss function

โ€ข Cross entropy loss

โ€ข In a binary classification caseโ€ข ๐‘™๐‘™๐บ๐บ = ๐‘ฆ๐‘ฆ๐บ๐บ๐‘™๐‘™๐‘˜๐‘˜๐‘’๐‘’ ๏ฟฝ๐‘ฆ๐‘ฆ๐บ๐บ + (1 โˆ’ ๐‘ฆ๐‘ฆ๐บ๐บ)log(1 โˆ’ ๏ฟฝ๐‘ฆ๐‘ฆ๐บ๐บ)

92

Page 93: CS249: GRAPH NEURAL NETWORKS

Graph Classification Applicationsโ€ขChemical compounds

โ€ข Toxic or not

โ€ขProteinsโ€ข Has certain function or not

โ€ขProgram-derived graphsโ€ข Contains bugs or not

93

Page 94: CS249: GRAPH NEURAL NETWORKS

4. Graph Similarity Computationโ€ขDecoder using ๐‘ก๐บ๐บ= ๐‘’๐‘’( ๐‘ง๐‘ง๐‘ฃ๐‘ฃ ๐‘ฃ๐‘ฃโˆˆ๐‘‰๐‘‰)

โ€ข Given a graph pair ๐บ๐บ1,๐บ๐บ2โ€ข Determine its score ๐‘ฃ๐‘ฃ๐บ๐บ1๐บ๐บ2 = ๐‘ก๐บ๐บ1

๐‘‡๐‘‡ ๐‘…๐‘…๐‘ก๐บ๐บ2โ€ขLoss function

โ€ข E.g., Square lossโ€ข ๐‘™๐‘™๐บ๐บ1๐บ๐บ2 = (๐‘ฆ๐‘ฆ๐บ๐บ1๐บ๐บ2โˆ’๐‘ฃ๐‘ฃ๐บ๐บ1๐บ๐บ2)^2

94

Page 95: CS249: GRAPH NEURAL NETWORKS

A Concrete solution by SimGNN [Bai et al., AAAI 2019]

โ€ข Goal: learn a GNN ๐œ™๐œ™: ๐’ข๐’ข ร— ๐’ข๐’ข โ†’ ๐‘…๐‘…+ to approximate Graph Edit Distance between two graphs

95

1. Attention-based graph-level embedding2. Histogram features from pairwise node similarities

Page 96: CS249: GRAPH NEURAL NETWORKS

Graph Similarity Computation Applications

โ€ขDrug databaseโ€ข Drug similarity search

โ€ขProgram databaseโ€ข Code recommendation

โ€ข Search ninja code for novice code โ€ข Search java code for COBOL code

96

Page 97: CS249: GRAPH NEURAL NETWORKS

Summaryโ€ข Introduction to Graphs

โ€ขSpectral analysis

โ€ขShallow Embedding

โ€ขGraph Neural Networks

97