cs249: graph neural networks
TRANSCRIPT
Contentโข Introduction to Graphs
โขSpectral analysis
โขShallow Embedding
โขGraph Neural Networks
2
3
Graph, Graph, Everywhere
Aspirin Yeast protein interaction network
from
H. J
eong
et a
l Nat
ure
411,
41
(200
1)
Internet Co-author network
More โฆ
4
5
Why Graph Mining?โข Graphs are ubiquitous
โข Chemical compounds (Cheminformatics)
โข Protein structures, biological pathways/networks (Bioinformactics)
โข Program control flow, traffic flow, and workflow analysis
โข XML databases, Web, and social network analysis
โข Graph is a general modelโข Trees, lattices, sequences, and items are degenerated graphs
โข Diversity of graphsโข Directed vs. undirected, labeled vs. unlabeled (edges & vertices), weighted vs.
unweighted, homogeneous vs. heterogeneous
โข Complexity of algorithms: many problems are of high complexity
Representation of a Graphโข ๐บ๐บ =< ๐๐,๐ธ๐ธ >
โข ๐๐ = {๐ข๐ข1, โฆ ,๐ข๐ข๐๐}: node setโข ๐ธ๐ธ โ ๐๐ ร ๐๐: edge set
โข Adjacency matrixโข ๐ด๐ด = ๐๐๐๐๐๐ , ๐๐, ๐๐ = 1, โฆ ,๐๐
โข ๐๐๐๐๐๐ = 1, ๐๐๐๐ < ๐ข๐ข๐๐ ,๐ข๐ข๐๐ >โ ๐ธ๐ธโข ๐๐๐๐๐๐ = 0, ๐๐๐๐ < ๐ข๐ข๐๐ ,๐ข๐ข๐๐ >โ ๐ธ๐ธ
โข Undirected graph vs. Directed graphโข ๐ด๐ด = ๐ด๐ดT ๐ฃ๐ฃ๐ฃ๐ฃ.๐ด๐ด โ ๐ด๐ดT
โข Weighted graphโข Use W instead of A, where ๐ค๐ค๐๐๐๐ represents the weight of edge
< ๐ข๐ข๐๐ ,๐ข๐ข๐๐ >
6
Example
7
Yahoo
MโsoftAmazon
y 1 1 0a 1 0 1m 0 1 0
y a m
Adjacency matrix A
Typical Graph Tasksโข Node level
โข Link prediction
โข Node classification
โข Similarity search
โข Community detection
โข Ranking
8
โข Graph levelโข Graph Prediction
โข Graph similarity search
โข Frequent pattern mining
โข MCS detection
โข Clustering
โข Node level, Graph Property โข Betweenness score predictionโข Travelling salesman problem โข Network attack problemโข Set cover problemโข Maximum clique detection
Graph TechniquesโขApproaches before GNNs
โข Heuristics
โข Graph signal processing
โข Graph Kernels
โข Graphical models
9
Contentโข Introduction to Graphs
โขSpectral analysis
โขShallow Embedding
โขGraph Neural Networks
10
Spectral Analysisโข Graph Laplacians are keys to understand graphsโข Unnormalized graph Laplaciansโข Normalized graph Laplacians
โข Spectral clusteringโข Leverage eigenvectors of graph Laplacians to conduct clustering on graphs
โข Has close relationship with graph cutsโข Label propagation
โข Semi-supervised node classification on graphsโข Also related to graph Laplacians
11
What are Graph Laplacian Matrices?โขThey are matrices defined as functions of graph adjacency or weight matrix
โขA tool to study graphs
โขThere is a field called spectral graph theory studying those matrices
12
Examples of Graph Laplaciansโข Given an undirected and weighted graph ๐บ๐บ = (๐๐,๐ธ๐ธ),
with weight matrix ๐๐โข ๐๐๐๐๐๐ = ๐๐๐๐๐๐ โฅ 0โข n: total number of nodesโข Degree for node ๐ฃ๐ฃ๐๐ โ ๐๐: ๐๐๐๐ = โ๐๐ ๐ค๐ค๐๐๐๐โข Degree matrix ๐ท๐ท: a diagonal matrix with degrees on the
diagonal, i.e., ๐ท๐ท๐๐๐๐ = ๐๐๐๐ and ๐ท๐ท๐๐๐๐ = 0 ๐๐๐๐ ๐๐ โ ๐๐โข Three examples of graph Laplacians
โข The unnormalized graph Laplacianโข ๐ฟ๐ฟ = ๐ท๐ท โ๐๐
โข The normalized graph Laplaciansโข Symmetric: ๐ฟ๐ฟ๐ ๐ ๐ ๐ ๐ ๐ = ๐ท๐ทโ1/2๐ฟ๐ฟ๐ท๐ทโ1/2
โข Random walk related: ๐ฟ๐ฟ๐๐๐๐ = ๐ท๐ทโ1๐ฟ๐ฟ
13
The Unnormalized Graph Laplacianโข Definition: ๐ฟ๐ฟ = ๐ท๐ท โ๐๐โข Properties of ๐ฟ๐ฟ
โข For any vector ๐๐ โ ๐ ๐ ๐๐
โข ๐ฟ๐ฟ is symmetric and positive semi-definiteโข The smallest eigenvalue of ๐ฟ๐ฟ is 0, and the
corresponding eigenvector is the constant one vector ๐๐โข ๐๐ is an all-one vector with n dimensionsโข An eigenvector can be scaled by multiplying a nonzero
scalar ๐ผ๐ผโข ๐ฟ๐ฟ has n non-negative, real-valued eigenvalues:
14
QuestionโขWill self loops in graph change ๐ฟ๐ฟ?
15
Number of Connected ComponentsโขThe multiplicity k of the eigenvalue 0 of ๐ฟ๐ฟequals the number of connected componentsโข Consider ๐๐ = 1, i.e., a connected graph, and ๐๐, the corresponding eigenvector for 0
โข Every element of ๐๐ has to be equal to each other
16
K connected componentsโข The graph can be represented as a block diagonal matrix, and so do matrix ๐ฟ๐ฟ
โข For each block ๐ฟ๐ฟ๐๐, it has eigenvalue 0 with corresponding constant one eigenvector
โข For ๐ฟ๐ฟ, it has k eigenvalues with 0, with corresponding eigenvectors as indicator vectorsโข ๐๐๐ด๐ด๐๐ is an indicator vector, with ones for nodes in ๐ด๐ด๐๐,
i.e., the nodes in component i, and zeros elsewhere 17
The normalized graph LaplaciansโขSymmetric:
โข๐ฟ๐ฟ๐ ๐ ๐ ๐ ๐ ๐ = ๐ท๐ทโ1/2๐ฟ๐ฟ๐ท๐ทโ1/2 = ๐ผ๐ผ โ ๐ท๐ทโ1/2๐๐๐ท๐ทโ1/2
โขRandom walk related: โข๐ฟ๐ฟ๐๐๐๐ = ๐ท๐ทโ1๐ฟ๐ฟ = ๐ผ๐ผ โ ๐ท๐ทโ1๐๐
18
Properties of ๐ณ๐ณ๐๐๐๐๐๐ and ๐ณ๐ณ๐๐๐๐โข For any vector ๐๐ โ ๐ ๐ ๐๐
โข ๐๐ is an eigenvalue of ๐ณ๐ณ๐๐๐๐ with eigenvector ๐ฃ๐ฃ ๐๐ is an eigenvalue of ๐ณ๐ณ๐๐๐๐๐๐ with eigenvector ๐ค๐ค = ๐ท๐ท1/2๐ฃ๐ฃ
โข ๐๐ is an eigenvalue of ๐ณ๐ณ๐๐๐๐ with eigenvector ๐ฃ๐ฃ ๐๐ and ๐ฃ๐ฃsolves the generalized eigen problem ๐ฟ๐ฟ๐ฃ๐ฃ = ๐๐๐ท๐ท๐ฃ๐ฃ
โข 0 is an eigenvalue of ๐ณ๐ณ๐๐๐๐ with eigenvector ๐๐. 0 is an eigenvalue of ๐ณ๐ณ๐๐๐๐๐๐ with eigenvector ๐ท๐ท1/2๐๐
โข ๐ณ๐ณ๐๐๐๐๐๐ and ๐ณ๐ณ๐๐๐๐ are positive semi-definite, and have n non-negative real-valued eigenvalues
19
Exampleโข Graph built upon Gaussian mixture model with four components
20
Spectral Clustering
Clustering Graphs and Network Data
โข Applicationsโข Bi-partite graphs, e.g., customers and products, authors and
conferencesโข Web search engines, e.g., click through graphs and Web
graphsโข Social networks, friendship/coauthor graphs
22Clustering books about politics [Newman, 2006]
Example of Graph ClusteringโขReference: ICDMโ09 Tutorial by Chris DingโขExample:
โข Clustering supreme court justices according to their voting behavior
23
W =
Example: Continue
24
Spectral Clustering AlgorithmsโขGoal: cluster nodes in the graph into k clusters
โข Idea: Leverage the first k eigenvectors of ๐ฟ๐ฟโขMajor steps:
โข Compute the first k eigenvectors, ๐ฃ๐ฃ1, ๐ฃ๐ฃ2, โฆ , ๐ฃ๐ฃ๐๐, of ๐ฟ๐ฟ
โข Each node i is then represented by k values ๐ฅ๐ฅ๐๐ = (๐ฃ๐ฃ1๐๐ , ๐ฃ๐ฃ2๐๐ , โฆ , ๐ฃ๐ฃ๐๐๐๐)
โข Cluster ๐ฅ๐ฅ๐๐โs using k-means 25
Variants of Spectral Clustering Algorithms
โขNormalized spectral clustering according to Shi and Malik (2000)โข Compute the first k eigenvectors, ๐ฃ๐ฃ1, ๐ฃ๐ฃ2, โฆ , ๐ฃ๐ฃ๐๐, of ๐ฟ๐ฟ๐๐๐๐
โขNormalized spectral clustering according to Ng, Jordan, and Weiss (2002)โข Compute the first k eigenvectors, ๐ฃ๐ฃ1, ๐ฃ๐ฃ2, โฆ , ๐ฃ๐ฃ๐๐, of ๐ฟ๐ฟ๐ ๐ ๐ ๐ ๐ ๐
โข Normalizing ๐ฅ๐ฅ๐๐ to have norm 1
26
Connections to Graph CutsโขMin-Cut
โข Minimize the # of cut of edges to partition a graph into 2 disconnected components
27
Objective Function of Min-Cut
28
AlgorithmโขStep 1:
โข Calculate Graph Laplacian matrix: ๐ฟ๐ฟ = ๐ท๐ท โ๐๐โขStep 2:
โข Calculate the second eigenvector qโข Q: Why second?
โขStep 3:โข Bisect q (e.g., 0) to get two clusters
29
Minimum Cut with Constraints
30
Other Objective Functions
31
Label Propagation
Label Propagation in the NetworkโขGiven a network, some nodes are given labels, can we classify the unlabeled nodes by using link information?โข E.g., Node 12 belongs to Class 1, Node 5 Belongs to Class 2
33
12
5
Problem Formalization for Label Propagation
โข Given n nodesโข l with labels (e.g., ๐๐1,๐๐2, โฆ ,๐๐๐๐ ๐๐๐๐๐๐ ๐๐๐๐๐๐๐ค๐ค๐๐)โข u without labels (e.g., ๐๐๐๐+1,๐๐๐๐+2, โฆ ,๐๐๐๐ are unknown)
โข ๐๐ ๐๐๐ฃ๐ฃ ๐ก๐ก๐ก๐๐ ๐๐ ร ๐พ๐พ ๐๐๐๐๐๐๐๐๐๐ ๐๐๐๐๐ก๐ก๐๐๐๐๐ฅ๐ฅโข K is the number of labels (classes)โข ๐๐๐๐๐๐ denotes the probability node i belonging to class k
โข The weighted adjacency matrix is Wโข The probabilistic transition matrix T
โข ๐๐๐๐๐๐ = ๐๐ ๐๐ โ ๐๐ = ๐๐๐๐๐๐โ๐๐โฒ ๐๐๐๐โฒ๐๐
34
The Label Propagation AlgorithmโขStep 1: Propagate ๐๐ โ ๐๐๐๐
โข๐๐๐๐ = โ๐๐ ๐๐๐๐๐๐๐๐๐๐ = โ๐๐ ๐๐ ๐๐ โ ๐๐ ๐๐๐๐โข Initialization of Y for unlabeled ones is not important
โขStep 2: Row-normalize Yโข The summation of the probability of each object belonging to each class is 1
โขStep 3: Reset the labels for the labeled nodes. Repeat 1-3 until Y converges
35
Example: Iter = 0
36
12
5
Example: Iter = 1
37
12
57
8 11
910
3
2
4
0
1
Example: Iter = 2
38
12
57
8 11
910
3
2
4
0
1
13
๐๐6 = ๐๐(7 โ 6)๐๐7 + ๐๐ 11 โ 6 ๐๐11 + ๐๐ 10 โ 6 ๐๐10+ ๐๐ 3 โ 6 ๐๐3 + ๐๐ 4 โ 6 ๐๐4 + ๐๐ 0 โ 6 ๐๐0
= (3/4,0)+(0,3/4) = (3/4,3/4)After normalization, ๐๐6 = (1
2, 12)
6
Other Label Propagation Algorithmsโข Energy minimizing and harmonic function
โข Semi-Supervised Learning Using Gaussian Fields and Harmonic Functionsโข By Xiaojin Zhu et al., ICMLโ03โข https://www.aaai.org/Papers/ICML/2003/ICML03-
118.pdf
โข Graph regularizationโข Learning with Local and Global Consistency
โข By Denny Zhou et al., NIPSโ03โข http://papers.nips.cc/paper/2506-learning-with-local-
and-global-consistency.pdf
39
From Energy Minimizing Perspectiveโข Consider a binary classification problem
โข ๐ฆ๐ฆ โ {0,1}โข Let ๐๐:๐๐ โ ๐ ๐ , which maps a node to a real number
โข ๐๐ ๐๐ = ๐ฆ๐ฆ๐๐, if i is labeled
โข ๐๐ = ๐๐๐๐๐๐๐ข๐ข
โข The energy functionโข
โข Intuition: if two nodes are connected, they should share similar labels
40
Minimizing Energy Function Results in Harmonic Function
โขNote ๐ธ๐ธ ๐๐ = ๐๐โฒ ๐ท๐ท โ๐๐ ๐๐ = ๐๐โฒ๐ฟ๐ฟ๐๐!โขGoal: find f such that ๐ธ๐ธ ๐๐ is minimized, and ๐๐๐๐ is fixed
โขSolution:โข ๐ท๐ท โ๐๐ ๐๐ = 0
41
Solve ๐๐๐ข๐ขโข Consider W as a block matrix
โข Closed form solution of ๐๐๐ข๐ข:
โข Where ๐๐ = ๐ท๐ทโ1๐๐โข Iterative solution of ๐๐๐ข๐ข:
โข ๐๐๐ข๐ข = (โ๐ก๐ก=0๐๐๐ข๐ข๐ข๐ข๐ก๐ก )๐๐๐ข๐ข๐๐๐๐๐๐ โ ๐๐๐ข๐ข๐ก๐ก = ๐๐๐ข๐ข๐ข๐ข๐๐๐ข๐ข๐ก๐กโ1 + ๐๐๐ข๐ข๐๐๐๐๐๐โข As (๐ผ๐ผ โ ๐๐๐ข๐ข๐ข๐ข)โ1 = ๐ผ๐ผ + ๐๐๐ข๐ข๐ข๐ข + ๐๐๐ข๐ข๐ข๐ข2 + โฏ
42
From Graph Regularization Perspective
โข Let ๐น๐น be n*K matrix with nonnegative entriesโข For node i, pick the label k such that ๐น๐น๐๐๐๐ is the
maximum on among all the k
โข Let ๐๐ be n*K matrix with ๐ฆ๐ฆ๐๐๐๐ = 1, if node i is labeled as class k, and 0 otherwise.
โขCost function: โข Smoothness constraint + fitting constraint
43
Solve Fโข Note the first component is related to trace ๐น๐นโฒ๐ฟ๐ฟ๐ ๐ ๐ ๐ ๐ ๐ ๐น๐น
โข Closed form solution:โข
โข โ ๐น๐นโ = 1 โ ๐ผ๐ผ ๐ผ๐ผ โ ๐ผ๐ผ๐ผ๐ผ โ1๐๐
โข Where ๐ผ๐ผ = ๐ท๐ทโ1/2๐๐๐ท๐ทโ1/2, ๐ผ๐ผ = 11+๐๐
is a value between 0 and 1
โข Iterative solution:โข
44
Referencesโข A Tutorial on Spectral Clustering by U. Luxburg
http://www.kyb.mpg.de/fileadmin/user_upload/files/publications/attachments/Luxburg07_tutorial_4488%5B0%5D.pdf
โข Learning from Labeled and Unlabeled Data with Label Propagationโข By Xiaojin Zhu and Zoubin Ghahramaniโข http://www.cs.cmu.edu/~zhuxj/pub/CMU-CALD-02-107.pdf
โข Semi-Supervised Learning Using Gaussian Fields and Harmonic Functionsโข By Xiaojin Zhu et al., ICMLโ03โข https://www.aaai.org/Papers/ICML/2003/ICML03-118.pdf
โข Learning with Local and Global Consistencyโข By Denny Zhou et al., NIPSโ03โข http://papers.nips.cc/paper/2506-learning-with-local-and-global-
consistency.pdf
45
Contentโข Introduction to Graphs
โขSpectral analysis
โขShallow Embedding
โขGraph Neural Networks
46
Representing Nodes and Graphsโข Important for many graph related tasksโขDiscrete nature makes it very challengingโขNaรฏve solutions
Limitations:Extremely High-dimensionalNo global structure information integratedPermutation-variant
Even more challenging for graph representation
โขEx. Graphlet-based feature vector
48
Source: https://haotang1995.github.io/projects/robust_graph_level_representation_learning_using_graph_based_structural_attentional_learning
Source: DOI: 10.1093/bioinformatics/btv130
Requires subgraph isomorphism test: NP-hard
Automatic representation LearningโขMap each node/graph into a low dimensional vectorโข๐๐:๐๐ โ ๐ ๐ ๐๐ or ๐๐:๐ข๐ข โ ๐ ๐ ๐๐
โขEarlier methodsโข Shallow node embedding methods inspired by word2vecโข DeepWalk [Perozzi, KDDโ14]โข LINE [Tang, WWWโ15]โข Node2Vec [Grover, KDDโ16]
49Source: DeepWalk๐๐ ๐๐ = ๐ผ๐ผ๐ป๐ป๐๐๐๐, where U is the embedding matrix and ๐๐๐๐
is the one-hot encoding vector
LINE: Large-scale Information Network Embedding
โข First-order proximity
โข Assumption: Two nodes are similar if they are connected
โข Limitation: links are sparse, not sufficient
50
๐ข๐ข๐๐: ๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐ ๐ฃ๐ฃ๐๐๐ฃ๐ฃ๐ก๐ก๐๐๐๐ ๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐ ๐๐
Objective function for first-order proximity
โขMinimize the KL divergence between empirical link distribution and modeled link distribution
51
๐ค๐ค๐๐๐๐:๐ค๐ค๐๐๐๐๐๐๐ก๐ก๐ก ๐๐๐ฃ๐ฃ๐๐๐๐ ๐๐๐๐๐๐๐๐(๐๐, ๐๐)
Second-Order ProximityโขAssumption:
โข Two nodes are similar if their neighbors are similar
52
๐ข๐ข๐๐: ๐ก๐ก๐๐๐๐๐๐๐๐๐ก๐ก ๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐ ๐ฃ๐ฃ๐๐๐ฃ๐ฃ๐ก๐ก๐๐๐๐ ๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐ ๐๐๐ข๐ข๐๐โฒ: ๐ฃ๐ฃ๐๐๐๐๐ก๐ก๐๐๐ฅ๐ฅ๐ก๐ก ๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐ ๐ฃ๐ฃ๐๐๐ฃ๐ฃ๐ก๐ก๐๐๐๐ ๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐ ๐๐
Objective function for second-order proximity
โขMinimize the KL divergence between empirical link distribution and modeled link distributionโข Empirical distribution
โข Objective function
53
๐ ๐ ๐๐
๐ ๐ ๐๐ = ๏ฟฝ๐๐
๐๐๐๐๐๐
Negative Sampling for OptimizationโขFor second-order proximity derived objective functionโข For each positive link (i, j), sample K negative links (i, n)โข An edge with weight w can be considered as w
binary edges
โข New objective function
54
๐๐๐๐๐๐๐๐๐ก๐ก๐๐๐ฃ๐ฃ๐๐ ๐๐๐๐๐ฃ๐ฃ๐ก๐ก๐๐๐๐๐๐๐ข๐ข๐ก๐ก๐๐๐๐๐๐: ๐๐๐๐ ๐ฃ๐ฃ โ ๐๐๐ฃ๐ฃ3/4
Limitation of shallow embedding techniques
โขToo many parametersโข Each node is associated with an embedding vector, which are parameters
โขNot inductiveโข Cannot handle new nodes
โขCannot handle node attributes
55
From shallow embedding to Graph Neural Networks
โขThe embedding function (encoder) is more complicatedโข Shallow embedding
โข๐๐ ๐ฃ๐ฃ = ๐๐๐๐๐ฅ๐ฅ๐ฃ๐ฃ, where U is the embedding matrix and ๐ฅ๐ฅ๐ฃ๐ฃ is the one-hot encoding vector
โข Graph neural networksโข๐๐ ๐ฃ๐ฃ is a neural network depending on the graph
structure
56
Knowledge Graph Embedding
Knowledge GraphโขWhat are knowledge graphs?
โข Multi-relational graph data โข (heterogeneous information network)
โข Provide structured representation for semantic relationships between real-world entities
58
A triple (h, r, t) represents a fact, ex: (Eiffel Tower, is located in, Paris)
Examples of KG
59
General-purpose KGs
Common-sense KGs & NLP
Bio & Medical KGs
Product Graphs & E-commerce
Applications of KGsโ Foundational to knowledge-driven AI systemsโ Enable many downstream applications (NLP
tasks, QA systems, etc)
60
QA & Dialogue systems
Sorry, I don't know that one.
Computational Biology
Natural Language Processing
Recommendation Systems
Knowledge Graphs
Knowledge Graph EmbeddingโขGoal:
โข Encode entities as low-dimensional vectors and relations as parametric algebraic operations
โขApplications:โข Dialogue agentsโข Question answeringโข Machine comprehensionโข Recommender systemsโข ...
61
Key Idea of KG embedding algorithms
โข Define a score function for a triple: ๐๐๐๐(๐๐, ๐๐)โข According to entity and relation representation
โข Define a loss function to guide the trainingโข E.g., an observed triple scores higher than a negative
one
62
Triple
Score Function
Summary of Existing Approaches
63
Source: Sun et al., RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space (ICLRโ19)
TransE: Score FunctionโขRelation: translating embedding
โขScore functionโข๐๐๐๐ ๐๐, ๐๐ = โ ๐๐ + ๐๐ โ ๐๐ = โ๐๐(๐๐ + ๐๐, ๐๐)
64
China
U.S.
UK
Beijing
D.C.
London
: Capitial
Bordes et al., Translating embeddings for modeling multi-relational data, NeurIPS 2013
TransE: Objective FunctionโขObjective Function
โข Margin-based ranking lossโข ๐ฟ๐ฟ = โ โ,๐๐,๐ก๐ก โ๐๐ โ(โโฒ,๐๐,๐ก๐กโฒ)โ๐๐ โ,๐๐,๐ก๐ก
โฒ [๐พ๐พ + ๐๐ (๐๐ +
65
TransE: Limitationsโข One-one mapping: ๐ก๐ก = ๐๐๐๐(๐ก)
โข Given (h,r), t is unique
โข Given (r,t), h is unique
โข Anti-symmetricโข If r(h,t) then r(t,h) is not true
โข Cannot model symmetric relation, e.g., friendship
โข Anti-reflexiveโข r(h,h) is not true
โข Cannot model reflexive relations, e.g., synonym
66
DistMultโข Bilinear score function
โข ๐๐๐๐ ๐๐, ๐๐ = ๐๐๐๐๐ด๐ด๐๐๐๐โข Where ๐ด๐ด๐๐ is a diagonal matrix with diagonal vector ๐๐
โข A simplification to neural tensor network (NTN)
โข Objective functionโข ๐ฟ๐ฟ = โ โ,๐๐,๐ก๐ก โ๐๐ โ(โโฒ,๐๐,๐ก๐กโฒ)โ๐๐ โ,๐๐,๐ก๐ก
โฒ [๐พ๐พ โ ๐๐๐๐ ๐๐, ๐๐ + ๐๐๐๐ ๐๐โฒ, ๐๐โฒ ]+
โข Limitationโข Can only model symmetric relation
โข ๐๐๐๐ ๐๐, ๐๐ = ๐๐๐๐ ๐๐,๐๐
67Yang et al., Embedding entities and relations for learning and inference in knowledge bases, ICLR 2015
RotatE: Score Functionโข Relation: rotation operation in complex space
โข head and tail entities in complex vector space, i.e., ๐ก๐ก, ๐ญ๐ญ โ โ๐๐
โข each relation r as an element-wise rotation from the head entity ๐ก๐ก to the tail entity ๐ญ๐ญ, i.e., โข ๐๐ = ๐๐ โ ๐๐, ๐๐. ๐๐. , ๐ก๐ก๐๐ = ๐ก๐๐๐๐๐๐ ,๐ค๐ค๐ก๐๐๐๐๐๐ ๐๐๐๐ = 1โข Equivalently, ๐๐๐๐ = ๐๐๐๐๐๐๐๐,๐๐ , ๐๐. ๐๐. , ๐๐๐๐๐ก๐ก๐๐๐ก๐ก๐๐ hi with ๐๐๐๐,๐๐
โข Score function:โข๐๐๐๐ ๐ก, ๐ก๐ก = โ||๐๐ โ ๐๐ โ ๐๐||
68
Zhiqing Sun, Zhihong Deng, Jian-Yun Nie, and Jian Tang. โRotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space.โ ICLRโ19.
RotatE: Geometric InterpretationโขConsider 1-d case
69
RotatE: Objective functionโขSmarter negative sampling
โข The negative triple with higher score is more likely to be sampled
โขCross-entropy loss
70
RotatE: Pros and Consโข Pros:
โข Can model relations with different propertiesโข Symmetric: ๐๐๐๐ = +1 ๐๐๐๐ โ 1โข Anti-symmetric: ๐๐ โ ๐๐ โ ๐๐โข Inverse relations: ๐ซ๐ซ2 = ๏ฟฝ๐ซ๐ซ1
โข E.g., hypernym is the inverse relation of hyponymโข Composition relations: ๐๐๐๐ = ๐๐๐๐ โ ๐๐๐๐, i. e. ,๐ฝ๐ฝ๐๐ = ๐ฝ๐ฝ๐๐ +๐ฝ๐ฝ๐๐, if ๐๐๐๐ = ๐๐๐๐๐ฝ๐ฝ๐๐ for j = 1,2,3
โข Cons:โข One-one mappingโข Relations are commutative: i.e., ๐๐๐๐ โ ๐๐๐๐ = ๐๐๐๐ โ ๐๐๐๐
โข Which is not always true, e.g., fatherโs wife โ wifeโs father
71
Contentโข Introduction to Graphs
โขSpectral analysis
โขShallow Embedding
โขGraph Neural Networks
72
NotationsโขAn attributed graph ๐บ๐บ = (๐๐,๐ธ๐ธ)
โข๐๐: vertex set
โข๐ธ๐ธ: edge set
โข๐ด๐ด: adjacency matrix
โข๐๐ โ ๐ ๐ ๐๐0ร|๐๐|: feature matrix for all the nodes
โข๐๐(๐ฃ๐ฃ): neighbors of node ๐ฃ๐ฃโข๐ก๐ฃ๐ฃ๐๐ : Representation vector of node ๐ฃ๐ฃ at Layer ๐๐
โข Note ๐ก๐ฃ๐ฃ0 = ๐ฅ๐ฅ๐ฃ๐ฃโข๐ป๐ป๐๐ โ ๐ ๐ ๐๐๐๐ร|๐๐|: representation matrix
73
The General Architecture of GNNsโขFor a node v at layer t
โข A function of representations of neighbors and itself from previous layersโข Aggregation of neighborsโข Transformation to a different spaceโข Combination of neighbors and the node itself
74
representation vector from previous layer for node v
representation vectors from previous layer for node vโs neighbors
Compare with CNNโขRecall CNN
โข Regular graph
โขGNNโข Extend to irregular graph structure
75
Graph Convolutional Network (GCN)
76
โขKipf and Welling, ICLRโ17โข , ๏ฟฝ๐ด๐ด = ๐ด๐ด + ๐ผ๐ผโข f: graph filter
โขFrom a node vโs perspective
๐พ๐พ๐๐:๐๐๐๐๐๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐๐๐๐๐ ๐๐๐๐ ๐ณ๐ณ๐๐๐๐๐๐๐๐ ๐๐, ๐๐๐๐๐๐๐๐๐๐๐ ๐ ๐๐๐๐๐๐๐๐๐๐๐๐ ๐ ๐ ๐๐๐ ๐ ๐ ๐ ๐๐๐๐๐๐๐ ๐ ๐๐ ๐ ๐ ๐๐๐ ๐ ๐๐๐๐
A toy example of 2-layer GCN on a 4-node graph
โขComputation graph
7777
GraphSAGEโข Inductive Representation Learning on Large Graphs
William L. Hamilton*, Rex Ying*, Jure Leskovec, NeurIPSโ17
78
A more general form
More about AGGโขMean
โขLSTMโข๐๐ โ : ๐๐ ๐๐๐๐๐๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐๐ข๐ข๐ก๐ก๐๐๐ก๐ก๐๐๐๐๐๐
โขPoolโข๐พ๐พ โ : Element-wise mean/max pooling of neighbor set
79
= ๐พ๐พ
Message-Passing Neural Networkโข Gilmer et al., 2017. Neural Message Passing for Quantum Chemistry. ICML.
โข A general framework that subsumes most GNNsโข Can also include edge information
โข Two stepsโข Get messages from neighbors at step k
โข Update the node latent represent based on the msg
80
e.g., Sum or MLP
e.g., LSTM, GRU
๐จ๐จ ๐๐๐๐๐๐๐๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐:๐ฎ๐ฎ๐ฎ๐ฎ๐ฎ๐ฎ๐ฎ๐ฎ, Li et al., Gated graph sequence neural networks, ICLR 2015
Graph Attention Network (GAN)โข How to decide the importance of neighbors?
โข GCN: a predefined weight
โข Others: no differentiation
โข GAN: decide the weights using learnable attentionโข Velickovic et al., 2018. Graph Attention Networks. ICLR.
81
The attention mechanismโขPotentially many possible designs
82
Heterogeneous Graph Transformer (HGT)
โขHow to handle heterogeneous types of nodes and relations?โข Introduce different weight matrices for different types of nodes and relations
โข Introduce different attention weight matrices for different types of nodes and relations
83
๐ก๐ก
๐ฃ๐ฃ2๐ฃ๐ฃ1
Paper
Author Paper
CiteWriteHu et al., Heterogeneous Graph Transformer, WWWโ20
Meta-Relation-based Parametrization
โข Introduce node- and edge- dependent parameterizationโข Leverage meta relation <source node type, edge type, target node type> to parameterize attention and message passing weight.
84
W<Author, Write, Paper>=WAuthor WWrite WPaper
W<Paper, Cite, Paper>=WPaper WCite WPaper
๐ฃ๐ฃ2๐ฃ๐ฃ1Author Paper
CiteWrite
๐ก๐ก
Meta-Relation-based AttentionโขAttention learning is also parameterized based on node type and link type
85
Downstream Tasks for Graphs
Typical Graph FunctionsโขNode level
โข Similarity search
โข Link prediction
โข Classification
โข Community detection
โข Ranking
87
โขGraph levelโข Similarity search
โข Frequent pattern mining
โข Graph isomorphism test
โข Graph matching
โข Classification
โข Clustering
โข Graph generation
1. Semi-supervised Node Classification
โขDecoder using ๐ง๐ง๐ฃ๐ฃ = ๐ก๐ฃ๐ฃ๐ฟ๐ฟโข Feed into another fully connected layer
โข ๏ฟฝ๐ฆ๐ฆ๐ฃ๐ฃ = ๐๐(๐๐๐๐๐ง๐ง๐ฃ๐ฃ)โขLoss function
โข Cross entropy loss
โข In a binary classification caseโข ๐๐๐ฃ๐ฃ = ๐ฆ๐ฆ๐ฃ๐ฃ๐๐๐๐๐๐ ๏ฟฝ๐ฆ๐ฆ๐ฃ๐ฃ + (1 โ ๐ฆ๐ฆ๐ฃ๐ฃ)log(1 โ ๏ฟฝ๐ฆ๐ฆ๐ฃ๐ฃ)
88
Applications of Node ClassificationโขSocial network
โข An account is bot or not
โขCitation networkโข A paperโs research field
โขA program-derived graphโข The type of a variable
89
2. Link PredictionโขDecoder using ๐ง๐ง๐ฃ๐ฃ = ๐ก๐ฃ๐ฃ๐ฟ๐ฟ
โข Given a node pair ๐ข๐ข, ๐ฃ๐ฃโข Determine its probability ๐๐๐ข๐ข๐ฃ๐ฃ = ๐ง๐ง๐ข๐ข๐๐๐ ๐ ๐ง๐ง๐ฃ๐ฃโข R could be different for different relation type
โขLoss functionโข Cross entropy loss
โข ๐๐๐ข๐ข๐ฃ๐ฃ = ๐ฆ๐ฆ๐ข๐ข๐ฃ๐ฃ๐๐๐๐๐๐๐๐๐ข๐ข๐ฃ๐ฃ + (1 โ ๐ฆ๐ฆ๐ข๐ข๐ฃ๐ฃ)log(1 โ ๐๐๐ข๐ข๐ฃ๐ฃ)
90
Link Prediction ApplicationsโขSocial network
โข Friend recommendation
โขCitation networkโข Citation recommendation
โขMedical networkโข Drug and target binding or not
โขA program-derived graphโข Code autocomplete
91
3. Graph ClassificationโขDecoder using ๐ก๐บ๐บ = ๐๐( ๐ง๐ง๐ฃ๐ฃ ๐ฃ๐ฃโ๐๐)
โข๐๐ โ : ๐๐ ๐๐๐๐๐๐๐๐ ๐๐๐ข๐ข๐ก๐ก ๐๐๐ข๐ข๐๐๐ฃ๐ฃ๐ก๐ก๐๐๐๐๐๐, e.g., sum
โข Feed ๐ก๐บ๐บ into another fully connected layer
โข ๏ฟฝ๐ฆ๐ฆ๐บ๐บ = ๐๐(๐๐๐๐๐ก๐บ๐บ)โขLoss function
โข Cross entropy loss
โข In a binary classification caseโข ๐๐๐บ๐บ = ๐ฆ๐ฆ๐บ๐บ๐๐๐๐๐๐ ๏ฟฝ๐ฆ๐ฆ๐บ๐บ + (1 โ ๐ฆ๐ฆ๐บ๐บ)log(1 โ ๏ฟฝ๐ฆ๐ฆ๐บ๐บ)
92
Graph Classification ApplicationsโขChemical compounds
โข Toxic or not
โขProteinsโข Has certain function or not
โขProgram-derived graphsโข Contains bugs or not
93
4. Graph Similarity ComputationโขDecoder using ๐ก๐บ๐บ= ๐๐( ๐ง๐ง๐ฃ๐ฃ ๐ฃ๐ฃโ๐๐)
โข Given a graph pair ๐บ๐บ1,๐บ๐บ2โข Determine its score ๐ฃ๐ฃ๐บ๐บ1๐บ๐บ2 = ๐ก๐บ๐บ1
๐๐ ๐ ๐ ๐ก๐บ๐บ2โขLoss function
โข E.g., Square lossโข ๐๐๐บ๐บ1๐บ๐บ2 = (๐ฆ๐ฆ๐บ๐บ1๐บ๐บ2โ๐ฃ๐ฃ๐บ๐บ1๐บ๐บ2)^2
94
A Concrete solution by SimGNN [Bai et al., AAAI 2019]
โข Goal: learn a GNN ๐๐: ๐ข๐ข ร ๐ข๐ข โ ๐ ๐ + to approximate Graph Edit Distance between two graphs
95
1. Attention-based graph-level embedding2. Histogram features from pairwise node similarities
Graph Similarity Computation Applications
โขDrug databaseโข Drug similarity search
โขProgram databaseโข Code recommendation
โข Search ninja code for novice code โข Search java code for COBOL code
96
Summaryโข Introduction to Graphs
โขSpectral analysis
โขShallow Embedding
โขGraph Neural Networks
97