arbitrary-order proximity preserved network …6 methods based on random-walks deepwalk, b. perozzi,...
TRANSCRIPT
![Page 1: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/1.jpg)
Arbitrary-Order Proximity Preserved
Network Embedding
Ziwei Zhang Peng Cui Xiao Wang Jian Pei Xuanrong Yao Wenwu Zhu
Tsinghua U Tsinghua U Tsinghua U JD&Simon Fraser U Tsinghua U Tsinghua U
![Page 2: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/2.jpg)
2
Network Data is Ubiquitous
Social Network Biology Network
Traffic Network
![Page 3: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/3.jpg)
3
Network Embedding: Vector Representation of Nodes
Generate
Embed
Apply feature-based machine
learning algorithms
Fast compute nodes similarity
Support parallel computing
Applications: link prediction,
node classification, community
detection, measuring centrality,
anomaly detection ...
![Page 4: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/4.jpg)
4
High-order proximity: key in capturing the underlying structure of networks
Advantages:
Solve the sparsity problem of network connections
Measure indirect relationship between nodes
High-Order Proximity
![Page 5: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/5.jpg)
5
Different networks/tasks require different high-order proximities
E.g., multi-scale classification (Bryan Perozzi, et al, ASONAM, 2017)
E.g., networks with different scales and sparsity
Proximities of different orders can also be arbitrarily weighted
E.g., equal weights, exponentially decayed weights (Katz)
Different High-Order Proximities
![Page 6: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/6.jpg)
6
Methods based on random-walks
DeepWalk, B. Perozzi, et al. KDD 2014.
LINE, J. Tang, et al. WWW 2015.
Node2vec, A. Grover, et al. KDD 2016.
Random walks on networks + skip-gram model from NLP
Methods based on matrix factorization
GraRep, S. Cao, et al. CIKM, 2015.
HOPE, M. Ou, et al. KDD 2016.
M-NMF, X. Wang, et al. AAAI 2017.
Objective function based on matrix factorization + optimization
Methods based on deep learning
SDNE, D. Wang, et al. KDD 2016.
DVNE, D. Zhu, et al. KDD 2018.
Deep auto-encoder to preserve the non-linearity
Existing Methods
![Page 7: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/7.jpg)
7
Existing methods can only preserve one fixed high-order proximity
Different high-order proximities have to be calculated separately
→ How to preserve arbitrary-order proximity simultaneously?
Key question: what is the underlying relationship between different proximities?
Existing Methods (cont.)
……Proximity1
Proximity2 Proximity3 Proximity4
Embedding1 Embedding2 Embedding3 Embedding4
Time consuming!
![Page 8: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/8.jpg)
8
Problem Formulation
High-order proximity: a polynomial function of the adjacency matrix
𝑆 = ℱ 𝐴 = 𝑤1𝐴1 + 𝑤2𝐴
2 +⋯+𝑤𝑞𝐴𝑞
𝑞: order; 𝑤1…𝑤𝑞: weights, assuming to be non-negative
𝐴: could be replaced by other variations (such as the Laplacian matrix)
Objective function: matrix factorization
min𝑈∗,𝑉∗
𝑆 − 𝑈∗𝑉∗𝑇𝐹
2
𝑈∗, 𝑉∗ ∈ ℝ𝑁×𝑑: left/right embedding vectors
d: dimensionality of the space
Optimal solution: Singular Value Decomposition (SVD)
𝑈, Σ, 𝑉 : top-d SVD results
𝑈∗ = 𝑈 Σ, 𝑉∗ = 𝑉 Σ
However, direct calculation is time-consuming
![Page 9: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/9.jpg)
9
Problem Transformation
Problem Transformation
𝑈, Σ, 𝑉 : top-d SVD . Λ, X : top-d eigen-decomposition
Theorem:
How to solve Λ, X for 𝑆 = 𝑓 𝐴 = 𝑤1𝐴1 + 𝑤2𝐴
2 +⋯+𝑤𝑞𝐴𝑞
![Page 10: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/10.jpg)
10
Eigen-decomposition Reweighting
Eigen-decomposition reweighting
𝐴𝑥 = 𝜆𝑥 → 𝐴2𝑥 = 𝜆2𝑥 → ℱ 𝐴 𝑥 = ℱ 𝜆 𝑥
Insights: high-order proximity is simply re-weighting dimensions!
Eigenvectors as coordinates, eigenvalues as weights
𝐴 𝑋ΛEigen-decomposition
𝑆
Polynomial ℱ · Polynomial ℱ ·
𝑋ℱ ΛEigen-decomposition
Time Consuming!
Time Consuming!
Efficient!
Efficient!
![Page 11: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/11.jpg)
Re-ordering of dimensions
d vs. l: 𝑙 ≈ 2𝑑
Proven for random (Erdos-Renyi), random power-law networks
Verified on experiments
11
Eigen-decomposition Reweighting (cont.)
𝜆1 𝜆2 𝜆3 𝜆𝑑…
𝜆1′ 𝜆2
′ 𝜆3′ 𝜆𝑑
′…
polynomial
function
top-d eigen-decomposition of 𝐴
top-d eigen-decomposition of 𝑆
top-l eigen-decomposition of 𝐴
top-d eigen-decomposition of 𝑆
×
√
𝜆𝑙𝜆1 𝜆2 𝜆3 𝜆𝑑…
𝜆1′ 𝜆2
′ 𝜆3′ 𝜆𝑑
′…
polynomial
function
…
?
![Page 12: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/12.jpg)
12
Preserving Arbitrary-Order Proximity
Shifting across different orders/weights:
Preserve arbitrary-order proximity simultaneously
Low marginal cost for preserving multiple proximities
Accurate (global optimal) and efficient (linear time complexity)
Eigen-decomposition𝑋Λ
……
Embedding1
Embedding2
Embedding3
Efficient!
Shifting
Embedding4
![Page 13: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/13.jpg)
13
Algorithm Framework
Time complexity: 𝑂 𝑇 𝑁𝑙2 +𝑀𝑙 + 𝑟 𝑙 + 𝑁𝑑
𝑁: number of nodes; 𝑀: number of edges; 𝑇: iteration; 𝑑: embedding
dimension (𝑙 ≈ 2𝑑); 𝑟: number of shifting
Linear w.r.t. the network size
Marginal cost for preserving multiple proximities
![Page 14: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/14.jpg)
14
Special Cases of the Proposed Method
Common Neighbors: the second order
𝑆 = 𝐴2
Propagation: weighted combination of the second and the third order
𝑆 = 𝑤2𝐴2 +𝑤3𝐴
3
Katz Proximity: infinite order with exponentially decayed weights
𝑆 =
𝑖=1
+∞
𝛽𝑖𝐴𝑖
Eigenvector Centrality: the first dimension
𝑈∗ : , 1 ∝ 𝑒𝑖𝑔𝑒𝑛𝑣𝑒𝑐𝑡𝑜𝑟_𝑐𝑒𝑛𝑡𝑟𝑎𝑙𝑖𝑡𝑦
Regardless of what high-order proximity is
![Page 15: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/15.jpg)
15
Experimental Setting: Datasets
Datasets:
BlogCatalog, Flickr, Youtube: online social networks where
nodes represent users and edges represent relationships
between users.
Wiki: wikipedia hyperlinks, where each node represents a
page and each edge represents a hyperlink between two
pages. The edges are treated as undirected.
![Page 16: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/16.jpg)
16
Experimental Setting: Baselines
Baselines:
DeepWalk (KDD 2014): DFS random walk + skip-gram
LINE (WWW 2015): BFS random walk + skip-gram
Node2vec (KDD 2016): biased random walk + skip-gram
SDNE (KDD 2016): deep auto-encoder
NEU (IJCAI 2017): matrix factorization approximation
Our method:
AROPE: search q from {1,2,3,4} and grid search weights
AROPE-F: search q from {1,2,3,4} while fixing weights 𝑤𝑖 = 0.1𝑖
Limit the search space for hyper-parameters
Code: https://github.com/ZW-ZHANG/AROPE
![Page 17: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/17.jpg)
17
Experimental Results
Preserving the High-Order Proximity
Achieves the global optimal solution while being extremely efficient
![Page 18: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/18.jpg)
18
Experimental Results
Network Reconstruction
+100%+100%
+100%
Better preserve network structure
![Page 19: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/19.jpg)
19
Experimental Results
Link Prediction
+200%+100%
Good inference ability: preserve arbitrary-order proximity
![Page 20: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/20.jpg)
20
Experimental Results
Node structural role classification (struc2vec, KDD 2017)
Capture the structural role of nodes
![Page 21: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/21.jpg)
21
Experimental Results
Parameter analysis
The optimal order varies greatly on different tasks and datasets
![Page 22: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/22.jpg)
22
Experimental Results
Scalability analysis
Linear scalability w.r.t. number of nodes and number of edges
(< 2 hours on network with 1 million nodes and 10 millions edges in a single PC)
![Page 23: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/23.jpg)
23
Conclusion
Study the problem of preserving arbitrary-order proximity in network
embedding
Different networks/tasks require different proximities
Eigen-decomposition Reweighting
The intrinsic relationship between different proximities is reweighting and
reordering dimensions
Preserving arbitrary-order proximity
Incorporate many commonly used proximity measures as special cases
Experimental results:
+100% improvements in network reconstruction and link prediction
Capture the structural roles of node
Linear scalability
![Page 24: Arbitrary-Order Proximity Preserved Network …6 Methods based on random-walks DeepWalk, B. Perozzi, et al. KDD 2014. LINE, J. Tang, et al. WWW 2015. Node2vec, A. Grover, et al. KDD](https://reader035.vdocuments.mx/reader035/viewer/2022070718/5ede2935ad6a402d6669759e/html5/thumbnails/24.jpg)
Thanks!Ziwei Zhang, Tsinghua University
https://zw-zhang.github.io/
http://nrl.thumedialab.com/
24