netsmf: large-scale network embedding as sparse matrix ... · two genres of network embedding...

34
NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization Jiezhong Qiu Tsinghua University June 17, 2019 Joint work with Yuxiao Dong (MSR), Hao Ma (Facebook AI), Jian Li (IIIS, Tsinghua), Chi Wang (MSR), Kuansan Wang (MSR), and Jie Tang (DCST, Tsinghua) 1 / 32

Upload: others

Post on 31-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

NetSMF: Large-Scale Network Embedding asSparse Matrix Factorization

Jiezhong Qiu

Tsinghua University

June 17, 2019

Joint work with Yuxiao Dong (MSR), Hao Ma (Facebook AI),Jian Li (IIIS, Tsinghua), Chi Wang (MSR), Kuansan Wang (MSR),

and Jie Tang (DCST, Tsinghua)

1 / 32

Page 2: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

Motivation and Problem Formulation

Problem FormulationGive a network G = (V,E), aim to learn a function f : V → Rp tocapture neighborhood similarity and community membership.

Applications:

I link prediction

I community detection

I label classification

Figure 1: A toy example (Figure from DeepWalk).

2 / 32

Page 3: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

Two Genres of Network Embedding Algorithm

I Local Context Methods:I LINE, DeepWalk, node2vec, metapath2vec.I Usually be formulated as a skip-gram-like problem, and

optimized by SGD.

I Global Matrix Factorization Methods.I NetMF, GraRep, HOPE.I Leverage global statistics of the input networks.I Not necessarily a gradient-based optimization problem.I Usually requires explicit construction of the matrix to be

factorized.

3 / 32

Page 4: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

Notations

Consider an undirected weighted graph G = (V,E) , where|V | = n and |E| = m.

I Adjacency matrix A ∈ Rn×n+ :

Ai,j =

{ai,j > 0 (i, j) ∈ E0 (i, j) 6∈ E .

I Degree matrix D = diag(d1, · · · , dn), where di is thegeneralized degree of vertex i.

I Volume of the graph G: vol(G) =∑

i

∑j Ai,j .

4 / 32

Page 5: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

Contents

Revisit DeepWalk and NetMF

NetSMF: Network Embedding as Sparse Matrix Factorization

Experimental Results

5 / 32

Page 6: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

DeepWalk and NetMF

//

RandomWalk Skip-gram

Output:Node

EmbeddingInputG=(V,E)

6 / 32

Page 7: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

DeepWalk and NetMF

RandomWalk Skip-gram

Output:Node

EmbeddingInputG=(V,E)

Levy & Goldberg (NIPS 14)

//

#(w, c) #(w)

#(c)

Co-occurrence of w and c Occurrence of word w

Occurrence of context c|D| Total number of word-context pairs

b Number of negative samples

7 / 32

Page 8: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

DeepWalk and NetMF

RandomWalk Skip-gram

Output:Node

EmbeddingInputG=(V,E)

Levy & Goldberg (NIPS 14)

//

#(w, c) #(w)

#(c)

Co-occurrence of w and c Occurrence of word w

Occurrence of context c|D| Total number of word-context pairs

b Number of negative samples

8 / 32

Page 9: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

DeepWalk and NetMF

RandomWalk Skip-gram

Output:Node

EmbeddingInputG=(V,E)

Levy & Goldberg (NIPS 14)

//

Adjacency matrix

Degree matrix b Number of negative samples

9 / 32

Page 10: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

DeepWalk and NetMF

RandomWalk Skip-gram

Output:Node

EmbeddingInputG=(V,E)

Levy & Goldberg (NIPS 14)

MatrixFactorization

//

10 / 32

Page 11: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

Contents

Revisit DeepWalk and NetMF

NetSMF: Network Embedding as Sparse Matrix Factorization

Experimental Results

11 / 32

Page 12: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

Contents

Revisit DeepWalk and NetMF

NetSMF: Network Embedding as Sparse Matrix Factorization

Experimental Results

12 / 32

Page 13: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

Computation Challanges of NetMF

For small world networks,

vol(G)

b

1

T

T∑r=1

(D−1A

)r︸ ︷︷ ︸matrix power

D−1 is always a dense matrix .

Why?

I In small world networks, each pair of vertices (i, j) can reacheach other in a small number of hops.

I Make the corresponding matrix entry a positive value.

Idea

I Sparse matrix is easier to handle.

I Can we achieve a matrix sparse but ‘good enough’ matrix.

13 / 32

Page 14: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

Computation Challanges of NetMF

For small world networks,

vol(G)

b

1

T

T∑r=1

(D−1A

)r︸ ︷︷ ︸matrix power

D−1 is always a dense matrix .

Why?

I In small world networks, each pair of vertices (i, j) can reacheach other in a small number of hops.

I Make the corresponding matrix entry a positive value.

Idea

I Sparse matrix is easier to handle.

I Can we achieve a matrix sparse but ‘good enough’ matrix.

13 / 32

Page 15: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

Computation Challanges of NetMF

For small world networks,

vol(G)

b

1

T

T∑r=1

(D−1A

)r︸ ︷︷ ︸matrix power

D−1 is always a dense matrix .

Why?

I In small world networks, each pair of vertices (i, j) can reacheach other in a small number of hops.

I Make the corresponding matrix entry a positive value.

Idea

I Sparse matrix is easier to handle.

I Can we achieve a matrix sparse but ‘good enough’ matrix.

13 / 32

Page 16: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

Observation

DefinitionFor

∑Tr=1 αr = 1 and αr non-negative,

L = D −T∑

r=1

αrD(D−1A

)r(1)

is a T -degree random-walk matrix polynomial.

ObservationFor α1 = · · · = αT = 1

T :

log◦

(vol(G)

b

(1

T

T∑r=1

(D−1A

)r)D−1

)

= log◦(vol(G)

bD−1(D −L)D−1

)≈ log◦

(vol(G)

bD−1(D − L)D−1

)14 / 32

Page 17: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

Random-walk Matrix Polynomial Sparsification

Theorem[CCL+15] For random-walk matrix polynomialL = D −∑T

r=1 αrD(D−1A

)r, one can construct, in time

O(T 2mε−2 log2 n), a (1 + ε)-spectral sparsifier, L, withO(n log nε−2) non-zeros. For unweighted graphs, the timecomplexity can be reduced to O(T 2mε−2 log n).

15 / 32

Page 18: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

NetSMF—Algorithm

The proposed NetSMF algorithm consists of three steps:

I Construct a random walk matrix polynomial sparsifier, L, bycalling PathSampling algorithm proposed in [CCL+15].

I Construct a NetMF matrix sparsifier.

trunc log◦(vol(G)

bD−1(D − L)D−1

)

I Truncated randomized singular value decomposition.

Detailed Algorithm

16 / 32

Page 19: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

Algorithm Details

PathSampling:I Sample an edge (u, v) from edge set.I Start very short random walk from u and arrive u′.I Start very short random walk from v and arrive v′.I Record vertex pair (u′, v′).

Randomized SVD:I Project origin matrix to low dimensional space by Gaussian

random matrix.I Deal with the projected small matrix.

17 / 32

Page 20: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

NetSMF — System Design

Figure 2: The System Design of NetSMF.

18 / 32

Page 21: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

Contents

Revisit DeepWalk and NetMF

NetSMF: Network Embedding as Sparse Matrix Factorization

Experimental Results

19 / 32

Page 22: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

Setup

Label Classification:

I BlogCatelog, PPI, Flickr, YouTube, OAG.

I Logistic Regression

I NetSMF (T = 10), NetMF (T = 10), DeepWalk, LINE.

Table 1: Statistics of Datasets.

Dataset BlogCatalog PPI Flickr YouTube OAG|V | 10,312 3,890 80,513 1,138,499 67,768,244|E| 333,983 76,584 5,899,882 2,990,443 895,368,962

#Labels 39 50 195 47 19

20 / 32

Page 23: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

Experimental Results

20253035404550

Micr

o-F1

(%)

BlogCatalog

05

1015202530 PPI

15202530354045 Flickr

20253035404550 YouTube

20253035404550 OAG

25 50 7510152025303540

Mac

ro-F

1 (%

)

25 50 7505

1015202530

1 2 3 4 5 6 7 8 9 10Training Ratio (%)

05

1015202530

1 2 3 4 5 6 7 8 9 1015202530354045

1 2 3 4 5 6 7 8 9 1005

1015202530

DeepWalk LINE node2vec NetMF NetSMF

Figure 3: Predictive performance on varying the ratio of training data.The x-axis represents the ratio of labeled data (%), and the y-axis in thetop and bottom rows denote the Micro-F1 and Macro-F1 scoresrespectively.

21 / 32

Page 24: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

Running Time

Table 2: Running Time

LINE DeepWalk node2vec NetMF NetSMFBlogCatalog 40 mins 12 mins 56 mins 2 mins 13 mins

PPI 41 mins 4 mins 4 mins 16 secs 10 secsFlickr 42 mins 2.2 hours 21 hours 2 hours 48 mins

YouTube 46 mins 1 day 4 days × 4.1 hoursOAG 2.6 hours – – × 24 hours

22 / 32

Page 25: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

Conclusion and Future Work

We propose NetSMF, a scalable, efficient, and effective networkembedding algorithm.

Future Work

I A distributed-memory implementation.

I Extension to directed, dynamic, heterogeneous graphs.

23 / 32

Page 26: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

Thanks.I Network Embedding as Matrix Factorization: Unifying DeepWalk,

LINE, PTE, and node2vec (WSDM ’18)

I NetSMF: Network Embedding as Sparse Matrix Factorization(WebConf ’19)

Code for NetMF available at github.com/xptree/NetMFCode for NetSMF available at github.com/xptree/NetSMF

Q&A

24 / 32

Page 27: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

On the Large-dimensionality Assumption of [LG14]

Recall the objective of skip-gram model:

minX,YL(X,Y )

where

L(X,Y ) = |D|∑w

∑c

(#(w, c)

|D| log g(x>wyc) + b#(w)

|D|#(c)

|D| log g(−x>wyc)

)

TheoremFor DeepWalk, when the length of random walk L→∞,

#(w, c)

|D|p→ 1

2T

T∑r=1

(dw

vol(G)(P r)w,c +

dcvol(G)

(P r)c,w

).

#(w)

|D|p→ dw

vol(G)and

#(c)

|D|p→ dc

vol(G).

Back

25 / 32

Page 28: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

NetSMF — Approximation Error

Denote M = D−1 (D −L)D−1 in

trunc log◦(vol(G)

bD−1(D − L)D−1

),

and M to be its sparsifier the we constructed.

TheoremThe singular value of M −M satisfies

σi(M −M) ≤ 4ε√didmin

,∀i ∈ [n].

TheoremLet ‖·‖F be the matrix Frobenius norm. Then

∥∥∥∥trunc log◦(vol(G)

bM

)− trunc log◦

(vol(G)

bM

)∥∥∥∥F

≤ 4ε vol(G)

b√dmin

√√√√ n∑i=1

1

di.

26 / 32

Page 29: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

Spectrally Similar

DefinitionSuppose G = (V,E,A) and G = (V, E, A) are two weighted

undirected networks. Let L = DG −A and L = DG− A be their

Laplacian matrices, respectively. We define G and G are(1 + ε)-spectrally similar if

∀x ∈ Rn, (1− ε) · x>Lx ≤ x>Lx ≤ (1 + ε) · x>Lx.

27 / 32

Page 30: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

NetSMF—Algorithm

Algorithm 1: NetSMFInput : A social network G = (V,E,A) which we want to learn network embedding;

The number of non-zeros M in the sparsifier; The dimension of embedding d.Output: An embedding matrix of size n× d, each row corresponding to a vertex.

1 G← (V, ∅, A = 0)

/* Create an empty network with E = ∅ and A = 0. */

2 for i← 1 to M do3 Uniformly pick an edge e = (u, v) ∈ E4 Uniformly pick an integer r ∈ [T ]5 u′, v′, Z ← PathSampling(e, r)

6 Add an edge(u′, v′, 2rm

MZ

)to G

/* Parallel edges will be merged into one edge, with their weights

summed up together. */

7 end

8 Compute L to be the unnormalized graph Laplacian of G

9 Compute M = D−1(D − L

)D−1

10 Ud,Σd,Vd ← RandomizedSVD(trunc log◦(

vol(G)b

M), d)

11 return Ud√

Σd as network embeddings

Back

28 / 32

Page 31: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

NetSMF—Algorithm

Algorithm 2: PathSampling algorithm as described in [CCL+15].

1 Procedure PathSampling(e = (u, v), r)2 Uniformly pick an integer k ∈ [r]3 Perform (k − 1)-step random walk from u to u04 Perform (r − k)-step random walk from v to ur5 Keep track of Z(p) =

∑ri=1

2Aui−1,ui

along the length-r path p

between u0 and ur6 return u0, ur, Z(p)

Back

29 / 32

Page 32: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

Randomized SVD

Algorithm 3: Randomized SVD on NetMF Matrix Sparsifier

1 Procedure RandomizedSVD(A, d)2 Sampling Gaussian random matrix O // O ∈ Rn×d

3 Compute sample matrix Y = A>O = AO // Y ∈ Rn×d

4 Orthonormalize Y

5 Compute B = AY // B ∈ Rn×d

6 Sample another Gaussian random matrix P // P ∈ Rd×d

7 Compute sample matrix of Z = BP // Z ∈ Rn×d

8 Orthonormalize Z

9 Compute C = Z>B // C ∈ Rd×d

10 Run Jacobi SVD on C = UΣV >

11 return ZU , Σ, Y V

/* Result matrices are of shape n× d, d× d, n× d resp. */

30 / 32

Page 33: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

Time and Space Complexity

Table 3: Time and Space Complexity of NetSMF.

Time Space

Step 1O(MT log n) for weighted networksO(MT ) for unweighted networks

O(M + n+m)

Step 2 O(M) O(M + n)Step 3 O(Md+ nd2 + d3) O(M + nd)

31 / 32

Page 34: NetSMF: Large-Scale Network Embedding as Sparse Matrix ... · Two Genres of Network Embedding Algorithm I Local Context Methods: I LINE, DeepWalk, node2vec, metapath2vec. I Usually

References I

Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, andShang-Hua Teng, Spectral sparsification of random-walkmatrix polynomials, arXiv preprint arXiv:1502.03496 (2015).

Omer Levy and Yoav Goldberg, Neural word embedding asimplicit matrix factorization, Advances in neural informationprocessing systems, 2014, pp. 2177–2185.

32 / 32