multi-level graph convolutional networks for cross …multi-level graph convolutional networks for...

Multi-level Graph Convolutional Networks for Cross-platformAnchor Link Prediction

Hongxu ChenUniversity of Technology Sydney

[email protected]

Hongzhi Yin∗The University of Queensland

[email protected]

Xiangguo SunSoutheast University

[email protected]

Tong ChenThe University of Queensland

[email protected]

Bogdan GabrysUniversity of Technology Sydney

[email protected]

Katarzyna MusialUniversity of Technology Sydney

[email protected]

ABSTRACTCross-platform account matching plays a significant role in socialnetwork analytics, and is beneficial for a wide range of applications.However, existing methods either heavily rely on high-quality usergenerated content (including user profiles) or suffer from data in-sufficiency problem if only focusing on network topology, whichbrings researchers into an insoluble dilemma of model selection.In this paper, to address this problem, we propose a novel frame-work that considers multi-level graph convolutions on both localnetwork structure and hypergraph structure in a unified manner.The proposed method overcomes data insufficiency problem ofexisting work and does not necessarily rely on user demographicinformation. Moreover, to adapt the proposed method to be capableof handling large-scale social networks, we propose a two-phasespace reconciliation mechanism to align the embedding spacesin both network partitioning based parallel training and accountmatching across different social networks. Extensive experimentshave been conducted on two large-scale real-life social networks.The experimental results demonstrate that the proposed methodoutperforms the state-of-the-art models with a big margin.

CCS CONCEPTS• Information systems→ Data mining.

KEYWORDSAnchor Link Prediction; Account Matching; Network Embedding;ACM Reference Format:Hongxu Chen, Hongzhi Yin, Xiangguo Sun, Tong Chen, Bogdan Gabrys,and Katarzyna Musial. 2020. Multi-level Graph Convolutional Networksfor Cross-platform Anchor Link Prediction. In Proceedings of the 26th ACMSIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’20),August 23–27, 2020, Virtual Event, CA, USA. ACM, New York, NY, USA,9 pages. https://doi.org/10.1145/3394486.3403201∗Corresponding author and having equal contribution with the first author.

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected] ’20, August 23–27, 2020, Virtual Event, CA, USA© 2020 Association for Computing Machinery.ACM ISBN 978-1-4503-7998-4/20/08. . . $15.00https://doi.org/10.1145/3394486.3403201

1 INTRODUCTIONNowadays, most people participate in more than one Online So-cial Network (OSN), such as Facebook, Twitter, Weibo, Linkedin.More often than not, users sign up at different OSNs for differentpurposes, and different OSNs show different views and aspectsof people. For example, a user makes connections to their friendson Facebook, but uses Linkedin to connect to his/her colleagues,interested companies and seek job opportunities. Though differentOSNs exhibit distinct features and functionalities, a large portionof overlapping individual user accounts across different social plat-forms have been always witnessed. However, the information aboutmultiple accounts that belong to the same individual is not explic-itly given in most social networks due to either privacy concernsor lack of motivation [26, 27].

The problem of matching accounts that belong to the sameindividual from different social networks is defined as AccountMapping [34], Social Network De-anonymization[28, 44, 46]or Social Anchor Link Prediction [11, 26, 42] in Data Miningresearch field. Account Matching across different social platformsplays a fundamental and significant role in social network analyticsas it helps improve many downstream applications, such as on-line personalized services [5], link prediction [1], recommendersystems [25, 35, 40, 41], biology protein-protein alignment for age-ing related complexes [13], and criminal behaviour detection [34].Although much attention has been dedicated to this challengingsubject, there is still plenty of room for improvement. Previousstudies [19, 22, 24, 31] proposed to solve this problem by exploitingavailable auxiliary information such as self-generated user profiles,daily generated content and other demographic features (e.g., username, profile picture, location, gender, post, blogs, reviews, etc.).However, with the increased public awareness of privacy and in-formation rights, these information is becoming less available andaccessible.

Recently, with the advances in Network Embedding (NE) tech-niques, research attention related to this problem has been shiftedto focus on mining network structure information [11, 23, 26, 34]as it has been claimed that the social network structural data ismuch more reliable in terms of correctness and completeness. How-ever, only focusing on modelling the network structure itself makesalmost all existing methods suffer from data insufficiency prob-lems, especially in small-scale networks and cold-start settings (i.e.,a user is new to the network). Therefore, it has been a dilemma

arX

iv:2

006.

0196

3v1

[cs

.SI]

2 J

un 2

020

https://doi.org/10.1145/3394486.3403201

https://doi.org/10.1145/3394486.3403201

KDD ’20, August 23–27, 2020, Virtual Event, CA, USA H. Chen, H. Yin, et al.

confronting practitioners in the real-world scenarios, and effectivesolutions are urgently needed.

In light of this, we propose to exploit and integrate the hyper-graph information distilled from the original network for data en-hancement. In the rest of the paper, we use the terms “simple graph”and “hypergraph” to denote original network and hypergraphs ex-tracted from original network, respectively. Compared to simplegraphs, hypergraphs allow one edge (a.k.a., heperedge) to connectmore than two nodes simultaneously. This means non-pairwiserelations among nodes in a graph can be easily organized and rep-resented as hyperedges. Moreover, hypergraphs are robust, flexibleand can fit a wide variety of social networks, no matter the givennetworks are pure social networks or heterogenous social networkswith various types of attributes and links.

More specifically, we propose a novel embedding frameworkMulti-level Graph Convolutional Networks, namely MGCN,to jointly learn embeddings for network vertices at different lev-els of granularity w.r.t. flexible GCN kernels (i.e., simple graphGCN, hypergraph GCN). Simple graph structure information ofsocial networks reveals relationships among users (e.g., friendships,followers), while hypergraphs carry different semantic meaningsdepending on their specific definitions in a social network. Forexample, N-hop neighbour-based hypergraphs (N-hop neighboursof a user are connected via a same hyperedge) represent friendscircle in some extent. Centrality-based hypergraphs represent dif-ferent social levels (users with similar centrality values may be ofsame social status). Therefore, by defining various hypergraphsand intergating them into network embedding learning will fa-cilitate learning better user representations. To support this, ourproposed MGCN framework is flexible and can incorporate varioushypergraph definitions, which can take any hypergraphs as vectorrepresentations, making the model structure invariant to varioushypergraph definitions.

The rationale behind exploiting and integrating hypergraphs byextending GCN is that hypergraphs provide a more flexible networkrepresentation that can contain additional and richer informationcompared to individual, single graph GCNs on local network topol-ogy. It has been found that the optimal number of GCN layers isalways set to two in most cases because adding more layers cannotsignificantly improve the performance [17]. As a result, GCNs areonly able to capture the local information around a node in net-works. This phenomenon also makes solo GCN contradictory andthus perform mediocrely on account matching task as the key tothe task is to explore more and deeper information to make thepredictions. Intuitively, defining GCNs on hypergrpahs extractedfrom original networks will be complementary to the limitationsof existing GCN-based network embedding models.

Nevertheless, it is still a challenging task because social networksare large-scale with millions of nodes and billions of edges. Tra-ditional centralized training methods fail to scale for such largenetworks, due to high computation demands. To adapt MGCN forlarge scale social networks, and improve its scalability and effi-ciency, we propose a novel training method that first partitions thelarge-scale social networks into clusters and learns network embed-dings in a fully decentralized way. To align the learned embeddingspaces of different clusters, we propose a novel two-phase space

reconciliation mechanism. At the first stage, we align the embed-ding spaces learned from each cluster within the same network.In addition to the alignment between different subnetworks in thesame network, the second-phase space reconciliation aligns two dif-ferent networks through a small number of observed anchor nodes,which makes our MGCN framework achieve more accurate anchorlink prediction than state-of-the-art models and high efficiency onlarge social networks.

The main contributions of this paper are summrized as follows:

• We propose a novel framework for the challenging task ofpredicting anchor links across different social networks. Theproposed method MGCN takes both local and hypergraphlevel graph convolutions into consideration to learn networkembeddings, which is able to capture wider and richer net-work information for the task.

• In order to adapt the proposed framework to be able to copewith large scale social networks, we propose a series of treat-ments including network partitioning and space reconcilia-tion to handle the distributed training process.

• Extensive evaluations on large-scale real-world datasets havebeen conducted, and the experimental results demonstratethe superiority of the proposed MGCN model against state-of-the-art models.

2 PROPOSED METHOD2.1 Preliminaries2.1.1 ProblemDefinition. Given a pair of networksG1 = {V1, E1}and G2 = {V2, E2}, and a set of observed anchor links Sanchor ={(u,v)|u ∈ V1,v ∈ V2}, our goal is to predict those unobservedanchor links across G1 and G2. We treat this task as binary classifi-cation, that is, given a pair of nodes (u,v) where u ∈ V1,v ∈ V2,we predict if there is a link between them.

2.1.2 Hypergraph. In simple graphs, an edge connects two nodes,while an edge in a hypergraph (i.e. hyperedge) can connect morethan two nodes. We denote a hypergraph by Gh = {V, Eh }, whereV is the node set, Eh is the hyperedge set. For each hyperedgee ∈ Eh , we have e = {v1, · · · ,vp },vi ∈ V, 2 < p ≤ |V|.

2.2 Model OverviewTo predict anchor links, we introduce a novel multi-level graphconvolutional network (MGCN) to learn the embeddings of eachnetwork. Figure 1 is an illustration of our proposed MGCN frame-work, which consists of two levels of graph convolution operations.It firstly performs convolution on simple graphs (i.e., the original so-cial network in our case). After obtaining the node embeddings fromthe simple graph convolution, the node embeddings are refinedby an innovative convolution operation defined on hypergraphs.With the final embeddings of two social networks obtained, wealign the latent space of two networks via an embedding recon-ciliation process. Lastly, we deploy a fully connected network topredict whether an anchor link exists between any arbitrary pairof nodes from two networks. In addition, we present a paralleliz-able scheme that allows MGCN to efficiently handle large-scalenetworks through graph partitioning.

Multi-level Graph Convolutional Networks for Cross-platform Anchor Link Prediction KDD ’20, August 23–27, 2020, Virtual Event, CA, USA

Figure 1: Multi-level graph Convolution.2.3 Convolution on Simple GraphsGiven an original social network G = {V, E} (i.e., simple graph),assume that we have constructed a hypergraph Gh from G, whereeach hyperedge e ∈ Eh , e = {v1,v2, · · · ,vn },vi ∈ V . We firstperform simple graph convolutions in order to obtain the baseembeddings of all nodes, denoted by X ∈ R |V×d | , where d is thedimension of each node embedding vector. We start with a simplegraph convolution within hyperedge e by:

Xk+1e = σ (AeXk

e Wk ) (1)

where Ae ∈ R |V |×|V | is the adjacency matrix within hyperedge,σ (·) denotes the non-linear activation function such as ReLU (·) =max(0, ·), while Xk

e and Wk carry the latent representations andthe trainable weights in the k-th convolution layer. Specifically, incontrast to the plain GCN [21] that simply operates on the entiregraph, we perform the convolution operation on each hyperedgeindividually. The rationale is that we can incorporate fine-grainedlocal structural information from the hyperedges into the learnednode embeddings. To achieve this, we define a diagonal matrixSe ∈ R |V |×|V | for hyperedge e , where each entry Se (vi ,vj ) is:

Se (vi ,vj ) ={p(v, e), if vi = vj ,vi ∈ e0, otherwise (2)

where p(v, e) stands for the possibility of observing node v in hy-peredge e , and its calculation depends on particular definitions ofhyperedges (see Section 3.6 for possible options). Then, let A =I |V | + D− 1

2 AD− 12 where D ∈ R |V |×|V | is a diagonal matrix con-

taining each node’s degree in simple graph G, A is the adjacencymatrix of the simple graph G, and I |V | is the identity matrix. Then,the local adjacency matrix Ae for hyperedge e is calculated via:

Ae = Se ASe (3)

Intuitively, Ae can be viewed as an adjacency matrix for thedirectly connected nodes in hyperedge e , which is further weightedby the hyperedge connectivity in Se (vi ,vj ). As a result, when per-forming simple graph convolutions, we can simultaneously taketwo types of local node-node structural information into considera-tion, making the learned base embeddings more expressive. Basedon Equation 1, the convolution operation on the entire simple graphG can be obtained through the summation across all hyperedges:

Xk+1simple = f (⊕e ∈EhXk+1

e ) (4)

where ⊕ means the concatenation of the output for each hyper-edge e , and f (·) denotes a dense layer that maps the concatenatedembeddings back to a d-dimensional space.

2.4 Convolution on HypergraphsWith the base embeddings XK

simple learned in the simple graphconvolution stage for the final K-th convolution layer, we furtherinfuse the structural information of the constructed hypergraphGh into every node’s latent representation. In recent years, hyper-graph convolution network has started to attract attention fromthe network embedding research community [14, 20, 39]. Differentfrommost related works that deduce hypergraph convolution usingthe spectral convolution theory, we derive the mathematical formof hypergraph convolution by treating it as a generalized versionof simple graph convolution, which makes the inference processmore intuitive and natural to understand.

Given a hypergraph Gh = {V, Eh }, let H ∈ R |V |×|Eh | be anincidence matrix where each entry H(v, e) is determined by:

H(v, e) ={p(v, e), if v ∈ e0, otherwise (5)

where p(v, e) indicates the possibility that node v belongs to hyper-edge e . Let the diagonal matrix Dn ∈ R |V |×|V | denoting the degreeof nodes in the hypergraph such that Dn (v,v) =

∑e ∈Eh H(v, e).

Similarly, the degree of hyperedges can be denoted by a diagonalmatrix De ∈ R |Eh |× |Eh | where De (e, e) =

∑v ∈V H(v, e). Since H

indicates the correlation between nodes and hyperedges, we can useHH⊤ to quantify the pairwise relationships between nodes. Then,the weighted adjacency matrix Ah ∈ R |V |×|V | of hypergraph Gh

can be derived as:Ah = HH⊤ − Dn (6)

Having acquired the adjacency matrix of hypergraph, we cannaturally extend simple graph convolution to hypergraph Gh . Re-call that in the typical GCN framework presented in [21], for asimple graph Gs = {Vs , Es }, the standard graph convolution isdefined as:

Xk+1s = σ

((I |Vs | + D

− 12

s AsD− 1

2s

)Xks Wk

s

)(7)

where Ds contains all nodes’ degree of Gs , As is the adjacencymatrix of Gs . Apart from the identity matrix I |Vs | , the above stan-dard graph convolution, at its core, are dependent on the noderelationships encoded in the degree and adjacency matrices Dsand As . Therefore, by replacing its input with the correspondinginformation extracted from the hypergraph Gh , we can effectivelymodel hypergraph convolution in a similar way to the standardGCN at each layer k :

Xk+1 =σ

((I |V | + D

− 12

n AhD− 1

2n

)XkWk

)=σ

((I |V | + D

− 12

n(HH⊤ − Dn

)D− 1

2n

)XkWk

)=σ

(D− 1

2n HH⊤D

− 12

n XkWk) (8)

Let Θ = D− 1

2v HHT D

− 12

v , then we have:

Xk+1 = σ (ΘXkWk ) (9)


where Xk = XKsimple when k = 0. Suppose we also adopt K lay-

ers of convolution on hypergraph, then the final output of themulti-level graph convolutional network is denoted by XK . By thismean, the generated node embeddings in Xk+1

simple can both cap-ture pairwise relations (i.e., 1-hop neighbourhood) and high-ordernon-pairwise relations (i.e., hyperedges). As we will further discussin Section 3.5.1, this is especially important when the number ofobserved anchor nodes for training are limited.2.5 Learning Network EmbeddingsFor network embedding, the output embeddings from Equation 9are learned by maximizing the probability of positive edges andminimizing the probability of negative ones:

Oembeddinд =∑

(vi ,vj )∈Elogη(xK⊤

i xKj ).

+

M∑k=1

Evk∝P (v)[log

(1 − σ (xK⊤

i xKk )) ]

+

M∑k=1

Evk∝P (v)[log

(1 − σ (xK⊤

j xKk )) ]

(10)

where η(·, ·) is the sigmoid function to calculate the probabilityof observing edge (vi ,vj ).

For a given positive edge (vi ,vj ) in the training set, we use bidi-rectional negative sampling strategy [7] to draw negative edges fortraining. Specifically, we fix vi and generateM negative nodes vkvia a noise distribution Pn (v) ∼ d0.75v , where dv is the degree ofnodev . Then we fixvj and sampleM negative nodes with the sameprocess. By optimizing Equation 10, we can obtain optimal embed-dings in XK from the last layerK . Afterwards, the final embeddingsare further leveraged for downstream anchor link prediction task.

Algorithm 1: Graph PartitioningInput: G(V, E), Nmax , Nmin , iteration T .Output: partitions P = {G1(V1, E1), · · · ,Gn (Vn , En )}.

1 P = Louvain(G) //Generating partitions P from G accordingto Louvain algorithm[4].

2 for iter from 1 to T do3 for partition G′ ∈ P do4 if |V′ | < Nmin then5 add nodes ofV′

into other partitions, delete G′.

6 else if Nmin < |V′ | <= Nmax then7 continue8 else9 Pt = Louvain(G′) //Generating partitions Pt from

G′according to Louvain algorithm [4].

10 P = P ∪ Pt11 end12 end13 end14 return P

2.6 Anchor Link PredictionNote that after acquiring the final representations XK

1 and XK2

of two networks G1 and G2, we should not directly use them foranchor link prediction because the node representations are learned

in two different latent spaces, which may vary a lot in terms ofsemantic contexts. Instead, we first reconcile both of them into thesame latent space, and then use the aligned embeddings for anchorlink prediction. To reconcile XK

1 and XK2 into the same space, we

fix XK1 and project XK

2 into the same space as XK1 . Let γ (.|Γ, b) be

a projection function with a projection matrix Γ and bias b. Then,by aligning the embedding vectors of the anchor nodes in bothgraphs, we can learn the parameters in the projection function,thus ensuring accurate reconciliation for two latent spaces:

Oanchor =∑

(v,u)∈Sanchor∥XK

1 [v, :] − ϕ(XK2 [u, :]|Θ, b)∥

2 (11)

where γ (x|Γ, b) = xΓ + b, and Sanchor is the labeled anchor links.Then, for any pair of nodes (vi ,vj ),vi ∈ G1,vj ∈ G2, the rep-resentation of this pair can be denoted by the concatenation oftheir corresponding embeddings. We sent these pair embeddingsinto a fully connected network and finally output the predictionof whether they are anchor link, and use cross entropy as the lossfunction of anchor link prediction.

2.7 Handling Large-Scale NetworksAlthough GCN-based methods have been widely used in varioustasks, most related methods still suffer from the “last mile” technol-ogy when we deal with large-scale networks because most GCN-based methods need the global adjacency matrix as their inputs,and this easily causes out of memory issues for GPU computa-tion. Besides, when the network scale increases, it will also leadto growth in computation time. Thus, we need an effective graphpartition strategy so that we can deploy the proposed MGCN inparallel. To this end, we first present a graph partitioning approachvia Algorithm 1, and propose a two-phase reconciliation mecha-nism as shown in Figure 2. Specifically, we split the large networkinto several partitions according to modularity maximization, andthen deploy our model in every single partition. For each graph,we reconcile the latent spaces of all its partitions into the sameone using the reserved anchor nodes when partitioning the wholegraph. Then, we align the embeddings of G1 and G2 into the samelatent space using observed anchor nodes from two graphs.

2.7.1 Graph Partition. As Algorithm 1 depicts, to split the largenetwork into several partitions with acceptable size (from Nminnodes to Nmax nodes, for example), we first compute the partitionof the network which maximises the modularity using the Louvainalgorithm[4]. For each partition G′

= {V′, E′}, if the size is larger

than the upper bound, that is |V′ | > Nmax , we put G′as the input

again and repeat the algorithm to further split G′into more smaller

partitions. If |V′ | < Nmin , we randomly assign it to other createdpartitions.2.7.2 Reconcile Latent Embedding Spaces. We have noticedthat to deploy our model into different partitions independentlyactually produce the embeddings in different latent spaces. There-fore we need to further match different partitions into the samerepresentation space. Here, we select N nodes from the network asshared nodes across all partitions, and append theseN nodes as wellas their associated edges into all partitions. Then we select one ofthe partitions as a fixed one, and reconcile the others into the samespace with it. For example, for all P partitions {G1,G2, · · · ,GP },


Reconcile

Reconcile

Reconcile

Embedding of partition 1

Network 1




Network 1

Network 2Network 2



Embedding of partition 2 Embedding of

partition 1

Network 2Network 1

Phase-1 Space Reconciliation Phase-2 Space Reconciliation

Figure 2: Two-phase embedding space reconciliation.

we fix partition G1 and all other partitions’ embeddings are trans-formed via a linear function д(.). We maximize the following target:

Opar tit ion =P∑p=2

∑vi ∈Vshared

logσ((fp (x(p)i ))⊤x(1)i

)(12)

whereVshared is the set of shared nodes appearing in all partitions,and x(p)i is the representation of node vi in partition p. Havingmatched each partition into the same space, we can get the finalnetwork embeddings in a uniform space, we can eventually usethem as described in Section 2.6 to predict the anchor links.2.8 Optimization StrategyWe train MGCN model in a step-by-step manner. Specifically, wefirst train MGCN by optimizing the graph embedding objectivefunction Oembeddinд . After that, we optimize the graph partitionreconciliation objective function Opar tit ion (i.e., phase-1 spacereconciliation), then optimize the reconciliation objective Oanchor(i.e., phase-2 space reconciliation). Lastly, with the fully alignednode embeddings from both graphs, we optimize MGCN for theanchor link prediction task by minimizing the cross-entropy loss.3 EXPERIMENTS3.1 DatasetsFor anchor link prediction, we use two cross-platform datasets col-lected and published in previous research on aligning heterogenoussocial networks [5]. One is the Facebook-Twitter dataset, andthe other is the Douban-Weibo dataset. Facebook-Twitter con-tains 1,091,489 nodes, where the Facebook network has 422,291nodes and 3,710,789 social links while the Twitter network con-tains 669,198 nodes that are connected by 12,749,257 social links. InFacebook-Twitter, 328,224 aligned user pairs are identified acrosstwo networks. Douban-Weibo bridges two popular social media plat-forms in China, namely Douban with 141,614 nodes and 2,700,602social links and Weibo with 141,614 nodes with 6,280,561 sociallinks. There are 141,614 aligned users in the total 283,228 nodesacross these two networks in the Douban-Weibo dataset.

For parameter sensitivity and robustness analysis on anchorlink prediction, we follow [26] to generate two sub-networks from

Facebook. Specifically, we define a sparsity parameter αs to controlthe sample ratio of edges from the original Facebook network, andαc to control the ratio of shared edges in two sub-networks. For eachedge, we generate a random value p in [0, 1]. If p ≤ 1 − 2αs + αsαc ,the edge is discarded; If 1 − 2αs + αsαc < p ≤ 1 − αs , it is added inthe first sub-network; If 1 − αs < p ≤ 1 − αsαc , it is only kept inthe second sub-network; Otherwise, the edge is added in both sub-networks. The reason of using extracted sub-networks instead of thefull dataset is that we can customize the network sparsity viaαs , andthe node overlap level via αc . Hence, the flexible compositions ofgenerated datasets can simulate awide range of different applicationscenarios for testing different models’ performance. Besides, theyare relatively smaller than Facebook-Twitter and Douban-Weibo,thus enabling running time reduction for parameter sensitivityanalysis.3.2 Baseline MethodsWe compare our method against the following baselines:

• Autoencoder [32]. This method uses one-hot encodingsof nodes as the input and learns node representations byoptimizing the mean square error loss function.

• MAH [34]. This method enforces that a pair of nodes in thesame hyperedge should come closer to learn node represen-tations for anchor link prediction.

• DeepWalk [30]. This method uses random walk to samplenode sequences, and then learns node embeddings with theword2vec model.

• GCN [12]. This method defines convolutional networks ongraphs for node representation learning.

• PALE [26]. This method predicts anchor links via networkembedding by maximizing the log likelihood of observededges and latent space matching.

• HGNN [14]. This method proposes hypergraph convolu-tional networks for network embedding.

It is worth mentioning that the baselines we have chosen are allnetwork embedding-based. In both datasets, the user profile andcontent information are unavailable, making traditional methods[19, 22, 24] that rely on auxiliary data sources inapplicable.


0.520.540.560.58

0.60.620.640.660.68

0.70.72

Macro Precision Macro F1 Macro Recall

(a) Anchor Link Prediction on Facebook-Twitter

0.57

0.59

0.61

0.63

0.65

0.67

Macro Precision Macro F1 Macro Recall

Our methodAutoencoderGCNHGNNPALEDeepWalkMAH

(b) Anchor Link Prediction on Douban-WeiboFigure 3: Results on anchor link prediction.

3.3 Experimental Settings3.3.1 Evaluation Metrics. Following related works [22, 26], wetreat anchor link prediction as a binary classification task. Specifi-cally, with a pair of nodes (u,v) as input, we aim to predict whetherthey represent the same entity in two networks or not. As such, weleverage three widely-used classification metrics, namely MacroPrecision, Macro Recall, and Macro F1.3.3.2 Parameter Settings. For anchor link prediction, the ratioof positive and negative anchor links is set to 1 : 1 for both thetraining and test. We train all methods using 50% of the positiveand negative links and test them on the remaining portion. Inthe graph partition and reconciliation step, we set Nmin = 1, 000,Nmax = 15, 000, and N = 1, 000. The layer size K is 2 in our model.We construct the hypergraph via each node’s 10 hop neighbors.That is, we connect each node and its 10 hop neighbors with onehyperedge. Note that we also adopt three other hypergraph con-struction strategies, and their impact will be discussed in section3.6. The learning rate and embedding dimension are respectivelyfixed to 0.01 and 200 in our model. The negative link number inEquation (10) is set to M = 5. For all baseline methods, we adopttheir reported optimal parameters by default.

3.4 Performance on Anchor Link PredictionIn this section, we evaluate all models’ performance on anchor linkprediction on Facebook-Twitter and Douban-Weibo datasets. Wereport Macro Precision, F1, and Recall in Figure 3. We draw thefollowing observations.

Firstly, in terms of all evaluation metrics, our method has consis-tently and significantly outperformed all baselines on both datasets.Specifically, compared with the second best results on Macro Preci-sion, Macro F1 and Macro Recall, our proposed MGCN achieves animprovement of 9.7%, 9.1%, and 9.0% on Facebook-Twitter, and 0.6%,2.7%, 2.6% on Douban-Weibo, respectively. On one hand, MGCNperforms both local graph convolution and hypergraph convolutionoperations on social networks, so it can effectively preserve thestructural information in the learned node embeddings, leading tosuperior classification performance. On the other hand, traditionalnetwork embedding-based methods (e.g., DeepWalk and PALE) areunable to capture the complex, high-order node relationships, andtend to underperform on large-scale networks.

Secondly, as hypergraph-based baseline methods, HGNN showsstronger performance than MAH on both datasets. This is becauseHGNN largely benefits from the nonlinearity of neural networks,which offers higher model expressiveness while modeling hyper-edges. Compared with GCN and PALE that only consider pair-wise relations, both our method and HGNN can achieve betterperformance regarding Macro Precision and Macro Recall. Thisobservation indicates the advantages of exploring hypergraphs for

anchor link prediction. However, compared with both baselines,MGCN further incorporates node information extracted from localneighbourhood, thus enriching the granularity of learned nodeembeddings and yielding more competitive results.

Thirdly, we also notice that our method is more advantageouson Facebook-Twitter than on Douban-Weibo. One possible reasonis that the Douban-Weibo dataset has relatively higher densitycompared with Facebook-Twitter. When handling sparser datasets,GCN, DeepWalk and PALE suffer from severe performance decreasebecause they heavily rely on sufficient observed pairwise relationsfor node representation learning. This further demonstrates thatour MGCNmaintains high-level performance and shows promisingrobustness in the presence of data sparsity problem.

3.5 Analysis on Model RobustnessAs we have previously mentioned, existing anchor link predictionmethods are prone to suffer from performance downgrade whenexposed to sparse datasets, and our proposed MGCN alleviatesthis problem by thoroughly investigating structural informationwithin both simple graphs and the extracted hypergraphs. To testthe robustness of our model, we carry out further comparisonswith baselines on the two subnetworks extracted from Facebooknetwork. To be specific, we vary the data compositions in these twosubnetworks by adjusting the proportions of training labels (i.e.,observed anchor links), edges (i.e., user-user pairwise interactions)and network overlaps (i.e., shared same nodes), and record the per-formance fluctuations of different models. We choose Autoencoder,GCN, HGNN and PALE in this analysis as they have competitiveoverall effectiveness and are relatively stable on large-scale datasets.

3.5.1 Effect of Anchor Link Percentage. In practice, the avail-ability of the observed anchor nodes between two social networksthat can be used for training are usually very limited. To test theimpact of available anchor links, we firstly hold out 10% of theobserved anchor links for test, and change the ratio of anchor linksfrom 10% to 90% for training. Note that two parameters αs and αcare both fixed to 0.9 during this test. All experiments including thesampling are executed five times. We report the average resultsof our method and baselines in Figure 4, from which we can seethat even with a small portion of training labels, our method stillperforms the best compared with other baselines. This is particu-larly important because in the real-world, anchor links are oftensparsely observed, thus our method is the most competitive choicewhen there are insufficient labels for training.

3.5.2 Effect of Edge Percentage. While most GCN-based meth-ods heavily rely on the information passed along edges for noderepresentation learning, most real-life networks are naturally sparsein terms of the number of edges. So, we evaluate our method and


Table 1: Experimental results under different sparsity levels.

sparsity level αsMetric Model 10% 20% 30% 40% 50% 60% 70% 80% 90%

Macro Precision

Our method 0.8620 0.9071 0.9353 0.9345 0.9440 0.9631 0.9638 0.9624 0.9638Autoencoder 0.8338 0.8455 0.8601 0.8195 0.8336 0.8590 0.9204 0.9255 0.8819GCN 0.8340 0.8457 0.8881 0.8862 0.9115 0.9252 0.9366 0.9359 0.9434HGNN 0.7295 0.8334 0.8340 0.8351 0.8376 0.8770 0.9025 0.8787 0.8850PALE 0.8334 0.8337 0.7333 0.8337 0.8337 0.8338 0.8336 0.7648 0.7711

Macro F1


Macro Recall


Entries in bold are the best results. For the sparsity level, a lower αs leads to a sparser dataset. αc is fixed to 0.6 in this test.

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Mac

ro P

reci

sion

Ratio of labeled anchor links

Our methodAutoencoderGCNHGNNPALE

(a) Macro Precision

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Mar

co F

1



(b) Macro F1

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Mac

ro R

ecal

l



(c) Macro RecallFigure 4: Results w.r.t. observed anchor link percentage.

0.680.7

0.720.740.760.78

0.80.820.840.86

Neighbor-basedhypergraph

Anchor-basedhypergraph

Centrality-basedhypergraph

Latenthypergraph

Marco Precision Macro F1 Macro Recall

Figure 5: Performance w.r.t. different hypergraph construc-tion methods.

baselines by adjusting the sparsity parameter αs mentioned in sec-tion 3.1 from 10% to 90%, and report the everage results achieved infive executions as well. Note that we still use the same evaluationset as in Section 3.5.1. As shown in Table 1, our method keeps stablew.r.t. different values of αs . The reason is that modeling hyper-graphs on top of simple graphs with MGCN can provide additionalstructural information when the availability of edges in physicalnetworks is limited.

3.5.3 Effect of Network Overlap Percentage. Network over-lap refers to shared entities (users in our case) in two differentnetworks, and the shared entities tend to have similar local neigh-borhood structures [3] in both networks. It characterizes the homo-geneity of two independent networks. In this section, we changethe parameter αc from 10% to 90% and show the average resultsof five executions in Table 2, from which we notice even in 10%overlap level, our method still keeps the best performance, and thesuperiority of our method becomes more obvious when αc is larger.

3.6 Impact of Hypergraph ConstructionStrategies

We supply four methods for extracting hypergraphs from originalnetworks, and compare their impacts to model performance below.

(1) Neighborhood-based hypergraph construction. This isthe default hypergraph construction method we use for ourexperiment in Section 3.4. For each node, we collect itsϕ-hopneighbors and connect them in one hyperedge. As such, fora sub-graph with N nodes, we finally have N hyperedges. ϕis optimized via grid search in {4, 6, 8, 10, 12} and is set to 10in our experiments.

(2) Anchor-based hypergraph construction. This method issimilar to the first one but we only consider the 10-hop neigh-bours of anchor nodes. That means, for a given sub-graphwith N nodes andM observed anchor nodes, we will resultinM hyperedges. SinceM ≪ N usually holds, this methodis more practical when graph partitions are not applied onlarge-scale graphs.

(3) Centrality-based hypergraph construction. We computethe following centrality values for each node: degree, be-tweenness, clustering coefficient, eigenvector, page rank,closeness centrality, node clique number, and communitiesa node belongs to. With these centrality-based properties,we generate a 20-dimensional vector (8-bit centrality-basedfeatures and 12-bit one-hot community encodings) for eachnode. By treating each dimension of the vector as a hyper-edge, then each node’s value on a specific dimension denotesthe probability that this node belongs to the hyperedge.


Table 2: Experimental results under different overlap levels.

overlap level αcMetric Model 10% 20% 30% 40% 50% 60% 70% 80% 90%

Macro Precision


Macro F1


Macro Recall


Entries in bold are the best results. For the overlap level, a higher αc leads to more overlaps in two networks. αs is fixed to 0.6 in this test.

(4) Latent feature-basedhypergraph construction. This Strat-egy uses Autoencoder to extract dense latent representationsof nodes (we set the latent dimension to 200), where eachlatent dimension serves as a hyperedge.

The performance w.r.t. different hypergraph construction strate-gies are shown in Figure 5. We set αc = αs = 0.3 for this test. Ingeneral, neighbor-based, anchor-based, and centrality-based hyper-graphs lead to very close results. Surprisingly, though centrality-based hypergraph construction strategy involves carefully hand-crafted features, it falls short in terms of Macro F1 and Macro Recall.This suggests that we do not have to design specific features toobtain performance improvements, which makes our method morepractical for large datasets.

0

5

10

15

20

25

30

35

2000

4000

6000

8000

1000

012

000

1400

016

000

1800

020

000

Runn

ing ti

me

(Sec

onds

)

Number of nodes

Our method GCN HGNN

Figure 6: Forward propagation time w.r.t. network scales.

3.7 Analysis on Model EfficiencyTo showcase the efficiency of MGCN, we calculate the running timeof forward propagation for 1,000 epochs w.r.t. an increasing scaleof the Facebook subnetworks and compare it with GCN and HGNN.All experiments are conducted on a linux server with two GTXTitan GPUs. The results are shown in Figure 6. From the modelarchitecture perspective, our method involves two GCN operationson both simple graphs and hypergraphs. However, compared withGCN and HGNN that only models simple graphs or hypergraphs,we can find that there is only a little additional time consumptionof MGCN. This verifies the necessity and efficacy of modeling hy-peredges in parallel. As a result, though MGCN achieves significantperformance gain over all baselines, it still has very close efficiencyto GCN and HGNN. Hence, for even larger datasets, our method

can offer state-of-the-art anchor link prediction performance whileretaining high-level scalability.

4 RELATEDWORK4.1 Anchor Link Prediction in Social NetworksTraditional methods. Traditionally, early studies solve the prob-lem of account matching by leveraging user profile (e.g., user name,age, location) and their generated contents such as textual reviewsand posts [15, 19, 22, 24]. However, due to the difficulty of obtaininghigh-quality and credible data from the Internet, these methodsinevitably suffer from the data insufficiency problem. As a result,these methods cannot achieve satisfactory results, and are subjectto constrained generalizability in practice. Other techniques adoptmatrix factorization to directly compute an alignment matrix [37],such as IsoRank [33], NetAligh [3], FINAL[43], and REGAL[18].However, such approaches can hardly scale up to very large net-works, because they take the entire adjacency matrices of networksas their input, which is highly demanding on storage and com-puting resources. Furthermore, they are prone to struggle whenhandling higher sparsity that comes with large-scale networks.Embedding-based approaches. There have been applications ofaccount matching by using network embedding techniques[10, 38].PALE [26] learns node embedding bymaximizing the co-occurrencelikelihood of connected vertices, then applies linear projectionor multi-layer perceptron (MLP) as the mapping function. Simi-lar methods also include IONE [23] which addresses this problemby modeling user-user following relationships in social networks.Though DALAUP [11] further employs active learning to learnnode embeddings, it is limited by its scalability as the active learn-ing scheme can be time-consuming on large-scale social networks.DeepLink [45] employs unbiased random walk to generate em-beddings using skip-gram, then adopts auto-encoder and MLP asthe mapping function. Manifold Alignment on Hypergraph (MAH)[34] uses hypergraphs to model high-order relations by exploitingthe idea that a pair of nodes in the same hyperedge should comecloser. MAH is a pure hypergraph-based approach, which is simpleand effective. However, it only considers sub-space learning forhyperedges, and is therefore vulnerable to noises and the loss ofimportant underlying network structure information. In contrast,


our proposed method proposes a decentralized hypergraph repre-sentation learning scheme, thus being able to handle large-scalesocial networks with a novel subgraph reconciliation mechanism.4.2 Network EmbeddingMainstream network embedding approaches include matrix factor-ization based methods, such as Multi Dimensional Scaling (MDS)[47], Spectral Clustering [29], Graph Factorization [2], etc., as wellas random walk-based methods [16, 30, 36] which firstly samplerandom walk node sequences, and then learn node embeddings viathe skip-gram model. The recently proposed Graph ConvolutionalNetworks (GCNs) [6, 17] successfully define convolutional kernelson graph-structured data to learn node representations by aggre-gating information passed from its surrounding neighbours. Morerecently, different from traditional GCNs that only model simplegraphs, hypergraphs have been infused into the context of graphconvolutions [14, 39], enabling the learning of richer structuralinformation. HGCN [14] introduces the concept of hypergraphLaplacian, and then proposes a hypergraph-based extension to theoriginal convolution on simple graphs. HyperGCN [39] also trainsGCNs on hypergraphs with the utilization of hypergraph spectraltheory. In this paper, we develop a specific GCN-like model thatinnovatively facilitates GCN operations at both hypergraph-leveland simple graph-level in a unified framework to allow for compre-hensive node representation learning.5 CONCLUSIONWe propose a multi-level graph convolution networks for anchorlink prediction. Through the fusion of simple graph and hyper-graph, our method steadily outperforms state-of-the-art methods.To handle large scale dataset, we also design a framework with net-work partitioning and two-phases reconciliation. The future workwould suggest to explore the automatic discovery of hypergraphsfor account matching problems, as well as scaling our frameworkto multiple social networks. Moreover, considering how to lever-age the unused links between partitions are also potential waysof improving this work. Applications of this work with temporalanalysis and recommendations [8, 9] will also be our future work.ACKNOWLEDGMENTThe work has been supported by Australian Research Council(Grant No. DP190101087, DP190101985, DP170103954 and FT200100825).REFERENCES[1] M. A. Ahmad, Z. Borbora, J. Srivastava, and N. Contractor. Link prediction across

multiple social networks. In ICDMW. IEEE, 2010.[2] A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola.

Distributed large-scale natural graph factorization. In WWW13.[3] M. Bayati, M. Gerritsen, D. F. Gleich, A. Saberi, and Y. Wang. Algorithms for

large, sparse network alignment problems. In ICDM. IEEE, 2009.[4] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding

of communities in large networks. Journal of statistical mechanics: theory andexperiment, 2008.

[5] X. Cao and Y. Yu. Bass: A bootstrapping approach for aligning heterogenoussocial networks. In Joint European Conference onMachine Learning and KnowledgeDiscovery in Databases. Springer, 2016.

[6] H. Chen, H. Yin, T. Chen, Q. V. H. Nguyen, W.-C. Peng, and X. Li. Exploiting cen-trality information with graph convolutions for network representation learning.In ICDE. IEEE, 2019.

[7] H. Chen, H. Yin, W. Wang, H. Wang, Q. V. H. Nguyen, and X. Li. Pme: projectedmetric embedding on heterogeneous networks for link prediction. In KDD, 2018.

[8] T. Chen, H. Yin, Q. V. H. Nguyen, W.-C. Peng, X. Li, and X. Zhou. Sequence-awarefactorization machines for temporal predictive analytics. In ICDE, 2020.

[9] T. Chen, H. Yin, G. Ye, Z. Huang, Y. Wang, and M. Wang. Try this instead:Personalized and interpretable substitute recommendation. arXiv preprint, 2020.

[10] W. Chen, H. Yin, W. Wang, L. Zhao, and X. Zhou. Effective and efficient useraccount linkage across location based social networks. In ICDE, pages 1085–1096.IEEE, 2018.

[11] A. Cheng, C. Zhou, H. Yang, J. Wu, L. Li, J. Tan, and L. Guo. Deep active learningfor anchor user prediction. IJCAI, 2019.

[12] M. Defferrard, X. Bresson, and P. Vandergheynst. Convolutional neural networkson graphs with fast localized spectral filtering. In NIPS, 2016.

[13] F. E. Faisal, H. Zhao, and T. Milenković. Global network alignment in the contextof aging. Transactions on Computational Biology and Bioinformatics, 2014.

[14] Y. Feng, H. You, Z. Zhang, R. Ji, and Y. Gao. Hypergraph neural networks. InProceedings of the AAAI Conference on Artificial Intelligence, 2019.

[15] O. Goga, P. Loiseau, R. Sommer, R. Teixeira, and K. P. Gummadi. On the reliabilityof profile matching across large online social networks. In KDD, 2015.

[16] A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. InKDD, 2016.

[17] W. Hamilton, Z. Ying, and J. Leskovec. Inductive representation learning on largegraphs. In NIPS, 2017.

[18] M. Heimann, H. Shen, T. Safavi, and D. Koutra. Regal: Representation learning-based graph alignment. In CIKM, 2018.

[19] T. Iofciu, P. Fankhauser, F. Abel, and K. Bischoff. Identifying users across socialtagging systems. In AAAI Conference on Weblogs and Social Media, 2011.

[20] J. Jiang, Y. Wei, Y. Feng, J. Cao, and Y. Gao. Dynamic hypergraph neural networks.In IJCAI, pages 2635–2641, 2019.

[21] T. N. Kipf and M. Welling. Semi-supervised classification with graph convolu-tional networks. ICLR, 2017.

[22] J. Liu, F. Zhang, X. Song, Y.-I. Song, C.-Y. Lin, and H.-W. Hon. What’s in a name?an unsupervised approach to link users across communities. In WSDM, 2013.

[23] L. Liu, W. K. Cheung, X. Li, and L. Liao. Aligning users across social networksusing network embedding. In Ijcai, pages 1774–1780, 2016.

[24] A. Malhotra, L. Totti, W. Meira Jr, P. Kumaraguru, and V. Almeida. Studying userfootprints in different online social networks. In ASONAM. IEEE, 2012.

[25] T. Man, H. Shen, J. Huang, and X. Cheng. Context-adaptive matrix factorizationfor multi-context recommendation. In CIKM, 2015.

[26] T. Man, H. Shen, S. Liu, X. Jin, and X. Cheng. Predict anchor links across socialnetworks via an embedding approach. In IJCAI, 2016.

[27] K. Musiał and P. Kazienko. Social networks on the internet. WWW, 2013.[28] A. Narayanan and V. Shmatikov. De-anonymizing social networks. In 2009 30th

IEEE symposium on security and privacy. IEEE, 2009.[29] A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an

algorithm. In NIPS, 2002.[30] B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social repre-

sentations. In KDD, 2014.[31] C. Riederer, Y. Kim, A. Chaintreau, N. Korula, and S. Lattanzi. Linking users

across domains with location data: Theory and validation. In WWW, 2016.[32] G. Salha, S. Limnios, R. Hennequin, V.-A. Tran, and M. Vazirgiannis. Gravity-

inspired graph autoencoders for directed link prediction. In CIKM, 2019.[33] R. Singh, J. Xu, and B. Berger. Global alignment of multiple protein interaction

networks with application to functional orthology detection. Proceedings of theNational Academy of Sciences, 2008.

[34] S. Tan, Z. Guan, D. Cai, X. Qin, J. Bu, and C. Chen. Mapping users across networksby manifold alignment on hypergraph. In AAAI, 2014.

[35] J. Tang, H. Gao, H. Liu, and A. Das Sarma. etrust: Understanding trust evolutionin an online world. In KDD, 2012.

[36] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. Line: Large-scale infor-mation network embedding. In WWW, 2015.

[37] H. T. Trung, N. T. Toan, T. Van Vinh, H. T. Dat, D. C. Thang, N. Q. V. Hung, andA. Sattar. A comparative study on network alignment techniques. Expert Systemswith Applications, 2020.

[38] W. Wang, H. Yin, X. Du, W. Hua, Y. Li, and Q. V. H. Nguyen. Online userrepresentation learning across heterogeneous social networks. In SIGIR, 2019.

[39] N. Yadati, M. Nimishakavi, P. Yadav, V. Nitin, A. Louis, and P. Talukdar. Hypergcn:A new method for training graph convolutional networks on hypergraphs. InNIPS, 2019.

[40] H. Yin, Q. Wang, K. Zheng, Z. Li, J. Yang, and X. Zhou. Social influence-basedgroup representation learning for group recommendation. In ICDE. IEEE, 2019.

[41] H. Yin, L. Zou, Q. V. H. Nguyen, Z. Huang, and X. Zhou. Joint event-partnerrecommendation in event-based social networks. In ICDE. IEEE, 2018.

[42] J. Zhang and S. Y. Philip. Integrated anchor and social link predictions acrosssocial networks. In AAAI, 2015.

[43] S. Zhang and H. Tong. Final: Fast attributed network alignment. In KDD, 2016.[44] Y. Zhang, J. Tang, Z. Yang, J. Pei, and P. S. Yu. Cosnet: Connecting heterogeneous

social networks with local and global consistency. In KDD, 2015.[45] F. Zhou, L. Liu, K. Zhang, G. Trajcevski, J. Wu, and T. Zhong. Deeplink: A deep

learning approach for user identity linkage. In IEEE INFOCOM. IEEE, 2018.[46] X. Zhou, X. Liang, H. Zhang, and Y. Ma. Cross-platform identification of anony-

mous identical users in multiple social media networks. TKDE, 2015.[47] L. Zlatkov. Multidimensional scaling (mds). 1978.

multi-level graph convolutional networks for cross …multi-level graph convolutional networks for...

Documents