heterogeneous deep graph infomax - arxiv

Heterogeneous Deep Graph Infomax

Yuxiang Ren,1 Bo Liu,2 Chao Huang,2 Peng Dai,2 Liefeng Bo,2 Jiawei Zhang,11Florida State University, IFM Lab

2JD Finance America Corporation, AI [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract—Graph representation learning is to learn universalnode representations that preserve both node attributes andstructural information. The derived node representations canserve various downstream tasks, such as node classification andnode clustering. When a graph is heterogeneous, the prob-lem becomes more challenging than the homogeneous graphrepresentation learning problem. Inspired by emerging mutualinformation-based learning algorithms, in this paper we proposean unsupervised graph neural network Heterogeneous DeepGraph Infomax (HDGI ) for heterogeneous graph representationlearning. We use the meta-path to model the structure involvingsemantics in heterogeneous graphs and utilize graph convolutionmodule and semantic-level attention mechanism to capture indi-vidual node local representations. By maximizing the local-globalmutual information, HDGI effectively learns node representa-tions based on the diverse information in heterogeneous graphs.Experiments show that HDGI remarkably outperforms state-of-the-art unsupervised graph representation learning methodson both node classification and clustering tasks. By feedingthe learned representations into a parametric model, such aslogistic regression, we even achieve comparable performance innode classification tasks when comparing with state-of-the-artsupervised end-to-end GNN models.

I. INTRODUCTION

Numerous real-world applications, such as socialnetworks [40], financial platforms [22] and knowledgegraphs [31] exhibit the favorable property of graphstructure data. Meanwhile, handling graph data is verychallenging. Because each node has its unique attributes,and the connections between nodes convey essentialinformation. When learning from nodes’ attributes and theconnection information simultaneously, the task becomesmore challenging.

Traditional machine learning methods focus on individualnodes’ features but ignore the structural information, whichobstructs their ability to process graph data. Graph neuralnetworks (GNNs) learn nodes’ new feature vectors througha recursive neighborhood aggregation scheme [35]. With thesupport of sufficient training samples, a rich body of super-vised graph neural network models have been developed [15],[29], [37]. However, labeled data is not always availablein graph representation learning tasks. Those algorithms arenot applicable to the unsupervised learning settings. For thepurpose of alleviating the training sample scarcity problem,unsupervised graph representation learning has aroused ex-tensive research interest. The goal of this task is to learna low-dimensional representation of each graph node. Therepresentation preserves graph topological structure and node

Author

Subject

Paper(a) Node Type (c) Heterogeneous Graph

Author1

Author2

Author3

Paper1

Paper3

Paper4

Subject1

Subject2

Subject3

(b) Edge Type

Write

Belong-to

Paper2

Paper1 Author1 Paper1

Paper1 Paper1Subject2

Meta-path: Paper-Author-Paper(PAP)

Meta-path: Paper-Subject-Paper(PSP)(d) Meta-paths

Figure 1: An example of heterogeneous bibliographic graph.

content. Meanwhile, the learned representations can be appliedto conventional sample-based machine learning algorithms.

Most of the existing unsupervised graph representationlearning models can be grouped into factorization-based mod-els and edge-based models. Precisely, factorization-based mod-els capture the global graph information by factorizing thesample affinity matrix [39], [36], [39]. Through the affinitymatrix factorization, the full graph structure can be wellgrasped. However, those methods ignore the node attributesand local neighborhood relationships. Edge-based models ex-ploit the local and higher-order neighborhood information byedge connections or random-walk paths. Nodes tend to havesimilar representations if they are connected or co-occur inthe same path [16], [5], [10], [20]. Edge-based models areprone to preserve limited order node proximity. They lacka mechanism to preserve the global graph structure. Therecently proposed mutual information-based models [12], [30],[25] demonstrate that mutual information maximization hasattractive advantages in encouraging an encoder to considersboth global and local features. For graph structure data, globaland local graph structure also need to be comprehensivelyconsidered. For homogeneous graphs, DGI [30] maximizesthe mutual information between graph patch representationsand the corresponding high-level summaries of graphs. How-ever, the networked data in the real world usually containcomplex structures (involving multiple types of nodes andedges), which can be formally modeled as heterogeneousgraphs (HG). Compared with homogeneous graphs, hetero-geneous graphs contain more detailed information and richsemantics among multi-typed nodes. Taking the bibliographic

arX

iv:1

911.

0853

8v5

[cs

.LG

] 1

3 N

ov 2

020

network in Figure 1 as an example, it contains three typesof nodes (Author, Paper, and Subject) as well as two typesof edges (Write and Belong-to). Besides, the individual nodesthemselves also carry much attribute information (e.g., papertextual contents). Due to the diversity of node and edge types,the heterogeneous graph itself becomes more complicated.The diverse (direct or indirect) connections between nodesalso convey more semantic information. The models initiallyproposed for homogeneous graphs may encounter significantchallenges to handle relations with different semantics inheterogeneous graphs.

In this paper, we explore the use of mutual informationmaximization for heterogeneous graph representation learning.To address the aforementioned challenges in heterogeneousgraphs, we propose a novel meta-path based unsupervisedgraph neural network model, namely Heterogeneous DeepGraph Infomax (HDGI ). In heterogeneous graph studies,meta-path [27] has been widely used to represent the com-posite relations with different semantics. As illustrated inFigure 1(d), the relations between paper can be expressed byPAP and PSP, which represent papers written by the sameauthor and papers belonging to the same subject, respectively.HDGI utilizes the structure of meta-paths to model connectionsemantics in heterogeneous graphs. Based on different meta-paths, HDGI disassembles heterogeneous graphs into homo-geneous graphs of specific semantics. In these homogeneousgraphs, HDGI applies a convolutional style GNN to capturethe local representation of nodes with specific semantics. Afterthat, through a semantic-level attention mechanism, HDGIaggregates node representations of different semantics. Bymaximizing local-global mutual information, HDGI learnshigh-level representations containing graph-level structural in-formation without any supervised label. The node attributescan be simultaneously fused into the representations in theprocess of maximizing local-global mutual information. Inorder to verify the effectiveness and competitiveness of learnedrepresentations, we perform experiments on both classificationtasks and clustering tasks. The experimental results show thatthe performance of HDGI can often beat supervised state-of-art comparative graph neural network models.

In summary, our contributions in this paper can be summa-rized as follows:

• This paper presents the first model to apply mutual infor-mation maximization to representation learning in heteroge-neous graphs.

• Our proposed method, HDGI , is a novel unsupervised graphneural network. It handles graph heterogeneity by utilizingan attention mechanism on meta-paths and deals with theunsupervised settings by applying mutual information max-imization.

• Our experiments demonstrate that the representationslearned by HDGI are effective for both node classificationand clustering tasks. Moreover, its performance can also beatstate-of-the-art GNN models, where they have additionalsupervised label information.

The rest of this paper is organized as follows. We discuss therelated work in Section 2. In Section 3, we present the problemformulation along with important terminologies used in ourmethod. We present HDGI in detail in Section 4. Experimentalresults and analyses are provided in Section 5. Finally, weconclude the paper in Section 6.

II. RELATED WORK

A. Graph representation learning

Graph representation learning has become a non-trivialtopic [3] because of the ubiquity of graphs in real-world sce-narios. As a data type containing rich structural information,many models [20], [9], [28], [19], [32] acting on graphs learnthe representations of nodes based on the structure of thegraph. Node2vec [9] learns a mapping of nodes to a low-dimensional space of features that maximizes the likelihood ofpreserving network neighborhoods of nodes. DeepWalk [20]uses the set of random walks over the graph in SkipGram tolearn node embeddings. Tang et al. [28] optimize a carefullydesigned objective function that preserves both the local andglobal network structures. Qu et al. [19] propose preservingasymmetric transitivity by approximating high-order proximitybased on asymmetric transitivity. Wang et al. [32] attemptto retrieve structural information through matrix factorizationincorporating the community structure. However, all the abovemethods are proposed for homogeneous graphs.

In order to handle graph heterogeneity, metapath2vec [4]samples random walks under the guidance of meta-paths andlearns node embeddings through the skip-gram. HIN2Vec [7]learns the embedding vectors of nodes and meta-paths simulta-neously while conducts prediction tasks. Wang et al. [33] con-sider the attention mechanism in heterogeneous graph learningthrough the model HAN, where information from multiplemeta-path defined connections can be learned effectively. Fromthe perspective of attributed graphs, SHNE [38] capturesboth structural closeness and unstructured semantic relationsthrough joint optimization of heterogeneous SkipGram anddeep semantic encoding. HGAT [21] employs a hierarchicalattention mechanism considering both node-level and schema-level attention to handle graph heterogeneity. Many learningmethods [23], [31] on knowledge graphs can often be appliedto other heterogeneous graphs.

B. Graph neural network

The core idea of GNN is aggregating the neighbors’ featureinformation through neural networks to learn the new fea-tures [35], which combine the independent information of thenode and corresponding structural information in the graph.A propagation model incorporating gated recurrent units topropagate information across all nodes is proposed in [17].Joan Bruna et al. [2] extends convolution to general graphsby a novel Fourier transformation in graphs. Kipf et al. [15]propose Graph Convolutional Network (GCN). Hamilton etal. [10] introduce GraphSAGE that generates embeddings bydirectly aggregating features from a node’s local neighbor-hood. Graph Attention Network (GAT) [29] first imports the

2

attention mechanism into graphs. Other successful GNNs arealso based on supervised learning including SplineCNN [6],AdaGCN [26] and AS-GCN [13]. The unsupervised learningGNNs can be mainly divided into two categories, i.e., randomwalk-based [20], [9], [16], [5], [10] and mutual information-based [30]. DGI [30] is a general GNN for learning noderepresentations within graph-structured data in an unsuper-vised manner. DGI relies on maximizing mutual informationbetween patch representations and corresponding high-levelsummaries of graphs. DGI is the pioneer of our work, but itcan only be applied to homogeneous graphs. HDGI extendsmutual information-based GNNs to heterogeneous graphs.

C. Mutual information

Information theory [24] has been applied in various do-mains. Recently mutual information is utilized in variouslearning tasks successfully. Belghazi et al. [1] propose aneural network-based mutual information estimator, enablingthe mutual information estimation to be linearly scalable indimensionality. Deep InfoMax (DIM) [12] investigates unsu-pervised learning of representations by maximizing mutualinformation in the computer vision domain. DGI [30] extendsthe mechanism into the graph domain and works on node rep-resentation learning. Sun et al. [25] take advantage of mutualinformation maximization to learn graph-level representations.

III. PROBLEM FORMULATION

In this section, we first introduce the concept of hetero-geneous graph and the problem definition of heterogeneousgraph representation learning. Next, we define meta-pathbased adjacency matrix, which is critical in the followingalgorithm description.

Definition 3.1 (Heterogeneous Graph (HG)): A heteroge-neous graph is defined as G = (V, E) with a node typemapping function φ : V → T and an edge type mappingfunction ψ : E → R. Each node v ∈ V belongs to oneparticular type in the node type set T : φ(v) ∈ T , and eachedge e ∈ E belongs to a particular edge type in the edge typeset R : ψ(e) ∈ R. Heterogeneous graphs have the propertythat |T | + |R| > 2. The attributes and content of nodes canbe encoded as initial feature matrix X .Problem Definition. (Heterogeneous Graph RepresentationLearning): Given a heterogeneous graph G and the initialfeature matrix X , the representation learning task in G is tolearn the low dimensional node representations H ∈ R|V|×dwhich contains both structure information of G and nodeattributes of X . The learned representation H can be applied todownstream graph-related tasks such as node classification andnode clustering, etc. We focus on learning the representationof a specific type of node in this paper. We represent the setof nodes for representation learning as the target-type nodesVt.

In a heterogeneous graph, two neighboring nodes can beconnected by different types of edges. Meta-paths [27], whichrepresent node and edge types between two neighboring nodes

in a HG, have been proposed to model such rich informa-tion. Formally, a path v1

R1−−→ v2R2−−→ · · · Rn−1−−−→ vn is

defined as a meta-path between nodes v1 and vn, whereinR = R1 ◦ R2 ◦ · · · ◦ Rn−1 defines the composite relationsbetween node v1 and vn [4]. In this paper, we intend toutilize symmetric and undirected meta paths to denote thecloseness among target-type nodes Vt, which can help simplifythe problem setting. We represent the set of meta paths usedin this paper as {Φ1,Φ2, · · · ,ΦP }, where Φi denotes the i-thmeta path type. For example, in Figure 1(d), Paper-Author-Paper (PAP) and Paper-Subject-Paper (PSP) are two types ofmeta-paths, which contain the semantic “papers written bythe same author” and “papers belonging to the same subject”respectively.

Definition 3.2 (Meta-path based Adjacency Matrix): For themeta-path Φi, if there exists a path instance between nodevi ∈ Vt and vj ∈ Vt, we call that vi and vj are “connectedneighbors” based on Φi. Such neighborhood information canbe represented by a meta-path based adjacent matrix AΦi ∈R|Vt|×|Vt|, where AΦi

ij = AΦiji = 1 if vi, vj are connected by

meta-path Φi and AΦiij = AΦi

ji = 0 otherwise.In the next section, we will propose the novel method HDGI

for heterogeneous graph representation learning, which is anunsupervised GNN model and can integrate information fromdifferent meta-paths through an attention mechanism.

IV. HDGI METHODOLOGY

Table I: Symbols and Definitions

Symbol Interpretation

Φ Meta-pathAΦ Meta-path based adjacency matrixX The initial node feature matrixVt The set of nodes with the target typeN The number of nodes in Vt

G The given heterogeneous graphD Mutual information based discriminatorC Negative samples generatorR Global representation encoder~s The graph-level summary vectorHΦ Node-level representations~q Semantic-level attention vectorSΦ Attention weight of meta-path φH Final negative nodes representationsH Final positive nodes representations

A. HDGI Architecture Overview

A high-level illustration of the proposed HDGI is shownin Figure 2. We summarize the notations used for modeldescription in Table I. The input of HDGI is a heterogeneousgraph G containing N target-type nodes whose initial d-dimension features are denoted by X ∈ RN×d, and meta-pathset {Φi}Pi=1. Based on {Φi}Pi=1 we can calculate the meta-path set based adjacency matrices {AΦi}Pi=1. The meta-pathbased local representation encoding described in Section IV-Bhas two steps: (1) learning individual node representationHΦi from X and each AΦi , i = 1, 2.., P and (2) generatingnode representation H by aggregating {HΦi}Pi=1 through a

3

�p<latexit sha1_base64="WCGR7w1G2tH4JSifByA2kYmPye8=">AAAB73icbVA9SwNBEJ3zM8avqKXNYiJYhbtYaBm0sYxgPiA5wt5mLlmyt3fu7gnhyJ+wsVDE1r9j579xk1yhiQ8GHu/NMDMvSATXxnW/nbX1jc2t7cJOcXdv/+CwdHTc0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHssHM0nQj+hQ8pAzaqzUqfQaI95PKv1S2a26c5BV4uWkDDka/dJXbxCzNEJpmKBadz03MX5GleFM4LTYSzUmlI3pELuWShqh9rP5vVNybpUBCWNlSxoyV39PZDTSehIFtjOiZqSXvZn4n9dNTXjtZ1wmqUHJFovCVBATk9nzZMAVMiMmllCmuL2VsBFVlBkbUdGG4C2/vEpatap3Wa3d18r1mzyOApzCGVyAB1dQhztoQBMYCHiGV3hzHp0X5935WLSuOfnMCfyB8/kDJMWPYA==</latexit>

�1<latexit sha1_base64="W7QkdFkCiV07P6ffAjx390knCXg=">AAAB73icbVA9SwNBEJ3zM8avqKXNYiJYhbtYaBm0sYxgPiA5wt5mL1myt3fuzgnhyJ+wsVDE1r9j579xk1yhiQ8GHu/NMDMvSKQw6Lrfztr6xubWdmGnuLu3f3BYOjpumTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6wEnC/YgOlQgFo2ilTqXXGIm+V+mXym7VnYOsEi8nZcjR6Je+eoOYpRFXyCQ1puu5CfoZ1SiY5NNiLzU8oWxMh7xrqaIRN342v3dKzq0yIGGsbSkkc/X3REYjYyZRYDsjiiOz7M3E/7xuiuG1nwmVpMgVWywKU0kwJrPnyUBozlBOLKFMC3srYSOqKUMbUdGG4C2/vEpatap3Wa3d18r1mzyOApzCGVyAB1dQhztoQBMYSHiGV3hzHp0X5935WLSuOfnMCfyB8/kDxPuPIQ==</latexit>

…

…

Semantic-Level Attention

Semantic-Level Attention



~s<latexit sha1_base64="QjjJAQFimeyT/sNxC5Un++RRQbM=">AAAB8HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGIiHwYI2VvmYMPu3WV3j4Rc+BU2Fhpj68+x89+4wBUKvmSSl/dmMjPPjwXXxnW/ndzG5tb2Tn63sLd/cHhUPD5p6ihRDBssEpFq+1Sj4CE2DDcC27FCKn2BLX98N/dbE1SaR+GjmcbYk3QY8oAzaqz0VO5OkKV6Vu4XS27FXYCsEy8jJchQ7xe/uoOIJRJDwwTVuuO5semlVBnOBM4K3URjTNmYDrFjaUgl6l66OHhGLqwyIEGkbIWGLNTfEymVWk+lbzslNSO96s3F/7xOYoKbXsrDODEYsuWiIBHERGT+PRlwhcyIqSWUKW5vJWxEFWXGZlSwIXirL6+TZrXiXVWqD9VS7TaLIw9ncA6X4ME11OAe6tAABhKe4RXeHOW8OO/Ox7I152Qzp/AHzucPZi6QIw==</latexit>

D<latexit sha1_base64="3ttfGuZZ21mYv+6l8y+bFgj1sJ4=">AAAB9HicbVBNTwIxFHzFL8Qv1KOXRjDxRHbxoEeiHjxiIkgCG9ItXWjodte2S0I2/A4vHjTGqz/Gm//GLuxBwUmaTGbey5uOHwuujeN8o8La+sbmVnG7tLO7t39QPjxq6yhRlLVoJCLV8YlmgkvWMtwI1okVI6Ev2KM/vsn8xwlTmkfywUxj5oVkKHnAKTFW8qq9kJgRJSK9nVX75YpTc+bAq8TNSQVyNPvlr94goknIpKGCaN11ndh4KVGGU8FmpV6iWUzomAxZ11JJQqa9dB56hs+sMsBBpOyTBs/V3xspCbWehr6dzDLqZS8T//O6iQmuvJTLODFM0sWhIBHYRDhrAA+4YtSIqSWEKm6zYjoiilBjeyrZEtzlL6+Sdr3mXtTq9/VK4zqvowgncArn4MIlNOAOmtACCk/wDK/whiboBb2jj8VoAeU7x/AH6PMHMs6RuA==</latexit>

D<latexit sha1_base64="3ttfGuZZ21mYv+6l8y+bFgj1sJ4=">AAAB9HicbVBNTwIxFHzFL8Qv1KOXRjDxRHbxoEeiHjxiIkgCG9ItXWjodte2S0I2/A4vHjTGqz/Gm//GLuxBwUmaTGbey5uOHwuujeN8o8La+sbmVnG7tLO7t39QPjxq6yhRlLVoJCLV8YlmgkvWMtwI1okVI6Ev2KM/vsn8xwlTmkfywUxj5oVkKHnAKTFW8qq9kJgRJSK9nVX75YpTc+bAq8TNSQVyNPvlr94goknIpKGCaN11ndh4KVGGU8FmpV6iWUzomAxZ11JJQqa9dB56hs+sMsBBpOyTBs/V3xspCbWehr6dzDLqZS8T//O6iQmuvJTLODFM0sWhIBHYRDhrAA+4YtSIqSWEKm6zYjoiilBjeyrZEtzlL6+Sdr3mXtTq9/VK4zqvowgncArn4MIlNOAOmtACCk/wDK/whiboBb2jj8VoAeU7x/AH6PMHMs6RuA==</latexit>

+<latexit sha1_base64="lGbw8jcJW3K0wEshrSQp8aQG0q8=">AAAB6HicbVDJSgNBEK2JWxK3qEcvjUEQhDCjBz0GvXhMwCyYDKGnU5O06Vno7hHDkC/w4kGRXP0B/8WbX6Od5aCJDwoe71VRVc+LBVfatr+szMrq2vpGNpff3Nre2S3s7ddVlEiGNRaJSDY9qlDwEGuaa4HNWCINPIENb3A98RsPKBWPwls9jNENaC/kPmdUG6l62ikU7ZI9BVkmzpwUy7l4fPfx+F3pFD7b3YglAYaaCapUy7Fj7aZUas4EjvLtRGFM2YD2sGVoSANUbjo9dESOjdIlfiRNhZpM1d8TKQ2UGgae6Qyo7qtFbyL+57US7V+6KQ/jRGPIZov8RBAdkcnXpMslMi2GhlAmubmVsD6VlGmTTd6E4Cy+vEzqZyXnvHRWNWlcwQxZOIQjOAEHLqAMN1CBGjBAeIIXeLXurWfrzRrPWjPWfOYA/sB6/wGSBpCB</latexit>

�<latexit sha1_base64="mUKsBb88/ta0Ga10SLxFp+4HiNw=">AAAB6HicbVDJTgJBEK3BDXBDPXrpSEy8SGb0oEeiF4+QyBJhQnqaGmjpWdLdYyQTvsCLB43h6g/4L978Gm2Wg4IvqeTlvapU1fNiwZW27S8rs7K6tr6RzeU3t7Z3dgt7+3UVJZJhjUUikk2PKhQ8xJrmWmAzlkgDT2DDG1xP/MYDSsWj8FYPY3QD2gu5zxnVRqqedgpFu2RPQZaJMyfFci4e3308flc6hc92N2JJgKFmgirVcuxYuymVmjOBo3w7URhTNqA9bBka0gCVm04PHZFjo3SJH0lToSZT9fdESgOlhoFnOgOq+2rRm4j/ea1E+5duysM40Riy2SI/EURHZPI16XKJTIuhIZRJbm4lrE8lZdpkkzchOIsvL5P6Wck5L51VTRpXMEMWDuEITsCBCyjDDVSgBgwQnuAFXq1769l6s8az1ow1nzmAP7DefwCVDpCD</latexit>

a�1<latexit sha1_base64="kFBbjL47VHCYdsB69VZ5dhko2/U=">AAAB83icbVDLSsNAFL2pr1pfVZduBlvBVUnqQpdFNy4r2Ac0IUymk3boZBLmIZTQ33DjQhG3/ow7/8Zpm4W2HrhwOOde7r0nyjhT2nW/ndLG5tb2Tnm3srd/cHhUPT7pqtRIQjsk5ansR1hRzgTtaKY57WeS4iTitBdN7uZ+74lKxVLxqKcZDRI8EixmBGsr+XUc5n57zEJvVg+rNbfhLoDWiVeQGhRoh9Uvf5gSk1ChCcdKDTw300GOpWaE01nFN4pmmEzwiA4sFTihKsgXN8/QhVWGKE6lLaHRQv09keNEqWkS2c4E67Fa9ebif97A6PgmyJnIjKaCLBfFhiOdonkAaMgkJZpPLcFEMnsrImMsMdE2pooNwVt9eZ10mw3vqtF8aNZat0UcZTiDc7gED66hBffQhg4QyOAZXuHNMc6L8+58LFtLTjFzCn/gfP4AAf6RAQ==</latexit>

a�1<latexit sha1_base64="kFBbjL47VHCYdsB69VZ5dhko2/U=">AAAB83icbVDLSsNAFL2pr1pfVZduBlvBVUnqQpdFNy4r2Ac0IUymk3boZBLmIZTQ33DjQhG3/ow7/8Zpm4W2HrhwOOde7r0nyjhT2nW/ndLG5tb2Tnm3srd/cHhUPT7pqtRIQjsk5ansR1hRzgTtaKY57WeS4iTitBdN7uZ+74lKxVLxqKcZDRI8EixmBGsr+XUc5n57zEJvVg+rNbfhLoDWiVeQGhRoh9Uvf5gSk1ChCcdKDTw300GOpWaE01nFN4pmmEzwiA4sFTihKsgXN8/QhVWGKE6lLaHRQv09keNEqWkS2c4E67Fa9ebif97A6PgmyJnIjKaCLBfFhiOdonkAaMgkJZpPLcFEMnsrImMsMdE2pooNwVt9eZ10mw3vqtF8aNZat0UcZTiDc7gED66hBffQhg4QyOAZXuHNMc6L8+58LFtLTjFzCn/gfP4AAf6RAQ==</latexit>

a�p<latexit sha1_base64="lye4TivgGDiAIoY6CQpIXDJjbtM=">AAAB83icbVBNS8NAEJ3Ur1q/qh69LLaCp5LUgx6LXjxWsB/QhLDZbtqlm82yuxFK6N/w4kERr/4Zb/4bt20O2vpg4PHeDDPzIsmZNq777ZQ2Nre2d8q7lb39g8Oj6vFJV6eZIrRDUp6qfoQ15UzQjmGG075UFCcRp71ocjf3e09UaZaKRzOVNEjwSLCYEWys5NdxmPvtMQvlrB5Wa27DXQCtE68gNSjQDqtf/jAlWUKFIRxrPfBcaYIcK8MIp7OKn2kqMZngER1YKnBCdZAvbp6hC6sMUZwqW8Kghfp7IseJ1tMksp0JNmO96s3F/7xBZuKbIGdCZoYKslwUZxyZFM0DQEOmKDF8agkmitlbERljhYmxMVVsCN7qy+uk22x4V43mQ7PWui3iKMMZnMMleHANLbiHNnSAgIRneIU3J3NenHfnY9lacoqZU/gD5/MHYfiRQA==</latexit>

a�p<latexit sha1_base64="lye4TivgGDiAIoY6CQpIXDJjbtM=">AAAB83icbVBNS8NAEJ3Ur1q/qh69LLaCp5LUgx6LXjxWsB/QhLDZbtqlm82yuxFK6N/w4kERr/4Zb/4bt20O2vpg4PHeDDPzIsmZNq777ZQ2Nre2d8q7lb39g8Oj6vFJV6eZIrRDUp6qfoQ15UzQjmGG075UFCcRp71ocjf3e09UaZaKRzOVNEjwSLCYEWys5NdxmPvtMQvlrB5Wa27DXQCtE68gNSjQDqtf/jAlWUKFIRxrPfBcaYIcK8MIp7OKn2kqMZngER1YKnBCdZAvbp6hC6sMUZwqW8Kghfp7IseJ1tMksp0JNmO96s3F/7xBZuKbIGdCZoYKslwUZxyZFM0DQEOmKDF8agkmitlbERljhYmxMVVsCN7qy+uk22x4V43mQ7PWui3iKMMZnMMleHANLbiHNnSAgIRneIU3J3NenHfnY9lacoqZU/gD5/MHYfiRQA==</latexit>

C<latexit sha1_base64="jxtliYu5Zkj9yDfOUeXcLRmbg1Q=">AAAB8nicbVBNS8NAEJ3Urxq/qh69LBbBU0nqQS9isRePFewHpKFstpt26WYTdjdCCf0ZXjwo0qv/w7sX8d+4aXvQ1gcDj/dmmDcTJJwp7TjfVmFtfWNzq7ht7+zu7R+UDo9aKk4loU0S81h2AqwoZ4I2NdOcdhJJcRRw2g5G9dxvP1KpWCwe9DihfoQHgoWMYG0krxthPSSYZ/VJr1R2Ks4MaJW4C1K++bCvk+mX3eiVPrv9mKQRFZpwrJTnOon2Myw1I5xO7G6qaILJCA+oZ6jAEVV+Nos8QWdG6aMwlqaERjP190SGI6XGUWA684hq2cvF/zwv1eGVnzGRpJoKMl8UphzpGOX3oz6TlGg+NgQTyUxWRIZYYqLNl2zzBHf55FXSqlbci0r13inXbmGOIpzAKZyDC5dQgztoQBMIxPAEL/BqaevZerOm89aCtZg5hj+w3n8A1C2Umg==</latexit>

C<latexit sha1_base64="jxtliYu5Zkj9yDfOUeXcLRmbg1Q=">AAAB8nicbVBNS8NAEJ3Urxq/qh69LBbBU0nqQS9isRePFewHpKFstpt26WYTdjdCCf0ZXjwo0qv/w7sX8d+4aXvQ1gcDj/dmmDcTJJwp7TjfVmFtfWNzq7ht7+zu7R+UDo9aKk4loU0S81h2AqwoZ4I2NdOcdhJJcRRw2g5G9dxvP1KpWCwe9DihfoQHgoWMYG0krxthPSSYZ/VJr1R2Ks4MaJW4C1K++bCvk+mX3eiVPrv9mKQRFZpwrJTnOon2Myw1I5xO7G6qaILJCA+oZ6jAEVV+Nos8QWdG6aMwlqaERjP190SGI6XGUWA684hq2cvF/zwv1eGVnzGRpJoKMl8UphzpGOX3oz6TlGg+NgQTyUxWRIZYYqLNl2zzBHf55FXSqlbci0r13inXbmGOIpzAKZyDC5dQgztoQBMIxPAEL/BqaevZerOm89aCtZg5hj+w3n8A1C2Umg==</latexit>

R<latexit sha1_base64="OtkLdMO+A/gL1vGusgv+WEs+9KA=">AAAB8nicbVBNS8NAEJ3Urxq/qh69BIvgqST1oBex6MVjFfsBaSib7aZdutkNuxuhhP4MLx4U6dX/4d2L+G/ctD1o64OBx3szzJsJE0aVdt1vq7Cyura+Udy0t7Z3dvdK+wdNJVKJSQMLJmQ7RIowyklDU81IO5EExSEjrXB4k/utRyIVFfxBjxISxKjPaUQx0kbyOzHSA4xYdj/ulspuxZ3CWSbenJSvPuzLZPJl17ulz05P4DQmXGOGlPI9N9FBhqSmmJGx3UkVSRAeoj7xDeUoJirIppHHzolRek4kpCmunan6eyJDsVKjODSdeUS16OXif56f6ugiyChPUk04ni2KUuZo4eT3Oz0qCdZsZAjCkpqsDh4gibA2X7LNE7zFk5dJs1rxzirVO7dcu4YZinAEx3AKHpxDDW6hDg3AIOAJXuDV0taz9WZNZq0Faz5zCH9gvf8A6viUqQ==</latexit>


…

…




~h�1i<latexit sha1_base64="2EHsTJuGSBBZqnpMurcECEvplhk=">AAAB/XicbVC5TsNAEF1zhnCZo6NZkSBRRXYooIygoQwSOaTYWOvNOFllfWh3HSlYFr9CQwFCtPwHHX/DJnEBCU8a6em9Gc3M8xPOpLKsb2NldW19Y7O0Vd7e2d3bNw8O2zJOBYUWjXksuj6RwFkELcUUh24igIQ+h44/upn6nTEIyeLoXk0ScEMyiFjAKFFa8szjqjMGmg1zjz1kTnPIPDuvembFqlkz4GViF6SCCjQ988vpxzQNIVKUEyl7tpUoNyNCMcohLzuphITQERlAT9OIhCDdbHZ9js+00sdBLHRFCs/U3xMZCaWchL7uDIkaykVvKv7n9VIVXLkZi5JUQUTni4KUYxXjaRS4zwRQxSeaECqYvhXTIRGEKh1YWYdgL768TNr1mn1Rq9/VK43rIo4SOkGn6BzZ6BI10C1qohai6BE9o1f0ZjwZL8a78TFvXTGKmSP0B8bnD/H1lOI=</latexit>

~xi<latexit sha1_base64="4256gQVBSauEIXRKoPlmZdL85dc=">AAAB8nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGIiH8lxIXvLHmzY273s7hHJhZ9hY6Extv4aO/+NC1yh4EsmeXlvJjPzwoQzbVz32ylsbG5t7xR3S3v7B4dH5eOTtpapIrRFJJeqG2JNORO0ZZjhtJsoiuOQ0044vpv7nQlVmknxaKYJDWI8FCxiBBsr+dXehJLsadZn1X654tbcBdA68XJSgRzNfvmrN5AkjakwhGOtfc9NTJBhZRjhdFbqpZommIzxkPqWChxTHWSLk2fowioDFEllSxi0UH9PZDjWehqHtjPGZqRXvbn4n+enJroJMiaS1FBBlouilCMj0fx/NGCKEsOnlmCimL0VkRFWmBibUsmG4K2+vE7a9Zp3Vas/1CuN2zyOIpzBOVyCB9fQgHtoQgsISHiGV3hzjPPivDsfy9aCk8+cwh84nz/uaJEE</latexit>

~xi<latexit sha1_base64="4256gQVBSauEIXRKoPlmZdL85dc=">AAAB8nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGIiH8lxIXvLHmzY273s7hHJhZ9hY6Extv4aO/+NC1yh4EsmeXlvJjPzwoQzbVz32ylsbG5t7xR3S3v7B4dH5eOTtpapIrRFJJeqG2JNORO0ZZjhtJsoiuOQ0044vpv7nQlVmknxaKYJDWI8FCxiBBsr+dXehJLsadZn1X654tbcBdA68XJSgRzNfvmrN5AkjakwhGOtfc9NTJBhZRjhdFbqpZommIzxkPqWChxTHWSLk2fowioDFEllSxi0UH9PZDjWehqHtjPGZqRXvbn4n+enJroJMiaS1FBBlouilCMj0fx/NGCKEsOnlmCimL0VkRFWmBibUsmG4K2+vE7a9Zp3Vas/1CuN2zyOIpzBOVyCB9fQgHtoQgsISHiGV3hzjPPivDsfy9aCk8+cwh84nz/uaJEE</latexit>

~xj<latexit sha1_base64="G5/Ckn7iqgb+5R+byVQyxLqXgJY=">AAAB/HicbVC7TsNAEDzzDOFlSEljkSBRRXYooIygoQwSeUixZZ3P6+TI+aG7c4RlmV+hoQAhWj6Ejr/hkriAhJFWGs3sanfHSxgV0jS/tbX1jc2t7cpOdXdv/+BQPzruiTjlBLokZjEfeFgAoxF0JZUMBgkHHHoM+t7kZub3p8AFjaN7mSXghHgU0YASLJXk6rWGPQWS25IyH/LHonAfGq5eN5vmHMYqsUpSRyU6rv5l+zFJQ4gkYViIoWUm0skxl5QwKKp2KiDBZIJHMFQ0wiEIJ58fXxhnSvGNIOaqImnM1d8TOQ6FyEJPdYZYjsWyNxP/84apDK6cnEZJKiEii0VBygwZG7MkDJ9yIJJlimDCqbrVIGPMMZEqr6oKwVp+eZX0Wk3rotm6a9Xb12UcFXSCTtE5stAlaqNb1EFdRFCGntEretOetBftXftYtK5p5UwN/YH2+QPlQpTs</latexit>

~xj<latexit sha1_base64="G5/Ckn7iqgb+5R+byVQyxLqXgJY=">AAAB/HicbVC7TsNAEDzzDOFlSEljkSBRRXYooIygoQwSeUixZZ3P6+TI+aG7c4RlmV+hoQAhWj6Ejr/hkriAhJFWGs3sanfHSxgV0jS/tbX1jc2t7cpOdXdv/+BQPzruiTjlBLokZjEfeFgAoxF0JZUMBgkHHHoM+t7kZub3p8AFjaN7mSXghHgU0YASLJXk6rWGPQWS25IyH/LHonAfGq5eN5vmHMYqsUpSRyU6rv5l+zFJQ4gkYViIoWUm0skxl5QwKKp2KiDBZIJHMFQ0wiEIJ58fXxhnSvGNIOaqImnM1d8TOQ6FyEJPdYZYjsWyNxP/84apDK6cnEZJKiEii0VBygwZG7MkDJ9yIJJlimDCqbrVIGPMMZEqr6oKwVp+eZX0Wk3rotm6a9Xb12UcFXSCTtE5stAlaqNb1EFdRFCGntEretOetBftXftYtK5p5UwN/YH2+QPlQpTs</latexit>

~h�p

i<latexit sha1_base64="2lwB9tFbzIgK+V2R9Arasv5cOQI=">AAAB/XicbVC5TsNAEB1zhnCZo6NZkSBRRXYooIygoQwSOaTYWOvNOlllfWh3HSlYFr9CQwFCtPwHHX/DJnEBCU8a6em9Gc3M8xPOpLKsb2NldW19Y7O0Vd7e2d3bNw8O2zJOBaEtEvNYdH0sKWcRbSmmOO0mguLQ57Tjj26mfmdMhWRxdK8mCXVDPIhYwAhWWvLM46ozpiQb5h57yJzmkHlJXvXMilWzZkDLxC5IBQo0PfPL6cckDWmkCMdS9mwrUW6GhWKE07zspJImmIzwgPY0jXBIpZvNrs/RmVb6KIiFrkihmfp7IsOhlJPQ150hVkO56E3F/7xeqoIrN2NRkioakfmiIOVIxWgaBeozQYniE00wEUzfisgQC0yUDqysQ7AXX14m7XrNvqjV7+qVxnURRwlO4BTOwYZLaMAtNKEFBB7hGV7hzXgyXox342PeumIUM0fwB8bnD1H+lSE=</latexit>

~h�1

j<latexit sha1_base64="cIMxhzIEBVJeSv4z5Buv1XeEssc=">AAACBXicbVC5TsNAEF2HK4TLQAmFRYJEFdmhgDKChjJI5JBiY63X43jJ+tDuOlJkuaHhV2goQIiWf6Djb9gcBSQ8aaSn92Y0M89LGRXSNL+10srq2vpGebOytb2zu6fvH3REknECbZKwhPc8LIDRGNqSSga9lAOOPAZdb3g98bsj4IIm8Z0cp+BEeBDTgBIsleTqxzV7BCS3JWU+5GFRuA/3ud0KqWsVNVevmnVzCmOZWHNSRXO0XP3L9hOSRRBLwrAQfctMpZNjLilhUFTsTECKyRAPoK9ojCMQTj79ojBOleIbQcJVxdKYqr8nchwJMY481RlhGYpFbyL+5/UzGVw6OY3TTEJMZouCjBkyMSaRGD7lQCQbK4IJp+pWg4SYYyJVcBUVgrX48jLpNOrWeb1x26g2r+ZxlNEROkFnyEIXqIluUAu1EUGP6Bm9ojftSXvR3rWPWWtJm88coj/QPn8AkLWYmQ==</latexit>

~h�p

j<latexit sha1_base64="R60/sKF3f3iOWW61WiDwniCNsEg=">AAACBXicbVC5TsNAEF2HK4TLQAmFRYJEFdmhgDKChjJI5JBiY63X43jJ+tDuOlJkuaHhV2goQIiWf6Djb9gcBSQ8aaSn92Y0M89LGRXSNL+10srq2vpGebOytb2zu6fvH3REknECbZKwhPc8LIDRGNqSSga9lAOOPAZdb3g98bsj4IIm8Z0cp+BEeBDTgBIsleTqxzV7BCS3JWU+5GFRuA/3ud0KqZsWNVevmnVzCmOZWHNSRXO0XP3L9hOSRRBLwrAQfctMpZNjLilhUFTsTECKyRAPoK9ojCMQTj79ojBOleIbQcJVxdKYqr8nchwJMY481RlhGYpFbyL+5/UzGVw6OY3TTEJMZouCjBkyMSaRGD7lQCQbK4IJp+pWg4SYYyJVcBUVgrX48jLpNOrWeb1x26g2r+ZxlNEROkFnyEIXqIluUAu1EUGP6Bm9ojftSXvR3rWPWWtJm88coj/QPn8A8K+Y2A==</latexit>

~hj

<latexit sha1_base64="mo9OPr63pVQbd6a0VL0flLStufE=">AAAB/HicbVC7TsNAEDyHVwgvQ0oaiwSJKrJDAWUEDWWQyEOKLet8XidHzg/dnSNZlvkVGgoQouVD6PgbLokLSBhppdHMrnZ3vIRRIU3zW6tsbG5t71R3a3v7B4dH+vFJX8QpJ9AjMYv50MMCGI2gJ6lkMEw44NBjMPCmt3N/MAMuaBw9yCwBJ8TjiAaUYKkkV6837RmQ3JaU+ZBPisJ9bLp6w2yZCxjrxCpJA5XouvqX7cckDSGShGEhRpaZSCfHXFLCoKjZqYAEkykew0jRCIcgnHxxfGGcK8U3gpiriqSxUH9P5DgUIgs91RliORGr3lz8zxulMrh2cholqYSILBcFKTNkbMyTMHzKgUiWKYIJp+pWg0wwx0SqvGoqBGv15XXSb7esy1b7vt3o3JRxVNEpOkMXyEJXqIPuUBf1EEEZekav6E170l60d+1j2VrRypk6+gPt8wfMspTc</latexit>

~hi<latexit sha1_base64="41SAisgSHUsdjqt4AbCLNlWxDu0=">AAAB8nicbVA9TwJBEJ3DL8Qv1NLmIphYkTsstCTaWGIiSAIXsrfMwYa93cvuHgm58DNsLDTG1l9j579xgSsUfMkkL+/NZGZemHCmjed9O4WNza3tneJuaW//4PCofHzS1jJVFFtUcqk6IdHImcCWYYZjJ1FI4pDjUzi+m/tPE1SaSfFopgkGMRkKFjFKjJW61d4EaTaa9Vm1X654NW8Bd534OalAjma//NUbSJrGKAzlROuu7yUmyIgyjHKclXqpxoTQMRli11JBYtRBtjh55l5YZeBGUtkSxl2ovycyEms9jUPbGRMz0qveXPzP66YmugkyJpLUoKDLRVHKXSPd+f/ugCmkhk8tIVQxe6tLR0QRamxKJRuCv/ryOmnXa/5Vrf5QrzRu8ziKcAbncAk+XEMD7qEJLaAg4Rle4c0xzovz7nwsWwtOPnMKf+B8/gDV6JD0</latexit>

Meta-path based local representation encoder

Meta-path based local representation encoder

Global representation encoder

Discriminator

Negative samples generator (a)

(a)

(d)

(d)

(b)

(c)

Figure 2 The high-level structure of HDGI . (a) Local representation encoder is a hierarchical structure: learning noderepresentations in terms of every meta-path based adjacency matrix respectively and then aggregating them throughsemantic-level attention. (b) Global representation encoder R outputs a graph-level summary vector ~s. (c) Negativesamples generator C is responsible for generating negative nodes. (d) The discriminator D maximizes mutualinformation between positive nodes and the graph-level summary.

semantic-level attention mechanism. A global representationencoder R is proposed to derive a graph summary vector~s from H (see Section IV-C). The discriminator D will betrained with the objective to maximize mutual informationbetween positive nodes and the graph-level summary ~s. At thesame time, HDGI is optimized in an end-to-end manner bybackpropagation with the object of mutual information maxi-mization. In Section IV-D we elaborate the mutual informationbased discriminator D and the negative sample generator C.

B. Meta-path based local representation encoder

Meta-path based local representation encoder has a hier-archical structure. The first step is meta-path specific noderepresentation learning, which encodes nodes in terms ofeach meta-path based adjacency matrix. The second step,namely heterogeneous graph node representation learning, isaggregating these representations relating to different meta-paths by a semantic-level attention mechanism.

1) Meta-path specific node representation learning: Weuse meta-path based adjacency matrices to convey multipledifferent semantics due to the heterogeneity of input graphs.Given the meta-path Φ, the adjacency matrix AΦ ∈ R|Vt|×|Vt|can represent the relational information about the meta-path Φbased connections. Each of AΦi , i = 1, 2, ..P can be viewedas a homogeneous graph. The initial node feature matrixX ∈ RN×F is constructed by stacking the feature vectors inX . At this step, our target is to derive a node representationcontaining the information of initial node feature X and AΦi ,

with a node-level encoder:

HΦi = aΦi(X,AΦi) (1)

Two kinds of the encoder are considered in this work.The first is Graph Convolutional Network (GCN) [15]. GCNintroduces a spectral graph convolution operator for the graphrepresentation learning. The node representations learned byGCN should be:

HΦi = (DΦi− 1

2 AΦiDΦi− 1

2 )XWΦi (2)

where AΦi = AΦi +I , DΦi is the diagonal node degree matrixof AΦi . Matrix WΦi ∈ Rd×F is the filter parameter matrix,which is not shared between different AΦ.

The second encoder we consider is the Graph AttentionNetworks (GAT) [29]. GAT effectively updates the noderepresentations by aggregating the information from theirneighbors, including self-neighbor. For the m-th node, its K-head attention output can be computed as:

~hΦim =

K

‖k=1

σ(∑

j∈NΦim

αΦi,kmj WΦi~xj) (3)

where ‖ is the concatenation operator, WΦi is the lineartransformation parameter matrix and NΦi

m is neighbor setdefined by Φi. α

Φi,kmj is the normalized attention coefficient

computed by the k-th attention mechanism.After the meta-path specific node representation learning,

we obtain the set of node representations {HΦi}Pi=1. They areaggregated to get the heterogeneous graph node representation.

4

2) Heterogeneous graph node representation learning: Therepresentations learned based on the specific meta-path containonly the semantic-specific information. In order to aggregatethe more general representations of the nodes, we need to com-bine these representations {HΦ1 , HΦ2 , . . . ,HΦP }. The vitalissue to accomplish the aggregation is exploring how mucheach meta-path should contribute to the final representations.Here we add a semantic-level attention layer Latt to learn theweights:

{βΦ1 , βΦ2 , . . . , βΦP } = Latt(HΦ1 , HΦ2 , . . . ,HΦP ) (4)

Specifically, the importance of the meta-path Φi is calculatedby

eΦi =1

N

N∑n=1

tanh(~qT · [Wsem · ~hΦin +~b]) (5)

where Wsem is a linear transformation parameter matrix. ~qis the learnable shared attention vector. βΦi is obtained bynormalizing {eΦi}Pi=1 with a softmax function:

βΦi = softmax(eΦi) =exp(eΦi)∑Pj=1 exp(eΦj )

(6)

The heterogeneous graph node representation H is obtainedby a linear combination of {HΦi}Pi=1, that is

H =

P∑i=1

βΦi ·HΦi (7)

Our semantic attention layer is inspired by HAN [33],but there are still some differences in the learning direction.HAN utilizes classification cross-entropy as the loss function.Available labels in training set guide the learning direction.The attention weights learned in HDGI are guided by thebinary cross-entropy loss, which indicates whether the nodebelongs to the original graph. Therefore, the weights learnedin HDGI serve for the existence of a node. Because noclassification label involves, the weights get no bias from theknown labels.

The representations H serve as the final output localfeatures. It should be mentioned that all parameters in themeta-path based local representation encoder are shared forpositive and negative nodes. Negative nodes are generated bythe negative samples generator, which we will introduce insection IV-D2. The global representation encoder leveragesthe representations H to output the graph-level summary,described in the following section.

C. Global Representation Encoder

The learning objective of HDGI is to maximize the mutualinformation between local representations and global repre-sentation. The local representations of nodes are included inH . We need the summary vector ~s to represent the entireheterogeneous graph’s global information. Based on H , weexamined three candidate encoder functions:Averaging encoder function. Our first candidate encoderfunction is the averaging operator, where we simply take the

mean of the node representations to output the graph-levelsummary vector ~s:

~s = σ

(1

N

N∑i=1

~hi

)(8)

σ is a PReLU activation funtion.Pooling encoder function. In this pooling encoder function,each node’s vector is independently fed through a fully-connected layer. An elementwise max-pooling operator hasapplied to summary the information from the nodes set:

~spool = max(σ(Wpool~hi + b), i ∈ {1, 2, . . . , N}) (9)

where max denotes the element-wise max operator and σ isa nonlinear activation function.Set2vec encoder function. The final encoder function weexamine is Set2vec [18], which is based on an LSTM archi-tecture. Because the original set2vec in [18] works on orderednode sequences. However, we need a summary of the graphconcluding comprehensive information from each node insteadof merely graph structure. Therefore, we apply the LSTMs toa random permutation of the node’s neighbor on an unorderedset.

D. HDGI Learning

1) Mutual information based discriminator: It is inconve-nient to calculate the mutual information between randomvariables directly. Belghazi et al. [1] prove that the KL-divergence admits the Donsker-Varadhan representation andthe f -divergence representation as dual representations. Thedual representations provide a lower-bound to the mutualinformation of random variables X and Y :

MI(X;Y ) ≥ EPXY[Tω(x, y)]− log(EPX⊗PY

[eTω(x,y)]) (10)

Here, PXY is the joint distribution, and PX ⊗ PY is theproduct of margins. Tω is a deep neural network baseddiscriminator parametrized by ω. The expectations in equ.10can be estimated using samples from PXY and PX ⊗PY . Theexpressive power of the discriminator ensures to approximatethe MI with high accuracy. Following the philosophy in [12],we estimate and maximize the mutual information by traininga discriminator D to distinguish positive sample set Pos ={

[~hn, ~s]}N

n=1with negative sample set Neg =

{[~hm, ~s]

}M

m=1.

The sample (~hi, ~s) is denoted as positive as node ~hi belongsto the original graph (the joint distribution), and (

~hj , ~s) is

negative as the node ~hj is the generated fake one (the productof marginals). The discriminator D is a bilinear layer:

D(~hi, ~s) = σ(~hTi WD~s) (11)

Here WD is a learnable matrix, and σ is the sigmoid activationfunction. Velickovic et al. [30] prove that the binary cross-entropy loss amounts to maximizing the mutual informationwith theoretical guarantee. Therefore, we can maximize the

5


… C<latexit sha1_base64="s4eeBHAv45kU4GGev0AQOtaRd00=">AAAB9HicbVBNTwIxFHyLX4hfqEcvjWDiieziQY9ELh4xETCBDemWLjR027XtkpANv8OLB43x6o/x5r+xC3tQcJImk5n38qYTxJxp47rfTmFjc2t7p7hb2ts/ODwqH590tEwUoW0iuVSPAdaUM0HbhhlOH2NFcRRw2g0mzczvTqnSTIoHM4upH+GRYCEj2FjJr/YjbMYE87Q5rw7KFbfmLoDWiZeTCuRoDcpf/aEkSUSFIRxr3fPc2PgpVoYRTuelfqJpjMkEj2jPUoEjqv10EXqOLqwyRKFU9gmDFurvjRRHWs+iwE5mGfWql4n/eb3EhDd+ykScGCrI8lCYcGQkyhpAQ6YoMXxmCSaK2ayIjLHCxNieSrYEb/XL66RTr3lXtfp9vdK4zesowhmcwyV4cA0NuIMWtIHAEzzDK7w5U+fFeXc+lqMFJ985hT9wPn8AMUiRtw==</latexit>

~x1<latexit sha1_base64="h3Dgwo2azxQRZ6c/6N2FJoOGn94=">AAAB8nicbVA9TwJBEN3DL8Qv1NJmI5hYkTsstCTaWGIiH8lxIXvLHmzY273szhHJhZ9hY6Extv4aO/+NC1yh4EsmeXlvJjPzwkRwA6777RQ2Nre2d4q7pb39g8Oj8vFJ26hUU9aiSijdDYlhgkvWAg6CdRPNSBwK1gnHd3O/M2HacCUfYZqwICZDySNOCVjJr/YmjGZPs75X7Zcrbs1dAK8TLycVlKPZL3/1BoqmMZNABTHG99wEgoxo4FSwWamXGpYQOiZD5lsqScxMkC1OnuELqwxwpLQtCXih/p7ISGzMNA5tZ0xgZFa9ufif56cQ3QQZl0kKTNLloigVGBSe/48HXDMKYmoJoZrbWzEdEU0o2JRKNgRv9eV10q7XvKta/aFeadzmcRTRGTpHl8hD16iB7lETtRBFCj2jV/TmgPPivDsfy9aCk8+coj9wPn8AmVCQzA==</latexit>

~x2<latexit sha1_base64="7JqHzxKaPt/AZk7nwhzEXjHZj+8=">AAAB8nicbVA9TwJBEJ3DL8Qv1NJmI5hYkbuz0JJoY4mJfCRwIXvLHmzY273s7hHJhZ9hY6Extv4aO/+NC1yh4EsmeXlvJjPzwoQzbVz32ylsbG5t7xR3S3v7B4dH5eOTlpapIrRJJJeqE2JNORO0aZjhtJMoiuOQ03Y4vpv77QlVmknxaKYJDWI8FCxiBBsrdau9CSXZ06zvV/vliltzF0DrxMtJBXI0+uWv3kCSNKbCEI617npuYoIMK8MIp7NSL9U0wWSMh7RrqcAx1UG2OHmGLqwyQJFUtoRBC/X3RIZjradxaDtjbEZ61ZuL/3nd1EQ3QcZEkhoqyHJRlHJkJJr/jwZMUWL41BJMFLO3IjLCChNjUyrZELzVl9dJy695VzX/wa/Ub/M4inAG53AJHlxDHe6hAU0gIOEZXuHNMc6L8+58LFsLTj5zCn/gfP4AmtWQzQ==</latexit>

~x3<latexit sha1_base64="E3bgoourCSzwVAKflXbi1+qf+ys=">AAAB8nicbVA9TwJBEN3DL8Qv1NJmI5hYkTsotCTaWGIiYHJcyN4yBxv2di+7e0Ry4WfYWGiMrb/Gzn/jAlco+JJJXt6bycy8MOFMG9f9dgobm1vbO8Xd0t7+weFR+fiko2WqKLSp5FI9hkQDZwLahhkOj4kCEoccuuH4du53J6A0k+LBTBMIYjIULGKUGCv51d4EaPY06zeq/XLFrbkL4HXi5aSCcrT65a/eQNI0BmEoJ1r7npuYICPKMMphVuqlGhJCx2QIvqWCxKCDbHHyDF9YZYAjqWwJgxfq74mMxFpP49B2xsSM9Ko3F//z/NRE10HGRJIaEHS5KEo5NhLP/8cDpoAaPrWEUMXsrZiOiCLU2JRKNgRv9eV10qnXvEatfl+vNG/yOIroDJ2jS+ShK9REd6iF2ogiiZ7RK3pzjPPivDsfy9aCk8+coj9wPn8AnFqQzg==</latexit>

~x4<latexit sha1_base64="BqBaaLdWmAprKfvyHM00xvOONXQ=">AAAB8nicbVBNTwIxEO3iF+IX6tFLI5h4IrtookeiF4+YCJIsG9ItXWjotpt2lkg2/AwvHjTGq7/Gm//GAntQ8CWTvLw3k5l5YSK4Adf9dgpr6xubW8Xt0s7u3v5B+fCobVSqKWtRJZTuhMQwwSVrAQfBOolmJA4FewxHtzP/ccy04Uo+wCRhQUwGkkecErCSX+2OGc2epr3Laq9ccWvuHHiVeDmpoBzNXvmr21c0jZkEKogxvucmEGREA6eCTUvd1LCE0BEZMN9SSWJmgmx+8hSfWaWPI6VtScBz9fdERmJjJnFoO2MCQ7PszcT/PD+F6DrIuExSYJIuFkWpwKDw7H/c55pREBNLCNXc3orpkGhCwaZUsiF4yy+vkna95l3U6vf1SuMmj6OITtApOkceukINdIeaqIUoUugZvaI3B5wX5935WLQWnHzmGP2B8/kDnd+Qzw==</latexit>

~x5<latexit sha1_base64="dUcnki3QNeDgLiBzZMmc132ooLU=">AAAB8nicbVBNTwIxEO3iF+IX6tFLI5h4IrsYo0eiF4+YCJIsG9ItXWjotpt2lkg2/AwvHjTGq7/Gm//GAntQ8CWTvLw3k5l5YSK4Adf9dgpr6xubW8Xt0s7u3v5B+fCobVSqKWtRJZTuhMQwwSVrAQfBOolmJA4FewxHtzP/ccy04Uo+wCRhQUwGkkecErCSX+2OGc2epr3Laq9ccWvuHHiVeDmpoBzNXvmr21c0jZkEKogxvucmEGREA6eCTUvd1LCE0BEZMN9SSWJmgmx+8hSfWaWPI6VtScBz9fdERmJjJnFoO2MCQ7PszcT/PD+F6DrIuExSYJIuFkWpwKDw7H/c55pREBNLCNXc3orpkGhCwaZUsiF4yy+vkna95l3U6vf1SuMmj6OITtApOkceukINdIeaqIUoUugZvaI3B5wX5935WLQWnHzmGP2B8/kDn2SQ0A==</latexit>


~x1<latexit sha1_base64="h3Dgwo2azxQRZ6c/6N2FJoOGn94=">AAAB8nicbVA9TwJBEN3DL8Qv1NJmI5hYkTsstCTaWGIiH8lxIXvLHmzY273szhHJhZ9hY6Extv4aO/+NC1yh4EsmeXlvJjPzwkRwA6777RQ2Nre2d4q7pb39g8Oj8vFJ26hUU9aiSijdDYlhgkvWAg6CdRPNSBwK1gnHd3O/M2HacCUfYZqwICZDySNOCVjJr/YmjGZPs75X7Zcrbs1dAK8TLycVlKPZL3/1BoqmMZNABTHG99wEgoxo4FSwWamXGpYQOiZD5lsqScxMkC1OnuELqwxwpLQtCXih/p7ISGzMNA5tZ0xgZFa9ufif56cQ3QQZl0kKTNLloigVGBSe/48HXDMKYmoJoZrbWzEdEU0o2JRKNgRv9eV10q7XvKta/aFeadzmcRTRGTpHl8hD16iB7lETtRBFCj2jV/TmgPPivDsfy9aCk8+coj9wPn8AmVCQzA==</latexit>~x2

<latexit sha1_base64="7JqHzxKaPt/AZk7nwhzEXjHZj+8=">AAAB8nicbVA9TwJBEJ3DL8Qv1NJmI5hYkbuz0JJoY4mJfCRwIXvLHmzY273s7hHJhZ9hY6Extv4aO/+NC1yh4EsmeXlvJjPzwoQzbVz32ylsbG5t7xR3S3v7B4dH5eOTlpapIrRJJJeqE2JNORO0aZjhtJMoiuOQ03Y4vpv77QlVmknxaKYJDWI8FCxiBBsrdau9CSXZ06zvV/vliltzF0DrxMtJBXI0+uWv3kCSNKbCEI617npuYoIMK8MIp7NSL9U0wWSMh7RrqcAx1UG2OHmGLqwyQJFUtoRBC/X3RIZjradxaDtjbEZ61ZuL/3nd1EQ3QcZEkhoqyHJRlHJkJJr/jwZMUWL41BJMFLO3IjLCChNjUyrZELzVl9dJy695VzX/wa/Ub/M4inAG53AJHlxDHe6hAU0gIOEZXuHNMc6L8+58LFsLTj5zCn/gfP4AmtWQzQ==</latexit>

~x3<latexit sha1_base64="E3bgoourCSzwVAKflXbi1+qf+ys=">AAAB8nicbVA9TwJBEN3DL8Qv1NJmI5hYkTsotCTaWGIiYHJcyN4yBxv2di+7e0Ry4WfYWGiMrb/Gzn/jAlco+JJJXt6bycy8MOFMG9f9dgobm1vbO8Xd0t7+weFR+fiko2WqKLSp5FI9hkQDZwLahhkOj4kCEoccuuH4du53J6A0k+LBTBMIYjIULGKUGCv51d4EaPY06zeq/XLFrbkL4HXi5aSCcrT65a/eQNI0BmEoJ1r7npuYICPKMMphVuqlGhJCx2QIvqWCxKCDbHHyDF9YZYAjqWwJgxfq74mMxFpP49B2xsSM9Ko3F//z/NRE10HGRJIaEHS5KEo5NhLP/8cDpoAaPrWEUMXsrZiOiCLU2JRKNgRv9eV10qnXvEatfl+vNG/yOIroDJ2jS+ShK9REd6iF2ogiiZ7RK3pzjPPivDsfy9aCk8+coj9wPn8AnFqQzg==</latexit>



~x1<latexit sha1_base64="h3Dgwo2azxQRZ6c/6N2FJoOGn94=">AAAB8nicbVA9TwJBEN3DL8Qv1NJmI5hYkTsstCTaWGIiH8lxIXvLHmzY273szhHJhZ9hY6Extv4aO/+NC1yh4EsmeXlvJjPzwkRwA6777RQ2Nre2d4q7pb39g8Oj8vFJ26hUU9aiSijdDYlhgkvWAg6CdRPNSBwK1gnHd3O/M2HacCUfYZqwICZDySNOCVjJr/YmjGZPs75X7Zcrbs1dAK8TLycVlKPZL3/1BoqmMZNABTHG99wEgoxo4FSwWamXGpYQOiZD5lsqScxMkC1OnuELqwxwpLQtCXih/p7ISGzMNA5tZ0xgZFa9ufif56cQ3QQZl0kKTNLloigVGBSe/48HXDMKYmoJoZrbWzEdEU0o2JRKNgRv9eV10q7XvKta/aFeadzmcRTRGTpHl8hD16iB7lETtRBFCj2jV/TmgPPivDsfy9aCk8+coj9wPn8AmVCQzA==</latexit>

~x6<latexit sha1_base64="WJAozPMxNsXGtPkpJGXmDzSLgvU=">AAAB8nicbVBNTwIxEO3iF+IX6tFLI5h4IruYqEeiF4+YCJIsG9ItXWjotpt2lkg2/AwvHjTGq7/Gm//GAntQ8CWTvLw3k5l5YSK4Adf9dgpr6xubW8Xt0s7u3v5B+fCobVSqKWtRJZTuhMQwwSVrAQfBOolmJA4FewxHtzP/ccy04Uo+wCRhQUwGkkecErCSX+2OGc2epr3Laq9ccWvuHHiVeDmpoBzNXvmr21c0jZkEKogxvucmEGREA6eCTUvd1LCE0BEZMN9SSWJmgmx+8hSfWaWPI6VtScBz9fdERmJjJnFoO2MCQ7PszcT/PD+F6DrIuExSYJIuFkWpwKDw7H/c55pREBNLCNXc3orpkGhCwaZUsiF4yy+vkna95l3U6vf1SuMmj6OITtApOkceukINdIeaqIUoUugZvaI3B5wX5935WLQWnHzmGP2B8/kDoOmQ0Q==</latexit>

~x2<latexit sha1_base64="7JqHzxKaPt/AZk7nwhzEXjHZj+8=">AAAB8nicbVA9TwJBEJ3DL8Qv1NJmI5hYkbuz0JJoY4mJfCRwIXvLHmzY273s7hHJhZ9hY6Extv4aO/+NC1yh4EsmeXlvJjPzwoQzbVz32ylsbG5t7xR3S3v7B4dH5eOTlpapIrRJJJeqE2JNORO0aZjhtJMoiuOQ03Y4vpv77QlVmknxaKYJDWI8FCxiBBsrdau9CSXZ06zvV/vliltzF0DrxMtJBXI0+uWv3kCSNKbCEI617npuYoIMK8MIp7NSL9U0wWSMh7RrqcAx1UG2OHmGLqwyQJFUtoRBC/X3RIZjradxaDtjbEZ61ZuL/3nd1EQ3QcZEkhoqyHJRlHJkJJr/jwZMUWL41BJMFLO3IjLCChNjUyrZELzVl9dJy695VzX/wa/Ub/M4inAG53AJHlxDHe6hAU0gIOEZXuHNMc6L8+58LFsLTj5zCn/gfP4AmtWQzQ==</latexit>

~x7<latexit sha1_base64="MBB8xBrSJ7kAPH3CmW5+6zOALKY=">AAAB8nicbVA9TwJBEN3DL8Qv1NJmI5hYkTsssCTaWGIiYHJcyN4yBxv2di+7e0Ry4WfYWGiMrb/Gzn/jAlco+JJJXt6bycy8MOFMG9f9dgobm1vbO8Xd0t7+weFR+fiko2WqKLSp5FI9hkQDZwLahhkOj4kCEoccuuH4du53J6A0k+LBTBMIYjIULGKUGCv51d4EaPY06zeq/XLFrbkL4HXi5aSCcrT65a/eQNI0BmEoJ1r7npuYICPKMMphVuqlGhJCx2QIvqWCxKCDbHHyDF9YZYAjqWwJgxfq74mMxFpP49B2xsSM9Ko3F//z/NRE10HGRJIaEHS5KEo5NhLP/8cDpoAaPrWEUMXsrZiOiCLU2JRKNgRv9eV10qnXvKta/b5ead7kcRTRGTpHl8hDDdREd6iF2ogiiZ7RK3pzjPPivDsfy9aCk8+coj9wPn8Aom6Q0g==</latexit>


~x6<latexit sha1_base64="WJAozPMxNsXGtPkpJGXmDzSLgvU=">AAAB8nicbVBNTwIxEO3iF+IX6tFLI5h4IruYqEeiF4+YCJIsG9ItXWjotpt2lkg2/AwvHjTGq7/Gm//GAntQ8CWTvLw3k5l5YSK4Adf9dgpr6xubW8Xt0s7u3v5B+fCobVSqKWtRJZTuhMQwwSVrAQfBOolmJA4FewxHtzP/ccy04Uo+wCRhQUwGkkecErCSX+2OGc2epr3Laq9ccWvuHHiVeDmpoBzNXvmr21c0jZkEKogxvucmEGREA6eCTUvd1LCE0BEZMN9SSWJmgmx+8hSfWaWPI6VtScBz9fdERmJjJnFoO2MCQ7PszcT/PD+F6DrIuExSYJIuFkWpwKDw7H/c55pREBNLCNXc3orpkGhCwaZUsiF4yy+vkna95l3U6vf1SuMmj6OITtApOkceukINdIeaqIUoUugZvaI3B5wX5935WLQWnHzmGP2B8/kDoOmQ0Q==</latexit>

~x7<latexit sha1_base64="MBB8xBrSJ7kAPH3CmW5+6zOALKY=">AAAB8nicbVA9TwJBEN3DL8Qv1NJmI5hYkTsssCTaWGIiYHJcyN4yBxv2di+7e0Ry4WfYWGiMrb/Gzn/jAlco+JJJXt6bycy8MOFMG9f9dgobm1vbO8Xd0t7+weFR+fiko2WqKLSp5FI9hkQDZwLahhkOj4kCEoccuuH4du53J6A0k+LBTBMIYjIULGKUGCv51d4EaPY06zeq/XLFrbkL4HXi5aSCcrT65a/eQNI0BmEoJ1r7npuYICPKMMphVuqlGhJCx2QIvqWCxKCDbHHyDF9YZYAjqWwJgxfq74mMxFpP49B2xsSM9Ko3F//z/NRE10HGRJIaEHS5KEo5NhLP/8cDpoAaPrWEUMXsrZiOiCLU2JRKNgRv9eV10qnXvKta/b5ead7kcRTRGTpHl8hDDdREd6iF2ogiiZ7RK3pzjPPivDsfy9aCk8+coj9wPn8Aom6Q0g==</latexit>



…


Figure 3: The example of generating negative samples

mutual information with the binary cross-entropy loss of thediscriminator:

L(Pos,Neg,~s) =1

N +M

(N∑

n=1

EPos[logD(~hn, ~s)]

+

M∑m=1

ENeg[log(1−D(~hm, ~s))]

)(12)

In essence, the discriminator works to maximize the mutualinformation between a high-level global representation and lo-cal representations (node-level), which encourages the encoderto learn the information presenting in all globally relevantlocations. The information about a class label can be one ofthe cases. The above loss can be optimized through gradientdescent. The representations of nodes can be learned when theoptimization is completed.

2) Negative samples generator: The negative sample set{[~hm, ~s]

}M

m=1is composed of the samples that do not exist

in the heterogeneous graph. As our target is to maximize themutual information between positive nodes and the graph-level summary vector, the generated negative samples willaffect the structural information captured by the model. Inthis way, we need high-quality negative samples that keepthe structural information preciseness. We extend the negativesample generation approach proposed in [30] to heterogeneousgraph setting. In heterogeneous graph G, we have rich andcomplex structural information characterized by meta-pathbased adjacency matrices. Our negative samples generator:

X, {AΦ1 , AΦ2 , . . . , AΦP } = C(X, {AΦ1 , AΦ2 , . . . , AΦP })(13)

keeps all meta-path based adjacency matrices unchanged butshuffles the rows of the initial node feature matrix X , whichchanges the index of nodes to corrupt the node-level connec-tions among them. We provide a simple example to illustratethe procedure of generating negative samples in Figure 3.

V. EVALUATION

In this section, we evaluate the proposed HDGI frameworkin three real-world heterogeneous graphs. We first introduce

the datasets and experimental settings. Then we report themodel performance as compared to other state-of-the-art com-petitive methods. The evaluation results show the superiorityof our developed model.

A. Datasets

We evaluate the performance of HDGI on three heteroge-neous graphs and summarize their details in Table II.

Table II: Statistics of experimented datasets.

Dataset Node-type # Nodes Edge-type # Edges feature Meta-path

ACMPaper (P) 3025 Paper-Author 9744 1870 PAP

Author (A) 5835 Paper-Subject 3025 PSPSubject (S) 56

IMDB

Movie (M) 4275 Movie-Actor 12838 MAMActor (A) 5431 Movie-Director 4280 6344 MDM

Director (D) 2082 Movie-keyword 20529 MKMKeyword (K) 7313

DBLP

Author (A) 4057 Author-Paper 19645 APAPaper (P) 14328 Paper-Conference 14328 334 APCPA

Conference (C) 20 Paper-Term 88420 APTPATerm (T) 8789

• DBLP [8]: This is a research paper set, which con-tains scientific publications and the corresponding au-thors. The target author node can be divided into fourareas: database, data mining, information retrieval, andmachine learning. We use the area of authors as la-bels. The initial features are generated based on au-thors’ profiles with the bag-of-words embeddings. Themeta-paths we defined in DBLP are Author-Paper-Author(APA), Author-Paper-Conference-Paper-Author (APCPA),and Author-Paper-Term-Paper-Author (APTPA).

• ACM [33]: This is another academic paper data in whichtarget paper nodes are categorized into 3 classes: database,wireless communication, and data Mining. We extract 2meta-paths from this graph: Paper-Author-Paper (PAP) andPaper-Subject-Paper (PSP). The initial features are con-structed from paper keywords with the TF-IDF based em-bedding techniques.

• IMDB [34]: It is a knowledge graph data about movies(target nodes) categorized into three types: Action, Comedy,and Drama. The meta-paths we choose are Movie-Actor-Movie (MAM), Movie-Director-Movie (MDM), and Movie-Keyword-Movie (MKM). The feature of a movie is com-posed of {color, title, language, keywords, country, rating,year} with a TF-IDF encoding.

B. Experiment Setup

The most commonly used tasks to measure the quality oflearned representations are node classification [20], [9], [11]and node clustering [4], [33]. We evaluate HDGI from bothtwo types of tasks.

1) Compared Baselines: HDGI is compared with the fol-lowing supervised and unsupervised methods:Unsupervised methods• Raw Feature: The initial features are used as embeddings.• Metapath2vec (M2V) [4]: A meta-path based graph

embedding method for heterogeneous graph. We test allmeta-paths and report the best result.

6

• DeepWalk (DW) [20]: A random walk based graphembedding method, but it is designed to deal with ho-mogeneous graph.

• DeepWalk+Raw Feature(DW+F): It concatenates thelearned DeepWalk embeddings with the raw features asthe final representations.

• DGI [30]: A mutual information based graph represen-tation method for homogeneous graph.

• HDGI-C: The graph convolutional network is utilized tocapture local representations in HDGI .

• HDGI-A: This is another variant of HDGI , which usesgraph attention mechanism to learn local representations.

Supervised methods• GCN [15]: A semi-supervised methods for node classi-

fication on homogeneous graphs.• RGCN [23]: It performs representation learning on all

nodes labeled with entity types in heterogeneous graphs.• GAT [29]: GAT applies the attention mechanism in

homogeneous graphs for node classification.• HAN [33]: HAN employs node-level attention and

semantic-level attention to capture the information fromall meta-paths.

For methods designed for homogeneous graphs, i.e., Deep-Walk, DGI, GCN, GAT, we do not consider graph hetero-geneity and construct meta-path based adjacency matrix, thenwe report the best performance. We test all meta-paths forMetapath2vec and report the best result. For RGCN, becauseour task is to learn the representations of target-type nodes, thecross-entropy loss is calculated by the classification in target-type nodes only. In the node classification task, a training setis used to learn a simple classifier for unsupervised methods,while the supervised methods can output the result as end-to-end. For the node clustering, we will not use any label in thisunsupervised learning task and make comparison among allunsupervised learning methods.

2) Reproducibility: For the proposed HDGI , includ-ing HDGI-C and HDGI-A, we optimize the model withAdam [14]. The dimension of node-level representations inHDGI-C is set as 512, and the dimension of ~q is set as8. For HDGI-A, we set the dimension of node-level rep-resentations as 64, and the attention head is set as 4. Thedimension of ~q is set as 8 as well. We employ Pytorchto implement our model and conduct experiments in theserver with 4 GTX-1080ti GPUs. The detailed parametersof all other comparison methods are provided in the open-source codes: https://github.com/YuxiangRen/Heterogeneous-Deep-Graph-Infomax.

C. Performance Comparison

1) Node classification task: In the node classification task,we train a logistic regression classifier for unsupervised learn-ing methods, while the supervised methods output the classi-fication result as end-to-end models. We conduct the experi-ments with two different training set sizes (20% or 80% of fulldatasets). The sizes of the validation set and test set are fixed

at 10% of full datasets. For unsupervised methods, the datasetdivision is used for training the logistic regression classifier buthas no relationship with representation learning. To keep theresults stable, we repeat the classification process 10 times andreport the average Macro-F1 and Micro-F1 in Table ??. X, Aand Y in Table ?? denote initial features, the adjacency matrixand node labels, respectively. They are used to reflect therequired input of different methods. We observe that HDGI-Coutperforms all other unsupervised learning methods. Whencompeting with the supervised learning methods (designedfor homogeneous graphs like GCN and GAT), HDGI canperform much better. This observation proves that the typeand semantic information are critical and need to be handledcarefully instead of directly ignoring them in heterogeneousgraphs. The result of RGCN is suboptimal because the originalRGCN is a featureless approach, and we follow the code toassign a one-hot vector to each node.

In addition, the unified learning of all types of nodes inthe same latent space is beneficial to entity type classification.However, it may not be applicable to label classification. HDGIis also competitive with the result reported from HAN [33],which is designed for heterogeneous graphs. The reason shouldbe that HDGI can capture more global structural informa-tion when exploring the mutual information in reconstructingthe representation. At the same time, supervised loss basedGNNs overemphasize the direct neighborhoods [30]. On theother hand, this observation suggests that the features learnedthrough supervised learning in graph structures may havelimitations, either from the structure or a task-based prefer-ence. These limitations can damage the representations from amore general perspective. Both HDGI-C and HDGI-A achievestunning performance, which verifies the effectiveness of thegeneral framework of HDGI as well.

2) Node clustering task: In the node clustering task, weuse the K-Means to conduct the clustering based on thelearned representations. The number of clusters K is set asthe number of target node classes. We do not use any labelin this unsupervised learning task and make the comparisonamong all unsupervised learning methods. We also repeat theclustering process 10 times and report the average NMI andARI in Table IV. DeepWalk cannot perform well because theyare not able to handle the graph heterogeneity. Metapath2veccannot handle diversified semantic information simultaneously,which makes the representations not effective enough. Theverification based on node clustering tasks also demonstratesthat HDGI can learn effective representations by consideringthe structural information, the semantic information, and thenode independent content simultaneously.

3) HDGI-A vs HDGI-C: From the comparison betweenHDGI-C and HDGI-A in node classification tasks, the resultsreflect some interesting findings. HDGI-C achieves betterperformance than HDGI-A in all experiments, which meansthat the graph convolution works better than the attentionmechanism in capturing local network structures. The reasonmight be that the graph attention mechanism is strictly limitedto the direct neighbors of nodes. The graph convolution

7

https://github.com/YuxiangRen/Heterogeneous-Deep-Graph-Infomax

https://github.com/YuxiangRen/Heterogeneous-Deep-Graph-Infomax

Table III: The results of node classification tasks

Available data X A X, A, Y X, A

Dataset Train Metric Raw M2V DW GCN RGCN GAT HAN DW+F DGI HDGI-A HDGI-C

ACM

20% Micro-F1 0.8590 0.6125 0.5503 0.9250 0.5766 0.9178 0.9267 0.8785 0.9104 0.9178 0.9227

Macro-F1 0.8585 0.6158 0.5582 0.9248 0.5801 0.9172 0.9268 0.8789 0.9104 0.9170 0.9232

80% Micro-F1 0.8820 0.6378 0.5788 0.9317 0.5939 0.9250 0.9400 0.8965 0.9175 0.9333 0.9379

Macro-F1 0.8802 0.6390 0.5825 0.9317 0.5918 0.9248 0.9403 0.8960 0.9155 0.9330 0.9379

DBLP

20% Micro-F1 0.7552 0.6985 0.2805 0.8192 0.1932 0.8244 0.8992 0.7163 0.8975 0.9062 0.9175

Macro-F1 0.7473 0.6874 0.2302 0.8128 0.2132 0.8148 0.8923 0.7063 0.8921 0.8988 0.9094

80% Micro-F1 0.8325 0.8211 0.3079 0.8383 0.2175 0.8540 0.9100 0.7860 0.9150 0.9192 0.9226

Macro-F1 0.8152 0.8014 0.2401 0.8308 0.2212 0.8476 0.9055 0.7799 0.9052 0.9106 0.9153

IMDB

20% Micro-F1 0.5112 0.3985 0.3913 0.5931 0.4350 0.5985 0.6077 0.5262 0.5728 0.5482 0.5893

Macro-F1 0.5107 0.4012 0.3888 0.5869 0.4468 0.5944 0.6027 0.5293 0.5690 0.5522 0.5914

80% Micro-F1 0.5900 0.4203 0.3953 0.6467 0.4476 0.6540 0.6600 0.6017 0.6003 0.5861 0.6592

Macro-F1 0.5884 0.4119 0.4001 0.6457 0.4527 0.6550 0.6586 0.6049 0.5950 0.5834 0.6646

“X” learning with initial features. “A” learning with the adjacency matrix. “Y” learning with node labels.

Table IV: Evaluation results on the node clustering task

Data ACM DBLP IMDB

Method NMI ARI NMI ARI NMI ARI

DeepWalk 25.47 18.24 7.40 5.30 1.23 1.22Raw Feature 32.62 30.99 11.21 6.98 1.06 1.17DeepWalk+F 32.54 31.20 11.98 6.99 1.23 1.22Metapath2vec 27.59 24.57 34.30 37.54 1.15 1.51DGI 41.09 34.27 59.23 61.85 0.56 2.6

HDGI-A 57.05 50.86 52.12 49.86 0.8 1.29HDGI-C 54.35 49.48 60.76 62.67 1.87 3.7

considering hierarchical dependencies can see farther. Theresults of the clustering task can also verify this analysis.

4) Different global representation encoder functions: Wepresent the results of HDGI-C with different global representa-tion encoder functions working on the node classification taskin Figure 4. The simple average function performs the best.However, we can find that this advantage is very subtle. In fact,each function can perform well on experimental datasets. Nev-ertheless, for larger and more complex heterogeneous graphs,a specified and sophisticated function may perform better. Thedesign of the global encoder function for heterogeneous graphswith different scales and structures is an open question, whichis worthy of further discussion.

VI. CONCLUSION

In this paper, we propose an unsupervised graph neuralnetwork, HDGI , which learns node representations in het-erogeneous graphs. HDGI combines several state-of-the-arttechniques. It employs convolution style GNNs along with asemantic-level attention mechanism to capture individual localrepresentations of nodes. Through maximizing the local-globalmutual information, HDGI learns high-level representationscontaining graph-level structural information. It exploits thestructure of meta-path to model the connection semantics in

IMDB DBLP ACMDataset

0.5

0.6

0.7

0.8

0.9

1.0

Mac

ro-F

1

averagemax_poolingset2vec

(a) Macro-F1

IMDB DBLP ACMDataset

0.5

0.6

0.7

0.8

0.9

1.0

Micr

o-F1

averagemax_poolingset2vec

(b) Micro-F1

Figure 4: The comparison between different global represen-tation encoder functions

heterogeneous graphs. Node attributes are fused into repre-sentations through the local-global mutual information max-imization simultaneously. We demonstrate the effectivenessof learned representations for both node classification andclustering tasks on three heterogeneous graphs. HDGI isincredibly competitive in node classification tasks with state-of-the-art supervised methods, where they have the additionalsupervised label information. We are optimistic that mutual

8

information maximization is a promising future direction forunsupervised representation learning.

REFERENCES

[1] Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, SherjilOzair, Yoshua Bengio, Aaron Courville, and R Devon Hjelm. Mine:mutual information neural estimation. ICML, 2018.

[2] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spec-tral networks and locally connected networks on graphs. In arXivpreprint arXiv:1312.6203, 2013.

[3] Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. A survey on networkembedding. IEEE Transactions on Knowledge and Data Engineering,31(5):833–852, 2018.

[4] Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. metap-ath2vec: Scalable representation learning for heterogeneous networks.In Proceedings of the 23rd ACM SIGKDD international conference onknowledge discovery and data mining, pages 135–144. ACM, 2017.

[5] Alberto Garcia Duran and Mathias Niepert. Learning graph representa-tions with embedding propagation. In Advances in neural informationprocessing systems, pages 5119–5130, 2017.

[6] Matthias Fey, Jan Eric Lenssen, Frank Weichert, and Heinrich Muller.Splinecnn: Fast geometric deep learning with continuous b-spline ker-nels. In Proceedings of the IEEE Conference on Computer Vision andPattern Recognition, pages 869–877, 2018.

[7] Tao-yang Fu, Wang-Chien Lee, and Zhen Lei. Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning.In Proceedings of the 2017 ACM on Conference on Information andKnowledge Management, pages 1797–1806. ACM, 2017.

[8] Jing Gao, Feng Liang, Wei Fan, Yizhou Sun, and Jiawei Han. Graph-based consensus maximization among multiple supervised and unsuper-vised models. In Advances in Neural Information Processing Systems,pages 585–593, 2009.

[9] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learningfor networks. In Proceedings of the 22nd ACM SIGKDD internationalconference on Knowledge discovery and data mining, pages 855–864.ACM, 2016.

[10] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representationlearning on large graphs. In Advances in Neural Information ProcessingSystems, pages 1024–1034, 2017.

[11] William L. Hamilton, Rex Ying, and Jure Leskovec. Representationlearning on graphs: Methods and applications. In IEEE Data Engineer-ing Bulletin, 2017.

[12] R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Gre-wal, Phil Bachman, Adam Trischler, and Yoshua Bengio. Learning deeprepresentations by mutual information estimation and maximization. InInternational Conference on Learning Representation, 2019.

[13] Wenbing Huang, Tong Zhang, Yu Rong, and Junzhou Huang. Adaptivesampling towards fast graph representation learning. In Advances inNeural Information Processing Systems, pages 4558–4567, 2018.

[14] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochasticoptimization. In International Conference on Learning Representation,2015.

[15] T. N. Kipf and M. Welling. Semi-supervised classification with graphconvolutional networks. In International Conference on LearningRepresentation, 2017.

[16] T. N. Kipf and M. Welling. Variational graph auto-encoders. InInternational Conference on Learning Representation, 2017.

[17] Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gatedgraph sequence neural networks. In ICLR, 2016.

[18] Manjunath Kudlur Oriol Vinyals, Samy Bengio. Order matters: Se-quence to sequence for sets. In International Conference on LearningRepresentation, 2016.

[19] Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu.Asymmetric transitivity preserving graph embedding. In Proceedingsof the 22nd ACM SIGKDD international conference on Knowledgediscovery and data mining, pages 1105–1114. ACM, 2016.

[20] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Onlinelearning of social representations. In Proceedings of the 20th ACMSIGKDD international conference on Knowledge discovery and datamining, pages 701–710. ACM, 2014.

[21] Yuxiang Ren and Jiawei Zhang. Hgat: Hierarchical graph attentionnetwork for fake news detection. arXiv, pages arXiv–2002, 2020.

[22] Yuxiang Ren, Hao Zhu, Jiawei ZHang, Peng Dai, and Liefeng Bo.Ensemfdet: An ensemble approach to fraud detection based on bipartitegraph. arXiv preprint arXiv:1912.11113, 2019.

[23] Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne VanDen Berg, Ivan Titov, and Max Welling. Modeling relational data withgraph convolutional networks. In European Semantic Web Conference,pages 593–607. Springer, 2018.

[24] Claude E Shannon. A mathematical theory of communication. Bellsystem technical journal, 27(3):379–423, 1948.

[25] Fan-Yun Sun, Jordan Hoffmann, and Jian Tang. Infograph: Unsupervisedand semi-supervised graph-level representation learning via mutualinformation maximization. arXiv preprint arXiv:1908.01000, 2019.

[26] Ke Sun, Zhouchen Lin, and Zhanxing Zhu. Adagcn: Adaboost-ing graph convolutional networks into deep models. arXiv preprintarXiv:1908.05081, 2019.

[27] Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu.Pathsim: Meta path-based top-k similarity search in heterogeneousinformation networks. Proceedings of the VLDB Endowment, 4(11):992–1003, 2011.

[28] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, andQiaozhu Mei. Line: Large-scale information network embedding. InProceedings of the 24th international conference on world wide web,pages 1067–1077. International World Wide Web Conferences SteeringCommittee, 2015.

[29] Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero,Pietro Lio, and Yoshua Bengio. Graph attention networks. arXiv preprintarXiv:1710.10903, 2017.

[30] Petar Velickovic, William Fedus, William L Hamilton, Pietro Lio,Yoshua Bengio, and R Devon Hjelm. Deep graph infomax. InternationalConference on Learning Representation, 2019.

[31] Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. Knowledge graphembedding: A survey of approaches and applications. IEEE Transactionson Knowledge and Data Engineering, 29(12):2724–2743, 2017.

[32] Xiao Wang, Peng Cui, Jing Wang, Jian Pei, Wenwu Zhu, and ShiqiangYang. Community preserving network embedding. In Thirty-First AAAIConference on Artificial Intelligence, 2017.

[33] Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui,and Philip S Yu. Heterogeneous graph attention network. In The WorldWide Web Conference, pages 2022–2032. ACM, 2019.

[34] Chao-Yuan Wu, Alex Beutel, Amr Ahmed, and Alexander J Smola.Explaining reviews and ratings with paco: Poisson additive co-clustering.In WWW, pages 127–128. ACM, 2016.

[35] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. Howpowerful are graph neural networks? arXiv preprint arXiv:1810.00826,2018.

[36] Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and EdwardChang. Network representation learning with rich text information. InInternational Joint Conference on Artificial Intelligence, 2015.

[37] Jiaxuan You, Rex Ying, Xiang Ren, William L Hamilton, and JureLeskovec. Graphrnn: Generating realistic graphs with deep auto-regressive models. arXiv preprint arXiv:1802.08773, 2018.

[38] Chuxu Zhang, Ananthram Swami, and Nitesh V. Chawla. Shne: Rep-resentation learning for semantic-associated heterogeneous networks.In Proceedings of the Twelfth ACM International Conference on WebSearch and Data Mining, WSDM ’19, pages 690–698. ACM, 2019.

[39] Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. Collectiveclassification via discriminative matrix factorization on sparsely labelednetworks. In Proceedings of the 25th ACM International on Conferenceon Information and Knowledge Management, pages 1563–1572. ACM,2016.

[40] Jiawei Zhang. Social network fusion and mining: a survey. arXivpreprint arXiv:1804.09874, 2018.

9

heterogeneous deep graph infomax - arxiv

Documents