deep hyperedges: a framework for transductive and ...stanford.edu/~joshp007/deep_hyperedges.pdf ·...

10
Deep Hyperedges: a Framework for Transductive and Inductive Learning on Hypergraphs Josh Payne IBM T. J. Watson Research Center Yorktown Heights, NY 10598 [email protected] Abstract From social networks to protein complexes to disease genomes to visual data, hypergraphs are everywhere. However, the scope of research studying deep learning on hypergraphs is still quite sparse and nascent, as there has not yet existed an effective, unified framework for using hyperedge and vertex embeddings jointly in the hypergraph context, despite a large body of prior work that has shown the utility of deep learning over graphs and sets. Building upon these recent advances, we propose Deep Hyperedges (DHE), a modular framework that jointly uses contextual and permutation-invariant vertex membership properties of hyperedges in hypergraphs to perform classification and regression in transductive and inductive learning settings. In our experiments, we use a novel random walk procedure and show that our model achieves and, in most cases, surpasses state-of-the-art performance on benchmark datasets. Additionally, we study our framework’s performance on a variety of diverse, non-standard hypergraph datasets and propose several avenues of future work to further enhance DHE. 1 Introduction and Background As data becomes more plentiful, we find ourselves with access to increasingly complex networks that can be used to express many different types of information. Hypergraphs have been used to recommend music [1], model cellular and protein-protein interaction networks [2, 3], classify images and perform 3D object recognition [4, 5], diagnose Alzheimer’s disease [6], analyze social relationships [7, 8], and classify gene expression [9]—the list goes on. However, despite the clear utility of hypergraphs as expressive objects, the field of research studying the application of deep learning to hypergraphs is still emerging. Certainly, there is a much larger and more mature body of research studying the application of deep learning to graphs, and the first notions of graph neural networks were introduced merely just over a decade ago by Gori et al. in [10] and Scarselli et al. in [11]. But despite its nascency, the field of deep learning on hypergraphs has seen a rise in attention from researchers recently, which we investigate in Section 2. In 2017, Zaheer et al. [12] proposed a novel architecture, DeepSets, for performing deep machine learning tasks on permutation-invariant and equivariant objects. This work was extended and generalized by Hartford et al. [13] to describe and study interactions between multiple sets. In this work, we go a step further, proposing a novel framework that utilizes ideas from context- based graph embedding approaches and permutation-invariant learning to perform transductive and inductive inference on hypergraphs—sets of sets with underlying contextual graph structure. This framework can be used for classification and regression of vertices and hyperedges alike; in this work, we will focus on the classification of hyperedges. Preprint. Under review.

Upload: others

Post on 16-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Hyperedges: a Framework for Transductive and ...stanford.edu/~joshp007/Deep_Hyperedges.pdf · 1 Introduction and Background As data becomes more plentiful, we find ourselves

Deep Hyperedges: a Framework for Transductiveand Inductive Learning on Hypergraphs

Josh PayneIBM T. J. Watson Research Center

Yorktown Heights, NY [email protected]

Abstract

From social networks to protein complexes to disease genomes to visual data,hypergraphs are everywhere. However, the scope of research studying deep learningon hypergraphs is still quite sparse and nascent, as there has not yet existed aneffective, unified framework for using hyperedge and vertex embeddings jointlyin the hypergraph context, despite a large body of prior work that has shown theutility of deep learning over graphs and sets. Building upon these recent advances,we propose Deep Hyperedges (DHE), a modular framework that jointly usescontextual and permutation-invariant vertex membership properties of hyperedgesin hypergraphs to perform classification and regression in transductive and inductivelearning settings. In our experiments, we use a novel random walk procedureand show that our model achieves and, in most cases, surpasses state-of-the-artperformance on benchmark datasets. Additionally, we study our framework’sperformance on a variety of diverse, non-standard hypergraph datasets and proposeseveral avenues of future work to further enhance DHE.

1 Introduction and Background

As data becomes more plentiful, we find ourselves with access to increasingly complex networksthat can be used to express many different types of information. Hypergraphs have been usedto recommend music [1], model cellular and protein-protein interaction networks [2, 3], classifyimages and perform 3D object recognition [4, 5], diagnose Alzheimer’s disease [6], analyze socialrelationships [7, 8], and classify gene expression [9]—the list goes on. However, despite the clearutility of hypergraphs as expressive objects, the field of research studying the application of deeplearning to hypergraphs is still emerging. Certainly, there is a much larger and more mature body ofresearch studying the application of deep learning to graphs, and the first notions of graph neuralnetworks were introduced merely just over a decade ago by Gori et al. in [10] and Scarselli et al. in[11]. But despite its nascency, the field of deep learning on hypergraphs has seen a rise in attentionfrom researchers recently, which we investigate in Section 2.

In 2017, Zaheer et al. [12] proposed a novel architecture, DeepSets, for performing deep machinelearning tasks on permutation-invariant and equivariant objects. This work was extended andgeneralized by Hartford et al. [13] to describe and study interactions between multiple sets.

In this work, we go a step further, proposing a novel framework that utilizes ideas from context-based graph embedding approaches and permutation-invariant learning to perform transductive andinductive inference on hypergraphs—sets of sets with underlying contextual graph structure. Thisframework can be used for classification and regression of vertices and hyperedges alike; in this work,we will focus on the classification of hyperedges.

Preprint. Under review.

Page 2: Deep Hyperedges: a Framework for Transductive and ...stanford.edu/~joshp007/Deep_Hyperedges.pdf · 1 Introduction and Background As data becomes more plentiful, we find ourselves

1.1 Graphs and Hypergraphs

A hypergraph H = (V,E) is comprised of a finite set of vertices V = {v1, v2, . . . , vn} and a setof hyperedges E = {e1, e2, . . . , em} ⊆ 2V . We consider connected hypergraphs with |V | ≥ 2.The dual, H′, of a hypergraph H is a hypergraph such that the hyperedges and vertices of H′ aregiven by the vertices and hyperedges of H, respectively. That is, a vertex v′i of H′ is a member ofa hyperedge e′j ofH′ if and only if its corresponding hyperedge ei ofH contained the vertex vj ofH corresponding to e′j . Note that (H′)′ = H. A graph G is a hypergraph where |ei| = 2 for eachei ∈ E. Graphs are well-studied in the field of machine learning, but are not capable completelyrepresenting the information captured in hypergraphs generally. Ihler et al. showed in [14], forinstance, that if |V | ≥ 4, there does not exist a representation of a hypergraph by a graph with thesame cut properties in the general case. This fact has practical implications as well: for instance,one cannot represent the protein complexes in the CORUM dataset [15] using pairwise relationships,as the proteins in a complex may not interact with each other independently. These hyperedges aresaid to be indecomposable [16]. Hence, while the theory built for studying graphs can be utilized tosome capacity in the hypergraph context, there is a need for learning procedures that can effectivelygeneralize to hypergraphs.

1.2 Transductive and Inductive Learning

In transductive (or semi-supervised) inference tasks, one often seeks to learn from a small amountof labeled training data, where the model has access to labeled and unlabeled data at training time.Formally, we have training instances {xi}n1 where {xi}l1 are labeled instances and {xi}l+ul+1 unlabeledinstances, and corresponding labels yi in {yi}n1 . Our aim is to learn a function F : {xi}n1 →{yi}n1 ; xi 7→ yi. Typically, in the case of transductive learning on graphs and hypergraphs, weseek to leverage topological information to represent the vertices in some continuous vector spaceRd by embeddings which capture the vertices’ or hyperedges’ context (homophily and structuralequivalence). As such, in the pre-training procedure for finding vertex embeddings, we want to findan embedding function Φ : V → Rd that maximizes the likelihood of observing a vertex in thesampled neighborhood N(v) of v given Φ(v):

maxΦ

∑v∈V

logP(N(v) | Φ(v)) (1)

The procedure for finding contextual embeddings for hyperedges is similar. Once we’ve learned Φ,we can use the embeddings Φ(v) to learn F in our transductive learning procedure.

In inductive (or supervised) inference tasks, we are given a training sample {xi}n1 ⊆ X to be seenby our model, and we want to learn a function g : X → Y ; xi 7→ yi that can generalize to unseeninstances. This type of learning is particularly useful for dynamic graphs and hypergraphs, when wemay find unseen vertices as time passes, or when we want to apply transfer learning to new graphsand hypergraphs altogether. Here, the representation learning function Φ is typically dependent onthe input features f(v) for a vertex v.

Several approaches have been proposed for each of these learning paradigms. Each has its uses, andeach can be applied using this framework. However, for testing purposes, we focus primarily on thetransductive case, leaving a quantitative investigation of the inductive case in this framework to afuture work.

2 Related Work

In 2013, Mikolov et al. proposed an unsupervised learning procedure, skip-gram, in [17] which usesnegative sampling to create context-based embeddings for terms given sentences in a corpus of text.This procedure was used by Perozzi et al. in DeepWalk [18] to treat random walks on a graph as“sentences" and vertices as “terms", which outperformed existing spectral clustering and weightedvote-based relational neighbor classifiers [19, 20, 21, 22, 23]. Similar random walk-based approachesfollowed, such as random walks with bias parameters [24], methods that utilize network attributes asadditional features [25, 26, 27] and approaches that could be extended to perform inductive learning[25, 28]. Graph convolutions were formally defined by Bruna et al. [29] and elaborated upon by Kipfand Welling in [30], who proposed graph convolutional networks. Graph convolutions have also been

2

Page 3: Deep Hyperedges: a Framework for Transductive and ...stanford.edu/~joshp007/Deep_Hyperedges.pdf · 1 Introduction and Background As data becomes more plentiful, we find ourselves

Figure 1: Architectural overview for the Deep Hyperedges model. On the left, we have a diagramillustrating the dual transformation for a hypergraph H. On the right, we have an outline of thetransductive learning procedure.

used in variational graph autoencoding in [31]. Each of these approaches are limited to the graphdomain, but are relevant to the proposed framework.

Deep learning on sets has been studied in [32, 12] in the permutation invariant and permutationequivariant cases. Particularly, these architectures employ a linear transformation (adding up therepresentations) was between layers of nonlinear transformations to learn continuous representationsof permutation invariant objects, which we utilize in this work. Multiset learning has been studiedin [13] but these methods do not directly employ the context-based inference that can be drawnfrom the hypergraph structure. Further, methods that make use of membership exclusively, suchas PointNet and PointNet++ [33, 34], and those that make use of the hypergraph structure, such asHGNNs [8], have been compared, and in the hypergraph context, it is typically better to includecontextual information at inference time.

Hypergraph learning is lesser-studied, but has seen a variety of approaches, nonetheless. In 2007,Zhou et al. proposed methods for hypergraph clustering and embedding in [35], but these methodsincur high computational and space complexity. Random walks on hypergraphs have been established,and have likewise been demonstrated as useful in inference tasks [36, 37, 38], but these methods donot directly account for the set membership and contextual properties of hyperedges simultaneouslyand efficiently. Very recently, hypergraph convolution and attention approaches have been proposed[8, 39] which define a hypergraph Laplacian matrix and perform convolutions on this matrix. Ourframework specifically uses a random-walk based model for their own benefits such as parallelizability,scalability, accomodation of inductive learning, and baseline comparisons between random walkmodels in the graph domain and our own random walk procedure, but convolution approaches couldconceivably be integrated into this framework and this line of exploration is left to future work.

3 Deep Hyperedges

We propose Deep Hyperedges (DHE), a framework that utilizes context and set membership toperform transductive and inductive learning on hypergraphs. We will be focusing on classification ofhyperedges, but recall that the hyperedges of the dualH′ of a hypergraphH correspond to the edgesofH, so vertex classification may be approached similarly. While DHE is conducive to random walk,spectral, and convolutional approaches for the vertex and hyperedge embedding steps, we investigaterandom walk approach for the sake of parallelizability and scalability, as mentioned in DeepWalk[18], and inductive inference, as mentioned in GraphSAGE [28]. Deep hyperedges is composed offour main learning stages: vertex and hyperedge random walk generation, embedding of vertices andhyperedges, contextual and permutation-invariant representation learning, and cumulative inference.

3

Page 4: Deep Hyperedges: a Framework for Transductive and ...stanford.edu/~joshp007/Deep_Hyperedges.pdf · 1 Introduction and Background As data becomes more plentiful, we find ourselves

3.1 Random Walks and Embeddings

To test this framework using random walk models, we proposed a simple new random walk modelfor hypergraphs that seeks to represent each vertex in a way that places more weight on capturingco-member information in its hyperedge. Currently proposed random walk models on hypergraphstraverse to new hyperedges with each vertex selection, as is done in most graph random walk models,meaning that the random walk will jump around very rapidly. To combat this, we propose Subsampleand Traverse (SaT) Walks. In this procedure, we start at a vertex vm in a selected hyperedge ei.We define the probability of traversing to be inversely proportional to the cardinality of the currenthyperedge; that is, p = min( α

|ei| + β, 1), where α, β ≥ 0 are tunable parameters. The expectationof samples the walk will draw from a given hyperedge is geometric, and thus influenced by thecardinality of the hyperedge itself. Using SaT Walks, we can construct random walks of vertices inthe hypergraph for embedding in the next stage. Likewise, we may perform SaT Walks on H′ toembed the hyperedges of H. For ease of notation, we refer to these walks as Traverse and Select(TaS) Walks. Using TaS Walks, we can construct random walks of hyperedges in the hypergraphfor contextual embedding in the next stage. Additionally, one could easily define in-out and returnparameters for TaS Walks (à la node2vec [24]) for controlling the degree of homophily vs. structuralequivalence representation (by inclining the search strategy toward BFS or DFS) when embeddingthe hyperedges in skip-gram.

We employ the skip-gram technique proposed in [17] and used in [18] to create context-basedembeddings Φ(v),Φ(e) of the vertices and hyperedges after the random walks have been generated.These works explain the procedure in detail.

3.2 Contextual and Permutation-Invariant Representation Learning

At this stage, we employ two distinct networks. We have two objectives: 1) we would like to learn arepresentation of each hyperedge that captures its contextual information, and 2) we’d also like tocreate a representation of each hyperedge that captures the membership of its vertices in a mannerthat is invariant to permutation.

To address the first objective, we construct a context network (c-network) that applies j hiddenlayers h to the hyperedge embedding Φ(e) to output a learned contextual representation hj(Φ(ei)) =c(ei). The second objective requires a linear transformation at a hidden layer for permutationinvariance. We use the φ and ρ networks of the DeepSets architecture. The membership representationnetwork (m-network) takes as input Φ(v) for each v ∈ ei, apply a nonlinear transformation to theseinputs individually (forming our φ network), add up the representations to obtain

∑v∈ei

φ(Φ(v)), and

finally apply k hidden layers to this representation (forming our ρ network) to obtain a membershiprepresentation m(ei) of the hyperedge ei that is invariant to permutation. We then concatenate therepresentations output by the c-network and m-network and apply l hidden layers and a final softmaxlayer: softmax(hl(c(ei) ‖m(ei))). Optionally, we apply m hidden layers to the input feature vectorf(ei) to obtain hm(f(ei)), and our formulation would then be

softmax(

hl (c(ei) ‖m(ei) ‖ hm (f(ei))))

(2)

This is the formulation shown in Figure 1. For the inductive formulation, we want to learn a functionthat can generalize to unseen vertices. Hence, this function will only depend initially on an inputfeature vector f(ei) and a uniform sampling and pooling aggregation procedure that arises naturallywith SaT Walks and TaS Walks (à la GraphSAGE [28]). The forward propagation algorithm forembedding generation is described in detail this work, and with these embeddings, we proceed asbefore.

4 Evaluation

4.1 Datasets

We tested DHE on several standard and non-standard datasets for experimentation. Because farmore literature exists on graph deep learning than does on hypergraph deep learning, a typical

4

Page 5: Deep Hyperedges: a Framework for Transductive and ...stanford.edu/~joshp007/Deep_Hyperedges.pdf · 1 Introduction and Background As data becomes more plentiful, we find ourselves

Figure 2: t-SNE [40] embedding visualizations comparing the our hyperedge embedding procedure(without additional features), DANE, VGAE, and DeepWalk.

Table 1: Experimental results on the Cora and PubMed datasets.

Cora PubMed

10% 30% 50% 10% 30% 50%

Method Micro Macro Micro Macro Micro Macro Micro Macro Micro Macro Micro Macro

DW 0.757 0.750 0.806 0.794 0.829 0.818 0.805 0.787 0.817 0.803 0.816 0.803N2V 0.748 0.726 0.820 0.812 0.824 0.816 0.803 0.785 0.811 0.797 0.810 0.798GraRep 0.757 0.744 0.793 0.789 0.800 0.792 0.795 0.779 0.803 0.790 0.805 0.794LINE 0.734 0.719 0.812 0.811 0.835 0.825 0.804 0.789 0.813 0.801 0.811 0.799

TADW 0.751 0.723 0.801 0.780 0.835 0.825 0.836 0.834 0.859 0.858 0.864 0.863ANE 0.720 0.715 0.803 0.791 0.812 0.799 0.798 0.788 0.826 0.819 0.828 0.820GAE 0.769 0.757 0.806 0.792 0.810 0.799 0.829 0.824 0.831 0.826 0.831 0.826VGAE 0.789 0.774 0.805 0.791 0.812 0.799 0.830 0.824 0.835 0.829 0.836 0.830SAGE 0.763 0.748 0.810 0.800 0.815 0.808 0.817 0.809 0.825 0.816 0.827 0.818DANE 0.787 0.775 0.828 0.813 0.850 0.838 0.861 0.858 0.873 0.871 0.878 0.875

DHE 0.790 0.771 0.830 0.818 0.842 0.833 0.870 0.868 0.887 0.886 0.894 0.891

benchmarking approach is to take a graph dataset, represent the data in a way that creates hyperedgeswith cardinality greater than two (without modifying the data itself), and compare with graph-basedlearning benchmarks. For this phase of the evaluation, we used the Cora and PubMed datasets,citation networks in which vertices are papers, edges are citations, and each paper is labeled as beingone of 7 or 3 fairly balanced classes, respectively. The hypergraph extension involves creating ahyperedge for each paper where the paper is the centroid and its 1-neighborhood forms the hyperedge.Each vertex also has a feature vector, which is a publication content representation in F1433

2 and R500

for the Cora and PubMed datasets, respectively. As the mapping between papers and hyperedges isbijective, it remains to classify the hyperedge for the inference task.

We also tested this model on several other datasets, including a protein complex network, social groupmeetup network, and disease genomics network. These are described and evaluated as supplementarymaterial in Appendix A.

4.2 Implementation and Testing

To evaluate DHE on the benchmark datasets, we compare to several state-of-the-art approaches (inorder): DeepWalk by Perozzi et al., 2014 [18], node2vec by Grover et al., 2016 [24], GraRep byCao et al., 2015 [41], and LINE by Tang et al., 2015 [42], which do not use network attributes; andTADW by Yang et al., 2015 [27], ANE by Huang et al., [20], Graph Auto-Encoder (GAE) by Kipfand Welling, 2016 [31], Variational Graph Auto-Encoder (VGAE) by Kipf and Welling, 2016 [31],GraphSAGE by Hamilton et al., 2017 [28], and DANE by Gao and Huang, 2018 [26], which douse network attributes. We benchmark against micro-F1 and macro-F1 scores for both datasets withrandomly sampled train:test splits of 10:90, 30:70, and 50:50, as reported by [26] (none is used forvalidation). We’ve also explored comparisons between DHE and its individual components, DeepSets+ SaT Walks and Multilayer Perceptron + TaS Walks, more broadly in Appendix A.

5

Page 6: Deep Hyperedges: a Framework for Transductive and ...stanford.edu/~joshp007/Deep_Hyperedges.pdf · 1 Introduction and Background As data becomes more plentiful, we find ourselves

We use 25 random walks of length 25 for each hyperedge and vertex and embed each hyperedge andvertex into R128 and R16, respectively, using skip-gram. In the m-network, each component of the φnetwork has two hidden layers of 100 neurons each with a dropout rate of 0.5 to prevent overfitting[43], and a tanh activation function as suggested in [12]. Our ρ network has a final fully connectedlayer of 30 neurons. Meanwhile, our c-network has a hidden layer of 100 neurons with a ReLUactivation function [44] and a dropout rate of 0.3. Finally, a feature network is utilized which has twodensely-connected hidden layer of 100 neurons each, a dropout rate of 0.3, and a ReLU activationfunction. The outputs of these networks are concatenated and the final hidden layer is a 100-neurondensely connected layer with a dropout rate of 0.4 which connects directly to the output softmaxlayer. We use stochastic gradient descent as our optimization algorithm and categorical cross-entropyloss. The scores reported are the average of 5 runs for each test. Our model scales well and has beentested on hyperedges of cardinalities of up to 1000 while maintaining low epoch runtime (less than 30seconds per epoch on a CPU). DHE outperforms the other methods in most cases, with the additional,unique advantage that it can be extended to hypergraphs.

5 Conclusion

In this work, we’ve proposed Deep Hyperedges, a framework that jointly uses contextual andpermutation-invariant vertex membership properties of hyperedges in hypergraphs to perform classifi-cation and regression in transductive and inductive learning settings. We’ve also proposed a randomwalk model that can be used for obtaining embeddings of hyperedges and vertices (hyperedges of thedual of the hypergraph) for testing with the framework. With these, demonstrated that DHE achievesand, in most cases, surpasses state-of-the-art performance on benchmark graph datasets and explorednonstandard hypergraph datasets in Appendix A. We’ve finally identified several exciting avenues offuture work, including deeper exploration into the inductive learning case, tuning of random walkparameters, and usage of convolutional models in the hyperedge embedding step. One aim of thisframework is to provide flexibility in allowing other approaches to be integrated, and in doing so,encourage collaborative efforts towards cracking the hypergraph learning problem.

References[1] J. Bu, S. Tan, C. Chen, C. Wang, H. Wu, L. Zhang, and X. He, “Music recommendation by

unified hypergraph: combining social media information and music content,” in Proceedings ofthe 18th ACM international conference on Multimedia. ACM, 2010, pp. 391–400.

[2] S. Klamt, U.-U. Haus, and F. Theis, “Hypergraphs and cellular networks,” PLoS computationalbiology, vol. 5, no. 5, p. e1000385, 2009.

[3] J. Lugo-Martinez and P. Radivojac, “Classification in biological networks with hypergraphletkernels,” arXiv preprint arXiv:1703.04823, 2017.

[4] J. Yu, D. Tao, and M. Wang, “Adaptive hypergraph learning and its application in imageclassification,” IEEE Transactions on Image Processing, vol. 21, no. 7, pp. 3262–3272, 2012.

[5] Y. Gao, M. Wang, D. Tao, R. Ji, and Q. Dai, “3-d object retrieval and recognition with hypergraphanalysis,” IEEE Transactions on Image Processing, vol. 21, no. 9, pp. 4290–4303, 2012.

[6] M. Liu, J. Zhang, P.-T. Yap, and D. Shen, “View-aligned hypergraph learning for alzheimer’sdisease diagnosis with incomplete multi-modality data,” Medical image analysis, vol. 36, pp.123–134, 2017.

[7] D. Yang, B. Qu, J. Yang, and P. Cudre-Mauroux, “Revisiting user mobility and social relation-ships in lsbns: a hypergraph embedding approach,” in The World Wide Web Conference. ACM,2019, pp. 2147–2157.

[8] Y. Feng, H. You, Z. Zhang, R. Ji, and Y. Gao, “Hypergraph neural networks,” in Proceedings ofthe AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 3558–3565.

[9] Z. Tian, T. Hwang, and R. Kuang, “A hypergraph-based learning algorithm for classifyinggene expression and arraycgh data with prior knowledge,” Bioinformatics, vol. 25, no. 21, pp.2831–2838, 2009.

[10] M. Gori, G. Monfardini, and F. Scarselli, “A new model for learning in graph domains,” inProceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., vol. 2.IEEE, 2005, pp. 729–734.

6

Page 7: Deep Hyperedges: a Framework for Transductive and ...stanford.edu/~joshp007/Deep_Hyperedges.pdf · 1 Introduction and Background As data becomes more plentiful, we find ourselves

[11] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neuralnetwork model,” IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61–80, 2008.

[12] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, “Deepsets,” in Advances in neural information processing systems, 2017, pp. 3391–3401.

[13] J. Hartford, D. R. Graham, K. Leyton-Brown, and S. Ravanbakhsh, “Deep models of interactionsacross sets,” arXiv preprint arXiv:1803.02879, 2018.

[14] E. Ihler, D. Wagner, and F. Wagner, “Modeling hypergraphs by graphs with the same mincutproperties,” Information Processing Letters, vol. 45, no. 4, pp. 171–175, 1993.

[15] M. Giurgiu, J. Reinhard, B. Brauner, I. Dunger-Kaltenbach, G. Fobo, G. Frishman, C. Montrone,and A. Ruepp, “Corum: the comprehensive resource of mammalian protein complexes—2019,”Nucleic acids research, vol. 47, no. D1, pp. D559–D563, 2018.

[16] K. Tu, P. Cui, X. Wang, F. Wang, and W. Zhu, “Structural deep embedding for hyper-networks,”in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

[17] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations ofwords and phrases and their compositionality,” 2013.

[18] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk,” Proceedings of the 20th ACM SIGKDDinternational conference on Knowledge discovery and data mining - KDD ’14, 2014. [Online].Available: http://dx.doi.org/10.1145/2623330.2623732

[19] L. Tang and H. Liu, “Leveraging social media networks for classification,” Data Mining andKnowledge Discovery, vol. 23, no. 3, pp. 447–478, 2011.

[20] X. Huang, J. Li, and X. Hu, “Label informed attributed network embedding,” in Proceedings ofthe Tenth ACM International Conference on Web Search and Data Mining. ACM, 2017, pp.731–739.

[21] L. Tang and H. Liu, “Scalable learning of collective behavior based on sparse social dimensions,”in Proceedings of the 18th ACM conference on Information and knowledge management. ACM,2009, pp. 1107–1116.

[22] ——, “Relational learning via latent social dimensions,” in Proceedings of the 15th ACMSIGKDD international conference on Knowledge discovery and data mining. ACM, 2009, pp.817–826.

[23] S. A. Macskassy and F. Provost, “A simple relational classifier,” NEW YORK UNIV NYSTERN SCHOOL OF BUSINESS, Tech. Rep., 2003.

[24] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedingsof the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining.ACM, 2016, pp. 855–864.

[25] Z. Yang, W. W. Cohen, and R. Salakhutdinov, “Revisiting semi-supervised learning with graphembeddings,” 2016.

[26] H. Gao and H. Huang, “Deep attributed network embedding.” in IJCAI, vol. 18, 2018, pp.3364–3370.

[27] C. Yang, Z. Liu, D. Zhao, M. Sun, and E. Chang, “Network representation learning with richtext information,” in Twenty-Fourth International Joint Conference on Artificial Intelligence,2015.

[28] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” inAdvances in Neural Information Processing Systems, 2017, pp. 1024–1034.

[29] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks and locally connectednetworks on graphs,” arXiv preprint arXiv:1312.6203, 2013.

[30] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,”arXiv preprint arXiv:1609.02907, 2016.

[31] ——, “Variational graph auto-encoders,” arXiv preprint arXiv:1611.07308, 2016.[32] S. Ravanbakhsh, J. Schneider, and B. Poczos, “Deep learning with sets and point clouds,” arXiv

preprint arXiv:1611.04500, 2016.[33] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d

classification and segmentation,” CoRR, vol. abs/1612.00593, 2016.

7

Page 8: Deep Hyperedges: a Framework for Transductive and ...stanford.edu/~joshp007/Deep_Hyperedges.pdf · 1 Introduction and Background As data becomes more plentiful, we find ourselves

[34] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning onpoint sets in a metric space,” in Advances in neural information processing systems, 2017, pp.5099–5108.

[35] D. Zhou, J. Huang, and B. Schölkopf, “Learning with hypergraphs: Clustering, classification,and embedding,” in Advances in neural information processing systems, 2007, pp. 1601–1608.

[36] U. Chitra and B. J. Raphael, “Random walks on hypergraphs with edge-dependent vertexweights,” CoRR, vol. abs/1905.08287, 2019. [Online]. Available: http://arxiv.org/abs/1905.08287

[37] S. N. Satchidanand, H. Ananthapadmanaban, and B. Ravindran, “Extended discriminativerandom walk: a hypergraph approach to multi-view multi-relational transductive learning,” inTwenty-Fourth International Joint Conference on Artificial Intelligence, 2015.

[38] A. Sharma, S. Joty, H. Kharkwal, and J. Srivastava, “Hyperedge2vec: Distributed representationsfor hyperedges,” 2018.

[39] S. Bai, F. Zhang, and P. H. Torr, “Hypergraph convolution and hypergraph attention,” arXivpreprint arXiv:1901.08150, 2019.

[40] L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of machine learningresearch, vol. 9, no. Nov, pp. 2579–2605, 2008.

[41] S. Cao, W. Lu, and Q. Xu, “Grarep: Learning graph representations with global structuralinformation,” in Proceedings of the 24th ACM international on conference on information andknowledge management. ACM, 2015, pp. 891–900.

[42] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line,” Proceedings of the24th International Conference on World Wide Web - WWW ’15, 2015. [Online]. Available:http://dx.doi.org/10.1145/2736277.2741093

[43] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simpleway to prevent neural networks from overfitting,” The journal of machine learning research,vol. 15, no. 1, pp. 1929–1958, 2014.

[44] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” inProceedings of the 27th international conference on machine learning (ICML-10), 2010, pp.807–814.

A Dataset and Model Comparisons

In this appendix, we explore and compare performance results between the three models presented(Deep Hyperedges, DeepSets + SaT Walks, Multilayer Perceptron + TaS Walks), as well as betweenthe seven datasets we investigated. The training:validation:test split used is 80:10:10, and networkfeatures are not used—only the hypergraph structure. As mentioned, these results could improve in afuture work with the use of graph convolutions in this framework, as well as model tweaking.

From left to right, we can view a t-SNE plot of the hyperedge embeddings constructed using TaSWalks, training accuracy, training loss, validation accuracy, and validation loss. The green plotrepresents DHE, the blue plot represents DeepSets + SaT Walks, and the red plot represents MLP +TaS Walks. This is meant to provide an overview at-a-glance of the relative effectiveness and trendsof each model for each dataset, and across datasets. We observe that DHE typically performs the bestin validation and test accuracy, and that the relative level of performance between DeepSets + SaTWalks and MLP + TaS Walks depends on how important cardinality, vertex membership, and contextare in the data.

These tests are all available as Jupyter Notebooks for ease of access and convenience of testing onother hypergraph data.

8

Page 9: Deep Hyperedges: a Framework for Transductive and ...stanford.edu/~joshp007/Deep_Hyperedges.pdf · 1 Introduction and Background As data becomes more plentiful, we find ourselves

The Cora dataset1 is a computer science publication citation network dataset where vertices arepapers and hyperedges are the 1-neighborhood of a centroid paper. Each paper (and thus eachhyperedge) is classified into one of seven classes based on topic. DHE achieved around 82.29% testaccuracy.

The PubMed diabetes publication citation network dataset2 is a network where vertices are papersand hyperedges are the cited works of a given paper. Each paper is classified into one of three classesbased on the type of diabetes it studies. DHE achieved around 82.35% test accuracy.

The CORUM protein complex dataset3 represents a network where vertices are proteins and hyper-edges are collections of proteins that interact to form a protein complex. Each hyperedge is labeledbased on whether or not the collection forms a protein complex. We generate negative examples byselecting a n proteins at random to form a negative hyperedge, where n is selected uniformly betweenthe minimum hyperedge cardinality and the maximum hyperedge cardinality. DHE achieves 95.53%test accuracy, with DeepSets + SaT Walks close behind, as a critical signal factor is the cardinality ofthe hyperedge.

CORUM dataset, same distribution. We generate negative examples by selecting a n proteins atrandom to form a negative hyperedge, where n is selected from the distribution in the data. This is atrickier dataset, and DHE achieves 70.23% test accuracy. The key insight brought by differences incardinality enjoyed by DeepSets + SaT Walks in the previous dataset are no more, as it trails behindMLP + TaS Walks. Perhaps if we’d let it train for more epochs, it would’ve caught up, as DeepSets +SaT Walks captures context, as well—just not as readily.

The Meetups dataset4 is a highly unbalanced social networking dataset where vertices are membersand hyperedges are meetup events. Each meetup event is classified into one of 33 types. The modelperformed fairly decently, achieving around 60.10% accuracy.

1https://relational.fit.cvut.cz/dataset/CORA2https://linqs.soe.ucsc.edu/data3https://mips.helmholtz-muenchen.de/corum/4https://www.kaggle.com/sirpunch/meetups-data-from-meetupcom

9

Page 10: Deep Hyperedges: a Framework for Transductive and ...stanford.edu/~joshp007/Deep_Hyperedges.pdf · 1 Introduction and Background As data becomes more plentiful, we find ourselves

Meetups dataset, balanced. We grouped “Tech" and “Career & Business" meetups together intoone class and every other type of meetup into another class to give the dataset more balance. Thistime, DHE performed much better, achieving around 88.31% test accuracy. DeepSets + SaT Walksperformed even better, with 91.31% test accuracy.

The DisGeNet5 dataset is a fairly unbalanced dataset with 23 classes. It is a disease genomics datasetwhere vertices are genes and hyperedges are diseases. Each disease is classified into one of 23MeSH codes (if it has multiple, we randomly select one). The model achieved around 40% testaccuracy—more than we expected, given that additional features are not being trained on, and thelinear inseparability of the hyperedge embeddings.

5http://www.disgenet.org/downloads

10