![Page 1: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/1.jpg)
DeepWalk: Online Learning of SocialRepresentations 1
Presented by Carlos Oliver for COMP 766
January 20, 2020
1Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. ”Deepwalk: Online learning of social representations.”
Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM,2014.
![Page 2: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/2.jpg)
Motivation
I Can we predict the label of a node given the graph?
I Problem: labels not i.i.d so traditional methods can’t beused. (MRF, Graph Kernels and other structured learningmodels needed)
![Page 3: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/3.jpg)
Motivation
I Can we predict the label of a node given the graph?
I Problem: labels not i.i.d so traditional methods can’t beused. (MRF, Graph Kernels and other structured learningmodels needed)
![Page 4: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/4.jpg)
Motivation
I Can we predict the label of a node given the graph?
I Problem: labels not i.i.d so traditional methods can’t beused. (MRF, Graph Kernels and other structured learningmodels needed)
![Page 5: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/5.jpg)
IdeaI Separate labels from underlying structure.
I Existing Approaches:I Graph statistics (neighbourhood overlap...)I Spectral clustering
I Limitations: often require full graph to compute ordomain-specific knowledge.
![Page 6: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/6.jpg)
IdeaI Separate labels from underlying structure.
I Existing Approaches:I Graph statistics (neighbourhood overlap...)I Spectral clustering
I Limitations: often require full graph to compute ordomain-specific knowledge.
![Page 7: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/7.jpg)
IdeaI Separate labels from underlying structure.
I Existing Approaches:I Graph statistics (neighbourhood overlap...)I Spectral clustering
I Limitations: often require full graph to compute ordomain-specific knowledge.
![Page 8: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/8.jpg)
Relationship between social graphs and natural language
I This tells us that modeling the co-occurence of vertices givesus similar information to measuring co-occurence of words.
I Seeing words co-occur gives us information about thestructure of the language (or graph).
![Page 9: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/9.jpg)
Relationship between social graphs and natural language
I This tells us that modeling the co-occurence of vertices givesus similar information to measuring co-occurence of words.
I Seeing words co-occur gives us information about thestructure of the language (or graph).
![Page 10: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/10.jpg)
Relationship between social graphs and natural language
I This tells us that modeling the co-occurence of vertices givesus similar information to measuring co-occurence of words.
I Seeing words co-occur gives us information about thestructure of the language (or graph).
![Page 11: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/11.jpg)
Random walks ∼ sentences
I Since co-occurence tells us about the structural context.Sample random walks from each node.
I Bonus: natural way to split up graph (i.e. don’t need wholegraph to get a node’s embedding)
![Page 12: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/12.jpg)
Random walks ∼ sentences
I Since co-occurence tells us about the structural context.Sample random walks from each node.
I Bonus: natural way to split up graph (i.e. don’t need wholegraph to get a node’s embedding)
![Page 13: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/13.jpg)
Learning objective
I Given a random walk, (v1, .., vi ), we update representationΦ(vi ) ∈ Rd to maximizes the likelihood of the walk.
arg minΦ− logP(v1, .., vi−1|Φ(vi ))
I Probabilities are given by a multi-label classifier which mapsΦ(vi )→ V k where k is the size of the walks.
I Nodes with similar Φ will have similar local graph ‘structure’.
![Page 14: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/14.jpg)
Learning objective
I Given a random walk, (v1, .., vi ), we update representationΦ(vi ) ∈ Rd to maximizes the likelihood of the walk.
arg minΦ− logP(v1, .., vi−1|Φ(vi ))
I Probabilities are given by a multi-label classifier which mapsΦ(vi )→ V k where k is the size of the walks.
I Nodes with similar Φ will have similar local graph ‘structure’.
![Page 15: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/15.jpg)
Learning objective
I Given a random walk, (v1, .., vi ), we update representationΦ(vi ) ∈ Rd to maximizes the likelihood of the walk.
arg minΦ− logP(v1, .., vi−1|Φ(vi ))
I Probabilities are given by a multi-label classifier which mapsΦ(vi )→ V k where k is the size of the walks.
I Nodes with similar Φ will have similar local graph ‘structure’.
![Page 16: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/16.jpg)
Speedups: SkipGram
I Skip-gram: lets us split the walk into sliding windows size wand update nodes in each window using SGD.
I arg minΦ− logP(v1, .., vi−1|Φ(vi )) becomes
I arg minΦ− logP(vi−w , vi−1, vi+1.., vi+w |Φ(vi ))
![Page 17: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/17.jpg)
Speedups: SkipGram
I Skip-gram: lets us split the walk into sliding windows size wand update nodes in each window using SGD.
I arg minΦ− logP(v1, .., vi−1|Φ(vi )) becomes
I arg minΦ− logP(vi−w , vi−1, vi+1.., vi+w |Φ(vi ))
![Page 18: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/18.jpg)
Speedups: SkipGram
I Skip-gram: lets us split the walk into sliding windows size wand update nodes in each window using SGD.
I arg minΦ− logP(v1, .., vi−1|Φ(vi )) becomes
I arg minΦ− logP(vi−w , vi−1, vi+1.., vi+w |Φ(vi ))
![Page 19: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/19.jpg)
Speedups: SkipGram
I Skip-gram: lets us split the walk into sliding windows size wand update nodes in each window using SGD.
I arg minΦ− logP(v1, .., vi−1|Φ(vi )) becomes
I arg minΦ− logP(vi−w , vi−1, vi+1.., vi+w |Φ(vi ))
![Page 20: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/20.jpg)
Speedups: Hierarchical Softmax
I Hierarchical Softmax: reduces softmax normalization fromO(|V |) to O(log |V |) by building tree of binary classifiers.
I
P(bk |Φ(vj)) = Πlog |V |l=1 P(bl |Φ(vj))
![Page 21: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/21.jpg)
Speedups: Hierarchical Softmax
I Hierarchical Softmax: reduces softmax normalization fromO(|V |) to O(log |V |) by building tree of binary classifiers.
I
P(bk |Φ(vj)) = Πlog |V |l=1 P(bl |Φ(vj))
![Page 22: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/22.jpg)
Speedups: Hierarchical SoftmaxI Hierarchical Softmax: reduces softmax normalization fromO(|V |) to O(log |V |) by building tree of binary classifiers.
I
P(bk |Φ(vj)) = Πlog |V |l=1 P(bl |Φ(vj))
![Page 23: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/23.jpg)
Training loop
![Page 24: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/24.jpg)
Evaluation
I Task: Multi-label node classification on large social graphs.I Evaluation: micro,macro F1 score
I F1 score is the harmonic mean of precision and recallI Macro is the arithmetic mean of F1 over all classesI Micro is total proportion of correct labels over all samples.
![Page 25: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/25.jpg)
Parameter Sensitivity
![Page 26: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/26.jpg)
Summary
I Modelling random walks as sentences in a language gives us:
I Continuous & Low Dimensional representations →robustness
I Online training → walks naturally partition the graphI Scalable implementation → SkipGram & Hierarchical
Softmax
![Page 27: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/27.jpg)
Summary
I Modelling random walks as sentences in a language gives us:
I Continuous & Low Dimensional representations →robustness
I Online training → walks naturally partition the graphI Scalable implementation → SkipGram & Hierarchical
Softmax
![Page 28: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/28.jpg)
Summary
I Modelling random walks as sentences in a language gives us:
I Continuous & Low Dimensional representations →robustness
I Online training → walks naturally partition the graph
I Scalable implementation → SkipGram & HierarchicalSoftmax
![Page 29: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/29.jpg)
Summary
I Modelling random walks as sentences in a language gives us:
I Continuous & Low Dimensional representations →robustness
I Online training → walks naturally partition the graphI Scalable implementation → SkipGram & Hierarchical
Softmax
![Page 30: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/30.jpg)
Limitations
I Representations themselves left unexplored.
I Number of parameters ∝ number of nodes in the graph.
I Similarity is only defined on a single graph.
![Page 31: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/31.jpg)
Limitations
I Representations themselves left unexplored.
I Number of parameters ∝ number of nodes in the graph.
I Similarity is only defined on a single graph.
![Page 32: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,](https://reader033.vdocuments.mx/reader033/viewer/2022051812/602bd807b2e8f056ab6ecc84/html5/thumbnails/32.jpg)
Limitations
I Representations themselves left unexplored.
I Number of parameters ∝ number of nodes in the graph.
I Similarity is only defined on a single graph.