beyond triangles: a distributed framework for estimating 3...
TRANSCRIPT
![Page 1: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/1.jpg)
Beyond Triangles: A Distributed Framework forEstimating 3-profiles of Large Graphs
Ethan R. Elenberg, Karthikeyan Shanmugam,Michael Borokhovich, Alexandros G. Dimakis
University of Texas, Austin, USA
August 12, 2015
E. R. Elenberg Beyond Triangles 1/20
![Page 2: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/2.jpg)
Introduction
• Perform analytics on large graphs
- World Wide Web, social networks, bioinformatics
• More descriptive than triangle count, clustering coefficient
• Scalable, distributed algorithms
E. R. Elenberg Beyond Triangles 2/20
![Page 3: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/3.jpg)
3-profile
• Count the induced subgraphs formed by selecting all triples ofvertices
H3
Definition
Let ni be the number of Hi’s in a graph G. The vectorn(G) = [n0, n1, n2, n3] is called the 3-profile of G.
- Always sums to(|V |
3
), the total number of 3-subgraphs
E. R. Elenberg Beyond Triangles 3/20
![Page 4: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/4.jpg)
3-profile
• Count the induced subgraphs formed by selecting all triples ofvertices
H3H2H1H0
Definition
Let ni be the number of Hi’s in a graph G. The vectorn(G) = [n0, n1, n2, n3] is called the 3-profile of G.
- Always sums to(|V |
3
), the total number of 3-subgraphs
E. R. Elenberg Beyond Triangles 3/20
![Page 5: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/5.jpg)
3-profile
• Count the induced subgraphs formed by selecting all triples ofvertices
H3H2H1H0
Definition
Let ni be the number of Hi’s in a graph G. The vectorn(G) = [n0, n1, n2, n3] is called the 3-profile of G.
- Always sums to(|V |
3
), the total number of 3-subgraphs
E. R. Elenberg Beyond Triangles 3/20
![Page 6: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/6.jpg)
Examples
• 4-clique: n(K4) = [0, 0, 0, 4]
H3
E. R. Elenberg Beyond Triangles 4/20
![Page 7: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/7.jpg)
Examples
• 4-clique: n(K4) = [0, 0, 0, 4]
H3
E. R. Elenberg Beyond Triangles 4/20
![Page 8: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/8.jpg)
Examples
• 4-clique: n(K4) = [0, 0, 0, 4]
H3
E. R. Elenberg Beyond Triangles 4/20
![Page 9: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/9.jpg)
Examples
• 4-clique: n(K4) = [0, 0, 0, 4]
H3
E. R. Elenberg Beyond Triangles 4/20
![Page 10: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/10.jpg)
Examples
• 5-cycle: n(C5) = [?, ?, ?, ?]
E. R. Elenberg Beyond Triangles 5/20
![Page 11: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/11.jpg)
Examples
• 5-cycle: n(C5) = [0, ?, ?, ?]
H0
E. R. Elenberg Beyond Triangles 5/20
![Page 12: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/12.jpg)
Examples
• 5-cycle: n(C5) = [0, 5, ?, ?]
H1
E. R. Elenberg Beyond Triangles 5/20
![Page 13: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/13.jpg)
Examples
• 5-cycle: n(C5) = [0, 5, 5, ?]
H2
E. R. Elenberg Beyond Triangles 5/20
![Page 14: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/14.jpg)
Examples
• 5-cycle: n(C5) = [0, 5, 5, 0]
H3
E. R. Elenberg Beyond Triangles 5/20
![Page 15: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/15.jpg)
Related Terms
For each v ∈ V :
Definition
The local 3-profile counts how many times v participates in eachHi with 2 other vertices.
Definition
The ego 3-profile is the 3-profile of ego graph N(v).
- Graph induced by set of neighbors Γ(v)
E. R. Elenberg Beyond Triangles 6/20
![Page 16: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/16.jpg)
Related Terms
For each v ∈ V :
Definition
The local 3-profile counts how many times v participates in eachHi with 2 other vertices.
Definition
The ego 3-profile is the 3-profile of ego graph N(v).
- Graph induced by set of neighbors Γ(v)
E. R. Elenberg Beyond Triangles 6/20
![Page 17: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/17.jpg)
Motivation
• Global 3-profile concisely describes local connectivity
- Molecule classification
• Local and ego 3-profiles are feature vectors for each vertex
- Spam detection- Generative models
E. R. Elenberg Beyond Triangles 7/20
![Page 18: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/18.jpg)
Introduction
• Problem: Compute (or approximate) 3-profile quantities for alarge graph
• Approach: Edge sub-sampling and distributed implementation
E. R. Elenberg Beyond Triangles 8/20
![Page 19: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/19.jpg)
Introduction
• Problem: Compute (or approximate) 3-profile quantities for alarge graph
• Approach: Edge sub-sampling and distributed implementation
E. R. Elenberg Beyond Triangles 8/20
![Page 20: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/20.jpg)
Contributions
1 Derive a 3-profile sparsifier with provable guarantees
2 Design distributed, graph engine algorithms to calculate localand ego 3-profiles
3 Evaluate performance on real-world datasets
E. R. Elenberg Beyond Triangles 9/20
![Page 21: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/21.jpg)
Related Work
Well studied across several communities:
• Graph sub-sampling
[Kim, Vu ’00] [Tsourakakis, et al. ’08 -’11] [Ahmed, et al. ’14]
• Large-scale triangle counting
[Satish, et al. ’14] [Shank ’07] [Suri, Vassilvitskii ’11]
• Subgraph counting
[Alon, et al. ’97] [Kloks, et al. ’00] [Kowaluk, et al. ’13]
• Graphlets
[Przulj ’07] [Shervashidze, et al. ’09]
E. R. Elenberg Beyond Triangles 10/20
![Page 22: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/22.jpg)
Outline
1 Introduction
2 3-profile SparsifierEdge Sub-sampling ProcessConcentration Bound
3 3-PROF Algorithm
4 Experiments
5 Conclusions
E. R. Elenberg Beyond Triangles 10/20
![Page 23: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/23.jpg)
Edge Sub-sampling Process
• Sub-sample each edge in the graph independently withprobability p
• Relate the original and sub-sampled graphs via a 1-stepMarkov chain
E. R. Elenberg Beyond Triangles 11/20
![Page 24: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/24.jpg)
Edge Sub-sampling Process
Original
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
p3
Sub-sampled
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
Estimator
=
1 1− p (1− p)2 (1− p)3
0 p 2p(1− p) 3p(1− p)2
0 0 p2 3p2(1− p)0 0 0 p3
−1 Sub-sampled
E. R. Elenberg Beyond Triangles 12/20
![Page 25: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/25.jpg)
Edge Sub-sampling Process
Original
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
p3
Sub-sampled
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
Estimator
=
1 1− p (1− p)2 (1− p)3
0 p 2p(1− p) 3p(1− p)2
0 0 p2 3p2(1− p)0 0 0 p3
−1 Sub-sampled
E. R. Elenberg Beyond Triangles 12/20
![Page 26: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/26.jpg)
Edge Sub-sampling Process
Original
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
p2
p3
Sub-sampled
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
Estimator
=
1 1− p (1− p)2 (1− p)3
0 p 2p(1− p) 3p(1− p)2
0 0 p2 3p2(1− p)0 0 0 p3
−1 Sub-sampled
E. R. Elenberg Beyond Triangles 12/20
![Page 27: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/27.jpg)
Edge Sub-sampling Process
Original
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
p2
3p2 (1−
p)
p3
Sub-sampled
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
Estimator
=
1 1− p (1− p)2 (1− p)3
0 p 2p(1− p) 3p(1− p)2
0 0 p2 3p2(1− p)0 0 0 p3
−1 Sub-sampled
E. R. Elenberg Beyond Triangles 12/20
![Page 28: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/28.jpg)
Edge Sub-sampling Process
Original
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
1
(1−p)
(1− p
)2
(1−p)
3
p
2p(1− p
)
3p(1− p
)2
p2
3p2 (1−
p)
p3
Sub-sampled
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
Estimator
=
1 1− p (1− p)2 (1− p)3
0 p 2p(1− p) 3p(1− p)2
0 0 p2 3p2(1− p)0 0 0 p3
−1 Sub-sampled
E. R. Elenberg Beyond Triangles 12/20
![Page 29: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/29.jpg)
Edge Sub-sampling Process
Original
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
1
(1−p)
(1− p
)2
(1−p)
3
p
2p(1− p
)
3p(1− p
)2
p2
3p2 (1−
p)
p3
Sub-sampled
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
Estimator
=
1 1− p (1− p)2 (1− p)3
0 p 2p(1− p) 3p(1− p)2
0 0 p2 3p2(1− p)0 0 0 p3
−1 Sub-sampled
E. R. Elenberg Beyond Triangles 12/20
![Page 30: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/30.jpg)
Main Result
Theorem (3-profile sparsifiers)
For all (ε,p)-balanced graphs∗, the l∞-norm of the 3-profilesparsifier error is bounded by ε
(|V |3
)with high probability.
Definition
A graph is (ε,p)-balanced if the majority of “triangles,” “wedges,”or “single-edges” do not depend on one common edge.
Proof Sketch:
- Apply multivariate polynomial concentration inequalities [Kim,Vu ’00] to each estimator
f(G, p) = e1e2e4 + e4e5e6 + . . .
E. R. Elenberg Beyond Triangles 13/20
![Page 31: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/31.jpg)
Main Result
Theorem (3-profile sparsifiers)
For all (ε,p)-balanced graphs∗, the l∞-norm of the 3-profilesparsifier error is bounded by ε
(|V |3
)with high probability.
Definition
A graph is (ε,p)-balanced if the majority of “triangles,” “wedges,”or “single-edges” do not depend on one common edge.
Proof Sketch:
- Apply multivariate polynomial concentration inequalities [Kim,Vu ’00] to each estimator
f(G, p) = e1e2e4 + e4e5e6 + . . .
E. R. Elenberg Beyond Triangles 13/20
![Page 32: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/32.jpg)
Main Result
Theorem (3-profile sparsifiers)
For all (ε,p)-balanced graphs∗, the l∞-norm of the 3-profilesparsifier error is bounded by ε
(|V |3
)with high probability.
Definition
A graph is (ε,p)-balanced if the majority of “triangles,” “wedges,”or “single-edges” do not depend on one common edge.
Proof Sketch:
- Apply multivariate polynomial concentration inequalities [Kim,Vu ’00] to each estimator
f(G, p) = e1e2e4 + e4e5e6 + . . .
E. R. Elenberg Beyond Triangles 13/20
![Page 33: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/33.jpg)
Outline
1 Introduction
2 3-profile SparsifierEdge Sub-sampling ProcessConcentration Bound
3 3-PROF Algorithm
4 Experiments
5 Conclusions
E. R. Elenberg Beyond Triangles 13/20
![Page 34: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/34.jpg)
3-PROF
Vertex program in the Gather-Apply-Scatter framework
1 For each vertex v: Gather and Apply vertex IDs to store Γ(v)
2 For each edge va: Scatter
v an3,va = |Γ(v) ∩ Γ(a)|,
v anc2,va = |Γ(v)| − |Γ(v) ∩ Γ(a)| − 1, . . .
3 For each vertex v: Gather and Apply
v an3,v = 1
2
∑a∈Γ(v) n3,va
v anc2,v = 1
2
∑a∈Γ(v) n
c2,va, . . .
E. R. Elenberg Beyond Triangles 14/20
![Page 35: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/35.jpg)
3-PROF
Vertex program in the Gather-Apply-Scatter framework
1 For each vertex v: Gather and Apply vertex IDs to store Γ(v)
2 For each edge va: Scatter
v an3,va = |Γ(v) ∩ Γ(a)|,
v anc2,va = |Γ(v)| − |Γ(v) ∩ Γ(a)| − 1, . . .
3 For each vertex v: Gather and Apply
v an3,v = 1
2
∑a∈Γ(v) n3,va
v anc2,v = 1
2
∑a∈Γ(v) n
c2,va, . . .
E. R. Elenberg Beyond Triangles 14/20
![Page 36: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/36.jpg)
3-PROF
Vertex program in the Gather-Apply-Scatter framework
1 For each vertex v: Gather and Apply vertex IDs to store Γ(v)
2 For each edge va: Scatter
v an3,va = |Γ(v) ∩ Γ(a)|,
v anc2,va = |Γ(v)| − |Γ(v) ∩ Γ(a)| − 1, . . .
3 For each vertex v: Gather and Apply
v an3,v = 1
2
∑a∈Γ(v) n3,va
v anc2,v = 1
2
∑a∈Γ(v) n
c2,va, . . .
E. R. Elenberg Beyond Triangles 14/20
![Page 37: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/37.jpg)
3-PROF
Vertex program in the Gather-Apply-Scatter framework
1 For each vertex v: Gather and Apply vertex IDs to store Γ(v)
2 For each edge va: Scatter
v an3,va = |Γ(v) ∩ Γ(a)|,
v anc2,va = |Γ(v)| − |Γ(v) ∩ Γ(a)| − 1, . . .
3 For each vertex v: Gather and Apply
v an3,v = 1
2
∑a∈Γ(v) n3,va
v anc2,v = 1
2
∑a∈Γ(v) n
c2,va, . . .
E. R. Elenberg Beyond Triangles 14/20
![Page 38: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/38.jpg)
Outline
1 Introduction
2 3-profile SparsifierEdge Sub-sampling ProcessConcentration Bound
3 3-PROF Algorithm
4 Experiments
5 Conclusions
E. R. Elenberg Beyond Triangles 14/20
![Page 39: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/39.jpg)
Implementation
• GraphLab PowerGraph v2.2
• Multicore server• 256 GB RAM, 72 logical cores
• EC2 cluster (Amazon Web Services)• 20 c3.8xlarge, 60 GB RAM, 32 logical cores each
Datasets
Name Vertices Edges (undirected)
Twitter 41, 652, 230 1, 202, 513, 046PLD 39, 497, 204 582, 567, 291LiveJournal 4, 846, 609 42, 851, 237Wikipedia 3, 515, 067 42, 375, 912DBLP 317, 080 1, 049, 866
E. R. Elenberg Beyond Triangles 15/20
![Page 40: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/40.jpg)
Implementation
• GraphLab PowerGraph v2.2
• Multicore server• 256 GB RAM, 72 logical cores
• EC2 cluster (Amazon Web Services)• 20 c3.8xlarge, 60 GB RAM, 32 logical cores each
Datasets
Name Vertices Edges (undirected)
Twitter 41, 652, 230 1, 202, 513, 046PLD 39, 497, 204 582, 567, 291LiveJournal 4, 846, 609 42, 851, 237Wikipedia 3, 515, 067 42, 375, 912DBLP 317, 080 1, 049, 866
E. R. Elenberg Beyond Triangles 15/20
![Page 41: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/41.jpg)
Results: 3-profile Sparsifier Accuracy, 5 runs
p=0.7 p=0.4 p=0.1 p=0.010.985
0.990
0.995
1.000
1.005
1.010
1.015A
ccur
acy
[exa
ct/a
ppro
x]PLD, Accuracy, 3-profiles
triangleswedges
edgeempty
E. R. Elenberg Beyond Triangles 16/20
![Page 42: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/42.jpg)
Results: Multicore, 3 runs
Compare 3-PROF to GraphLab’s default triangle count
Twitter PLD0
100
200
300
400
500
600
Run
ning
time
[sec
]Twitter and PLD, Multicore (p=1)
3-prof Trian
E. R. Elenberg Beyond Triangles 17/20
![Page 43: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/43.jpg)
Results: AWS, 5 runs
Compare EGO-PAR to naive, serial algorithm (EGO-SER )
100 egos 1K egos 10K egos10−1
100
101
102
103
104
105
Run
ning
time
[sec
]
>1000 sec
>10000 sec
LiveJournal, AWS c3 8xlargeEgo-ser 12 nodes Ego-par 12 nodes
E. R. Elenberg Beyond Triangles 18/20
![Page 44: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/44.jpg)
Results: AWS, 5 runs
10k egos0
2
4
6
8
10
12
14R
unni
ngtim
e[s
ec]
LiveJournal, AWS c3 8xlargeEgo-par 12 nodes Ego-par 16 nodes Ego-par 20 nodes
E. R. Elenberg Beyond Triangles 19/20
![Page 45: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/45.jpg)
Outline
1 Introduction
2 3-profile SparsifierEdge Sub-sampling ProcessConcentration Bound
3 3-PROF Algorithm
4 Experiments
5 Conclusions
E. R. Elenberg Beyond Triangles 19/20
![Page 46: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/46.jpg)
Summary
1 Edge sub-sampling produces fast, accurate 3-profile estimates
2 3-profile counting consumes roughly the same resources astriangle counting
3 Distributed algorithms scale well over large data and largecomputing clusters
github.com/eelenberg/3-profiles
E. R. Elenberg Beyond Triangles 20/20
![Page 47: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/47.jpg)
(Backup) Edge Pivot Equations
v a
∑a∈Γ(v)
(n3,va
2
) F2(v) 3F3(v)
= +
E. R. Elenberg Beyond Triangles 20/20
![Page 48: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/48.jpg)
(Backup) Edge Pivot Equations
v a
∑a∈Γ(v)
(n3,va
2
) F2(v) 3F3(v)
= +
E. R. Elenberg Beyond Triangles 20/20
![Page 49: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/49.jpg)
(Backup) Edge Pivot Equations
v a
∑a∈Γ(v)
(n3,va
2
) F2(v) 3F3(v)
= +
E. R. Elenberg Beyond Triangles 20/20
![Page 50: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/50.jpg)
(Backup) Results: 3-profile Sparsifier Accuracy, 5 runs
p=0.7 p=0.5 p=0.3 p=0.1
0.996
0.998
1.000
1.002
1.004
Acc
urac
y[e
xact
/app
rox]
Twitter, Accuracy, 3-profilestriangleswedges
edgeempty
E. R. Elenberg Beyond Triangles 20/20
![Page 51: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/51.jpg)
(Backup) Results: 3-PROF vs. TRIAN, AWS, 3 runs
12 nodes 16 nodes 20 nodes0
1
2
3
4
5
6
7
Run
ning
time
[sec
]
LiveJournal, AWS c3 8xlarge3-prof p=13-prof p=0.5
3-prof p=0.1Trian p=1
Trian p=0.5Trian p=0.1
LiveJournal Running Time
12 nodes 16 nodes 20 nodes0
20
40
60
80
100
120
Run
ning
time
[sec
]
PLD, AWS c3 8xlarge3-prof p=13-prof p=0.5
3-prof p=0.1Trian p=1
Trian p=0.5Trian p=0.1
PLD Running Time
E. R. Elenberg Beyond Triangles 20/20
![Page 52: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/52.jpg)
(Backup) Results: 3-PROF vs. TRIAN, AWS, 3 runs
12 nodes 16 nodes 20 nodes0.0
0.2
0.4
0.6
0.8
1.0
Net
wor
kse
nt[b
ytes
]
×1010 LiveJournal, AWS c3 8xlarge3-prof p=13-prof p=0.5
3-prof p=0.1Trian p=1
Trian p=0.5Trian p=0.1
LiveJournal Network Usage
12 nodes 16 nodes 20 nodes0.0
0.2
0.4
0.6
0.8
1.0
1.2
Net
wor
kse
nt[b
ytes
]
×1011 PLD, AWS c3 8xlarge3-prof p=13-prof p=0.5
3-prof p=0.1Trian p=1
Trian p=0.5Trian p=0.1
PLD Network Usage
E. R. Elenberg Beyond Triangles 20/20
![Page 53: Beyond Triangles: A Distributed Framework for Estimating 3 ...eelenberg.github.io/Elenberg3profileKDD15.pdf · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros](https://reader033.vdocuments.mx/reader033/viewer/2022053022/604cbd748c46ff603c417ff6/html5/thumbnails/53.jpg)
(Backup) Results: AWS, 5 runs
100 egos0
20
40
60
80
100
120
Run
ning
time
[sec
]
LiveJournal, AWS c3 8xlargeEgo-ser 12 nodes Ego-ser 16 nodes Ego-ser 20 nodes
EGO-SER
100 egos0
2
4
6
8
10
12
Run
ning
time
[sec
]
LiveJournal, AWS c3 8xlargeEgo-par 12 nodes Ego-par 16 nodes Ego-par 20 nodes
EGO-PAR
E. R. Elenberg Beyond Triangles 20/20