the shortest path is not always a straight line
TRANSCRIPT
![Page 1: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/1.jpg)
THE SHORTEST PATH IS NOT ALWAYS A STRAIGHT LINE
leveraging semi-metricity in large-scale graph analysis
Vasiliki Kalavri ([email protected]) KTH Royal Institute of TechnologyTiago Simas ([email protected]) Telefonica Research Dionysios Logothetis ([email protected]) Facebook
![Page 2: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/2.jpg)
2
Alice42 likes
Weighted graphs capture relationship strength
distance
similarity social proximity
rating preference
influential nodes
optimal propagation paths
communities
recommendations
BobMax
3 likes
![Page 3: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/3.jpg)
3
Sparsification techniques reduce the graph size and still give exact or good
approximate results
G G’f(G) ~ f(G’)
![Page 4: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/4.jpg)
THE METRIC BACKBONE
Reduces the graph size while maintaining relevant structure
The minimum subgraph of a weighted graph, that preserves the shortest paths of the original graph
4
B
E
DA
C2
3
10
4
2
1
B
E
DA
C2
3
2
1
![Page 5: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/5.jpg)
WHAT CAN WE USE IT FOR?• Exact computations
• any algorithm that depends on the shortest paths• reachability, connectivity• betweenness centrality, closeness centrality
• Approximation• PageRank, random walks• eigenvector centrality• community detection, clustering
5
![Page 6: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/6.jpg)
WHAT CAN WE USE IT FOR?• Exact computations
• any algorithm that depends on the shortest paths• reachability, connectivity• betweenness centrality, closeness centrality
• Approximation• PageRank, random walks• eigenvector centrality• community detection, clustering
5
Improves community detection modularity and recommender
systems accuracy
![Page 7: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/7.jpg)
IMPACT ON LARGE-SCALE SYSTEMS• Graph Databases
• fewer edges => smaller path search space
• Batch Graph Processing• CPU and memory requirements depend on #messages
• #messages proportional to #edges
• fewer edges => improved analysis performance
• Graph Compression• fewer edges => storage reduction
6
![Page 8: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/8.jpg)
BACKGROUND
![Page 9: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/9.jpg)
SEMI-METRICITYIn a weighted graph, an edge is semi-metric, if there exists a shorter indirect path between its endpoints
8
B
E
DA
C2
3
10
4
2
1
![Page 10: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/10.jpg)
SEMI-METRICITYIn a weighted graph, an edge is semi-metric, if there exists a shorter indirect path between its endpoints
9
B
E
DA
C2
3
10
4
2
1
CE is 1st-order semi-metric:
C-D-E is a shorter2-hop path
![Page 11: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/11.jpg)
SEMI-METRICITYIn a weighted graph, an edge is semi-metric, if there exists a shorter indirect path between its endpoints
10
B
E
DA
C2
3
10
4
2
1
AD is 2nd-order semi-metric:
A-B-C-D is a shorter 3-hop path
CE is 1st-order semi-metric:
C-D-E is a shorter2-hop path
![Page 12: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/12.jpg)
SEMI-METRICITYIn a weighted graph, an edge is semi-metric, if there exists a shorter indirect path between its endpoints
11
B
E
DA
C2
3
10
4
2
1
CE is 1st-order semi-metric:
C-D-E is a shorter2-hop path
AD is 2nd-order semi-metric:
A-B-C-D is a shorter 3-hop path
AB, BC, CD, DE are metric
![Page 13: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/13.jpg)
BACKBONE ALGORITHM
![Page 14: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/14.jpg)
BACKBONE CALCULATION• Calculating the backbone:
• find all semi-metric edges: 1 BFS per edge?• compute APSP and store O(N2) paths
13
![Page 15: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/15.jpg)
BACKBONE CALCULATION• Calculating the backbone:
• find all semi-metric edges: 1 BFS per edge?• compute APSP and store O(N2) paths
Can we calculate or approximate the backbone
without solving APSP?
13
![Page 16: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/16.jpg)
ORDER OF SEMI-METRICITY
14
![Page 17: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/17.jpg)
ORDER OF SEMI-METRICITY
14
Most semi-metric edges are1st-order semi-metric
![Page 18: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/18.jpg)
A 3-PHASE BACKBONE ALGORITHM
15
Find 1st-order semi-metric edges: only look at triangles
1.
![Page 19: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/19.jpg)
A 3-PHASE BACKBONE ALGORITHM
15
Find 1st-order semi-metric edges: only look at triangles
1. Scalable & practicalfor large graphs
![Page 20: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/20.jpg)
EXAMPLE
16
B
E
DA
C2
3
10
4
2
1
![Page 21: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/21.jpg)
EXAMPLE
17
B
E
DA
C2
3
10
4
2
1
Phase 1
![Page 22: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/22.jpg)
EXAMPLE
18
B
E
DA
C2
3
10 2
1
Phase 1
![Page 23: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/23.jpg)
A 3-PHASE BACKBONE ALGORITHM
19
Find 1st-order semi-metric edges: only look at triangles
1. Scalable & practicalfor large graphs
![Page 24: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/24.jpg)
A 3-PHASE BACKBONE ALGORITHM
19
Find 1st-order semi-metric edges: only look at triangles
1.
Identify metric edges in 2-hop paths
2.
Scalable & practicalfor large graphs
![Page 25: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/25.jpg)
A 3-PHASE BACKBONE ALGORITHM
19
Find 1st-order semi-metric edges: only look at triangles
1.
Identify metric edges in 2-hop paths
2.
Scalable & practicalfor large graphs
Most semi-metric edgeshave been removed
![Page 26: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/26.jpg)
EXAMPLE
20
B
E
DA
C2
3
10 2
1
Phase 2
![Page 27: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/27.jpg)
EXAMPLE
20
B
E
DA
C2
3
10 2
1
Phase 2
M
M
MM
The lowest-weight edge of every vertex is metric
![Page 28: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/28.jpg)
EXAMPLE
20
B
E
DA
C2
3
10 2
1
Phase 2
M
M
MM
The lowest-weight edge of every vertex is metric
uv2
4
2
1
any indirect pathfrom u to vwould have
larger weight
![Page 29: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/29.jpg)
EXAMPLE
20
B
E
DA
C2
3
10 2
1
Phase 2
?
M
M
MM
The lowest-weight edge of every vertex is metric
uv2
4
2
1
any indirect pathfrom u to vwould have
larger weight
![Page 30: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/30.jpg)
A 3-PHASE BACKBONE ALGORITHM
21
Find 1st-order semi-metric edges: only look at triangles!
1.
Identify metric edges in 2-hop paths
2.
Scalable & practicalfor large graphs!
Most semi-metric edgeshave been removed
![Page 31: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/31.jpg)
A 3-PHASE BACKBONE ALGORITHM
21
Find 1st-order semi-metric edges: only look at triangles!
1.
Identify metric edges in 2-hop paths
2.
Run a BFS for remaining unlabeled edges.
3.
Scalable & practicalfor large graphs!
Most semi-metric edgeshave been removed
![Page 32: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/32.jpg)
A 3-PHASE BACKBONE ALGORITHM
21
Find 1st-order semi-metric edges: only look at triangles!
1.
Identify metric edges in 2-hop paths
2.
Run a BFS for remaining unlabeled edges.
3.
Scalable & practicalfor large graphs!
1%-9% edges
Most semi-metric edgeshave been removed
![Page 33: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/33.jpg)
EXAMPLE
22
B
E
DA
C2
3
10 2
1
Phase 3
M
M
MM
BFS
![Page 34: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/34.jpg)
EXAMPLE
22
B
E
DA
C2
3
10 2
1
Phase 3
M
M
MM
BFS
Explore paths with shorter
distances only
![Page 35: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/35.jpg)
EXAMPLE
22
B
E
DA
C2
3
10 2
1
Phase 3
M
M
MM
BFS
Explore paths with shorter
distances only
If the BFS arrives at the target, the edge
is semi-metric
![Page 36: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/36.jpg)
EXAMPLE
23
B
E
DA
C2
3
2
1
Metric Backbone
![Page 37: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/37.jpg)
DISTRIBUTED IMPLEMENTATION
code available: http://grafos.ml/okapi.html#analytics
24
Implementation in the vertex-centric model
![Page 38: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/38.jpg)
EVALUATION
![Page 39: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/39.jpg)
EVALUATION GOALS
• How does our algorithm compare to APSP?
• Are large, real-world graphs semi-metric?
• Can we improve graph analysis performance?
26
![Page 40: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/40.jpg)
COMPARISON TO APSPComputing APSP in Giraph• multiple SSSPs• multiple MSSPs, i.e. SSSPs from
several sources in parallel
27
![Page 41: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/41.jpg)
COMPARISON TO APSPComputing APSP in Giraph• multiple SSSPs• multiple MSSPs, i.e. SSSPs from
several sources in parallel
27
In the order of months for million-edge graphs
![Page 42: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/42.jpg)
COMPARISON TO APSPComputing APSP in Giraph• multiple SSSPs• multiple MSSPs, i.e. SSSPs from
several sources in parallel
27
In the order of months for million-edge graphs
In the order of days for million-edge graphs
![Page 43: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/43.jpg)
COMPARISON TO APSPComputing APSP in Giraph• multiple SSSPs• multiple MSSPs, i.e. SSSPs from
several sources in parallel
27
In the order of months for million-edge graphs
In the order of days for million-edge graphs
Our algorithm is 120-180x faster than SSSPand 11-14x faster than MSSP: order of hours for million-edge graphs
![Page 44: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/44.jpg)
ALGORITHM PHASES
28
Phase 1 Phase 2 Phase 3
![Page 45: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/45.jpg)
ALGORITHM PHASES
28
Phase 1 Phase 2 Phase 3
Very fastand scalable
![Page 46: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/46.jpg)
ALGORITHM PHASES
28
Phase 1 Phase 2 Phase 3
Very fastand scalable
Removes up to 90%of semi-metric edges
![Page 47: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/47.jpg)
ALGORITHM PHASES
28
Phase 1 Phase 2 Phase 3
Very fastand scalable
Removes up to 90%of semi-metric edges
Moderately fast
![Page 48: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/48.jpg)
ALGORITHM PHASES
28
Phase 1 Phase 2 Phase 3
Very fastand scalable
Removes up to 90%of semi-metric edges
Moderately fast
Labels up to 60%of the unlabeled edges
![Page 49: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/49.jpg)
ALGORITHM PHASES
28
Phase 1 Phase 2 Phase 3
Very fastand scalable
Removes up to 90%of semi-metric edges
Moderately fast
Labels up to 60%of the unlabeled edges
Slow
![Page 50: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/50.jpg)
ALGORITHM PHASES
28
Phase 1 Phase 2 Phase 3
Very fastand scalable
Removes up to 90%of semi-metric edges
Moderately fast
Labels up to 60%of the unlabeled edges
Slow
Labels up to 1-9%of the total edges
![Page 51: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/51.jpg)
ALGORITHM PHASES
28
Phase 1 Phase 2 Phase 3
Very fastand scalable
Removes up to 90%of semi-metric edges
Moderately fast
Labels up to 60%of the unlabeled edges
Slow
Labels up to 1-9%of the total edges
Phase 1 is the fastest and most useful phase
![Page 52: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/52.jpg)
PHASE 1 SCALABILITY
29
![Page 53: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/53.jpg)
PHASE 1 SCALABILITY
29
<200s on a billion-edge graph
![Page 54: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/54.jpg)
PHASE 1 SCALABILITY
29
almost linear scalability
<200s on a billion-edge graph
![Page 55: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/55.jpg)
SEMI-METRICITY IN REAL GRAPHS
30
Graph |V| |E| metric semi-metricity
Facebook 190M 49.9B custom 26.5%Twitter 40M 1.5B jaccard 39%Tuenti 12M 685M jaccard 59%
Livejournal 4.8M 34M jaccard 40%NotreDame 0.3M 1.5M jaccard, adamic 45%-29%
DBLP 318K 1M jaccard, adamic 23%-9%Twitter-ego 81K 1.7M jaccard, adamic 57%-39%Movielens 1.6K 1.9M jaccard 88%
Facebook 1K 143K #messages, message size 78%-77%
US-Airports 0.5K 6K #passengers 72%C-Elegans 0.3K 2.3K #connections 17%
![Page 56: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/56.jpg)
SEMI-METRICITY IN REAL GRAPHS
30
Graph |V| |E| metric semi-metricity
Facebook 190M 49.9B custom 26.5%Twitter 40M 1.5B jaccard 39%Tuenti 12M 685M jaccard 59%
Livejournal 4.8M 34M jaccard 40%NotreDame 0.3M 1.5M jaccard, adamic 45%-29%
DBLP 318K 1M jaccard, adamic 23%-9%Twitter-ego 81K 1.7M jaccard, adamic 57%-39%Movielens 1.6K 1.9M jaccard 88%
Facebook 1K 143K #messages, message size 78%-77%
US-Airports 0.5K 6K #passengers 72%C-Elegans 0.3K 2.3K #connections 17%
% 1st-order semi-metric edges =>
reduction in memory and communication
![Page 57: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/57.jpg)
QUERY SPEEDUP ON NEO4J
31
6.7x speedup
![Page 58: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/58.jpg)
APACHE GIRAPH SPEEDUP
32
Including the time to calculate the backbone
4x speedup
![Page 59: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/59.jpg)
APACHE GIRAPH SPEEDUP
33
6x speedup
![Page 60: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/60.jpg)
COMMUNICATION REDUCTION
34
Up to 70% for highly semi-metric graphs
![Page 61: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/61.jpg)
BEST PRACTICESWhen to use the backbone?
• semi-metric weighting schemes, e.g. neighborhood similarity• we can amortize the overhead: e.g. many algorithms on the same graph,
multiple distance queries• lossy compression is ok
When not to use the backbone?
• for metric weighting schemes• we need to run one-off analysis• we need lossless compression
35
![Page 62: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/62.jpg)
RECAP: MAIN CONTRIBUTIONS
36
• An algorithm for computing the metric backbone without solving APSP
• An open-source distributed implementation• Graph query and graph analytics speedup on
Neo4j and Apache Giraph
![Page 63: The shortest path is not always a straight line](https://reader031.vdocuments.mx/reader031/viewer/2022030305/587138f81a28abf0568b650d/html5/thumbnails/63.jpg)
THE SHORTEST PATH IS NOT ALWAYS A STRAIGHT LINE
leveraging semi-metricity in large-scale graph analysis
Vasiliki Kalavri ([email protected]) KTH Royal Institute of TechnologyTiago Simas ([email protected]) Telefonica Research Dionysios Logothetis ([email protected]) Facebook