graphit: a dsl for high-performance graph...
TRANSCRIPT
![Page 1: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/1.jpg)
Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman Amarasinghe
GraphIt: A DSL for High-Performance Graph Analytics
�1
![Page 2: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/2.jpg)
void pagerank(Graph &graph, double * new_rank, double * old_rank, int * out_degree, int max_iter){ for (i = 0; i < max_iter; i++) {
for (src : graph.vertices()) { for (dst : graph.getOutgoingNeighbors(node)) {
new_rank[dst] += old_rank[src]/out_degree[src]; } } for (node : graph.vertices()) {
new_rank[node] = base_score + damping*new_rank[node]; } swap (old_rank, new_rank); }
}
PageRank Example in C++
!2
![Page 3: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/3.jpg)
void pagerank(Graph &graph, double * new_rank, double * old_rank, int * out_degree, int max_iter){ for (i = 0; i < max_iter; i++) {
for (src : graph.vertices()) { for (dst : graph.getOutgoingNeighbors(node)) {
new_rank[dst] += old_rank[src]/out_degree[src]; } } for (node : graph.vertices()) {
new_rank[node] = base_score + damping*new_rank[node]; } swap (old_rank, new_rank); }
}
PageRank Example in C++
!3
![Page 4: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/4.jpg)
void pagerank(Graph &graph, double * new_rank, double * old_rank, int * out_degree, int max_iter){ for (i = 0; i < max_iter; i++) {
for (src : graph.vertices()) { for (dst : graph.getOutgoingNeighbors(node)) {
new_rank[dst] += old_rank[src]/out_degree[src]; } } for (node : graph.vertices()) {
new_rank[node] = base_score + damping*new_rank[node]; } swap (old_rank, new_rank); }
}
PageRank Example in C++
!4
![Page 5: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/5.jpg)
!5
Hand-Optimized C++More than 23x faster
Intel Xeon E5-2695 v3 CPUs with 12 cores each for a total of 24 cores.
template<typename APPLY_FUNC> void edgeset_apply_pull_parallel(Graph &g, APPLY_FUNC apply_func) { int64_t numVertices = g.num_nodes(), numEdges = g.num_edges(); parallel_for(int n = 0; n < numVertices; n++) { for (int socketId = 0; socketId < omp_get_num_places(); socketId++) { local_new_rank[socketId][n] = new_rank[n]; } } int numPlaces = omp_get_num_places(); int numSegments = g.getNumSegments("s1"); int segmentsPerSocket = (numSegments + numPlaces - 1) / numPlaces; #pragma omp parallel num_threads(numPlaces) proc_bind(spread){ int socketId = omp_get_place_num(); for (int i = 0; i < segmentsPerSocket; i++) { int segmentId = socketId + i * numPlaces; if (segmentId >= numSegments) break; auto sg = g.getSegmentedGraph(std::string("s1"), segmentId); #pragma omp parallel num_threads(omp_get_place_num_procs(socketId)) proc_bind(close){ #pragma omp for schedule(dynamic, 1024) for (NodeID localId = 0; localId < sg->numVertices; localId++) { NodeID d = sg->graphId[localId]; for (int64_t ngh = sg->vertexArray[localId]; ngh < sg->vertexArray[localId + 1]; ngh++) { NodeID s = sg->edgeArray[ngh]; local_new_rank[socketId][d] += contrib[s]; }}}}} parallel_for(int n = 0; n < numVertices; n++) { for (int socketId = 0; socketId < omp_get_num_places(); socketId++) { new_rank[n] += local_new_rank[socketId][n]; }}} struct updateVertex { void operator() (NodeID v) { double old_score = old_rank[v]; new_rank[v] = (beta_score + (damp * new_rank[v])); error[v] = fabs((new_rank[v] - old_rank[v])) ; old_rank[v] = new_rank[v]; new_rank[v] = ((float) 0) ; }; }; void pagerank(Graph &g, double *new_rank, double *old_rank, int *out_degree, int max_iter) { for (int i = (0); i < (max_iter); i++) { parallel_for(int v_iter = 0; v_iter < builtin_getVertices(edges); v_iter ++) { contrib[v] = (old_rank[v] / out_degree[v]);}; edgeset_apply_pull_parallel(edges, updateEdge()); parallel_for(int v_iter = 0; v_iter < builtin_getVertices(edges); v_iter ++) { updateVertex()(v_iter); }; }
![Page 6: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/6.jpg)
!6
Hand-Optimized C++More than 23x faster
Intel Xeon E5-2695 v3 CPUs with 12 cores each for a total of 24 cores.
NUMA Optimized
Cache Optimized
Multi-Threaded
Load Balanced
template<typename APPLY_FUNC> void edgeset_apply_pull_parallel(Graph &g, APPLY_FUNC apply_func) { int64_t numVertices = g.num_nodes(), numEdges = g.num_edges(); parallel_for(int n = 0; n < numVertices; n++) { for (int socketId = 0; socketId < omp_get_num_places(); socketId++) { local_new_rank[socketId][n] = new_rank[n]; } } int numPlaces = omp_get_num_places(); int numSegments = g.getNumSegments("s1"); int segmentsPerSocket = (numSegments + numPlaces - 1) / numPlaces; #pragma omp parallel num_threads(numPlaces) proc_bind(spread){ int socketId = omp_get_place_num(); for (int i = 0; i < segmentsPerSocket; i++) { int segmentId = socketId + i * numPlaces; if (segmentId >= numSegments) break; auto sg = g.getSegmentedGraph(std::string("s1"), segmentId); #pragma omp parallel num_threads(omp_get_place_num_procs(socketId)) proc_bind(close){ #pragma omp for schedule(dynamic, 1024) for (NodeID localId = 0; localId < sg->numVertices; localId++) { NodeID d = sg->graphId[localId]; for (int64_t ngh = sg->vertexArray[localId]; ngh < sg->vertexArray[localId + 1]; ngh++) { NodeID s = sg->edgeArray[ngh]; local_new_rank[socketId][d] += contrib[s]; }}}}} parallel_for(int n = 0; n < numVertices; n++) { for (int socketId = 0; socketId < omp_get_num_places(); socketId++) { new_rank[n] += local_new_rank[socketId][n]; }}} struct updateVertex { void operator() (NodeID v) { double old_score = old_rank[v]; new_rank[v] = (beta_score + (damp * new_rank[v])); error[v] = fabs((new_rank[v] - old_rank[v])) ; old_rank[v] = new_rank[v]; new_rank[v] = ((float) 0) ; }; }; void pagerank(Graph &g, double *new_rank, double *old_rank, int *out_degree, int max_iter) { for (int i = (0); i < (max_iter); i++) { parallel_for(int v_iter = 0; v_iter < builtin_getVertices(edges); v_iter ++) { contrib[v] = (old_rank[v] / out_degree[v]);}; edgeset_apply_pull_parallel(edges, updateEdge()); parallel_for(int v_iter = 0; v_iter < builtin_getVertices(edges); v_iter ++) { updateVertex()(v_iter); }; }
![Page 7: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/7.jpg)
!7
Hand-Optimized C++More than 23x faster
Intel Xeon E5-2695 v3 CPUs with 12 cores each for a total of 24 cores.
NUMA Optimized
Cache Optimized
Multi-Threaded
Load Balanced
(1) Hard to write correctly(2) Extremely difficult to experiment
with different combinations of optimizations
template<typename APPLY_FUNC> void edgeset_apply_pull_parallel(Graph &g, APPLY_FUNC apply_func) { int64_t numVertices = g.num_nodes(), numEdges = g.num_edges(); parallel_for(int n = 0; n < numVertices; n++) { for (int socketId = 0; socketId < omp_get_num_places(); socketId++) { local_new_rank[socketId][n] = new_rank[n]; } } int numPlaces = omp_get_num_places(); int numSegments = g.getNumSegments("s1"); int segmentsPerSocket = (numSegments + numPlaces - 1) / numPlaces; #pragma omp parallel num_threads(numPlaces) proc_bind(spread){ int socketId = omp_get_place_num(); for (int i = 0; i < segmentsPerSocket; i++) { int segmentId = socketId + i * numPlaces; if (segmentId >= numSegments) break; auto sg = g.getSegmentedGraph(std::string("s1"), segmentId); #pragma omp parallel num_threads(omp_get_place_num_procs(socketId)) proc_bind(close){ #pragma omp for schedule(dynamic, 1024) for (NodeID localId = 0; localId < sg->numVertices; localId++) { NodeID d = sg->graphId[localId]; for (int64_t ngh = sg->vertexArray[localId]; ngh < sg->vertexArray[localId + 1]; ngh++) { NodeID s = sg->edgeArray[ngh]; local_new_rank[socketId][d] += contrib[s]; }}}}} parallel_for(int n = 0; n < numVertices; n++) { for (int socketId = 0; socketId < omp_get_num_places(); socketId++) { new_rank[n] += local_new_rank[socketId][n]; }}} struct updateVertex { void operator() (NodeID v) { double old_score = old_rank[v]; new_rank[v] = (beta_score + (damp * new_rank[v])); error[v] = fabs((new_rank[v] - old_rank[v])) ; old_rank[v] = new_rank[v]; new_rank[v] = ((float) 0) ; }; }; void pagerank(Graph &g, double *new_rank, double *old_rank, int *out_degree, int max_iter) { for (int i = (0); i < (max_iter); i++) { parallel_for(int v_iter = 0; v_iter < builtin_getVertices(edges); v_iter ++) { contrib[v] = (old_rank[v] / out_degree[v]);}; edgeset_apply_pull_parallel(edges, updateEdge()); parallel_for(int v_iter = 0; v_iter < builtin_getVertices(edges); v_iter ++) { updateVertex()(v_iter); }; }
![Page 8: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/8.jpg)
!8
Locality
Parallelism Work-Efficiency
PushPullPartitioningVertex-Parallel
Bitvector….
Optimization Tradeoff Space
Edge-aware Vertex-parallel
![Page 9: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/9.jpg)
!9
Optimizations
![Page 10: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/10.jpg)
!10
Graphs
Optimizations
![Page 11: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/11.jpg)
!11
Graphs
Algorithms
Optimizations
![Page 12: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/12.jpg)
!12
Hardware
Algorithms
Graphs
Optimizations
![Page 13: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/13.jpg)
!13
Hardware
Algorithms
Graphs
OptimizationsBad sets of optimizations can be > 100x
slower
![Page 14: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/14.jpg)
GraphIt
!14
A Domain-Specific Language for Graph Analytics
• Decouple algorithm from optimization for graph programs
• Algorithm: What to Compute
• Optimization (schedule): How to Compute
• Optimization (schedule) representation
• Easy to use for users to try different combinations
• Powerful enough to beat hand-hand-optimized libraries by up to 4.8x
![Page 15: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/15.jpg)
GraphIt
!15
A Domain-Specific Language for Graph Analytics
• Decouple algorithm from optimization for graph programs
• Algorithm: What to Compute
• Optimization (schedule): How to Compute
• Optimization (schedule) representation
• Easy to use for users to try different combinations
• Powerful enough to beat hand-optimized libraries by up to 4.8x
![Page 16: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/16.jpg)
GraphIt DSL
!16
Algorithm Representation (Algorithm Language)
Optimization Representation Autotuner
• Scheduling Language• Schedule Representation
(e.g. Graph Iteration Space)
![Page 17: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/17.jpg)
GraphIt DSL
!17
Algorithm Representation (Algorithm Language)
Optimization Representation Autotuner
• Scheduling Language• Schedule Representation
(e.g. Graph Iteration Space)
![Page 18: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/18.jpg)
Algorithm Language
!18
![Page 19: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/19.jpg)
Algorithm Language
!19
edges.apply(func)
![Page 20: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/20.jpg)
Algorithm Language
!20
edges.apply(func) edges.from(vertexset) .to(vertexset)
.srcFilter(filtF) .dstFilter(filtF) .apply(func)
![Page 21: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/21.jpg)
Algorithm Language
!21
edges.apply(func) edges.from(vertexset) .to(vertexset)
.srcFilter(filtF) .dstFilter(filtF) .apply(func)
vertices.apply(func)
![Page 22: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/22.jpg)
PageRank Example
!22
func updateEdge (src: Vertex, dst: Vertex) new_rank[dst] += old_rank[src] / out_degree[src] end
![Page 23: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/23.jpg)
PageRank Example
!23
func updateEdge (src: Vertex, dst: Vertex) new_rank[dst] += old_rank[src] / out_degree[src] end
func updateVertex (v: Vertex) new_rank[v] = beta_score + 0.85*new_rank[v]; old_rank[v] = new_rank[v];
new_rank[v] = 0; end
![Page 24: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/24.jpg)
PageRank Example
!24
func updateEdge (src: Vertex, dst: Vertex) new_rank[dst] += old_rank[src] / out_degree[src] end
func main() for i in 1:max_iter
#s1# edges.apply(updateEdge); vertices.apply(updateVertex); end
end
func updateVertex (v: Vertex) new_rank[v] = beta_score + 0.85*new_rank[v]; old_rank[v] = new_rank[v];
new_rank[v] = 0; end
![Page 25: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/25.jpg)
PageRank Example
!25
func updateEdge (src: Vertex, dst: Vertex) new_rank[dst] += old_rank[src] / out_degree[src] end
func main() for i in 1:max_iter
#s1# edges.apply(updateEdge); vertices.apply(updateVertex); end
end
func updateVertex (v: Vertex) new_rank[v] = beta_score + 0.85*new_rank[v]; old_rank[v] = new_rank[v];
new_rank[v] = 0; end
![Page 26: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/26.jpg)
PageRank Example
!26
func updateEdge (src: Vertex, dst: Vertex) new_rank[dst] += old_rank[src] / out_degree[src] end
func main() for i in 1:max_iter
#s1# edges.apply(updateEdge); vertices.apply(updateVertex); end
end
func updateVertex (v: Vertex) new_rank[v] = beta_score + 0.85*new_rank[v]; old_rank[v] = new_rank[v];
new_rank[v] = 0; end
![Page 27: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/27.jpg)
GraphIt DSL
!27
Algorithm Representation (Algorithm Language)
Optimization Representation Autotuner
• Scheduling Language• Schedule Representation
(e.g. Graph Iteration Space)
![Page 28: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/28.jpg)
GraphIt DSL
!28
Algorithm Representation (Algorithm Language)
Optimization Representation Autotuner
• Scheduling Language• Schedule Representation
(e.g. Graph Iteration Space)
![Page 29: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/29.jpg)
Scheduling Language
!29
func updateEdge (src: Vertex, dst: Vertex) new_rank[dst] += old_rank[src] / out_degree[src] end
func main() for i in 1:max_iter
#s1# edges.apply(updateEdge); vertices.apply(updateVertex); end
end
func updateVertex (v: Vertex) new_rank[v] = beta_score + 0.85*new_rank[v]; old_rank[v] = new_rank[v];
new_rank[v] = 0; end
![Page 30: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/30.jpg)
Scheduling Language
!30
Algorithm Specification func updateEdge (src: Vertex, dst: Vertex) new_rank[dst] += old_rank[src] / out_degree[src] end
func main() for i in 1:max_iter
#s1# edges.apply(updateEdge); vertices.apply(updateVertex); end
end
func updateVertex (v: Vertex) new_rank[v] = beta_score + 0.85*new_rank[v]; old_rank[v] = new_rank[v];
new_rank[v] = 0; end
![Page 31: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/31.jpg)
Scheduling Functions
Schedule 1
!31
schedule: program->configApplyDirection(“s1”, “SparsePush”);
Algorithm Specification func updateEdge (src: Vertex, dst: Vertex) new_rank[dst] += old_rank[src] / out_degree[src] end
func main() for i in 1:max_iter
#s1# edges.apply(updateEdge); vertices.apply(updateVertex); end
end
func updateVertex (v: Vertex) new_rank[v] = beta_score + 0.85*new_rank[v]; old_rank[v] = new_rank[v];
new_rank[v] = 0; end
![Page 32: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/32.jpg)
Algorithm Specification func updateEdge (src: Vertex, dst: Vertex) new_rank[dst] += old_rank[src] / out_degree[src] end
func main() for i in 1:max_iter
#s1# edges.apply(updateEdge); vertices.apply(updateVertex); end
end
func updateVertex (v: Vertex) new_rank[v] = beta_score + 0.85*new_rank[v]; old_rank[v] = new_rank[v];
new_rank[v] = 0; end
Scheduling Functions
Schedule 1
!32
schedule: program->configApplyDirection(“s1”, “SparsePush”);
![Page 33: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/33.jpg)
Algorithm Specification func updateEdge (src: Vertex, dst: Vertex) new_rank[dst] += old_rank[src] / out_degree[src] end
func main() for i in 1:max_iter
#s1# edges.apply(updateEdge); vertices.apply(updateVertex); end
end
func updateVertex (v: Vertex) new_rank[v] = beta_score + 0.85*new_rank[v]; old_rank[v] = new_rank[v];
new_rank[v] = 0; end
Scheduling Functions
Schedule 1
!33
schedule: program->configApplyDirection(“s1”, “SparsePush”);
Pseudo Generated Codedouble * new_rank = new double[num_verts];double * old_rank = new double[num_verts];int * out_degree = new int[num_verts];
…
for (NodeID src : vertices) { for(NodeID dst : G.getOutNgh(src)){ new_rank[dst] += old_rank[src] / out_degree[src]; } }
….
![Page 34: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/34.jpg)
Pseudo Generated Code
Scheduling Functions
Schedule 2
!34
schedule: program->configApplyDirection(“s1”, “SparsePush”); program->configApplyParallelization(“s1”, “dynamic-vertex-parallel”);
double * new_rank = new double[num_verts];double * old_rank = new double[num_verts];int * out_degree = new int[num_verts];
…
parallel_for (NodeID src : vertices) { for(NodeID dst : G.getOutNgh(src)){ atomic_add (new_rank[dst], old_rank[src] / out_degree[src] ); } }
….
Algorithm Specification func updateEdge (src: Vertex, dst: Vertex) new_rank[dst] += old_rank[src] / out_degree[src] end
func main() for i in 1:max_iter
#s1# edges.apply(updateEdge); vertices.apply(updateVertex); end
end
func updateVertex (v: Vertex) new_rank[v] = beta_score + 0.85*new_rank[v]; old_rank[v] = new_rank[v];
new_rank[v] = 0; end
![Page 35: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/35.jpg)
Pseudo Generated Codedouble * new_rank = new double[num_verts];double * old_rank = new double[num_verts];int * out_degree = new int[num_verts];
…
parallel_for (NodeID dst : vertices) { for(NodeID src : G.getInNgh(dst)){ new_rank[dst] += old_rank[src] / out_degree[src]; } }
….
Scheduling Functions
Schedule 3
!35
schedule: program->configApplyDirection(“s1”, “DensePull”); program->configApplyParallelization(“s1”, “dynamic-vertex-parallel”);
Algorithm Specification func updateEdge (src: Vertex, dst: Vertex) new_rank[dst] += old_rank[src] / out_degree[src] end
func main() for i in 1:max_iter
#s1# edges.apply(updateEdge); vertices.apply(updateVertex); end
end
func updateVertex (v: Vertex) new_rank[v] = beta_score + 0.85*new_rank[v]; old_rank[v] = new_rank[v];
new_rank[v] = 0; end
![Page 36: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/36.jpg)
Pseudo Generated Codedouble * new_rank = new double[num_verts];double * old_rank = new double[num_verts];int * out_degree = new int[num_verts];
…for (Subgraph sg : G.subgraphs) { parallel_for (NodeID dst : verticesa) { for(NodeID src : G.getInNgh(dst)){ new_rank[dst] += old_rank[src] / out_degree[src]; } } }….
Scheduling Functions
Schedule 4
!36
schedule: program->configApplyDirection(“s1”, “DensePull”); program->configApplyParallelization(“s1”, “dynamic-vertex-parallel”); program->configApplyNumSSG(“s1”, “fixed-vertex-count”, 10);
Algorithm Specification func updateEdge (src: Vertex, dst: Vertex) new_rank[dst] += old_rank[src] / out_degree[src] end
func main() for i in 1:max_iter
#s1# edges.apply(updateEdge); vertices.apply(updateVertex); end
end
func updateVertex (v: Vertex) new_rank[v] = beta_score + 0.85*new_rank[v]; old_rank[v] = new_rank[v];
new_rank[v] = 0; end
![Page 37: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/37.jpg)
Speedups of Schedules
!37
Spee
dups
0
6.25
12.5
18.75
25
Schedule1 Schedule2 Schedule3 Schedule4
Twitter graph with 41M vertices and 1.47B edges Intel Xeon E5-2695 v3 CPUs with 12 cores each for a total of 24 cores
![Page 38: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/38.jpg)
!38
• Direction optimizations (configApplyDirection),
• SparsePush, DensePush, DensePull, DensePull-
SparsePush, DensePush-SparsePush
• Parallelization strategies (configApplyParallelization)
• serial, dynamic-vertex-parallel, static-vertex-parallel,
edge-aware-dynamic-vertex-parallel, edge-parallel
• Cache (configApplyNumSSG)
• fixed-vertex-count, edge-aware-vertex-count
• NUMA (configApplyNUMA)
• serial, static-parallel, dynamic-parallel
• AoS, SoA (fuseFields)
• Vertexset data layout (configApplyDenseVertexSet)
• bitvector, boolean array
Many More Optimizations
![Page 39: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/39.jpg)
GraphIt DSL
!39
Algorithm Representation (Algorithm Language)
Optimization Representation Autotuner
• Scheduling Language• Schedule Representation
(e.g. Graph Iteration Space)
![Page 40: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/40.jpg)
GraphIt DSL
!40
Algorithm Representation
(Algorithm Language)
Optimization Representation Autotuner
• Scheduling Language• Schedule Representation
(e.g. Graph Iteration Space)
![Page 41: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/41.jpg)
State of the Art and GraphIt
!41
Intel Xeon E5-2695 v3 CPUs with 12 cores each for a total of 24 cores
(PPoPP13)
(SOSP13) (PPoPP18)
(ASPLOS12)
(OSDI16)
(VLDB15)
(OOPSLA18)
![Page 42: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/42.jpg)
!42
State of the Art and GraphIt
Intel Xeon E5-2695 v3 CPUs with 12 cores each for a total of 24 cores
(PPoPP13)
(SOSP13) (PPoPP18)
(ASPLOS12)
(OSDI16)
(VLDB15)
(OOPSLA18)
![Page 43: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/43.jpg)
!43
State of the Art and GraphIt
Intel Xeon E5-2695 v3 CPUs with 12 cores each for a total of 24 cores
(PPoPP13)
(SOSP13) (PPoPP18)
(ASPLOS12)
(OSDI16)
(VLDB15)
(OOPSLA18)
![Page 44: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/44.jpg)
!44
Ease-of-UseReduces the lines of code
by up to an order of magnitude compare to the
next fastest framework
Intel Xeon E5-2695 v3 CPUs with 12 cores each for a total of 24 cores
(PPoPP13)
(SOSP13) (PPoPP18)
(ASPLOS12)
(OSDI16)
(VLDB15)
(OOPSLA18)
![Page 45: GraphIt: A DSL for High-Performance Graph Analyticsgroups.csail.mit.edu/commit/2019tensorworkshop/slides/shun.pdf · Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec80931e2976d52e90010f2/html5/thumbnails/45.jpg)
Summary• Performance of graph programs depends highly on data,
algorithm, and hardware
• GraphIt decouples algorithm from optimization to achieve high-performance and programmability
• Results are for multicore so far, but we are working on a GPU backend
• Open source (graphit-lang.org)
!45