elixir : a system for synthesizing concurrent graph programs
DESCRIPTION
Elixir : A System for Synthesizing Concurrent Graph Programs . Dimitrios Prountzos 1 Roman Manevich 2 Keshav Pingali 1. 1. The University of Texas at Austin 2. Ben-Gurion University of the Negev. Goal. Allow programmer to easily implement correct and efficient - PowerPoint PPT PresentationTRANSCRIPT
Elixir : A System for Synthesizing Concurrent Graph Programs
Dimitrios Prountzos1
Roman Manevich2
Keshav Pingali1
1. The University of Texas at Austin2. Ben-Gurion University of the Negev
2
GoalAllow programmer to easily implement
correct and efficientparallel graph algorithms
• Graph algorithms are ubiquitousSocial network analysis, Computer graphics, Machine learning, …
• Difficult to parallelize due to their irregular nature
• Best algorithm and implementation usually
– Platform dependent– Input dependent
• Need to easily experiment with different solutions• Focus: Fixed graph structure
• Only change labels on nodes and edges• Each activity touches a fixed number of nodes
3
• Problem Formulation– Compute shortest distance
from source node S to every other node• Many algorithms
– Bellman-Ford (1957)– Dijkstra (1959)– Chaotic relaxation (Miranker 1969)– Delta-stepping (Meyer et al. 1998)
• Common structure– Each node has label dist
with known shortest distance from S• Key operation
– relax-edge(u,v)
Example: Single-Source Shortest-Path
2 5
1 7
A B
C
D E
F
G
S
34
22
1
9
12
2 A
C
3
if dist(A) + WAC < dist(C) dist(C) = dist(A) + WAC
4
Scheduling of relaxations:• Use priority queue of nodes,
ordered by label dist• Iterate over nodes u in priority
order• On each step: relax all
neighbors v of u – Apply relax-edge to all (u,v)
Dijkstra’s Algorithm2 5
1 7
A B
C
D E
F
G
S
34
22
1
9
7
53
6
<C,3> <B,5><B,5> <E,6> <D,7><B,5>
5
Chaotic Relaxation
• Scheduling of relaxations:• Use unordered set of edges• Iterate over edges (u,v) in any
order• On each step:– Apply relax-edge to edge (u,v)
2 5
1 7
A B
C
D E
F
G
S
34
22
1
9
5
12
(S,A)(B,C)
(C,D)
(C,E)
6
Insights Behind Elixir
What should be done
How it should be done
Unordered/Ordered algorithms
Operator Delta
: activity
Parallel Graph Algorithm
Operators Schedule
Order activity processing
Identify new activities
Static Schedule
Dynamic Schedule
“TAO of parallelism”PLDI 2011
7
Insights Behind ElixirParallel Graph
Algorithm
Operators Schedule
Order activity processing
Identify new activities
Static Schedule
Dynamic Schedule
Dijkstra-style Algorithm
q = new PrQueueq.enqueue(SRC)while (! q.empty ) { a = q.dequeue for each e = (a,b,w) { if dist(a) + w < dist(b) { dist(b) = dist(a) + w q.enqueue(b) } }}
8
Contributions
• Language– Operators/Schedule separation– Allows exploration of
implementation space• Operator Delta Inference
– Precise Delta required for efficient fixpoint computations
• Automatic Parallelization– Inserts synchronization to atomically
execute operators– Avoids data-races / deadlocks– Specializes parallelization based on
scheduling constraints
Parallel Graph Algorithm
Operators Schedule
Order activity processing
Identify new activities
Static Schedule
Dynamic Schedule
Synchronization
9
SSSP in ElixirGraph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]
relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ]
sssp = iterate relax schedule ≫
Graph type
OperatorFixpointStatement
10
OperatorsGraph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]
relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ]
sssp = iterate relax schedule ≫
Redex patternGuardUpdate
ba if bd > ad + w
adw
bd
ba
adw
ad+w
11
Fixpoint StatementGraph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]
relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ]
sssp = iterate relax schedule ≫
Apply operator until fixpoint
Scheduling expression
12
Scheduling ExamplesGraph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]
relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ]
sssp = iterate relax schedule ≫
Locality enhanced Label-correctinggroup b unroll 2 approx metric ad ≫ ≫Dijkstra-style
metric ad group b ≫
q = new PrQueueq.enqueue(SRC)while (! q.empty ) { a = q.dequeue for each e = (a,b,w) { if dist(a) + w < dist(b) { dist(b) = dist(a) + w q.enqueue(b) } }}
13
Operator Delta Inference
Parallel Graph Algorithm
Operators Schedule
Order activity processing
Identify new activities
Static Schedule
Dynamic Schedule
14
Identifying the Delta of an Operator
b
a
relax1
??
15
Delta Inference Example
ba
SMT Solver
SMT Solver
assume (da + w1 < db)
assume ¬(dc + w2 < db)
db_post = da + w1
assert ¬(dc + w2 < db_post)Query Program
relax1
c
w2
w1
relax2
(c,b) does not become active
16
assume (da + w1 < db)
assume ¬(db + w2 < dc)
db_post = da + w1
assert ¬(db_post + w2 < dc)Query Program
Delta Inference Example – Active
SMT Solver
SMT Solver
ba
relax1
cw1
relax2
w2
Apply relax on all outgoing edges (b,c) such that:
dc > db +w2
and c a≄
17
System Architecture
Elixir
Galois/OpenMP Parallel Runtime
Algorithm Spec
Parallel Thread-PoolGraph ImplementationsWorklist Implementations
Synthesize codeInsert synchronization
C++Program
18
ExperimentsExplored Dimensions
Grouping Statically group multiple instances of operator
Unrolling Statically unroll operator applications by factor K
Dynamic Scheduler Choose different policy/implementation for the dynamic worklist
...
Compare against hand-written parallel implementations
19
SSSP Results
1 2 4 8 16 20 240
100
200
300
400
500
600
700
800
Elixir
Lonestar
Threads
Tim
e (m
s)
• 24 core Intel Xeon @ 2 GHz• USA Florida Road Network (1 M nodes, 2.7 M Edges)
Group + Unroll improve localityImplementation Variant
20
Breadth-First Search Results
Scale-Free Graph1 M nodes, 8 M edges
USA road network24 M nodes, 58 M edges
1 2 4 8 16 20 240
100
200
300
400
500
600
700
800
900
1000
Elixir (Variant 1)LonestarCilk
Threads
Tim
e (m
s)
1 2 4 8 16 20 240
1000
2000
3000
4000
5000
6000
7000
Elixir (Variant 2)LonestarCilk
Threads
Tim
e (m
s)
21
Conclusion
• Graph algorithm = Operators + Schedule– Elixir language :
imperative operators + declarative schedule• Allows exploring implementation space• Automated reasoning for efficiently computing
fixpoints• Correct-by-construction parallelization • Performance competitive with hand-parallelized
code
22
Thank You!
23
Backup Slides
24
Related Work
• DSL-Synthesis– SPIRAL [Puchel et al. IEEE’05], Pochoir [Tang et al. SPAA’11],
Green-Marl [Hong et al. ASPLOS’12]• Synthesis from logical specifications– [Itzhaky et al. OOPSLA’10] [Srivastava et al. POPL’10]
Sketching[Lezama et al. PLDI 08], Paraglide [Vechev et al. PLDI’08]
• Term and Graph Rewriting– Progress[Schurr’99], GrGen [Gei’06], GP [Plump’09]
• Finite Differencing [Paige’82]
25
Read paper for…
• Full scheduling language• Parallelizing ordered iterations– Automatic reasoning to enable level-parallel
execution– Specialization of dynamic scheduler
• Synchronization details• Synthesis procedures
26
Influence Patterns
b=cad
ba=c
d
a=dc
b
b=da=c b=ca=d
b=da
c