interfaces for runtime correctness checking of parallel programs · 2020. 8. 11. · generic tool...
TRANSCRIPT
Interfaces for Runtime Correctness Checking of Parallel ProgramsJoachim Protze ([email protected])
Generic Tool Interface for Runtime Correctness Checking
Joachim Protze
2
Motivation
• OpenMP 3 introduced tasks (2008)
• Several data race detection tools for OpenMP tasks popped up just last year
• How can we effectively reduce the porting effort for new programming paradigms?
Memory accesses
Concurrency
Synchronization
Generic Tool Interface for Runtime Correctness Checking
Joachim Protze
3
Synchronization in OpenMP
Parallel region
• Encountering a parallel directive
happens before execution of the
parallel region
• Encountering a barrier directive
happens before execution of code
following the barrier region
• Encountering the implicit barrier
happens before the master
continues code following the
parallel region
parallel-begin
implicit-task-begin
implicit-task-end !
parallel-end
barrier-begin
barrier-end
barrier-begin
Generic Tool Interface for Runtime Correctness Checking
Joachim Protze
4
Synchronization in OpenMP
Task region
• Encountering a task directive
happens before execution of the
task region
• Finishing execution of a child task
happens before execution of code
following a taskwait, barrier, or
taskgroup region
• Finishing a predecessor task
happens before a dependent task
starts execution
• Deferring a task happens before
scheduling the task again
task-create
task-begin
taskwait-end
task-end
task-begin
task-end
+task-dependencies
task depend(out:a)
task depend(in:a)
taskwait
task
task
Generic Tool Interface for Runtime Correctness Checking
Joachim Protze
5
Archer based on ThreadSanitizer
• ThreadSanitizer comes with clang and gcc (-fsanitize=thread)
• Compiler instrumentation of memory accesses− Less overhead than binary instrumentation (e.g., PIN, valgrind)
• ThreadSanitizer is not aware of OpenMP synchronization
• Happens before analysis with simplified fast track algorithm.− 4 records of memory access to a word, storing (epoch,tid,r/w)
• Archer annotates OpenMP synchronization− Initially instrumentation of the LLVM/OpenMP runtime
− Now based on OMPT events
Generic Tool Interface for Runtime Correctness Checking
Joachim Protze
6
Data race analysis overhead for SPEC OMP 2012 (train)
• Expected overhead according to base tool: 2-20x
• 359.botsspar and 370.mgrid331 > 20x − Both run <1 second with high synchronization rate
▪ 359.botsspar: 353400 task switches
▪ 370.mgrid331: 6383 parallel regions
0
10
20
30
40
50
350 351 352 357 358 359 360 362 363 367 370 371 372 376
To
ol
Slo
wd
ow
n
2 Threads
4 Threads
12 Threads
99.8
Generic Tool Interface for Runtime Correctness Checking
Joachim Protze
7
Concurrency for OpenMP Tasksth
read
• Execution of
the thread as
observed by a
tool
• Lamport
happens-
before
• Separating
the logical
slices
Wallclock time
• Observed actual
execution with
HB
Happens-before
Observed execution order
Wallclock timeWallclock timeLogical clock
Generic Tool Interface for Runtime Correctness Checking
Joachim Protze
8
TLC: Marking execution within a thread as concurrentth
read
• Execution of
the thread as
observed by
the tool
• Lamport
happens-
before
• Separating
the logical
slices
• Observed actual
execution with
HB
Happens-before
Observed execution order
Fork / spawn
Wallclock time Wallclock timeWallclock timeLogical clock
Generic Tool Interface for Runtime Correctness Checking
Joachim Protze
9
Generic events
• Fork(curr, *new)− Fork(curr, *new, *msg)
• Join(curr, next)
• Switch(curr, next)− Switch(curr, next, msg)
• Send(curr, *msg)
• Recv(curr, msg)
Generic Tool Interface for Runtime Correctness Checking
Joachim Protze
10
Concurrency / Synchronization in Shared Memory
Parallel, Tasks, Loops
• Fork → P2P synchronization, concurrency
• Join → P2P synchronization
• Barrier → global synchronization− Can translate into N2N synchronization
• Dependencies → P2P synchronization
• Locks → ?− Should be flexible to enable lock-set and HB analysis
• Parallel loop → concurrency for each iteration
• Doacross loops → P2P synchronization
Fork(curr, *new)
Join(curr, next)
Switch(curr, next)
Send(curr, *msg)
Recv(curr, msg)
Generic Tool Interface for Runtime Correctness Checking
Joachim Protze
11
Applying this semantics to MPI
MPI Non-Blocking
• MPI_Isend / MPI_Irecv → concurrency, P2P synchronization− Bind the new execution unit handle to the request
• MPI_Wait → synchronize
• Buffer access → read/write
thre
ad
task
taskMPI_Irecv
MPI_Wait
Generic Tool Interface for Runtime Correctness Checking
Joachim Protze
12
Applying this semantics to MPI
MPI One-sided
• MPI One-sided epochs → concurrency, P2P synchronization
• MPI One-sided target completion → synchronize
• Remote memory access → read/write
Device Offloading
Generic Tool Interface for Runtime Correctness Checking
Joachim Protze
14
Basic memory operations in device offloading
• Memory access
• Alloc/release memory
• (Dis-)Associate memory
• Update memory (memcopy)
OpenMP mapping semantics:
• Alloc alloc + associate
• Map-to ((alloc +) associate +) update to device
• Map-from update from device (+ disassociate (+ release))
• Update-to/from update to/from device
• Release disassociate + release
Challenge: semantics of global/static memory
Distributed Memory ?
Thank you for your attention.