efficient dynamic verification algorithms for mpi applications
DESCRIPTION
EFFICIENT DYNAMIC VERIFICATION ALGORITHMS FOR MPI APPLICATIONS. Dissertation Defense Sarvani Vakkalanka. Committee: Prof. Ganesh Gopalakrishnan (advisor), Prof. Mike Kirby (co-advisor), Prof. Suresh Venkatasubramanian , Prof. Matthew Might, Prof. Stephen Siegel (Univ. of Delaware). - PowerPoint PPT PresentationTRANSCRIPT
1
EFFICIENT DYNAMIC VERIFICATION ALGORITHMS FOR MPI APPLICATIONS
Dissertation DefenseSarvani Vakkalanka
Committee: Prof. Ganesh Gopalakrishnan (advisor), Prof. Mike Kirby (co-
advisor), Prof. Suresh Venkatasubramanian, Prof. Matthew Might, Prof. Stephen Siegel (Univ. of Delaware)
2
Necessity for Verification
• Software testing is ad-hoc.• Software Errors expensive - $59.5
Billion/yr (2001 NTSI Study). • Software written today is complex
and uses many existing libraries.• Our focus – contribute to– Parallel scientific software written
using MPI
3
Motivation
• Concurrent software debugging is hard!• Very little formal support for Message Passing
concurrency.• Active testing (schedule enforcement) is
important.• Reducing redundant (equivalent) verification runs
is crucial.• Verification for portability – another important
requirement.
4
Approaches to Verification
Testing methods suffer from bug omissions.
Static analysis based methods generate many false alarms. Model based verification is tedious.
Dynamic verification – no false alarms
5
Contributions
• New dynamic verification algorithms for MPI.• New Happens-Before models for Message
Passing concurrency.• Verification to handle resource dependency.• MPI dynamic verification tool ISP that handles
non-trivial codes for safety properties.
6
Agenda
• Intro to Dynamic Verification• Intro to MPI– Four MPI Operations (S, R, W, B).– MPI Ordering Guarantees.– Applying DPOR to MPI
• Dynamic verification algorithms avoiding redundant searches and handling resource dependencies
• Formal MPI Transition System• Experimental Results• Conclusions
7
EFFICIENT DYNAMIC VERIFICATION
Code written using mature libraries (MPI, OpenMP, PThreads, …)
API calls made from real programming languages
(C, Fortran, C++)
Runtime semantics determined by realistic compilers and runtimes
Dynamic Verificationabstracts verification details.
(static analysis and model based verification canplay importantsupportive roles)
8
Growing Importance of Dynamic Verification
Exponential number of TOTAL Interleavings – most are EQUIVALENT – generate only RELEVANT ones !!
9
P0 P1 P2 P3 P4
TOTAL > 10 Billion Interleavings !!
a++
b-- g=2 g=3
Dynamic Partial Order Reduction
10
P0 P1 P2 P3 P4
Dependent actionsOnly these 2 are
RELEVANT!!!
TOTAL > 10 Billion Interleavings !!
All other actionsare pairwise independent
g=2 g=3
DPOR
• A state σ consists of the following sets:– enabled(σ)– backtrack(σ) : sufficient subset of enabled(σ)– enabled(σ) = backtrack(σ) , then the full state space is
explored.• Co-enabledness of transitions• Dependence among transitions
11
σ
Co-enabledness & Dependence
12
t1
t1
t2
t2
{t2}{t1}
{t1, t2}
DPOR Concepts
• DPOR requires the identification of dependence and co-enabledness among transitions
• Identifying dependence is simple– Two lock accesses on the same mutex.– Two writes to the same global variable.– Similar concepts for MPI.
• Identifying co-enabledness is difficult (like will happen in parallel).
13
14
P1 P2
lock(l)x = 1x = 2unlock(l)
lock(l)y = 1x = 2unlock(l)
Illustration of DPOR Concepts
15
P1 P2
lock(l)x = 1x = 2unlock(l)
lock(l)y = 1x = 2unlock(l)
Illustration of DPOR Concepts
16
Thread Verification vs MPI Verification
• Thread verification – well studied! .– Well known dynamic verification tools on thread
verification [CHESS, INSPECT].– Thread verification follows traditional dynamic partial
order reduction. DPOR does not extend directly for MPI
• MPI Verification – not so! – requires a formal definition.– out-of-order completion semantics.– Must define dependence
INTRODUCTION TO MPI
17
18
IBM Blue Gene(Picture Courtesy IBM)
LANL’s Petascale machine“Roadrunner”(AMD Opteron CPUs and IBM PowerX Cell)
• The choice for ALL large-scale parallel simulations (earthquake, weather..)• Runs “everywhere”.• Very mature codes exist in MPI – tens of person years.• Performs critical simulations in science and engineering.
The Ubiquity of MPI
19
Overview of Message Passing Interface (MPI) API
• One of the major Standardization Successes.
• Lingua franca of Parallel Computing
• Runs on parallel machines of a WIDE range of sizes
• Standard is published at www.mpi-forum.org
• MPI 2.0 includes over 300 functions
20
MPI Execution Environment
• MPI execution environment consists of two main components:– MPI processes.– The MPI runtime daemon.
• All processes statically created.• Process rank between 0 and n-1.• The MPI processes issue instructions into MPI runtime.• The MPI runtime implements and executes the MPI
library.
21
MPI Execution Contd…
• Every process starts execution with MPI_Init(int argc, char **argv);
• MPI_Finalize – at the end
22
• Abbreviated as S
MPI_Isend (void *buff, …, int dest, int tag, MPI_Comm comm, MPI_Request handle);
23
• Abbreviated as R
MPI_Irecv (void *buff, …, int src, int tag, MPI_Comm comm, MPI_Request *handle);
24
• Abbreviated as W
MPI_Wait (MPI_Request *handle, MPI_Status *status);
25
• Abbreviated as B.• All processes must invoke B before any can get
past.
MPI_Barrier (MPI_Comm comm);
26
MPI Ordering Guarantees
27
MPI Ordering Guarantees
28
MPI Ordering Guarantee
29
Applying DPOR to MPI
Programs like this – almost impossible to test on real platforms.
30
Why DPOR does not work!
31
Modifying Runtime Doesn’t Help!
• Assume that the MPI runtime is modified to support verification
• The sends are matched with receives in the order they are issued to the MPI runtime
• Is this sufficient?
32
Crooked Barrier ExampleP0 P1 P2
Isend(1, req)
BarrierBarrier
Irecv(*, req) Barrier
Wait(req) Irecv(2, req1)
Wait(req1)
Wait(req)
Isend(1, req)
Wait(req)
Verification Support does not work!
33
Our Main Algorithms
• Partial Order avoiding Elusive Interleavings (POE).
• POEOPT : Reduced interleavings even further.
• POEMSE: Handle resource dependencies.
34
Illustration of POE P0 P1 P2
Barrier
Isend(1, req)
Wait(req)
MPI Runtime
Scheduler
Irecv(*, req)
Barrier
Recv(2)
Wait(req)
Isend(1, req)
Wait(req)
Barrier
Isend(1)
sendNext Barrier
35
P0 P1 P2
Barrier
Isend(1, req)
Wait(req)
MPI Runtime
Scheduler
Irecv(*, req)
Barrier
Recv(2)
Wait(req)
Isend(1, req)
Wait(req)
Barrier
Isend(1)
sendNextBarrier
Irecv(*)
Barrier
Illustration of POE
36
P0 P1 P2
Barrier
Isend(1, req)
Wait(req)
MPI Runtime
Scheduler
Irecv(*, req)
Barrier
Recv(2)
Wait(req)
Isend(1, req)
Wait(req)
Barrier
Isend(1)
Barrier
Irecv(*)
Barrier
Barrier
Barrier
Barrier
Barrier
Illustration of ISP’s Verification Algorithm
37
P0 P1 P2
Barrier
Isend(1, req)
Wait(req)
MPI Runtime
Scheduler
Irecv(*, req)
Barrier
Recv(2)
Wait(req)
Isend(1, req)
Wait(req)
Barrier
Isend(1)
Barrier
Irecv(*)
Barrier
Barrier
Wait (req)
Recv(2)
Isend(1)
SendNext
Wait (req)
Irecv(2)Isend
Wait
No Matching
Send
Deadlock!
Illustration of POE
IntraCB
38
Notations
• MPI_Isend : Si,j (k), where – i is the process issuing the send,– j is the dynamic execution count of S in process i and – k is the destination process rank where the message is to be
sent
• MPI_Irecv: Ri,j(k)– k is the source
• MPI_Barrier: Bi,j
• MPI_Wait : Wi,j’(hi,j)– hi,j is the request handle of Si,j (k) or Ri,j(k)
POE Issue: Redundancy
39
POE explores both the match-sets resulting in 2 interlevings while just 1 interleaving is sufficient.
SOLUTION : Explore only match-sets for single wildcard receive.
DOES NOT WORK! BREAKS PERSISTENCE.
POE and Persistent Sets
Add only this match-set to bactrack
40
Maintaining Persistent backtrack sets is important.
Otherwise, verification algorithm is broken
POE Issue: Buffering DeadlocksWhen no sends are buffered
Deadlock!
41
POE Issue: Redundancy
Simple Optimization: If there is no more sends targeting a wildcard receive, thenadd only of of the match-sets to backtrack set.
42
Redundancy : POEOPT
P0 P1 P2 P3
W0,2(h0,1)
S0,1(1) R1,1(*)
W1,2(h1,1)
S1,3(3)
W0,4(h1,3)
R1,5(*)
W1,6(h1,5)
W2,2(h2,1)
R2,1(1) S3,1(*)
W3,2(h3,1)
R3,3(1)
W3,4(h3,3)
S3,5(1)
W3,6(h3,5) 43
Detecting Matching
• Exploring all non-deterministic matchings in a state is not a solution
• The IntraHB relation is not sufficient to detect matchings across processes
• We introduce the notion of Inter-HB
44
InterHB Relation
45
Redundancy : POEOPT
P0 P1 P2 P3
W0,2(h0,1)
S0,1(1) R1,1(*)
W1,2(h1,1)
S1,3(3)
W0,4(h1,3)
R1,5(*)
W1,6(h1,5)
W2,2(h2,1)
R2,1(*) S3,1(2)
W3,2(h3,1)
R3,3(1)
W3,4(h3,3)
S3,5(1)
W3,6(h3,5) 46
Redundancy : POEOPT
W0,2(h0,1)
S0,1(1) R1,1(*)
W1,2(h1,1)
R1,3(3)
W1,4(h1,3)
W2,2(h2,1)
R2,1(*) S3,1(2)
W3,2(h3,1)
S3,3(1)
W3,4(h3,3)
P0 P1 P2 P3 P4 P5
R4,1(*)
W4,2(h4,1)
S5,1(1)
W5,2(h5,1)
NO PATH47
Slack/Buffering Deadlocks
Deadlocks only when S0,1 or S1,1 or both are buffered
48
Buffer All Sends ???ZERO SLACK
49
Buffer All Sends ???ZERO SLACK
50
Buffer All Sends ???INF SLACK
51
Buffer All Sends ???INF SLACK
52
Buffer All Sends ???ONLY S0,0
Deadlock!
53
Slack/Buffering : POEMSE
P0 P1 P2
W0,2(h0,1)
S0,1(1) S1,1(2)
W1,2(h1,1)
R1,3(0)
W1,4(h1,3)
R2,3(0)
W2,4(h2,3)
W2,2(h2,1)
R2,1(*)
S0,3(2)
W0,4(h0,3)
54
Slack/Buffering : POEMSE
P0 P1 P2
W0,2(h0,1)
S0,1(1) S1,1(2)
W1,2(h1,1)
R1,3(0)
W1,4(h1,3)
R2,3(0)
W2,4(h2,3)
W2,2(h2,1)
R2,1(*)
S0,3(2)
W0,4(h0,3)
No-op
55
POEMSE
• Finds all paths between a wildcard receive and a matching send.
• If there is a path without a culprit wait in it, then does nothing.
• If every path contains at least one culprit wait, then the algorithm finds all ways to break paths by trying to select exactly one wait in every path.– We call this as finding minimal wait sets.– NP-Complete problem (Proved by reduction from 1-in-3 SAT).– Finding all minimal wait sets in #P-Complete.
56
Minimal Wait Sets
• Find the power-set of all the culprit waits.• Sort the Power-set by size in non-decreasing order.• For each subset if it breaks all paths– Delete all it supersets from the power-set.
• If it does not break all the paths– Delete it from the power-set
57
Slack/Buffering : POEMSE
P0 P1 P2
W0,2(h0,1)
S0,1(1) S1,1(2)
W1,2(h1,1)
R1,3(0)
W1,4(h1,3)
R2,3(0)
W2,4(h2,3)
W2,2(h2,1)
R2,1(*)
S0,3(2)
W0,4(h0,3)
58
59
MPI TRANSITION SYSTEM
MPI State and Transitions
• An MPI function is in one or more of the following states:– Issued (I)– Matched (M)– Complete (C)– Returned (R)
• A global state is <I,M,C,R, pc> – initial state is
• Two kinds of transitions:– Process transitions– MPI runtime transitions
60
Process Transitions
61
MPI Runtime Book-keeping Sets
No Ancestor of x is in Ready set
All ancestors are matched All ancestors are Complete62
63
IntraHB - Crooked Barrier ExampleP0 P1 P2
Isend(1, req)
BarrierBarrier
Irecv(*, req) Barrier
Wait(req) Irecv(2, req1)
Wait(req1)
Wait(req)
Isend(1, req)
Wait(req)
MPI Runtime Transitions
Zero Buffering
64
MPI Runtime Transitions Contd..
65
MPI Runtime Transition
Conditional Happens-Before
Dynamic source re-write
66
Simple MPI Example
67
68
69
70
Dependence and Independence Properties
MPI Dependence
71
RESULTS – Real Benchmarks• Game of Life
• EuroPVM / MPI 2007 versions of Gropp and Lusk done in seconds• MADRE – Siegels’ Mem Aware Data Redistrib. Engine
• Found previously documented deadlock• ParMETIS – Hypergraph Partitioner
• Initial run of days reduced now to seconds on a laptop• MPI-BLAST – Genome sequencer using BLAST
• Runs to completion on small instances• A few MPI Spec Benchmarks
• Some benchmarks exhibit interleaving explosion; others OK• ADLB
• Initial experiments have been successful
72
Results
• Resource Leak caught in Parmetis• ISP used in the development cycle of A* algorithm:– Found 3 deadlocks during various implementation phases.– All deadlocks were unintentional (not seeded)
73
Results Contd…Umpire Program POE Marmot
any_src-can-deadlock7.c Deadlock detected 2 interleavings
Deadlock Caught in5/10 runs
any_src-can-deadlock10.c Deadlock Detected1 interleaving
Deadlock caught7/10 runs
basic-deadlock10.c Deadlock detected in1 interleaving
Deadlock caught in10/10 runs
74
75
POEOPT
76
POEMSE
77
How well did we do?• Verisoft Project
– Used for telephone switch software verification in Bell Labs– Available
• The Java Pathfinder Project– Developed at NASA for Java Control Software– On SourceForge
• The CHESS Project– Microsoft Research; available for academic institutions– In use within Microsoft product groups and used by academics
• Inspect : UV group’s unique Pthread verifier.– Available for download
• ISP : dynamic verification tool for MPI– Implements the dynamic verification algorithms from this dissertations.– Available for download with the PTP (Parallel Tools Platform).
CONCLUSIONS
• First efficient and practical dynamic reduction based algorithms for real MPI programs.
• Verification for portability with respect to buffering.• First Happens-Before model for MPI.• ISP scheduler directly based on our theory.• ISP+GEM released and demoed widely.
78
Questions & Answers
79
THANK YOU
POE ALGORITHM
80
POE Proof of Correctness
81
POE Illustration
82
83
Dynamic Verification
• No modeling effort for the programmer• Program is the model – the actual program is
verified• Push-button interface : easy to use.• On the downside – verification is a function of
input. – Most programs are fairly data independent.
Dynamic verification methods are ideal for programmers!