source level debugging of parallel programs
DESCRIPTION
Source Level Debugging of Parallel Programs. Roland Wismüller LRR-TUM, TU München Germany. Outline. Introduction: source level debuggers Debuggers for parallel programs Current / future work at LRR-TUM. What is a Debugger?. A tool to remove bugs? No! A tool to find bugs? No! - PowerPoint PPT PresentationTRANSCRIPT
Source Level Debugging of Source Level Debugging of Parallel ProgramsParallel Programs
Roland Wismüller
LRR-TUM, TU München
Germany
OutlineOutline
• Introduction: source level debuggers
• Debuggers for parallel programs
• Current / future work at LRR-TUM
What is a Debugger?What is a Debugger?
• A tool to remove bugs?– No!
• A tool to find bugs?– No!
• A tool to examine program executions?– Yes!
Source Level DebuggingSource Level Debugging
Compilation ExampleCompilation Example
Setting a BreakpointSetting a Breakpoint
Setting a BreakpointSetting a Breakpoint
Printing a VariablePrinting a Variable
Continue ExecutionContinue Execution
4) cont must execute original instruction
call footrapmove r0,r1
replace trap withoriginal instruction
Continue ExecutionContinue Execution
4) cont must execute original instruction
call fooadd #4,spmove r0,r1
execute a singlestep
Continue ExecutionContinue Execution
4) cont must execute original instruction
call fooadd #4,spmove r0,r1
insert trap again
Continue ExecutionContinue Execution
4) cont must execute original instruction
call footrapmove r0,r1
continue execution
Continue ExecutionContinue Execution
4) cont must execute original instruction
call fooadd #4,spmove r0,r1
execute a singlestep
Problem:• there may be no support for single stepping
Continue ExecutionContinue Execution
4) cont must execute original instruction
call fooadd #4,spmove r0,r1
execute a singlestep
replace next instructionwith a trap
Continue ExecutionContinue Execution
4) cont must execute original instruction
call fooadd #4,sptrap
continue execution
Continue ExecutionContinue Execution
4) cont must execute original instruction
call fooadd #4,sptrap
insert originaltrap & instruction
Continue ExecutionContinue Execution
4) cont must execute original instruction
call footrapmove r0,r1
continue execution
Still a problem:• original instruction may be a jump / call / ret• we have to emulate these instructions!
Continue ExecutionContinue Execution
4) cont must execute original instruction
call fooadd #4,spmove r0,r1
execute a singlestep
A different problem:• multithreading: another thread may bypass our breakpoint
Continue ExecutionContinue Execution
4) cont must execute original instruction
call fooadd #4,spmove r0,r1
execute a singlestep
A different problem:• multithreading: another thread may bypass our breakpoint
Continue ExecutionContinue Execution
4) cont must execute original instruction
call fooadd #4,spmove r0,r1
execute a singlestep
A different problem:• multithreading: another thread may bypass our breakpoint
Continue ExecutionContinue Execution
4) cont must execute original instruction
call fooadd #4,spmove r0,r1
execute a singlestep
A different problem:• multithreading: another thread may bypass our breakpoint
Continue ExecutionContinue Execution
4) cont must execute original instruction
call footrapmove r0,r1
Solution:• don’t remove the trap• execute original instruction somewhere else
add #4,sp
Continue ExecutionContinue Execution
4) cont must execute original instruction
call footrapmove r0,r1
Solution:• don’t remove the trap• execute original instruction somewhere else
add #4,sp
Continue ExecutionContinue Execution
4) cont must execute original instruction
call footrapmove r0,r1
Solution:• don’t remove the trap• execute original instruction somewhere else
add #4,sp
Continue ExecutionContinue Execution
4) cont must execute original instruction
call footrapmove r0,r1
Solution:• don’t remove the trap• execute original instruction somewhere else
add #4,sp
Continue ExecutionContinue Execution
4) cont must execute original instruction
Still a problem:• instruction may depend on the PC value• we have to emulate these instructions!
call footrapmove r0,r1
add #4,sp
Optimization EffectsOptimization Effects
Optimization EffectsOptimization Effects
print i
reads i5
variable table
i register i5
shortprints z !!
Parallel DebuggingParallel Debugging
• Additional properties of parallel programs
• Requirements for parallel debuggers
• Problems and solution techniques
Parallel ProgramsParallel Programs
• Multiple processes and/or threads– created dynamically– many of them
• Program distributed across several hosts
• Additional state components:– communication subsystem
Multiple Processes / ThreadsMultiple Processes / Threads
• Naming processes / threads– system id’s
• may not be unique, not persistent• not user friendly
– debugger generated id’s• usually: small integers• selection based on additional information
– naming not yet existent processes / threads
• DETOP: pattern matching
Thread Selection in DETOP Thread Selection in DETOP
function executable system iddebugger id
node list selection pattern
ScalabilityScalability
• Input: use process / thread sets– commands are executed for each member– e.g. [1,2,3] print i
or [2,7] break 123– sometimes: named sets– problems:
• command semantics may differ for the processese.g. different executables / call stacks
• when to evaluate named sets?
DETOP User InterfaceDETOP User Interface
• Output: aggregation– simple case: aggregate identical results
– complex case: aggregate partially identical results
– impossible cases: asynchronous events
ScalabilityScalability
[1]: 12.3[2]: 4.1[3]: 12.3[4]: 12.3[5]: 12.3
[1,3-5]: 12.3[2]: 4.1
Aggregating Stacks: Call TreeAggregating Stacks: Call Tree
Concurrency IssuesConcurrency Issues
• What happens if a thread stops?– stop all threads in all processes– stop all threads in the same process– stop only that thread
• What happens if I continue a thread?– start all threads in all processes– start all threads in the same process– start only that thread
• When does the debugger accept input?– only when all processes are stopped– always
Concurrency IssuesConcurrency Issues
• What happens if a thread stops?– stop all threads in all processes (BP option)– stop all threads in the same process (BP option)– stop only that thread
• What happens if I continue a thread?– start all threads in all processes (separate command)– start all threads in the same process (use pattern)– start only that thread
• When does the debugger accept input?– only when all processes are stopped– always
Additional State ComponentsAdditional State Components
• E.g. message buffers, blocked processes
• Usually no support from debuggers– additional dependency on programming
library implementation
• Often other tools (visualizers) will show this information– use them together with the debugger (?)
interoperable tools
Interoperable ToolsInteroperable Tools
• Multiple, loosely coupled tools are used on the same program
• Concrete scenario:– debugger that allows to ‘time-warp’– i.e. return to previous program states without
rerunning the program– speed up debugging cycle of long running
programs
‘‘Time-Warp’ DebuggerTime-Warp’ Debugger
• Tools that need to interoperate:– parallel debugger (DETOP)– checkpointing system for parallel programs
(CoCheck, based on Condor)– deterministic execution controller (codex)– means to specify the state to return to
(VISTOP: state based program flow visualizer)
Preconditions for InteroperabilityPreconditions for Interoperability
• Common monitoring infrastructure– OMIS / OCM
• Mechanisms for informing tools on modifications of state done by other tools– e.g. VISTOP must know when DETOP stops a
process, as event buffer must be read
• Mechanisms for direct tool interaction– e.g. VISTOP to CoCheck: ‘restart from
checkpoint’
OMISOMIS
• Basis:– objects + services– event / action paradigm– scalability by using object sets– location transparency
• Example:thread_creates_proc([t_1,t_2]):thread_stop([$proc, $new_proc])thread_get_backtrace([$thread],0)
Interoperability ProblemsInteroperability Problems
• A tool may violate preconditions of another tool– DETOP can stop a process– checkpointing is initiated by sending a signal– stopped process won’t handle signal !– we cannot hide the state change from the
checkpointer
this case cannot be handled easily
The EndThe End
• Debuggers are by far not trivial
• Parallel debuggers are even more complex
• Lots of open (maybe unsolvable) research issues
• Interoperability may ease implementation of enhanced functionality