debugging components
DESCRIPTION
Debugging Components. Koen De Bosschere RUG-ELIS. Problem description. Components are loosely coupled and do not have a common notion of time Components have contracts (e.g. timing contracts) Components are activated asynchronously by the scheduler - PowerPoint PPT PresentationTRANSCRIPT
October 24, 2003
S
E
E
S
C
O
A STWW - Programma
Debugging Components
Koen De Bosschere
RUG-ELIS
S
E
E
S
C
O
A
Problem description
Components are loosely coupled and do not have a common notion of time
Components have contracts (e.g. timing contracts)
Components are activated asynchronously by the scheduler
Components can be replaced at run-time
Traditional debugging techniques are not adequate
S
E
E
S
C
O
A
Traditional debugging inadequate?
Execution is non-deterministic: no two runs can be guaranteed to be identical (scheduling, timing differences, replacing components,…): cyclic debugging not applicable
Timing is part of correctness: the intrusion caused by the debugger might violate the contracts
Input might not be repeatable if generated by an external device (e.g. camera or microphone)
Debugging is often a matter of trial and error, and a good portion of luck and experience is needed; the use of multithreading only adds to that.
S
E
E
S
C
O
A
Two approaches
On-chip debugging techniques
Software debugging techniques
S
E
E
S
C
O
A
On-chip debugging techniques
Logic Analyser
ROM monitor
ROM emulator
In-Circuit Emulator
Background Debug Mode
JTAG
These add-ons take up valuable chip area (up to 10%)
Hardware manufacturers believe in design for debugability
S
E
E
S
C
O
A
Software debugging techniques
Execution must be repeatable to allow for cyclic debugging
Program flow must be identical Input must be identical
Execution must be observable to allow for debugging
We must be able to use breakpoints, watch points, etc. without altering the program flow
Re-execution must be deterministic
S
E
E
S
C
O
A
Example code
class G { public static int global = 5;
}
class Thread1 extends Thread { public void run() { G.global += 2; }
}
class Thread2 extends Thread {public void run() { G.global *= 3; }
}
class Main {public static void main(String [] args) {
Thread1 t1 = new Thread1(); Thread2 t2 = new Thread2();
G.global = 5;t1.start(); t2.start();t1.join(); t2.join();System.out.println(“global” + G.global);
}}
S
E
E
S
C
O
A
Possible executions
L(5)
G.global=15
L(5)
S(7)
S(15)
+2
*3
G.global=5
G.global=7
L(5)
L(5)
S(7)S(15)
+2
*3
G.global=5
G.global=21
L(5)
L(7)S(7)
S(21)
+2
*3
G.global=5
G.global=17
L(15)
L(5)
S(17)
S(15)
+2
*3
G.global=5
S
E
E
S
C
O
A
Causes of non-determinism
Sequential programs: InputCertain system calls (time)…
Parallel programs: Race conditions on shared variables,Load balancing…
S
E
E
S
C
O
A
Execution Replay
Goal: make repeated equivalent re-executions possible
Method: two phasesRecord phase: record all non-
deterministic events during an execution in a trace file
Replay phase: use trace file to produce the same execution
Question: what & where to trace?Synchronization Replay Input ReplayData race detection
S
E
E
S
C
O
A
Requirements execution replay
Record must have low intrusion
Replay must be accurate
Record phase must be space efficient
Replay phase must be time efficient
S
E
E
S
C
O
A
Synchronization Replay
Execution 1 Execution 2
Trace file
record replay(happens before
relation)
S
E
E
S
C
O
A
Input replay
application
kernel
IO-instructions
System calls
S
E
E
S
C
O
A
Example code
class G { public static int global = 5; public static Object s = new Object();
}
class Thread1 extends Thread { public void run() { synchronized(G.s){G.global += 2;}}
}class Thread2 extends Thread {
public void run() { synchronized(G.s){G.global *= 3;}}}
class Main {public static void main(String [] args) {
Thread1 t1 = new Thread1(); Thread2 t2 = new Thread2();
G.global = 5;t1.start(); t2.start();t1.join(); t2.join();
}}
S
E
E
S
C
O
A
Possible executions
G.global=21
L(5)
L(7)
S(7)
S(21)
+2
*3
G.global=5
G.global=7
L(5)
L(5)
S(7)S(15)
+2
*3
G.global=5
L(5)
G.global=15
L(5)
S(7)
S(15)
+2
*3
G.global=5
G.global=17
L(15)
L(5)
S(17)
S(15)
+2
*3
G.global=5
S
E
E
S
C
O
A
Record phase
G.global=21
L(5)
L(7)
S(7)
S(21)
+2
*3
G.global=51
23
43
79
6
8
6
4
5
7
10
1,2,3,7,9,10
3,4,5,6
4,6,7,8
G.global=17
L(15)
S(17)+2
G.global=5
L(5)
S(15)*3
1
2
3
1011
12
3
45
67
89
7
1,2,3,10,11,12
3,7,8,9
4,5,6,7
S
E
E
S
C
O
A
1
2
3
4
5
6
7
8
9
10
11
12
Replay phase
G.global=21
L(5)
L(7)
S(7)
S(21)
+2
*3
G.global=51
2
3
4
3
7
9
6
8
6
4
5
7
10
G.global=17
L(15)S(17) +2
G.global=5
L(5)S(15)*3
1
2
3
10
11
12
3
4
5
6
7
8
9
7
S
E
E
S
C
O
A
Execution Replay in Java
Requires to record the choices made by synchronization constructs like synchronized, wait, signal, etc.
During replay, the synchronization operations are replaced by operations waitforlogicaltime(t).
component system T
S
E
E
S
C
O
A
Input Replay
Execution will only yield the same results if the input is repeatable too
Solution: recording input by capturing all I/O events and regenerating them during replay
Input replay generates a huge amount of data…
S
E
E
S
C
O
A
Data race detection
Data race occurs if a store/store, load/store or store/load occurs between two threads in parallel on the same location.
Automatic data race detection: check data race condition on all load/store pairs that are not ordered.
L(5)
G.global=15
L(5)
S(7)
S(15)
+2
*3
G.global=5
G.global=7
L(5)
L(5)
S(7)S(15)
+2
*3
G.global=5
S
E
E
S
C
O
A
Implementation
RecPlay for Solaris (SPARC) and Linux (x86)
Uses JiTI for dynamic instrumentationRecord overhead: 1.6%
JaReC for Java (on top of the JVM)
Uses JVMPI for dynamic instrumentationRecord overhead: 25% on average
Input-Replay for Linux (Tornado)
Uses ptrace
S
E
E
S
C
O
A
Performance modeling JVM
Java workload separable in different components Virtual Machine (SUN, IBM, JikesRVM, JRockit, …) Java application (SPECjvm98, SPECjbb2000, …) Input to the application
Measure execution characteristics (AMD Duron) IPC, branch & cache behavior, …
Statistical analysis Principal Components Analysis Cluster Analysis
Quantify difference SPECcpu2000 and Java workloads
S
E
E
S
C
O
A
JVM Results
Java workloads mostly clustered by benchmark for large workloads VM for small workloads
SPECjvm98: small input set not significant for large input set execution behavior
Comparing Java vs. C: No significant difference IPC, amount of branches, data TLB Significant difference data cache behaviour, instruction TLB,
return stack usage
S
E
E
S
C
O
A
PCA for SPECjvm98 – s1 input set
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5
principal component 1
prin
cipa
l com
pone
nt 2
blackdown 141 ibm 141 jikes base jikes adaptive sun 141 kaffe jRockit
S
E
E
S
C
O
A
PCA for SPECjvm98 – s100 input set
-3
-2
-1
0
1
2
3
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
principal component 1
prin
cipa
l com
pone
nt 2
blackdow n 141 ibm 141 jikes base jikes adaptive jrockit kaffe sun 141
S
E
E
S
C
O
A
PCA for SPECcpu vs. Java
S
E
E
S
C
O
A
Conclusions
Debugging multithreaded/distributed systems is not an easy task
Faithful record/replay requires extra resources (time + space)
Record/replay enables the developer the effectively debug a complex multithreaded program
The choice of Java VM has an impact on the low-level behavior of the processor. Java benchmarks should be large enough to be realistic.
S
E
E
S
C
O
A
Output
14 refereed conference papers (OOPSLA, ParCo, WBT,…)
12 workshop papers
5 journal publications (FGCS, CACM, Parallel Computing,…)
1 PhD
12 master theses
Java and Embedded Systems Symposium Nov 2002 [150 people]
AADEBUG 2003 workshop, Sept 2003 [60 people]