cooperative concurrency bug isolation guoliang jin, aditya thakur, ben liblit, shan lu university of...

Post on 05-Jan-2016

216 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Cooperative Concurrency Bug Isolation

Guoliang Jin, Aditya Thakur, Ben Liblit, Shan LuUniversity of Wisconsin–Madison

Instrumentation and Sampling Strategies

for

2

Cooperative Concurrency Bug Isolation

• They are synchronization mistakes in multi-threaded programs.

• Several types:– Atomicity violation– Data race– Deadlock, etc.

read(x)

read(x)

write(x)

thread 1 thread 2

JL

write(x)

read(x)

thread 1 thread 2

J?J?

3

Concurrency bugs are common in the fields

• Developers are poor at parallel programming• Interleaving testing is inefficient• Applications with concurrency bugs shipped to

the users

�ƒ€‚�

4

Concurrency bug lead to failures in the field

• Disasters in the past– Therac-25, Northeastern Blackout 2003

• More threats in multi-core era

5

Failure diagnosis is critical

6

L

Concurrency Bug Failure Example

Concurrency Bug from Apache HTTP Server

7

…memcpy(&buf[idx], s, strlen(s));

…log_writer() {

…}…

thread 1

J

Concurrency Bug Failure Example

Concurrency Bug from Apache HTTP Server

…temp = idx;idx = temp + strlen(s);

idx

thread 2

…return SUCCESS;

…memcpy(&buf[idx], s, strlen(s));

…log_writer() {

…}…

…temp = idx;idx = temp + strlen(s);…return SUCCESS;

8

…return SUCCESS;

…memcpy(&buf[idx], s, strlen(s));

…memcpy(&buf[idx], s, strlen(s));

…log_writer() {

…}…

thread 1

L

Concurrency Bug Failure Example

Concurrency Bug from Apache HTTP Server

…temp = idx;idx = temp + strlen(s);

idx

thread 2

…return SUCCESS;

…log_writer() {

…}…

…temp = idx;idx = temp + strlen(s);

9

• The failure is non-deterministic and rare– Programmers have trouble to repeat the failure

• The root cause involves more than one thread

Diagnosing Concurrency Bug Failure is Challenging

10

Existing work and their limitations

• Failure replay– High runtime overhead– Developers need to manually locate faults

• Run-time bug detection– (mostly) High runtime overhead– Not guided by the failure• Many false positives How to achieve

low-overhead & accurate

failure diagnosis?

11

Predicates

Our work: CCI

�ƒƒ€‚�

Program

SourceCompiler

Counts& J/L

StatisticalDebugging

Predictors

Sampler

• Goal: diagnosing production run concurrency bug failures• Major components:– predicates instrumentor– sampler– statistical debugging

True in most failure runs, false in most correct runs.

12

CCI Overview• Three different types of predicates.• Each predicate has its supporting

sampling strategy.• Same statistical debugging as in CBI.• Experiments show CCI is effective in

diagnosing concurrency failures.

Capability

Ove

rhea

d

FunRe

Havoc

Prev

13

• Motivation• CCI Overview• CCI Predicates and Sampling Strategies – CCI-Prev and its sampling strategy – CCI-Havoc and its sampling strategy– CCI-FunRe and its sampling strategy

• Evaluation• Conclusion

Outline

• Motivation• CCI Overview• CCI Predicates and Sampling Strategies – CCI-Prev and its sampling strategy – CCI-Havoc and its sampling strategy– CCI-FunRe and its sampling strategy

• Evaluation• Conclusion

14

CCI-Prev Intuition

read(x)

read(x)

write(x)

J L

thread 1 thread 2

read(x)

read(x)

write(x)

thread 1 thread 2

read(x)

write(x)

J L

thread 1 thread 2

read(x)

write(x)

thread 1 thread 2

Atomicity Violation Data Race

Just record which thread accessed last time.

read(x) write(x)

read(x)

read(x)

read(x)

write(x) read(x)

15

CCI-Prev PredicateIt tracks whether two successive accesses to

a shared memory location were by two distinct threads or were by the same thread.

Capability

Ove

rhea

d Prev

16

…memcpy(&buf[idx], s, strlen(s));

…memcpy(&buf[idx], s, strlen(s));

…log_writer() {

…}…

thread 1

J

CCI-Prev Predicate on the Correct Run

Concurrency Bug from Apache HTTP Server

…temp = idx;idx = temp + strlen(s);

thread 2

…return SUCCESS;

…log_writer() {

…}…

…temp = idx;idx = temp + strlen(s);…return SUCCESS;

I

I

Predicate J L…

remoteI 0 0

localI 0 0

Predicate J L…

remoteI 0 0

localI 1 0

Predicate J L…

remoteI 0 0

localI 2 0

17

…memcpy(&buf[idx], s, strlen(s));

…memcpy(&buf[idx], s, strlen(s));

…return SUCCESS;

…log_writer() {

…}…

thread 1

L

CCI-Prev Predicate on the Failure Run

Concurrency Bug from Apache HTTP Server

…temp = idx;idx = temp + strlen(s);

thread 2

…return SUCCESS;

…log_writer() {

…}…

…temp = idx;idx = temp + strlen(s);

I

I

Predicate J L…

remoteI 0 0

localI 2 0

Predicate J L…

remoteI 0 0

localI 2 1

Predicate J L…

remoteI 0 1

localI 2 1

Predicate J L…

remoteI 0 1

localI 2 1

Predicate J L…

remoteI 0 1

localI 2 1

18

…memcpy(&buf[idx], s, strlen(s));

…log_writer() {

…}…

thread 1

L

CCI-Prev Predicate Instrumentation

Concurrency Bug from Apache HTTP Server

temp = idx;

idx = temp + strlen(s);

thread 2

…return SUCCESS;

…log_writer() {…}…

Predicate J L…

remoteI 0 0

localI 2 1

Predicate J L…

remoteI 0 1

localI 2 1

Iunlock(glock);

remote = test_and_insert(& idx, curTid);record(I, remote);

lock(glock);a global hash table

address ThreadID

… …

& idx 2

… …

address ThreadID

… …

& idx 1

… …

address ThreadID

… …

& idx 1

… …

19

…memcpy(&buf[idx], s, strlen(s));

…memcpy(&buf[idx], s, strlen(s));

…return SUCCESS;

…log_writer() {

…}…

thread 1

CCI-Prev Sampling Strategy

…temp = idx;idx = temp + strlen(s);

thread 2

…return SUCCESS;

…log_writer() {

…}…

…temp = idx;idx = temp + strlen(s);

Does traditional sampling work? NO.

• Thread-coordinated• Bursty

I

20

• Motivation• CCI Overview• CCI Predicates and Sampling Strategies – CCI-Prev and its sampling strategy – CCI-Havoc and its sampling strategy– CCI-FunRe and its sampling strategy

• Evaluation• Conclusion

• Motivation• CCI Overview• CCI Predicates and Sampling Strategies – CCI-Prev and its sampling strategy – CCI-Havoc and its sampling strategy– CCI-FunRe and its sampling strategy

• Evaluation• Conclusion

Outline

21

…memcpy(&buf[idx], s, strlen(s));

CCI-Havoc Intuition

Just record what value was observed during last access.

…memcpy(&buf[idx], s, strlen(s));

…return SUCCESS;

…log_writer() {

…}…

thread 1

…temp = idx;idx = temp + strlen(s);

thread 2

…return SUCCESS;

…log_writer() {

…}…

…temp = idx;idx = temp + strlen(s);

I

22

CCI-Havoc PredicateIt tracks whether the value of a given shared location changes between two consecutive accesses by one thread.

Capability

Ove

rhea

d Prev

Havoc

Only uses thread local information

23

…memcpy(&buf[idx], s, strlen(s));

…memcpy(&buf[idx], s, strlen(s));

…log_writer() {

…}…

thread 1

J

CCI-Havoc Predicate on the Correct Run

Concurrency Bug from Apache HTTP Server

…temp = idx;idx = temp + strlen(s);

thread 2

…return SUCCESS;

…log_writer() {

…}…

…temp = idx;idx = temp + strlen(s);…return SUCCESS;

I

I

Predicate J L…

unchangedI 0 0

changedI 0 0

Predicate J L…

unchangedI 1 0

changedI 0 0

Predicate J L…

unchangedI 2 0

changedI 0 0

24

…memcpy(&buf[idx], s, strlen(s));

…memcpy(&buf[idx], s, strlen(s));

…return SUCCESS;

…log_writer() {

…}…

thread 1

L

CCI-Havoc Predicate on the Failure Run

Concurrency Bug from Apache HTTP Server

…temp = idx;idx = temp + strlen(s);

thread 2

…return SUCCESS;

…log_writer() {

…}…

…temp = idx;idx = temp + strlen(s);

I

I

Predicate J L…

unchangedI 2 0

changedI 0 0

Predicate J L…

unchangedI 2 1

changedI 0 0

Predicate J L…

unchangedI 2 1

changedI 0 1

Predicate J L…

unchangedI 2 1

changedI 0 1

Predicate J L…

unchangedI 2 1

changedI 0 1

25

…memcpy(&buf[idx], s, strlen(s));

…log_writer() {

…}…

thread 1

L

CCI-Havoc Predicate Instrumentation

Concurrency Bug from Apache HTTP Server

… temp = idx;

idx = temp + strlen(s);

thread 2

…return SUCCESS;

Predicate J L…

unchangedI 2 1

changedI 0 0

Predicate J L…

unchangedI 2 1

changedI 0 1

…log_writer() {…}…

I

insert (& idx, temp);

changed = test(& idx, temp);record(I, changed);

hash table forthread1

address value

… …

& idx idx

… …

address value

… …

& idx idx+len2

… …

26

…memcpy(&buf[idx], s, strlen(s));

…return SUCCESS;

…log_writer() {

…}…

thread 1

CCI-Havoc Sampling Strategy

…temp = idx;idx = temp + strlen(s);

thread 2

…return SUCCESS;

…log_writer() {

…}…

…temp = idx;idx = temp + strlen(s);

• Bursty• Thread-independent

…memcpy(&buf[idx], s, strlen(s));

27

• Motivation• CCI Overview• CCI Predicates and Sampling Strategies – CCI-Prev and its sampling strategy – CCI-Havoc and its sampling strategy– CCI-FunRe and its sampling strategy

• Evaluation• Conclusion

• Motivation• CCI Overview• CCI Predicates and Sampling Strategies – CCI-Prev and its sampling strategy – CCI-Havoc and its sampling strategy– CCI-FunRe and its sampling strategy

• Evaluation• Conclusion

Outline

28

CCI-FunRe PredicateIt tracks whether the execution of one function overlaps with the execution of the same function from a different thread.

Capability

Ove

rhea

d Prev

HavocFunRe

CCI-FunRe Predicate Examplethread 1 thread 2

L

thread 1 thread 2

J

…log_writer() {…return SUCCESS;}… …

log_writer() {…return SUCCESS;}…

…log_writer() {…

return SUCCESS;}…

…log_writer() {…return SUCCESS;}…

Predicate J L…

NonReentlog_writer 2 1

Reentlog_writer 0 1

Predicate J L…

NonReentlog_writer 2 1

Reentlog_writer 0 1

… 29

30

…log_writer() {

oldCount = atomic_inc(Count); record(“log_writer”, oldCount);

atomic_dec(Count); return SUCCESS;}…

CCI-FunRe Predicate Instrumentationthread 1 thread 2

…log_writer() {

oldCount = atomic_inc(Count); record(“log_writer”, oldCount);

atomic_dec(Count); return SUCCESS;}…

L

Predicate J L…

NonReentlog_writer 2 0

Reentlog_writer 0 0

FuncName Counter

… …

log_writer 0

… …

FuncName Counter

… …

log_writer 1

… …

Predicate J L…

NonReentlog_writer 2 1

Reentlog_writer 0 0

FuncName Counter

… …

log_writer 2

… …

Predicate J L…

NonReentlog_writer 2 1

Reentlog_writer 0 1

Predicate J L…

NonReentlog_writer 2 1

Reentlog_writer 0 1

FuncName Counter

… …

log_writer 0

… …

31

CCI-FunRe Sampling Strategy

L

thread 1 thread 2…log_writer() {

return SUCCESS;}…

Function execution accounting is not suitable for sampling, so this part is unconditional.

…log_writer() {

oldCount = atomic_inc(Count); record(“log_writer”, oldCount);

atomic_dec(Count); return SUCCESS;}…

FuncName Counter

… …

log_writer 0

… …

FuncName Counter

… …

log_writer 0

… …

FuncName Counter

… …

log_writer 0

… …

32

CCI-FunRe Sampling Strategy

• Function execution accounting:–unconditional

• FunRe predicate recording:–thread-independent–non-bursty

33

• Motivation• CCI Overview• CCI Predicates and Sampling Strategies – CCI-Prev and its sampling strategy – CCI-Havoc and its sampling strategy– CCI-FunRe and its sampling strategy

• Evaluation• Conclusion

• Motivation• CCI Overview• CCI Predicates and Sampling Strategies – CCI-Prev and its sampling strategy – CCI-Havoc and its sampling strategy– CCI-FunRe and its sampling strategy

• Evaluation• Conclusion

Outline

34

Experimental Evaluation

• Implementation– Static instrumentor based on the CBI framework

• Real world concurrency bug failure from:– Apache HTTP server, Cherokee– Mozilla-JS, PBZIP2– SPLASH-2: FFT, LU

• Parameter used– Roughly 1/100 sampling rate

35

Failure Diagnosis Evaluation

• Methodology– Using concurrency bug failures occurred in real-world– Each app. runs 3000 times on a multi-core machine• Add random sleep to get some failure runs

– Sampling is enabled– Statistical debugging then return a list of predictors• Which predictor in the list can diagnose failure?

36

Failure Diagnosis Results (with sampling)

Program CCI-Prev CCI-Havoc CCI-FunRe

Apache-1 top1 top1 top1Apache-2 top1 top1 Cherokee top2

FFT top1 LU top1

Mozilla-JS-1 top2 top1Mozilla-JS-2 top1 top1 top1Mozilla-JS-3 top2 top1 top1

PBZIP2 top1 top1

FunRe Havoc Prev

Capability

37

Runtime OverheadPrev Havoc FunRe

No Sampling

Sampling No Sampling

Sampling No Sampling

Sampling

Apache-1 62.6% 27.4% 1.1%

Apache-2 8.4% 4.2% 0.2%

Cherokee 19.1% 2.1% 0.3%

FFT 169 % 33.5% 72.8%

LU 57857 % 1693 % 1682 %

Mozilla-JS 11311 % 7587 % 123 %

PBZIP2 0.2% 0.2% 0.3%

FunRe Havoc Prev

Overhead

Prev Havoc FunRe

No Sampling

Sampling No Sampling

Sampling No Sampling

Sampling

Apache-1 62.6% 1.9% 27.4% 2.8% 1.1% 1.8%

Apache-2 8.4% 0.5% 4.2% 0.4% 0.2% 0.2%

Cherokee 19.1% 0.3% 2.1% 0.0% 0.3% 0.4%

FFT 169 % 24.0% 33.5% 5.5% 72.8% 30.0%

LU 57857 % 949 % 1693 % 8.9% 1682 % 926 %

Mozilla-JS 11311 % 606 % 7587 % 356 % 123 % 97.0%

PBZIP2 0.2% 0.2% 0.2% 0.2% 0.3% 0.2%

38

Conclusion• CCI is capable and suitable to

diagnose many production-run concurrency bug failures.

• Future predicates can leverage our effective sampling strategies.

• Experiments confirm design tradeoff.

Capability

Ove

rhea

d

Prev

Havoc

FunRe

39

Questions about ?

Capability

Ove

rhea

d

Prev

Havoc

FunRe

CCI

40

Questions about ?

Capability

Ove

rhea

d

Prev

Havoc

FunRe

CCI

41

…memcpy(&buf[idx], s, strlen(s));

…memcpy(&buf[idx], s, strlen(s));

CBI on Concurrency Bug Failures

…return SUCCESS;

…log_writer() {

…}…

thread 1

LConcurrency Bug from Apache HTTP Server

…temp = idx;idx = temp + strlen(s);

thread 2

…return SUCCESS;

…log_writer() {

…}…

…temp = idx;idx = temp + strlen(s);

CBI does not work!

idx

To diagnose production-run concurrency bug failures, interleaving related events should be tracked!!!

42

CCI-Prev Predicate Instrumentation with Sampling

if (gsample) {

} else {

temp = cnt;

lock(glock);

changed = test_and_insert(& cnt, curTid);

record(I, changed);

temp = cnt;

unlock(glock);

[[ gsample = true; iset = curTid; lLength=gLength=0;]]?}

43

CCI-Prev Predicate Instrumentation with Sampling

if (gsample) {

} else {

temp = cnt;

lock(glock);

changed = test_and_insert(& cnt, curTid);

record(I, changed);

temp = cnt;

[[ gsample = true; iset = curTid; lLength=gLength=0;]]?

}

unlock(glock);

lLength++;

gLength++;

if (( iset == curTid && lLength > lMAX) || gLength > gMAX){ clear (); iset = unusedTid; gsample = false; }

record(stale ? P1 : P2, changed);

changed = test_and_insert(& cnt, curTid, &stale);

44

CCI-Havoc Predicate Instrumentation with Sampling

record(stale ? P1 : P2, changed);

changed = test(& cnt, cnt, &stale);

if (sample) {

} else {

temp = cnt;

temp = cnt;

[[ sample = true; length=0;]]?

}

insert (& cnt, cnt);

if (length > lMAX) { clear (); sample = false;}

length++;

No global lock used!!!

45

Failure Diagnosis Results (with sampling)

Program CBI CCI-Prev CCI-Havoc CCI-FunRe

Apache-1 top1 top1 top1Apache-2 top1 top1 Cherokee top2

FFT top1 LU top1

Mozilla-JS-1 top2 top1Mozilla-JS-2 top1 top1 top1Mozilla-JS-3 top2 top1 top1

PBZIP2 top1 top1

FunRe Havoc Prev

Capability

46

Failure diagnosis is critical

top related