cooperative concurrency bug isolation guoliang jin, aditya thakur, ben liblit, shan lu university of...

Cooperative Concurrency Bug Isolation

Guoliang Jin, Aditya Thakur, Ben Liblit, Shan LuUniversity of Wisconsin–Madison

Instrumentation and Sampling Strategies

Cooperative Concurrency Bug Isolation

• They are synchronization mistakes in multi-threaded programs.

• Several types:– Atomicity violation– Data race– Deadlock, etc.

read(x)

write(x)

thread 1 thread 2

write(x)

read(x)

thread 1 thread 2

J？J？

Concurrency bugs are common in the fields

• Developers are poor at parallel programming• Interleaving testing is inefficient• Applications with concurrency bugs shipped to

the users

�ƒ€‚�

Concurrency bug lead to failures in the field

• Disasters in the past– Therac-25, Northeastern Blackout 2003

• More threats in multi-core era

Failure diagnosis is critical

Concurrency Bug Failure Example

Concurrency Bug from Apache HTTP Server

…memcpy(&buf[idx], s, strlen(s));

…log_writer() {

…}…

thread 1

…temp = idx;idx = temp + strlen(s);

thread 2

…return SUCCESS;

…log_writer() {

…}…

…temp = idx;idx = temp + strlen(s);…return SUCCESS;

…return SUCCESS;

…log_writer() {

…}…

thread 1

thread 2

…return SUCCESS;

…log_writer() {

…}…

• The failure is non-deterministic and rare– Programmers have trouble to repeat the failure

• The root cause involves more than one thread

Diagnosing Concurrency Bug Failure is Challenging

Existing work and their limitations

• Failure replay– High runtime overhead– Developers need to manually locate faults

• Run-time bug detection– (mostly) High runtime overhead– Not guided by the failure• Many false positives How to achieve

low-overhead & accurate

failure diagnosis?

Predicates

Our work: CCI

�ƒƒ€‚�

Program

SourceCompiler

Counts& J/L

StatisticalDebugging

Predictors

Sampler

• Goal: diagnosing production run concurrency bug failures• Major components:– predicates instrumentor– sampler– statistical debugging

True in most failure runs, false in most correct runs.

CCI Overview• Three different types of predicates.• Each predicate has its supporting

sampling strategy.• Same statistical debugging as in CBI.• Experiments show CCI is effective in

diagnosing concurrency failures.

Capability

• Motivation• CCI Overview• CCI Predicates and Sampling Strategies – CCI-Prev and its sampling strategy – CCI-Havoc and its sampling strategy– CCI-FunRe and its sampling strategy

• Evaluation• Conclusion

Outline

CCI-Prev Intuition

read(x)

write(x)

thread 1 thread 2

read(x)

write(x)

thread 1 thread 2

read(x)

write(x)

thread 1 thread 2

read(x)

write(x)

thread 1 thread 2

Atomicity Violation Data Race

Just record which thread accessed last time.

read(x) write(x)

read(x)

write(x) read(x)

CCI-Prev PredicateIt tracks whether two successive accesses to

a shared memory location were by two distinct threads or were by the same thread.

Capability

d Prev

…log_writer() {

…}…

thread 1

CCI-Prev Predicate on the Correct Run

thread 2

…return SUCCESS;

…log_writer() {

…}…

Predicate J L…

remoteI 0 0

localI 0 0

Predicate J L…

remoteI 0 0

localI 1 0

Predicate J L…

remoteI 0 0

localI 2 0

…return SUCCESS;

…log_writer() {

…}…

thread 1

CCI-Prev Predicate on the Failure Run

thread 2

…return SUCCESS;

…log_writer() {

…}…

Predicate J L…

remoteI 0 0

localI 2 0

Predicate J L…

remoteI 0 0

localI 2 1

Predicate J L…

remoteI 0 1

localI 2 1

Predicate J L…

remoteI 0 1

localI 2 1

Predicate J L…

remoteI 0 1

localI 2 1

…log_writer() {

…}…

thread 1

CCI-Prev Predicate Instrumentation

temp = idx;

idx = temp + strlen(s);

thread 2

…return SUCCESS;

…log_writer() {…}…

Predicate J L…

remoteI 0 0

localI 2 1

Predicate J L…

remoteI 0 1

localI 2 1

Iunlock(glock);

remote = test_and_insert(& idx, curTid);record(I, remote);

lock(glock);a global hash table

address ThreadID

… …

& idx 2

… …

address ThreadID

… …

& idx 1

… …

address ThreadID

… …

& idx 1

… …

…return SUCCESS;

…log_writer() {

…}…

thread 1

CCI-Prev Sampling Strategy

thread 2

…return SUCCESS;

…log_writer() {

…}…

Does traditional sampling work? NO.

• Thread-coordinated• Bursty

Outline

CCI-Havoc Intuition

Just record what value was observed during last access.

…return SUCCESS;

…log_writer() {

…}…

thread 1

thread 2

…return SUCCESS;

…log_writer() {

…}…

CCI-Havoc PredicateIt tracks whether the value of a given shared location changes between two consecutive accesses by one thread.

Capability

d Prev

Only uses thread local information

…log_writer() {

…}…

thread 1

CCI-Havoc Predicate on the Correct Run

thread 2

…return SUCCESS;

…log_writer() {

…}…

Predicate J L…

unchangedI 0 0

changedI 0 0

Predicate J L…

unchangedI 1 0

changedI 0 0

Predicate J L…

unchangedI 2 0

changedI 0 0

…return SUCCESS;

…log_writer() {

…}…

thread 1

CCI-Havoc Predicate on the Failure Run

thread 2

…return SUCCESS;

…log_writer() {

…}…

Predicate J L…

unchangedI 2 0

changedI 0 0

Predicate J L…

unchangedI 2 1

changedI 0 0

Predicate J L…

unchangedI 2 1

changedI 0 1

Predicate J L…

unchangedI 2 1

changedI 0 1

Predicate J L…

unchangedI 2 1

changedI 0 1

…log_writer() {

…}…

thread 1

CCI-Havoc Predicate Instrumentation

… temp = idx;

idx = temp + strlen(s);

thread 2

…return SUCCESS;

Predicate J L…

unchangedI 2 1

changedI 0 0

Predicate J L…

unchangedI 2 1

changedI 0 1

…log_writer() {…}…

insert (& idx, temp);

changed = test(& idx, temp);record(I, changed);

hash table forthread1

address value

… …

& idx idx

… …

address value

… …

& idx idx+len2

… …

…return SUCCESS;

…log_writer() {

…}…

thread 1

CCI-Havoc Sampling Strategy

thread 2

…return SUCCESS;

…log_writer() {

…}…

• Bursty• Thread-independent

Outline

CCI-FunRe PredicateIt tracks whether the execution of one function overlaps with the execution of the same function from a different thread.

Capability

d Prev

HavocFunRe

CCI-FunRe Predicate Examplethread 1 thread 2

thread 1 thread 2

…log_writer() {…return SUCCESS;}… …

log_writer() {…return SUCCESS;}…

…log_writer() {…

return SUCCESS;}…

…log_writer() {…return SUCCESS;}…

Predicate J L…

NonReentlog_writer 2 1

Reentlog_writer 0 1

Predicate J L…

Reentlog_writer 0 1

… 29

…log_writer() {

oldCount = atomic_inc(Count); record(“log_writer”, oldCount);

atomic_dec(Count); return SUCCESS;}…

CCI-FunRe Predicate Instrumentationthread 1 thread 2

…log_writer() {

Predicate J L…

Reentlog_writer 0 0

FuncName Counter

… …

log_writer 0

… …

FuncName Counter

… …

log_writer 1

… …

Predicate J L…

Reentlog_writer 0 0

FuncName Counter

… …

log_writer 2

… …

Predicate J L…

Reentlog_writer 0 1

Predicate J L…

Reentlog_writer 0 1

FuncName Counter

… …

log_writer 0

… …

CCI-FunRe Sampling Strategy

thread 1 thread 2…log_writer() {

return SUCCESS;}…

Function execution accounting is not suitable for sampling, so this part is unconditional.

…log_writer() {

FuncName Counter

… …

log_writer 0

… …

FuncName Counter

… …

log_writer 0

… …

FuncName Counter

… …

log_writer 0

… …

CCI-FunRe Sampling Strategy

• Function execution accounting:–unconditional

• FunRe predicate recording:–thread-independent–non-bursty

Outline

Experimental Evaluation

• Implementation– Static instrumentor based on the CBI framework

• Real world concurrency bug failure from:– Apache HTTP server, Cherokee– Mozilla-JS, PBZIP2– SPLASH-2: FFT, LU

• Parameter used– Roughly 1/100 sampling rate

Failure Diagnosis Evaluation

• Methodology– Using concurrency bug failures occurred in real-world– Each app. runs 3000 times on a multi-core machine• Add random sleep to get some failure runs

– Sampling is enabled– Statistical debugging then return a list of predictors• Which predictor in the list can diagnose failure?

Failure Diagnosis Results (with sampling)

Program CCI-Prev CCI-Havoc CCI-FunRe

Apache-1 top1 top1 top1Apache-2 top1 top1 Cherokee top2

FFT top1 LU top1

Mozilla-JS-1 top2 top1Mozilla-JS-2 top1 top1 top1Mozilla-JS-3 top2 top1 top1

PBZIP2 top1 top1

FunRe Havoc Prev

Capability

Runtime OverheadPrev Havoc FunRe

No Sampling

Sampling No Sampling

Sampling

Apache-1 62.6% 27.4% 1.1%

Apache-2 8.4% 4.2% 0.2%

Cherokee 19.1% 2.1% 0.3%

FFT 169 % 33.5% 72.8%

LU 57857 % 1693 % 1682 %

Mozilla-JS 11311 % 7587 % 123 %

PBZIP2 0.2% 0.2% 0.3%

FunRe Havoc Prev

Overhead

Prev Havoc FunRe

No Sampling

Sampling No Sampling

Sampling

Apache-1 62.6% 1.9% 27.4% 2.8% 1.1% 1.8%

Apache-2 8.4% 0.5% 4.2% 0.4% 0.2% 0.2%

Cherokee 19.1% 0.3% 2.1% 0.0% 0.3% 0.4%

FFT 169 % 24.0% 33.5% 5.5% 72.8% 30.0%

LU 57857 % 949 % 1693 % 8.9% 1682 % 926 %

Mozilla-JS 11311 % 606 % 7587 % 356 % 123 % 97.0%

PBZIP2 0.2% 0.2% 0.2% 0.2% 0.3% 0.2%

Conclusion• CCI is capable and suitable to

diagnose many production-run concurrency bug failures.

• Future predicates can leverage our effective sampling strategies.

• Experiments confirm design tradeoff.

Capability

Questions about ?

Capability

Questions about ?

Capability

CBI on Concurrency Bug Failures

…return SUCCESS;

…log_writer() {

…}…

thread 1

LConcurrency Bug from Apache HTTP Server

thread 2

…return SUCCESS;

…log_writer() {

…}…

CBI does not work!

To diagnose production-run concurrency bug failures, interleaving related events should be tracked!!!

CCI-Prev Predicate Instrumentation with Sampling

if (gsample) {

} else {

temp = cnt;

lock(glock);

changed = test_and_insert(& cnt, curTid);

record(I, changed);

temp = cnt;

unlock(glock);

[[ gsample = true; iset = curTid; lLength=gLength=0;]]?}

CCI-Prev Predicate Instrumentation with Sampling

if (gsample) {

} else {

temp = cnt;

lock(glock);

changed = test_and_insert(& cnt, curTid);

record(I, changed);

temp = cnt;

[[ gsample = true; iset = curTid; lLength=gLength=0;]]?

unlock(glock);

lLength++;

gLength++;

if (( iset == curTid && lLength > lMAX) || gLength > gMAX){ clear (); iset = unusedTid; gsample = false; }

record(stale ? P1 : P2, changed);

changed = test_and_insert(& cnt, curTid, &stale);

CCI-Havoc Predicate Instrumentation with Sampling

record(stale ? P1 : P2, changed);

changed = test(& cnt, cnt, &stale);

if (sample) {

} else {

temp = cnt;

[[ sample = true; length=0;]]?

insert (& cnt, cnt);

if (length > lMAX) { clear (); sample = false;}

length++;

No global lock used!!!

Failure Diagnosis Results (with sampling)

Program CBI CCI-Prev CCI-Havoc CCI-FunRe

Apache-1 top1 top1 top1Apache-2 top1 top1 Cherokee top2

FFT top1 LU top1

Mozilla-JS-1 top2 top1Mozilla-JS-2 top1 top1 top1Mozilla-JS-3 top2 top1 top1

PBZIP2 top1 top1

FunRe Havoc Prev

Capability

Failure diagnosis is critical

cooperative concurrency bug isolation guoliang jin, aditya thakur, ben liblit, shan lu university of...

concurrency bug failures

concurrency bug lead

types of concurrency

rid of concurrency bugs

failure diagnosis

software failure

specific failure

production run

Documents

guoliang tongtau04,nara,septemper 16,20041 searching for...

neeraj thakur

ben liblit et al

thakur educational trust’s (regd.) thakur vidya mandir...

scanned by camscanner · manveesh thakur manveesh thakur...

surbhi thakur

shivoham s thakur

thakur college of engineering and technology scholarship...

gagenendrnath thakur

ankush thakur

nike air max 87 men : nike outlet online,nike factory...

thakur educational trust’s (regd.) thakur vidya mandir...

virendra thakur

anchor rohit thakur

clique para avançar guoliang tunnel in china's taihang...

shubham thakur (gajju)

renu thakur

thakur satyapal singh

thakur retal

rohit thakur