diagnosing and fixing concurrency bugs credits to dr. guoliang jin, computer science, nc state...

46
Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Upload: marshall-mathews

Post on 19-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Diagnosing and FixingConcurrency Bugs

Credits to Dr. Guoliang Jin, Computer Science, NC STATE

Presented by Tao Wang

Page 2: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

2

We need reliable software People’s daily life now depends on reliable software

Software companies spend lots of resources on debugging More than 50% effort on finding and fixing bugs Around $300 billion per year

Page 3: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Concurrency bugs hurt It is an increasingly parallel world

Concurrency bugs in history

3

Page 4: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Multi-threaded program Concurrent programs under the shared-memory model

Programs execute multiple interacting threads in parallel Threads communicate via shared memory Shared-memory accesses should be well-synchronized

Multicore chip

core1

cache

thread1

core2

cache

thread2

core3

cache

thread3

core4

cache

thread4

shared memory

4

Page 5: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

HugeInterleavingspace

An example of concurrency bug

Thread 1if (ptr != NULL) { ptr->field = 1;}

Thread 2

ptr = NULL;

Thread 1

if (ptr != NULL) { ptr->field = 1;}

Thread 2ptr = NULL;

The interleaving space

5

Bad interleavings

Previous research focuses on finding

Thread 1if (ptr != NULL) {

ptr->field = 1;}

Thread 2

ptr = NULL;

Thread 1if (ptr != NULL) {

ptr->field = 1;}

Thread 2

ptr = NULL;

Segmentation Fault

Page 6: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Software quality does not improve until bugs are fixed

Manual concurrency bug fixing is time-consuming: 73 days on average error-prone: 39% patches are buggy in the first release

CFix: automated concurrency-bug fixing [PLDI’11*, OSDI’12] Program behaves correctly if bad interleavings do not occur Fix concurrency bugs by disabling bad interleavings

Bug fixing

6

*SIGPLAN:“one of the first papers to attack the problem of automated bug fixing”

Page 7: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

HugeInterleavingspace

Bad interleavings

Bad interleavings

Bad interleavingsDisabled

The interleaving space (again)

lead to production-run failures

7

Bad interleavings

Bad interleavings

Bad interleavings

Disabled

Page 8: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Failures still happen in production runs The reason behind failure needs to be understood

Tools dealing with production runs demand low overhead Diagnostic information needs to be informative

Production-run concurrency-bug failure diagnosis Design new monitoring schemes and sampling strategies

CCI: a pure software solution [OOPSLA’10]

PBI, LXR: hardware-assisted solutions [ASPLOS’13 & 14]

Failure diagnosis

8

Page 9: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

My work on concurrency bugs

[ASPLOS’11]

Production-RunFailure Diagnosis:

CCI/PBI/LXR[OOPSLA’10,

ASPLOS’13 & 14]9

[PLDI’11*, OSDI’12]*Received a SIGPLAN

CACM nomination

Bug Detection and software testing:

ConSeq

AutomatedConcurrency-Bug

Fixing: CFix

Page 10: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Outline Motivation and Overview Automated Concurrency-Bug Fixing

The problem and idea Overview Internals of CFix Evaluation and summary

10

Page 11: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

What is the correct behavior? Usually requires developers’ knowledge

How to get the correct behavior? Correct program states under bug-triggering inputs No change to program states under other inputs

Automated fixing is difficult

11

Description:SymptomTriggering condition…

Patch:CorrectnessPerformanceSimplicity

�?

Page 12: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

What is the correct behavior? The program state is correct as long as the buggy interleaving does not occur

How to get the correct behavior? Only need to disable failure-inducing interleavings Can leverage well-defined synchronization operations

CFix’ insights

12

Description:SymptomTriggering condition…

Patch:CorrectnessPerformanceSimplicity

?

Page 13: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Description:SymptomTriggering condition…

Description:Interleavings that lead to software failure

13

Patch:CorrectnessPerformanceSimplicity

?atomicity violation detectorsParkASPLOS’09,FlanaganPOPL’04,LuASPLOS’06,ChewEuroSys’10

order violation detectorsZhangASPLOS’10,LuciaMICRO’09,YuISCA’09,GaoASPLOS’11

data race detectorsSenPLDI’08,SavageTOCS’97,YuSOSP’05,EricksonOSDI’10,KasikciASPLOS’10

abnormal data flow detectorsZhangASPLOS’11,ShiOOPSLA’10

p rc

A B

Wb

RWg

I1

I2

How to get a general solution that generates good patches?

Page 14: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

. . .

. . .Patched binaryPatched binary

Patched binaryPatched binary

Merged binary

. . .Selected binary Selected binary

Mutual exclusionOrder

Mutual exclusionOrder

Final patched binary14

Description:Interleavings that lead to software failure

Patch:CorrectnessPerformanceSimplicity

CFix

Run-time Support

PatchMerging

Patch Testing & Selection

Synchronization Enforcement

Fix-Strategy Design

Source codeBug reports

Page 15: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Fix-strategy design: what to fix

Challenges: Huge variety of bugs

15

Run-time Support

Fix-Strategy Design

SynchronizationEnforcement

PatchMerging

Patch Testing& Selection

Page 16: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Why these two? Real-world concurrency bug characteristics study[SHAN

ASPLOS’08]: 97% either atomicity violation or order violation Either can be fixed by mutual exclusion or order enforcement

Two types of Concurrency bugs

16

Atomicity violation

Order violation

Page 17: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Fix-strategy design: how to fix

Challenges: Inaccurate root cause

17

Run-time Support

Fix-Strategy Design

SynchronizationEnforcement

PatchMerging

Patch Testing& Selection

Page 18: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

atomicity-violation

Thread 1

if (ptr != NULL) {

ptr->field = 1;}

ptr = NULL;

Thread 2

18

P

C

R

Page 19: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Fix-strategy for atomicity-voilation

Thread 1

if (ptr != NULL) {

ptr->field = 1;}

ptr = NULL;

Thread 2

19

Page 20: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

CFix: fix-strategy design

Challenges: Inaccurate root cause Huge variety of bugsSolution: A combination of

mutual exclusion & order relationship enforcement

20

Run-time Support

Fix-Strategy Design

SynchronizationEnforcement

PatchMerging

Patch Testing& Selection

Page 21: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Fix-strategies Overview

OV DetectorAV Detector Race Detector DU Detector

I1

I2

A B

p rc

Wb

RWg

21

Page 22: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

CFix: synchronization enforcement

Challenges: Correctness Performance simplicitySolution: Mutual exclusion

enforcement: AFix [PLDI’11] Order relationship

enforcement: OFix [OSDI’12]

22

Run-time Support

Fix-Strategy Design

SynchronizationEnforcement

PatchMerging

Patch Testing& Selection

Page 23: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Input: three statements (p, c, r) with contexts

Idea: making the code region from p to c be mutually exclusive with r

Atomicity violation in Fixing

23

Thread 1if (ptr != NULL) {

ptr->field = 1;}

Thread 2

ptr = NULL; rp

c

Page 24: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Approach: lock

Goal: Correctness: paired lock acquisition and release operations Performance: Make the critical section as small as possible

Mutual exclusion enforcement: AFix

p

c

r

24

Page 25: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

A naïve solution Add lock on edges reaching p Add unlock on edges leaving c

Potential new bugs Could lock without unlock Could unlock without lock etc.

A naïve solution

p

c

p

c

p

c

p

c

25

Page 26: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Assume p and c are in the same function f

Step 1: find protected nodes in critical section

Step 2: add lock operations

unprotected node protected node

protected node unprotected node

Avoid those potential bugs mentioned

The AFix solution

p

c

26

Page 27: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

p and c adjustment when they are in different functions Observation: people put lock and unlock in one function Find the longest common prefix of p’s and c’s stack traces Adjust p and c accordingly

Put r into a critical section Do nothing if we can reach r from the p–c critical section

Lock type: Lock with timeout: if critical section has blocking operations Reentrant lock: if recursion is possible within critical section

Subtle details

27

Page 28: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

use read

initialization

destroy

OFix: two order relationshipsAi

A BAj

…… ?

firstA-BallA-B

A1

B

An

A1

B

An

…28

Page 29: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Approach: condition variable and flag Insert signal operations in A-threads Insert wait operation before B

Rules A-thread signals exactly once when it will not execute more A A-thread signals as soon as possible B proceeds when each A-thread has signaled

OFix allA-B enforcement

29

Page 30: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

OFix allA-B enforcement: A sideHow to identify the last A instance in one thread

A

. . .; for (. . .) . . . ; // A . . .;

Each thread that executes A exactly once as soon as it can execute no more A

30

Page 31: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

OFix allA-B enforcement: A sideHow to identify the last thread that executes A

void main() { for (. . .) thread_create(thr_main); . . .;} void ofix_signal() {

mutex_lock(L);

--;

if ( == 0) cond_broadcast(con); mutex_unlock(L);}

void thr_main() { for (. . .) . . . ; // A . . .;}

counter for signal threads

=1

++

thread_create A

31

Page 32: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Safe to execute only when is 0

Give up if OFix knows that it introduces new deadlock Timed wait-operation to mask potential deadlocks

OFix allA-B enforcement: B side

B

void ofix_wait() { mutex_lock(L); if ( != 0) cond_timedwait(con, L, t); mutex_unlock(L);}

32

Page 33: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Basic enforcement

When A may not execute Add a safety-net of signal with allA-B algorithm

OFix firstA-B

BA

33

Page 34: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

CFix: patch testing & selection

Challenge: Multi-thread software

testingSolution: CFix-patch oriented testing

34

Run-time Support

Fix-Strategy Design

SynchronizationEnforcement

PatchMerging

Patch Testing& Selection

Page 35: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Patch testing principles Two ideas:

No exhaustive testing, but patch oriented testing Leverage existing testing techniques, with extra heuristics

The work-flow Step 1 Prune incorrect patches

• Patches causing failures due to wrong fix strategies, etc Step 2 Prune slow patches Step 3 Prune complicated patches

35

Page 36: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Run once without external perturbation Reject if there is a time-out or failure

Patches fixing wrong root cause Make software to fail deterministically

Thread 1

ptr->field = 1;

ptr->field = 1;

Thread 2

ptr = NULL;

36

Page 37: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Implicit bad patch A failure in patch_b implies a failure in patch_a

If patch_a is less restrictive than patch_b

Helpful to prune patch_a Traditional testing may not find the failure in patch_a

aMutual Exclusion

b cOrder Relationships

37

Page 38: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Challenge: One single programming

mistake usually leads to multiple bug reports

Solution: Heuristics to merge patches

CFix: patch merging

38

Run-time Support

Fix-Strategy Design

SynchronizationEnforcement

PatchMerging

Patch Testing& Selection

Page 39: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

c1

r1

p1

p2

c2, r2

void buf_write() { int tmp = buf_len + str_len; if (tmp > MAX) return;

memcpy(buf[buf_len], str, str_len); buf_len = tmp;}

An example with multiple reports

p1

c1 p2r1 c2, r2

Too many lock/unlock operations Potential new deadlocks May hurt performance and simplicity

39

Page 40: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Related patch: a case of AFix Merge if p, c, or r is in some other patch’s critical sections

lock(L1)p1

lock(L2)p2c1

unlock(L1)c2

unlock(L2)

lock(L1)r1

unlock(L1)

lock(L2)r2

unlock(L2)

lock(L1)p1

p2c1

c2unlock(L1)

lock(L1)r2

unlock(L1)

40

Page 41: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

c1

r1

p1

p2

c2,r2

void buf_write() { int tmp = buf_len + str_len; if (tmp > MAX) { return; }

memcpy(buf[buf_len], str, str_len); buf_len = tmp; }

The merged patch for the example

p1

c1 p2r1 c2, r2

c1,p2

c2,r1,r2

p1

41

Page 42: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

To understand whether there is a deadlock underlying time-out

Low-overhead, and suitable for production runs

CFix: run-time support

42

Run-time Support

Fix-Strategy Design

SynchronizationEnforcement

PatchMerging

Patch Testing& Selection

Page 43: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Evaluation methodologyAPP.

PBZIP2

x264

FFT

HTTrack

Mozilla-1

transmission

ZSNES

Apache

MySQL-1

MySQL-2

Mozilla-2

Cherokee

Mozilla-3

AV Detector OV Detector RA Detector DU Detector

43

Page 44: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Evaluation resultAV Detector OV Detector RA Detector DU Detector

ü ü ü

ü

ü ü ü ü

ü ü ü

ü ü

ü ü ü ü

ü

ü ü

ü ü ü

ü ü û

ü ü

ü ü ü

ü ü ü

# of Ops

5

7

5

2

2

2

3

3

5

9

3

2

5

APP.

PBZIP2

x264

FFT

HTTrack

Mozilla-1

transmission

ZSNES

Apache

MySQL-1

MySQL-2

Mozilla-2

Cherokee

Mozilla-344

Page 45: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Summary Software reliability is critical Fixing Concurrency bugs is costly and error-prone CFix uses some heuristics, with good results in practice

A combination of mutual exclusion and order enforcement Use testing to select the best patch Fix root cause without requiring detectors to report it Small overhead and good simplicity

45

Page 46: Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang

Questions ?

Thank you

46