fully automatic and precise detection of thread...

36
FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD SAFETY VIOLATIONS PLDI 2012 by Michael Pradel and Thomas R. Gross ETH Zurich presented by Martin Aigner University of Salzburg May 2, 2013

Upload: others

Post on 01-Aug-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD SAFETY

VIOLATIONSPLDI 2012

by

Michael Pradel and Thomas R. GrossETH Zurich

presented by

Martin AignerUniversity of Salzburg

May 2, 2013

Page 2: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

OVERVIEW

• The problem of testing multi-threaded code

• Proposed solution: automatically generate test cases

• Evaluation of the results

Page 3: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

TESTING FOR THREAD-SAFETY

• Consider a (wanna-be thread-safe) class in an OO language...

• Try to argue about it’s thread-safety... How do you do that?

• Verify correct concurrent behavior?

• Too expensive in most cases

•Write tests? How? Where to start? Where to end?

• Also expensive

Page 4: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

EXAMPLEclass StringBuffer implements CharSequence {

/* init instance with given string */StringBuffer(String s) {...}

/* modify this holding a lock */synchronized void deleteCharAt (int index) {...}

/* modify this while holding a lock */synchronized void insert (int dstOffset, CharSequence s,int start, int end) {...}

/* convenience overloading */void insert (int dstOffset, CharSequence s) {this.insert(dstOffset, s, 0, s.length());

}}

Page 5: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

class StringBuffer implements CharSequence {

/* init instance with given string */StringBuffer(String s) {...}

/* modify this holding a lock */synchronized void deleteCharAt (int index) {...}

/* modify this while holding a lock */synchronized void insert (int dstOffset, CharSequence s,int start, int end) {...}

/* convenience overloading */void insert (int dstOffset, CharSequence s) {this.insert(dstOffset, s, 0, s.length());

}}

Thread 1 Thread 2sb1.insert(0, sb2); sb1.deleteCharAt(0);

main: StringBuffer sb1 = new StringBuffer(“thread-safe”);main: StringBuffer sb2 = new StringBuffer(“not”);

Result?

Page 6: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

class StringBuffer implements CharSequence {

/* init instance with given string */StringBuffer(String s) {...}

/* modify this holding a lock */synchronized void deleteCharAt (int index) {...}

/* modify this while holding a lock */synchronized void insert (int dstOffset, CharSequence s,int start, int end) {...}

/* convenience overloading */void insert (int dstOffset, CharSequence s) {this.insert(dstOffset, s, 0, s.length());

}}

Thread 1 Thread 2sb1.insert(0, sb1); sb1.deleteCharAt(0);

main: StringBuffer sb1 = new StringBuffer(“thread-safe”);

Result?

Page 7: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

PROBLEMS WITH TESTING

• In general:

• Cover the configuration space

•Observe success of failure

• For concurrent tests:

• Find out if a concurrent execution meets a sequential specification

Page 8: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

PROPOSED SOLUTIONCreate, execute, and validate test cases

automatically

Page 9: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

REQUIREMENTS

•We need:

• The class under test (CUT)

• A set of auxiliary classes the CUT depends on

• A test that maybe triggers a bug (thread-safety violation)

• An execution environment that exposes the bug

• An oracle that recognizes an erroneous execution

Page 10: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

GENERATING CONCURRENT TESTS

•Definitions: call sequence, prefix, suffixes

• A call sequence is a n-tuple of calls (c1, ..., cn) where a call is a method together with input and output parameters

• Prefix: a call sequence that is executed sequentially to “grow” an object

• Suffixes: a set of call sequences that are executed concurrently on an object to trigger a concurrency bug

Page 11: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

GENERATING CONCURRENT TESTS

•Definitions: test

• A test is a triple (p, s1, s2) where p is a prefix and s1 and s2 are suffixes

p1

pn

s11 s21

s1j s2k

Page 12: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

Thread 1 Thread 2sb1.insert(0, sb1); sb1.deleteCharAt(0);

main: StringBuffer sb1 = new StringBuffer(“thread-safe”);

EXAMPLE: CONCURRENT TEST

suffix s1 suffix s2

prefix p

Page 13: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

TASKS: CREATING CALL SEQUENCES

• A task takes a call sequence sin = (c1, ..., ci) and returns a call sequence sout = (c1, ..., ci, cj, ..., cn)

• 3 types of tasks:

• instantiateCUTTask: appends a call to instantiate the CUT

• callCUTTask: appends a call to the CUT instance

• parameterTask: provides a typed parameter by either :

• selecting a previous output parameter

• appending a call that returns the parameter

Page 14: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

TEST GENERATION ALGORITHM

• Three global variables:

• P: a set of prefixes

•M: a map assigning a prefix to its set of suffixes

• T: a set of already generated tests

Page 15: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

TEST GENERATION ALGORITHM

• Step 1: create a new prefix

• invokes instantiateCUTTask to create an instance of CUT

• repeatedly invokes callCUTTask to extend the prefix

• the parameterTask is invoked whenever a call requires a parameter

Page 16: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

TEST GENERATION ALGORITHM

• Step 2: create a suffix for a prefix

• repeatedly invokes callCUTTask and use the output parameters of the prefix as input the suffix

• again, the parameterTask is invoked whenever a call requires a parameter

• add the suffix to M for the given prefix

Page 17: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

TEST GENERATION ALGORITHM

• Step 3: test creation

• create tests based on the prefix, the suffix and all it’s other suffixes (obtained from M) and store them in T

• randomly return a test from T

Page 18: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

Algorithm 1 Returns a concurrent test (p, s1, s2)

1: P: set of prefixes B global variables2: M: maps a prefix to suffixes3: T : set of ready-to-use tests4: if |T | > 0 then5: return randRemove(T )6: if |P| < maxPrefixes then B create a new prefix7: p instantiateCUTTask(empty call sequence)8: if p = failed then9: if P = ; then

10: fail(”cannot instantiate CUT”)11: else12: p randTake(P)13: else14: for i 1,maxStateChangerTries do15: p

ext

callCUTTask(p)16: if p

ext

6= failed then17: p p

ext

18: P P [ {p}19: else20: p randTake(P)21: s1 empty call sequence B create a new suffix22: for i 1,maxCUTCallT ries do23: s1,ext callCUTTask(s1, p)24: if s1,ext 6= failed then25: s1 s1,ext

26: M(p) M(p) [ {s1}27: for all s2 2M(p) do B one test for each pair of suffixes28: T T [ {(p, s1, s2)}29: return randRemove(T )

After instantiating the CUT, the algorithm tries to invoke meth-ods on the CUT instance to change the state of the CUT instance.The callCUTTask randomly chooses a method among all meth-ods of C and invokes the parameterTask for each required pa-rameter. The prefix in Figure 1b contains no state changing call.Alternatively to the shown prefix, the test generator could have cre-ated the call sequence

StringBuffer sb = new StringBuffer();sb.append("abc");

where the second call is a state changing call.

Creating Suffixes During the second step (lines 21 to 26), thealgorithm creates a new suffix for the prefix p. Therefore, it repeat-edly invokes the callCUTTask. In addition to the suffix, the callin line 23 passes the prefix to the task to allow for using outputvariables from the prefix as input variables in the suffix. After ap-pending calls to the CUT instance, the new suffix is added to theset of suffixes for the prefix p.

All tasks that append method calls depend on the parameter-Task. This task uses three strategies to make a parameter of aparticular type t available. If the given call sequence already hasone or more output variables of type t or a subtype of t, the taskrandomly chooses between reusing a variable and creating a freshvariable. The rationale for creating fresh variables is to diversifythe generated tests instead of passing the same parameters againand again. To reuse a variable, the task randomly chooses from allavailable variables with matching type. To create a fresh variable,the behavior depends on the type t. If t is a primitive type orString, the task returns a randomly created literal. Otherwise,the tasks randomly chooses a method from all methods in C andA returning t or a subtype of t. If calling this method requiresparameters, the task recursively invokes itself. We limit the number

ArrayList l = new ArrayList();

l.add("a");l.add("b");

l.hashCode();

Thread 1 Thread 2

Result: ConcurrentModificationException in Thread 2

Figure 2: Test execution using a thread-unsafe class.

of recursions to avoid infinite loops and fail the task if the limitis reached. If there is no method providing t, the parameterTask

returns null. Using null as a parameter may be illegal. In this case,executing the sequence can result in an exception, for example, aNullPointerException, so that the candidate is rejected.

Creating Tests The third step of the algorithm (lines 27 to 29)combines the new suffix with each existing suffix for the prefixinto a test. The algorithm stores the created tests in T and onfurther invocations returns a randomly selected test from T untilT becomes empty.

3.3 Beyond This WorkThe tests generated by Algorithm 1 provide input to exercise a classwithout any human effort, and hence, increase the usefulness ofexisting dynamic analyses. For example, executions of generatedtests can be analyzed by existing detectors of data races [15, 32, 42,44] or atomicity violations [16, 21, 28, 38, 45]. Without generatedtests, these analyses are limited to manually created tests and thebugs exposed by these tests.

4. Thread Safety OracleThis section presents an automatic technique to determine whetherthe execution of a concurrent test exposes a thread safety bug.

4.1 Thread SafetyA class is said to be thread-safe if multiple threads can use it with-out synchronization and if the behavior observed by each thread isequivalent to a linearization of all calls that maintains the order ofcalls in each thread [1, 20]. Saying that a class is thread-safe meansthat all its methods are thread-safe. Our approach can also deal withclasses that guarantee thread safety for a subset of all methods byexcluding the unsafe methods when generating suffixes for a test.

Figure 2 is an example for a thread-unsafe class and a test exe-cution that exposes this property. The concurrent use of ArrayListfrom java.util results in an exception that does not occur in anyof the three possible linearizations of the calls:

add! add! hashCode

add! hashCode! add

hashCode! add! add

Therefore, the execution in Figure 2 shows that ArrayList isthread-unsafe, as expected.

A property related to thread safety is atomicity, that is, theguarantee that a sequence of operations performed by a thread ap-pears to execute without interleaved operations by other threads.One way to make a class thread-safe is to guarantee that eachcall to one of its methods appears to be atomic for the callingthread. However, a thread-safe class does not guarantee that mul-tiple calls to a shared instance of the class are executed atomi-cally [45]. For example, consider the use of the thread-safe classjava.util.concurrent.CopyOnWriteArrayList in Figure 3. Exe-cuting the concurrent test has three possible outputs. For all three,

524

Step 1

Step 2

Step 3

the parameters maxPrefixes, maxStateChangerTries and maxCUTCallTries are heuristically defined limits.

Page 19: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

THREAD SAFETY ORACLE

• A class is said to be thread-safe (in Java methodology) if:

•multiple threads can use it without synchronization

• and the observed behavior of a concurrent execution is equivalent to and sequential execution of a linearization of the calls preserving per-thread ordering

Page 20: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

EXAMPLE

Thread 1 Thread 2l.add(“a”); l.add(“b”);

println(l.toString());

main: CopyOnWriteArrayList l = new CopyOnWriteArrayList();

Results: [a], [a,b], or [b,a]

Linearizations:add(“a”); ! println(); ! add(“b”); gives [a]add(“a”); ! add(“b”); ! println(); gives [a, b]add(“b”); ! add(“a”); ! println(); gives [b, a]

Page 21: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

DEFINITIONS

• operator ⊕♁ concatenates call sequences

• For calls c and c’ , the notion c → c’ indicates that call c

precedes c’

Page 22: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

LINEARIZATION

CopyOnWriteArrayList l = new CopyOnWriteArrayList();

l.add("a");println(l);

l.add("b");

Thread 1 Thread 2

Result: [a], [a,b], or [b,a]

Figure 3: Test execution using a thread-safe class.

there is a linearization of calls that provides the same result, so thetest execution does not expose a thread safety problem:

add("a")! println()! add("b") gives [a]add("a")! add("b")! println() gives [a,b]add("b")! add("a")! println() gives [b,a]

4.2 DefinitionsThe thread safety oracle answers the question whether a concurrenttest execution exposes a thread safety problem by comparing theconcurrent execution to executions of linearizations of the test. Weuse � to concatenate sequences, and c !

s

c

0 indicates that call ccomes before call c0 in a call sequence s.

Definition 1 (Linearization). For a test (p, s1, s2), let P12 be theset of all permutations of the call sequence s1 � s2. The set oflinearizations of the test is:

L(p,s1,s2) = {p� s12 | s12 2 P12 ^(8c, c0 (c!

s1 c

0 ) c!s12 c

0) ^(c!

s2 c

0 ) c!s12 c

0))}That is, a linearization of a test (p, s1, s2) appends to p all calls

from s1 and from s2 in a way that preserves the order of calls in s1

and s2.

Definition 2 (Execution). For a test (p, s1, s2), we denote the setof all distinguishable executions of this test as E(p,s1,s2). Eache(p,s1,s2) 2 E(p,s1,s2) represents the sequential execution of p

followed by a concurrent execution of s1 and s2. Likewise, wedenote the sequential execution of a call sequence s as e

s

.

A single test can have multiple distinguishable executions be-cause of the non-determinism of concurrent executions.

The following definition of thread safety refers to the equiva-lence e1

⇠= e2 of two executions e1 and e2. We discuss in Sec-tion 4.3 how our oracle decides whether two executions are equiv-alent.

Definition 3 (Thread safety). Let TC

be the set of all possible testsfor a class C. C is thread-safe if and only if:

8(p, s1, s2) 2 TC

8e(p,s1,s2) 2 E(p,s1,s2)

9l 2 L(p,s1,s2) so that e(p,s1,s2) ⇠= e

l

That is, a class is thread-safe if each concurrent test executionhas an equivalent linearization.

4.3 The Test OracleShowing that a class is thread-safe according to Definition 3 isdifficult in practice, because all possible tests and all possibleexecutions of these tests would have to be considered. However,the thread safety oracle can show that a class C is thread-unsafe byshowing:

9(p, s1, s2) 2 TC

9e(p,s1,s2) 2 E(p,s1,s2)

so that 8l 2 L(p,s1,s2) e(p,s1,s2) � e

l

Algorithm 2 Checks whether a test (p, s1, s2) exposes a threadsafety bug

1: repeat2: e(p,s1,s2) execute(p, s1, s2)3: if failed(e(p,s1,s2)) then4: seqFailed false5: for all l 2 L(p, s1, s2) do6: if seqFailed = false then7: e

l

execute(l)8: if failed(e

l

)^ sameFailure(e(p,s1,s2), el) then9: seqFailed true

10: if seqFailed = false then11: report bug e(p,s1,s2) and exit12: until maxConcExecs reached

That is, the thread safety oracle tries to find a test that exposesbehavior not possible with any linearization of the test. To decidewhether two executions are equivalent, the oracle compares theexceptions and deadlocks caused by the executions.

Definition 4 (Equivalence of executions). Two executions e1 ande2 are equivalent if

• neither e1 nor e2 results in an exception or a deadlock, or• both e1 and e2 fail for the same reason (that is, the same type

of exception is thrown or both executions end with a deadlock).

Although this abstraction ignores many potential differencesbetween executions, such as different return values of method calls,it is crucial to ensure that the analysis only reports a class as thread-unsafe if this is definitely true.

Algorithm 2 shows how the analysis checks whether a test(p, s1, s2) exposes a thread safety problem. The algorithm repeat-edly executes the test until a maximum number of concurrent ex-ecutions is reached, or until a bug is found. If the test fails, thatis, it throws an exception or results in a deadlock, the algorithmexecutes all linearizations of the test to check whether the samefailure occurs during a sequential execution of the same calls. If nolinearization exposes the same failure, the algorithm reports a bugbecause the CUT is thread-unsafe.

The thread safety oracle is sound but incomplete.3 Every exe-cution for which the oracle reports a bug is guaranteed to exposea real thread safety problem according to Definition 3, but the or-acle may classify an execution as correct even though it exposesa thread safety problem. The soundness of the oracle ensures thatour approach detects concurrency bugs without reporting false pos-itives.

We build upon two assumptions, which we find true for mostreal-world classes during our evaluation. First, we assume that un-caught exceptions and deadlocks that occur in a concurrent usageof a class but not in a sequential usage are considered a problem.This assumption is in line with the commonly accepted definitionof thread safety [1, 20]. Second, we assume that sequential execu-tions of a call sequence behave deterministically. Sequentially non-deterministic methods, for example, methods that depend on thecurrent system time, should be excluded from our analysis. Alter-natively, our analysis can be combined with a runtime environmentthat ensures deterministic sequential execution [40].

3 We mean soundness and completeness regarding incorrectness [47]. Inother communities, such as type systems, the terms are typically used withrespect to correctness, that is, inverse to the usage here.

525

Intuitively: a linearization of a test (p, s1, s2) appends to p all calls from s1 and from s2 in a way that preserves the order of calls in s1 and s2

Page 23: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

EXECUTION

CopyOnWriteArrayList l = new CopyOnWriteArrayList();

l.add("a");println(l);

l.add("b");

Thread 1 Thread 2

Result: [a], [a,b], or [b,a]

Figure 3: Test execution using a thread-safe class.

there is a linearization of calls that provides the same result, so thetest execution does not expose a thread safety problem:

add("a")! println()! add("b") gives [a]add("a")! add("b")! println() gives [a,b]add("b")! add("a")! println() gives [b,a]

4.2 DefinitionsThe thread safety oracle answers the question whether a concurrenttest execution exposes a thread safety problem by comparing theconcurrent execution to executions of linearizations of the test. Weuse � to concatenate sequences, and c !

s

c

0 indicates that call ccomes before call c0 in a call sequence s.

Definition 1 (Linearization). For a test (p, s1, s2), let P12 be theset of all permutations of the call sequence s1 � s2. The set oflinearizations of the test is:

L(p,s1,s2) = {p� s12 | s12 2 P12 ^(8c, c0 (c!

s1 c

0 ) c!s12 c

0) ^(c!

s2 c

0 ) c!s12 c

0))}That is, a linearization of a test (p, s1, s2) appends to p all calls

from s1 and from s2 in a way that preserves the order of calls in s1

and s2.

Definition 2 (Execution). For a test (p, s1, s2), we denote the setof all distinguishable executions of this test as E(p,s1,s2). Eache(p,s1,s2) 2 E(p,s1,s2) represents the sequential execution of p

followed by a concurrent execution of s1 and s2. Likewise, wedenote the sequential execution of a call sequence s as e

s

.

A single test can have multiple distinguishable executions be-cause of the non-determinism of concurrent executions.

The following definition of thread safety refers to the equiva-lence e1

⇠= e2 of two executions e1 and e2. We discuss in Sec-tion 4.3 how our oracle decides whether two executions are equiv-alent.

Definition 3 (Thread safety). Let TC

be the set of all possible testsfor a class C. C is thread-safe if and only if:

8(p, s1, s2) 2 TC

8e(p,s1,s2) 2 E(p,s1,s2)

9l 2 L(p,s1,s2) so that e(p,s1,s2) ⇠= e

l

That is, a class is thread-safe if each concurrent test executionhas an equivalent linearization.

4.3 The Test OracleShowing that a class is thread-safe according to Definition 3 isdifficult in practice, because all possible tests and all possibleexecutions of these tests would have to be considered. However,the thread safety oracle can show that a class C is thread-unsafe byshowing:

9(p, s1, s2) 2 TC

9e(p,s1,s2) 2 E(p,s1,s2)

so that 8l 2 L(p,s1,s2) e(p,s1,s2) � e

l

Algorithm 2 Checks whether a test (p, s1, s2) exposes a threadsafety bug

1: repeat2: e(p,s1,s2) execute(p, s1, s2)3: if failed(e(p,s1,s2)) then4: seqFailed false5: for all l 2 L(p, s1, s2) do6: if seqFailed = false then7: e

l

execute(l)8: if failed(e

l

)^ sameFailure(e(p,s1,s2), el) then9: seqFailed true

10: if seqFailed = false then11: report bug e(p,s1,s2) and exit12: until maxConcExecs reached

That is, the thread safety oracle tries to find a test that exposesbehavior not possible with any linearization of the test. To decidewhether two executions are equivalent, the oracle compares theexceptions and deadlocks caused by the executions.

Definition 4 (Equivalence of executions). Two executions e1 ande2 are equivalent if

• neither e1 nor e2 results in an exception or a deadlock, or• both e1 and e2 fail for the same reason (that is, the same type

of exception is thrown or both executions end with a deadlock).

Although this abstraction ignores many potential differencesbetween executions, such as different return values of method calls,it is crucial to ensure that the analysis only reports a class as thread-unsafe if this is definitely true.

Algorithm 2 shows how the analysis checks whether a test(p, s1, s2) exposes a thread safety problem. The algorithm repeat-edly executes the test until a maximum number of concurrent ex-ecutions is reached, or until a bug is found. If the test fails, thatis, it throws an exception or results in a deadlock, the algorithmexecutes all linearizations of the test to check whether the samefailure occurs during a sequential execution of the same calls. If nolinearization exposes the same failure, the algorithm reports a bugbecause the CUT is thread-unsafe.

The thread safety oracle is sound but incomplete.3 Every exe-cution for which the oracle reports a bug is guaranteed to exposea real thread safety problem according to Definition 3, but the or-acle may classify an execution as correct even though it exposesa thread safety problem. The soundness of the oracle ensures thatour approach detects concurrency bugs without reporting false pos-itives.

We build upon two assumptions, which we find true for mostreal-world classes during our evaluation. First, we assume that un-caught exceptions and deadlocks that occur in a concurrent usageof a class but not in a sequential usage are considered a problem.This assumption is in line with the commonly accepted definitionof thread safety [1, 20]. Second, we assume that sequential execu-tions of a call sequence behave deterministically. Sequentially non-deterministic methods, for example, methods that depend on thecurrent system time, should be excluded from our analysis. Alter-natively, our analysis can be combined with a runtime environmentthat ensures deterministic sequential execution [40].

3 We mean soundness and completeness regarding incorrectness [47]. Inother communities, such as type systems, the terms are typically used withrespect to correctness, that is, inverse to the usage here.

525

Note: due to the non-determinism of concurrent executions a single test can have multiple distinguishable executions

Page 24: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

THREAD SAFETY

Notation: for executions e1 and e2 , the notion e1 ≅ e2 indicates that e1 and e2 are equivalent (discussed later)

CopyOnWriteArrayList l = new CopyOnWriteArrayList();

l.add("a");println(l);

l.add("b");

Thread 1 Thread 2

Result: [a], [a,b], or [b,a]

Figure 3: Test execution using a thread-safe class.

there is a linearization of calls that provides the same result, so thetest execution does not expose a thread safety problem:

add("a")! println()! add("b") gives [a]add("a")! add("b")! println() gives [a,b]add("b")! add("a")! println() gives [b,a]

4.2 DefinitionsThe thread safety oracle answers the question whether a concurrenttest execution exposes a thread safety problem by comparing theconcurrent execution to executions of linearizations of the test. Weuse � to concatenate sequences, and c !

s

c

0 indicates that call ccomes before call c0 in a call sequence s.

Definition 1 (Linearization). For a test (p, s1, s2), let P12 be theset of all permutations of the call sequence s1 � s2. The set oflinearizations of the test is:

L(p,s1,s2) = {p� s12 | s12 2 P12 ^(8c, c0 (c!

s1 c

0 ) c!s12 c

0) ^(c!

s2 c

0 ) c!s12 c

0))}That is, a linearization of a test (p, s1, s2) appends to p all calls

from s1 and from s2 in a way that preserves the order of calls in s1

and s2.

Definition 2 (Execution). For a test (p, s1, s2), we denote the setof all distinguishable executions of this test as E(p,s1,s2). Eache(p,s1,s2) 2 E(p,s1,s2) represents the sequential execution of p

followed by a concurrent execution of s1 and s2. Likewise, wedenote the sequential execution of a call sequence s as e

s

.

A single test can have multiple distinguishable executions be-cause of the non-determinism of concurrent executions.

The following definition of thread safety refers to the equiva-lence e1

⇠= e2 of two executions e1 and e2. We discuss in Sec-tion 4.3 how our oracle decides whether two executions are equiv-alent.

Definition 3 (Thread safety). Let TC

be the set of all possible testsfor a class C. C is thread-safe if and only if:

8(p, s1, s2) 2 TC

8e(p,s1,s2) 2 E(p,s1,s2)

9l 2 L(p,s1,s2) so that e(p,s1,s2) ⇠= e

l

That is, a class is thread-safe if each concurrent test executionhas an equivalent linearization.

4.3 The Test OracleShowing that a class is thread-safe according to Definition 3 isdifficult in practice, because all possible tests and all possibleexecutions of these tests would have to be considered. However,the thread safety oracle can show that a class C is thread-unsafe byshowing:

9(p, s1, s2) 2 TC

9e(p,s1,s2) 2 E(p,s1,s2)

so that 8l 2 L(p,s1,s2) e(p,s1,s2) � e

l

Algorithm 2 Checks whether a test (p, s1, s2) exposes a threadsafety bug

1: repeat2: e(p,s1,s2) execute(p, s1, s2)3: if failed(e(p,s1,s2)) then4: seqFailed false5: for all l 2 L(p, s1, s2) do6: if seqFailed = false then7: e

l

execute(l)8: if failed(e

l

)^ sameFailure(e(p,s1,s2), el) then9: seqFailed true

10: if seqFailed = false then11: report bug e(p,s1,s2) and exit12: until maxConcExecs reached

That is, the thread safety oracle tries to find a test that exposesbehavior not possible with any linearization of the test. To decidewhether two executions are equivalent, the oracle compares theexceptions and deadlocks caused by the executions.

Definition 4 (Equivalence of executions). Two executions e1 ande2 are equivalent if

• neither e1 nor e2 results in an exception or a deadlock, or• both e1 and e2 fail for the same reason (that is, the same type

of exception is thrown or both executions end with a deadlock).

Although this abstraction ignores many potential differencesbetween executions, such as different return values of method calls,it is crucial to ensure that the analysis only reports a class as thread-unsafe if this is definitely true.

Algorithm 2 shows how the analysis checks whether a test(p, s1, s2) exposes a thread safety problem. The algorithm repeat-edly executes the test until a maximum number of concurrent ex-ecutions is reached, or until a bug is found. If the test fails, thatis, it throws an exception or results in a deadlock, the algorithmexecutes all linearizations of the test to check whether the samefailure occurs during a sequential execution of the same calls. If nolinearization exposes the same failure, the algorithm reports a bugbecause the CUT is thread-unsafe.

The thread safety oracle is sound but incomplete.3 Every exe-cution for which the oracle reports a bug is guaranteed to exposea real thread safety problem according to Definition 3, but the or-acle may classify an execution as correct even though it exposesa thread safety problem. The soundness of the oracle ensures thatour approach detects concurrency bugs without reporting false pos-itives.

We build upon two assumptions, which we find true for mostreal-world classes during our evaluation. First, we assume that un-caught exceptions and deadlocks that occur in a concurrent usageof a class but not in a sequential usage are considered a problem.This assumption is in line with the commonly accepted definitionof thread safety [1, 20]. Second, we assume that sequential execu-tions of a call sequence behave deterministically. Sequentially non-deterministic methods, for example, methods that depend on thecurrent system time, should be excluded from our analysis. Alter-natively, our analysis can be combined with a runtime environmentthat ensures deterministic sequential execution [40].

3 We mean soundness and completeness regarding incorrectness [47]. Inother communities, such as type systems, the terms are typically used withrespect to correctness, that is, inverse to the usage here.

525

Page 25: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

THREAD SAFETY

Notation: for executions e1 and e2 , the notion e1 ≅ e2 indicates that e1 and e2 are equivalent (discussed later)

CopyOnWriteArrayList l = new CopyOnWriteArrayList();

l.add("a");println(l);

l.add("b");

Thread 1 Thread 2

Result: [a], [a,b], or [b,a]

Figure 3: Test execution using a thread-safe class.

there is a linearization of calls that provides the same result, so thetest execution does not expose a thread safety problem:

add("a")! println()! add("b") gives [a]add("a")! add("b")! println() gives [a,b]add("b")! add("a")! println() gives [b,a]

4.2 DefinitionsThe thread safety oracle answers the question whether a concurrenttest execution exposes a thread safety problem by comparing theconcurrent execution to executions of linearizations of the test. Weuse � to concatenate sequences, and c !

s

c

0 indicates that call ccomes before call c0 in a call sequence s.

Definition 1 (Linearization). For a test (p, s1, s2), let P12 be theset of all permutations of the call sequence s1 � s2. The set oflinearizations of the test is:

L(p,s1,s2) = {p� s12 | s12 2 P12 ^(8c, c0 (c!

s1 c

0 ) c!s12 c

0) ^(c!

s2 c

0 ) c!s12 c

0))}That is, a linearization of a test (p, s1, s2) appends to p all calls

from s1 and from s2 in a way that preserves the order of calls in s1

and s2.

Definition 2 (Execution). For a test (p, s1, s2), we denote the setof all distinguishable executions of this test as E(p,s1,s2). Eache(p,s1,s2) 2 E(p,s1,s2) represents the sequential execution of p

followed by a concurrent execution of s1 and s2. Likewise, wedenote the sequential execution of a call sequence s as e

s

.

A single test can have multiple distinguishable executions be-cause of the non-determinism of concurrent executions.

The following definition of thread safety refers to the equiva-lence e1

⇠= e2 of two executions e1 and e2. We discuss in Sec-tion 4.3 how our oracle decides whether two executions are equiv-alent.

Definition 3 (Thread safety). Let TC

be the set of all possible testsfor a class C. C is thread-safe if and only if:

8(p, s1, s2) 2 TC

8e(p,s1,s2) 2 E(p,s1,s2)

9l 2 L(p,s1,s2) so that e(p,s1,s2) ⇠= e

l

That is, a class is thread-safe if each concurrent test executionhas an equivalent linearization.

4.3 The Test OracleShowing that a class is thread-safe according to Definition 3 isdifficult in practice, because all possible tests and all possibleexecutions of these tests would have to be considered. However,the thread safety oracle can show that a class C is thread-unsafe byshowing:

9(p, s1, s2) 2 TC

9e(p,s1,s2) 2 E(p,s1,s2)

so that 8l 2 L(p,s1,s2) e(p,s1,s2) � e

l

Algorithm 2 Checks whether a test (p, s1, s2) exposes a threadsafety bug

1: repeat2: e(p,s1,s2) execute(p, s1, s2)3: if failed(e(p,s1,s2)) then4: seqFailed false5: for all l 2 L(p, s1, s2) do6: if seqFailed = false then7: e

l

execute(l)8: if failed(e

l

)^ sameFailure(e(p,s1,s2), el) then9: seqFailed true

10: if seqFailed = false then11: report bug e(p,s1,s2) and exit12: until maxConcExecs reached

That is, the thread safety oracle tries to find a test that exposesbehavior not possible with any linearization of the test. To decidewhether two executions are equivalent, the oracle compares theexceptions and deadlocks caused by the executions.

Definition 4 (Equivalence of executions). Two executions e1 ande2 are equivalent if

• neither e1 nor e2 results in an exception or a deadlock, or• both e1 and e2 fail for the same reason (that is, the same type

of exception is thrown or both executions end with a deadlock).

Although this abstraction ignores many potential differencesbetween executions, such as different return values of method calls,it is crucial to ensure that the analysis only reports a class as thread-unsafe if this is definitely true.

Algorithm 2 shows how the analysis checks whether a test(p, s1, s2) exposes a thread safety problem. The algorithm repeat-edly executes the test until a maximum number of concurrent ex-ecutions is reached, or until a bug is found. If the test fails, thatis, it throws an exception or results in a deadlock, the algorithmexecutes all linearizations of the test to check whether the samefailure occurs during a sequential execution of the same calls. If nolinearization exposes the same failure, the algorithm reports a bugbecause the CUT is thread-unsafe.

The thread safety oracle is sound but incomplete.3 Every exe-cution for which the oracle reports a bug is guaranteed to exposea real thread safety problem according to Definition 3, but the or-acle may classify an execution as correct even though it exposesa thread safety problem. The soundness of the oracle ensures thatour approach detects concurrency bugs without reporting false pos-itives.

We build upon two assumptions, which we find true for mostreal-world classes during our evaluation. First, we assume that un-caught exceptions and deadlocks that occur in a concurrent usageof a class but not in a sequential usage are considered a problem.This assumption is in line with the commonly accepted definitionof thread safety [1, 20]. Second, we assume that sequential execu-tions of a call sequence behave deterministically. Sequentially non-deterministic methods, for example, methods that depend on thecurrent system time, should be excluded from our analysis. Alter-natively, our analysis can be combined with a runtime environmentthat ensures deterministic sequential execution [40].

3 We mean soundness and completeness regarding incorrectness [47]. Inother communities, such as type systems, the terms are typically used withrespect to correctness, that is, inverse to the usage here.

525

Problem: this is expensive! All executions times all tests times all linearizations... Let’s try the opposite

Page 26: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

THREAD-UNSAFETYWe call a class thread-unsafe if:

CopyOnWriteArrayList l = new CopyOnWriteArrayList();

l.add("a");println(l);

l.add("b");

Thread 1 Thread 2

Result: [a], [a,b], or [b,a]

Figure 3: Test execution using a thread-safe class.

there is a linearization of calls that provides the same result, so thetest execution does not expose a thread safety problem:

add("a")! println()! add("b") gives [a]add("a")! add("b")! println() gives [a,b]add("b")! add("a")! println() gives [b,a]

4.2 DefinitionsThe thread safety oracle answers the question whether a concurrenttest execution exposes a thread safety problem by comparing theconcurrent execution to executions of linearizations of the test. Weuse � to concatenate sequences, and c !

s

c

0 indicates that call ccomes before call c0 in a call sequence s.

Definition 1 (Linearization). For a test (p, s1, s2), let P12 be theset of all permutations of the call sequence s1 � s2. The set oflinearizations of the test is:

L(p,s1,s2) = {p� s12 | s12 2 P12 ^(8c, c0 (c!

s1 c

0 ) c!s12 c

0) ^(c!

s2 c

0 ) c!s12 c

0))}That is, a linearization of a test (p, s1, s2) appends to p all calls

from s1 and from s2 in a way that preserves the order of calls in s1

and s2.

Definition 2 (Execution). For a test (p, s1, s2), we denote the setof all distinguishable executions of this test as E(p,s1,s2). Eache(p,s1,s2) 2 E(p,s1,s2) represents the sequential execution of p

followed by a concurrent execution of s1 and s2. Likewise, wedenote the sequential execution of a call sequence s as e

s

.

A single test can have multiple distinguishable executions be-cause of the non-determinism of concurrent executions.

The following definition of thread safety refers to the equiva-lence e1

⇠= e2 of two executions e1 and e2. We discuss in Sec-tion 4.3 how our oracle decides whether two executions are equiv-alent.

Definition 3 (Thread safety). Let TC

be the set of all possible testsfor a class C. C is thread-safe if and only if:

8(p, s1, s2) 2 TC

8e(p,s1,s2) 2 E(p,s1,s2)

9l 2 L(p,s1,s2) so that e(p,s1,s2) ⇠= e

l

That is, a class is thread-safe if each concurrent test executionhas an equivalent linearization.

4.3 The Test OracleShowing that a class is thread-safe according to Definition 3 isdifficult in practice, because all possible tests and all possibleexecutions of these tests would have to be considered. However,the thread safety oracle can show that a class C is thread-unsafe byshowing:

9(p, s1, s2) 2 TC

9e(p,s1,s2) 2 E(p,s1,s2)

so that 8l 2 L(p,s1,s2) e(p,s1,s2) � e

l

Algorithm 2 Checks whether a test (p, s1, s2) exposes a threadsafety bug

1: repeat2: e(p,s1,s2) execute(p, s1, s2)3: if failed(e(p,s1,s2)) then4: seqFailed false5: for all l 2 L(p, s1, s2) do6: if seqFailed = false then7: e

l

execute(l)8: if failed(e

l

)^ sameFailure(e(p,s1,s2), el) then9: seqFailed true

10: if seqFailed = false then11: report bug e(p,s1,s2) and exit12: until maxConcExecs reached

That is, the thread safety oracle tries to find a test that exposesbehavior not possible with any linearization of the test. To decidewhether two executions are equivalent, the oracle compares theexceptions and deadlocks caused by the executions.

Definition 4 (Equivalence of executions). Two executions e1 ande2 are equivalent if

• neither e1 nor e2 results in an exception or a deadlock, or• both e1 and e2 fail for the same reason (that is, the same type

of exception is thrown or both executions end with a deadlock).

Although this abstraction ignores many potential differencesbetween executions, such as different return values of method calls,it is crucial to ensure that the analysis only reports a class as thread-unsafe if this is definitely true.

Algorithm 2 shows how the analysis checks whether a test(p, s1, s2) exposes a thread safety problem. The algorithm repeat-edly executes the test until a maximum number of concurrent ex-ecutions is reached, or until a bug is found. If the test fails, thatis, it throws an exception or results in a deadlock, the algorithmexecutes all linearizations of the test to check whether the samefailure occurs during a sequential execution of the same calls. If nolinearization exposes the same failure, the algorithm reports a bugbecause the CUT is thread-unsafe.

The thread safety oracle is sound but incomplete.3 Every exe-cution for which the oracle reports a bug is guaranteed to exposea real thread safety problem according to Definition 3, but the or-acle may classify an execution as correct even though it exposesa thread safety problem. The soundness of the oracle ensures thatour approach detects concurrency bugs without reporting false pos-itives.

We build upon two assumptions, which we find true for mostreal-world classes during our evaluation. First, we assume that un-caught exceptions and deadlocks that occur in a concurrent usageof a class but not in a sequential usage are considered a problem.This assumption is in line with the commonly accepted definitionof thread safety [1, 20]. Second, we assume that sequential execu-tions of a call sequence behave deterministically. Sequentially non-deterministic methods, for example, methods that depend on thecurrent system time, should be excluded from our analysis. Alter-natively, our analysis can be combined with a runtime environmentthat ensures deterministic sequential execution [40].

3 We mean soundness and completeness regarding incorrectness [47]. Inother communities, such as type systems, the terms are typically used withrespect to correctness, that is, inverse to the usage here.

525

Intuitively: the oracle tries to find a test that exposes behavior not possible with any linearization of the test

Still expensive, but we are lucky this time. We only need to check buggy concurrent executions for equivalence to their linearizations

Page 27: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

EQUIVALENCE OF EXECUTION

CopyOnWriteArrayList l = new CopyOnWriteArrayList();

l.add("a");println(l);

l.add("b");

Thread 1 Thread 2

Result: [a], [a,b], or [b,a]

Figure 3: Test execution using a thread-safe class.

there is a linearization of calls that provides the same result, so thetest execution does not expose a thread safety problem:

add("a")! println()! add("b") gives [a]add("a")! add("b")! println() gives [a,b]add("b")! add("a")! println() gives [b,a]

4.2 DefinitionsThe thread safety oracle answers the question whether a concurrenttest execution exposes a thread safety problem by comparing theconcurrent execution to executions of linearizations of the test. Weuse � to concatenate sequences, and c !

s

c

0 indicates that call ccomes before call c0 in a call sequence s.

Definition 1 (Linearization). For a test (p, s1, s2), let P12 be theset of all permutations of the call sequence s1 � s2. The set oflinearizations of the test is:

L(p,s1,s2) = {p� s12 | s12 2 P12 ^(8c, c0 (c!

s1 c

0 ) c!s12 c

0) ^(c!

s2 c

0 ) c!s12 c

0))}That is, a linearization of a test (p, s1, s2) appends to p all calls

from s1 and from s2 in a way that preserves the order of calls in s1

and s2.

Definition 2 (Execution). For a test (p, s1, s2), we denote the setof all distinguishable executions of this test as E(p,s1,s2). Eache(p,s1,s2) 2 E(p,s1,s2) represents the sequential execution of p

followed by a concurrent execution of s1 and s2. Likewise, wedenote the sequential execution of a call sequence s as e

s

.

A single test can have multiple distinguishable executions be-cause of the non-determinism of concurrent executions.

The following definition of thread safety refers to the equiva-lence e1

⇠= e2 of two executions e1 and e2. We discuss in Sec-tion 4.3 how our oracle decides whether two executions are equiv-alent.

Definition 3 (Thread safety). Let TC

be the set of all possible testsfor a class C. C is thread-safe if and only if:

8(p, s1, s2) 2 TC

8e(p,s1,s2) 2 E(p,s1,s2)

9l 2 L(p,s1,s2) so that e(p,s1,s2) ⇠= e

l

That is, a class is thread-safe if each concurrent test executionhas an equivalent linearization.

4.3 The Test OracleShowing that a class is thread-safe according to Definition 3 isdifficult in practice, because all possible tests and all possibleexecutions of these tests would have to be considered. However,the thread safety oracle can show that a class C is thread-unsafe byshowing:

9(p, s1, s2) 2 TC

9e(p,s1,s2) 2 E(p,s1,s2)

so that 8l 2 L(p,s1,s2) e(p,s1,s2) � e

l

Algorithm 2 Checks whether a test (p, s1, s2) exposes a threadsafety bug

1: repeat2: e(p,s1,s2) execute(p, s1, s2)3: if failed(e(p,s1,s2)) then4: seqFailed false5: for all l 2 L(p, s1, s2) do6: if seqFailed = false then7: e

l

execute(l)8: if failed(e

l

)^ sameFailure(e(p,s1,s2), el) then9: seqFailed true

10: if seqFailed = false then11: report bug e(p,s1,s2) and exit12: until maxConcExecs reached

That is, the thread safety oracle tries to find a test that exposesbehavior not possible with any linearization of the test. To decidewhether two executions are equivalent, the oracle compares theexceptions and deadlocks caused by the executions.

Definition 4 (Equivalence of executions). Two executions e1 ande2 are equivalent if

• neither e1 nor e2 results in an exception or a deadlock, or• both e1 and e2 fail for the same reason (that is, the same type

of exception is thrown or both executions end with a deadlock).

Although this abstraction ignores many potential differencesbetween executions, such as different return values of method calls,it is crucial to ensure that the analysis only reports a class as thread-unsafe if this is definitely true.

Algorithm 2 shows how the analysis checks whether a test(p, s1, s2) exposes a thread safety problem. The algorithm repeat-edly executes the test until a maximum number of concurrent ex-ecutions is reached, or until a bug is found. If the test fails, thatis, it throws an exception or results in a deadlock, the algorithmexecutes all linearizations of the test to check whether the samefailure occurs during a sequential execution of the same calls. If nolinearization exposes the same failure, the algorithm reports a bugbecause the CUT is thread-unsafe.

The thread safety oracle is sound but incomplete.3 Every exe-cution for which the oracle reports a bug is guaranteed to exposea real thread safety problem according to Definition 3, but the or-acle may classify an execution as correct even though it exposesa thread safety problem. The soundness of the oracle ensures thatour approach detects concurrency bugs without reporting false pos-itives.

We build upon two assumptions, which we find true for mostreal-world classes during our evaluation. First, we assume that un-caught exceptions and deadlocks that occur in a concurrent usageof a class but not in a sequential usage are considered a problem.This assumption is in line with the commonly accepted definitionof thread safety [1, 20]. Second, we assume that sequential execu-tions of a call sequence behave deterministically. Sequentially non-deterministic methods, for example, methods that depend on thecurrent system time, should be excluded from our analysis. Alter-natively, our analysis can be combined with a runtime environmentthat ensures deterministic sequential execution [40].

3 We mean soundness and completeness regarding incorrectness [47]. Inother communities, such as type systems, the terms are typically used withrespect to correctness, that is, inverse to the usage here.

525

Page 28: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

CopyOnWriteArrayList l = new CopyOnWriteArrayList();

l.add("a");println(l);

l.add("b");

Thread 1 Thread 2

Result: [a], [a,b], or [b,a]

Figure 3: Test execution using a thread-safe class.

there is a linearization of calls that provides the same result, so thetest execution does not expose a thread safety problem:

add("a")! println()! add("b") gives [a]add("a")! add("b")! println() gives [a,b]add("b")! add("a")! println() gives [b,a]

4.2 DefinitionsThe thread safety oracle answers the question whether a concurrenttest execution exposes a thread safety problem by comparing theconcurrent execution to executions of linearizations of the test. Weuse � to concatenate sequences, and c !

s

c

0 indicates that call ccomes before call c0 in a call sequence s.

Definition 1 (Linearization). For a test (p, s1, s2), let P12 be theset of all permutations of the call sequence s1 � s2. The set oflinearizations of the test is:

L(p,s1,s2) = {p� s12 | s12 2 P12 ^(8c, c0 (c!

s1 c

0 ) c!s12 c

0) ^(c!

s2 c

0 ) c!s12 c

0))}That is, a linearization of a test (p, s1, s2) appends to p all calls

from s1 and from s2 in a way that preserves the order of calls in s1

and s2.

Definition 2 (Execution). For a test (p, s1, s2), we denote the setof all distinguishable executions of this test as E(p,s1,s2). Eache(p,s1,s2) 2 E(p,s1,s2) represents the sequential execution of p

followed by a concurrent execution of s1 and s2. Likewise, wedenote the sequential execution of a call sequence s as e

s

.

A single test can have multiple distinguishable executions be-cause of the non-determinism of concurrent executions.

The following definition of thread safety refers to the equiva-lence e1

⇠= e2 of two executions e1 and e2. We discuss in Sec-tion 4.3 how our oracle decides whether two executions are equiv-alent.

Definition 3 (Thread safety). Let TC

be the set of all possible testsfor a class C. C is thread-safe if and only if:

8(p, s1, s2) 2 TC

8e(p,s1,s2) 2 E(p,s1,s2)

9l 2 L(p,s1,s2) so that e(p,s1,s2) ⇠= e

l

That is, a class is thread-safe if each concurrent test executionhas an equivalent linearization.

4.3 The Test OracleShowing that a class is thread-safe according to Definition 3 isdifficult in practice, because all possible tests and all possibleexecutions of these tests would have to be considered. However,the thread safety oracle can show that a class C is thread-unsafe byshowing:

9(p, s1, s2) 2 TC

9e(p,s1,s2) 2 E(p,s1,s2)

so that 8l 2 L(p,s1,s2) e(p,s1,s2) � e

l

Algorithm 2 Checks whether a test (p, s1, s2) exposes a threadsafety bug

1: repeat2: e(p,s1,s2) execute(p, s1, s2)3: if failed(e(p,s1,s2)) then4: seqFailed false5: for all l 2 L(p, s1, s2) do6: if seqFailed = false then7: e

l

execute(l)8: if failed(e

l

)^ sameFailure(e(p,s1,s2), el) then9: seqFailed true

10: if seqFailed = false then11: report bug e(p,s1,s2) and exit12: until maxConcExecs reached

That is, the thread safety oracle tries to find a test that exposesbehavior not possible with any linearization of the test. To decidewhether two executions are equivalent, the oracle compares theexceptions and deadlocks caused by the executions.

Definition 4 (Equivalence of executions). Two executions e1 ande2 are equivalent if

• neither e1 nor e2 results in an exception or a deadlock, or• both e1 and e2 fail for the same reason (that is, the same type

of exception is thrown or both executions end with a deadlock).

Although this abstraction ignores many potential differencesbetween executions, such as different return values of method calls,it is crucial to ensure that the analysis only reports a class as thread-unsafe if this is definitely true.

Algorithm 2 shows how the analysis checks whether a test(p, s1, s2) exposes a thread safety problem. The algorithm repeat-edly executes the test until a maximum number of concurrent ex-ecutions is reached, or until a bug is found. If the test fails, thatis, it throws an exception or results in a deadlock, the algorithmexecutes all linearizations of the test to check whether the samefailure occurs during a sequential execution of the same calls. If nolinearization exposes the same failure, the algorithm reports a bugbecause the CUT is thread-unsafe.

The thread safety oracle is sound but incomplete.3 Every exe-cution for which the oracle reports a bug is guaranteed to exposea real thread safety problem according to Definition 3, but the or-acle may classify an execution as correct even though it exposesa thread safety problem. The soundness of the oracle ensures thatour approach detects concurrency bugs without reporting false pos-itives.

We build upon two assumptions, which we find true for mostreal-world classes during our evaluation. First, we assume that un-caught exceptions and deadlocks that occur in a concurrent usageof a class but not in a sequential usage are considered a problem.This assumption is in line with the commonly accepted definitionof thread safety [1, 20]. Second, we assume that sequential execu-tions of a call sequence behave deterministically. Sequentially non-deterministic methods, for example, methods that depend on thecurrent system time, should be excluded from our analysis. Alter-natively, our analysis can be combined with a runtime environmentthat ensures deterministic sequential execution [40].

3 We mean soundness and completeness regarding incorrectness [47]. Inother communities, such as type systems, the terms are typically used withrespect to correctness, that is, inverse to the usage here.

525

consider only failed concurrent executions

no concurrency bug if linearization fails for the

same reason

concurrency bug only if the linearizations did not

fail

Page 29: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

LIMITATIONS

•Only exceptions and deadlocks are considered

• Testing is still incomplete

• No semantical equivalence of concurrent and linearized executions

• The proposed approach is based on two assumptions:

• Uncaught exceptions and deadlocks that occur in concurrent executions but not in sequential ones are a problem

• Sequential execution is deterministic

Page 30: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

CONTRIBUTIONS

• Fully automatic

•Only true positives are reported

•No (or very little) human effort for testing and evaluation

• This makes it cheap!

• It does indeed find bugs

Page 31: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

EVALUATION

• Tests were performed on six popular code bases

• Java standard library

• Apache Commons DBCP

• XStream (serialization library)

• LingPipe (text processing toolkit)

• JFreeChart (chart library)

• Joda-Time (library for handling data and time)

Page 32: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

ID Code base Class Declared Found Reason for failing Implicitthread-safe unsafe

Previously unknown bugs:

(1) JDK 1.6.0 27 and 1.7.0 StringBuffer yes yes IndexOutOfBoundsException yes(2) JDK 1.6.0 27 and 1.7.0 ConcurrentHashMap yes yes StackOverflowError yes(3) Commons DBCP 1.4 SharedPoolDataSource yes yes ConcurrentModificationException yes(4) Commons DBCP 1.4 PerUserPoolDataSource yes yes ConcurrentModificationException yes(5) XStream 1.4.1 XStream yes yes NullPointerException yes(6) LingPipe 4.1.0 MedlineSentenceModel yes yes IllegalStateException no

Known bugs:

(7) JDK 1.1 BufferedInputStream yes yes NullPointerException yes(8) JDK 1.4.1 Logger yes yes NullPointerException yes(9) JDK 1.4.2 SynchronizedMap yes yes Deadlock yes

(10) JFreeChart 0.9.8 TimeSeries yes yes NullPointerException yes(11) JFreeChart 0.9.8 XYSeries yes yes ConcurrentModificationException yes(12) JFreeChart 0.9.12 NumberAxis yes yes IllegalArgumentException no(13) JFreeChart 1.0.1 PeriodAxis yes yes IllegalArgumentException no(14) JFreeChart 1.0.9 XYPlot yes yes ConcurrentModificationException yes(15) JFreeChart 1.0.13 Day yes yes NumberFormatException yes

Automatic classification of classes as thread-unsafe:

(16) Joda-Time 2.0 DateTimeFormatterBuilder no yes IndexOutOfBoundsException (10x) yes(17) Joda-Time 2.0 DateTimeParserBucket no yes IllegalArgumentException (9x) no

NullPointerException (1x) yes(18) Joda-Time 2.0 DateTimeZoneBuilder no yes NullPointerException (6x) yes

ArrayIndexOutOfBoundsException (2x) yesIllegalFieldValueException (2x) yes

(19) Joda-Time 2.0 MutableDateTime no yes IllegalFieldValueException (9x) noArithmeticException (1x) yes

(20) Joda-Time 2.0 MutableInterval no yes IllegalArgumentException (10x) no(21) Joda-Time 2.0 MutablePeriod no yes ArithmeticException (10x) yes(22) Joda-Time 2.0 PeriodFormatterBuilder no yes ConcurrentModificationException (5x) yes

IndexOutOfBoundsException (4x) yesIllegalStateException (1x) no

Joda-Time 2.0 ZoneInfoCompiler no no (stopped after 24h) –

Table 1: Summary of results. The last column indicates whether the reason for failing is implicit in the Java runtime environment or explicitlyspecified in the code base under test.

ement to the correctly synchronized put(). As a result, a call tomap.putAll(map) can undo changes done by a concurrently ex-ecuting thread that also modifies the map—a clear thread safetybug. Our analysis generates the test in Figure 4b, which exposesthe problem. Calling hashCode() results in a stack overflow forself-referential collections, because the method recursively callsitself (this is documented behavior). If ConcurrentHashMap werethread-safe, this behavior should not be triggered by the test, be-cause all three possible linearizations of the calls in Figure 4b callclear() before calling hashCode(). However, the test fails with aStackOverflowError, because putAll() undoes the effects of theconcurrently executed clear().

Figure 5 is a previously unknown bug that our analysis detectsin Apache Commons DBCP. The supposedly thread-safe classSharedPoolDataSource provides two methods setDataSource-

Name() and close() that register and unregister the data source viastatic methods of InstanceKeyObjectFactory, respectively. Thefactory class maintains a thread-unsafe HashMap assigning names todata sources. Although registering new instances is synchronized toavoid concurrent accesses to the HashMap, unregistering instancesis not synchronized. The generated test in Figure 5b shows thatthis lack of synchronization leads to an exception when callingsetDataSourceName() and close() concurrently.

To allow for reproducing our results, all analyzed classes, de-scriptions of the bugs, and a generated test to trigger each bug areavailable at http://mp.binaervarianz.de/pldi2012.

6.3 Annotating Classes as Thread-unsafeBeyond finding bugs, our analysis can be used to analyze classeshaving no documentation on their thread safety and to automat-ically annotate these classes as thread-unsafe where appropriate.In a preliminary study for this work, we found that one of themost common concurrency-related questions of Java developersis whether a particular library class is thread-safe. Few librariescome with precise documentation to answer this question. To ad-dress this lack of documentation, our analysis can automaticallyannotate classes as thread-unsafe. Since the oracle is sound, theseannotations are guaranteed to be correct.

To evaluate this usage scenario, we run our analysis on a li-brary that specifies for each class whether it is thread-safe or not.The library (Joda-Time) contains 41 classes, of which 33 are docu-mented as thread-safe and eight are documented as thread-unsafe.The lower part of Table 1 summarizes the results. For seven ofthe eight thread-unsafe classes, our analysis detects a thread safetyproblem. The missing class, ZoneInfoCompiler, reads files fromthe file system, transforms them, and writes other files as output.

527

Page 33: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

COSTS on 8-core 3GHz Xeonsclass SharedPoolDataSource {

void setDataSourceName(String v) {key = InstanceKeyObjectFactory

.registerNewInstance(this);}void close() {

InstanceKeyObjectFactory.removeInstance(key);}

}

class InstanceKeyObjectFactory {static final Map instanceMap = new HashMap();synchronized static String

registerNewInstance(SharedPoolDataSource ds) {// iterate over instanceMap

}static void removeInstance(String key) {

// BUG: unsynchronized access to instanceMap

instanceMap.remove(key);}

}

(a) Supposedly thread-safe class.

SharedPoolDataSource ds1 = new SharedPoolDataSource();ds1.setConnectionPoolDataSource(null);

dataSource.setDataSourceName("a"); dataSource.close();

Thread 1 Thread 2

Result: ConcurrentModificationException in Thread 1

(b) Execution of a generated concurrent test exposing a thread-safety bug.

Figure 5: Concurrency bug in Apache Commons DBCP.

Since the thread-safety oracle does not check the integrity of suchfiles, it cannot detect problems caused by concurrent usages of theclass. For the 33 classes that are documented as thread-safe, noproblems are found after running the analysis for 24 hours.

6.4 Effort of Using the AnalysisUsing our approach involves minimal human effort, because theanalysis requires the source code or byte code of the classes undertest as only input and produces true positives as only output. Expe-rience from applying automated bug finding techniques in industryshows that both properties are important [3].

The computational effort of our implementation is acceptablefor an automatic testing tool. Figure 6a shows how long the analysistakes to trigger the problems from Table 1. The horizontal axisshows the IDs from Table 1, sorting the classes by the average timerequired to find the problem. The vertical axis gives the minimum,average, and maximum time taken to find the problem over tenruns. Most of the problems are found within one hour. For severalproblems, the analysis takes only a few seconds. Other classesrequire several hours of computation time, with up to 8.2 hours forbug 8 (JDK’s Logger). Given that the bug remained unnoticed forseveral years in one of the most popular Java libraries, we considerthis time to be still acceptable. Section 5 outlines ways to reducethe running time of our implementation with additional engineeringeffort. The approach has very moderate memory requirements. Thetest generator selects methods randomly and therefore does notmaintain significant amounts of state. If executing a generated callsequence exceeds the available memory, an exception is thrown andthe sequence is not extended.

0.01

0.1

1

10

100

1000

11 7 16 22 13 10 15 12 20 17 21 18 6 14 1 9 2 19 3 5 4 8

Tim

e (m

in/avg/m

ax m

inutes)

CUTs (sorted by avg. time)

< 1 minute

> 1 hour

(a) Time to trigger a thread-safety problem.

10

100

1000

10000

100000

1e+06

1e+07

1e+08

11 7 16 22 13 10 15 12 20 17 21 18 6 14 1 9 2 19 3 5 4 8

Generated tests (avg.)

CUTs (sorted by avg. time)

(b) Tests generated before triggering a thread-safety problem.

Figure 6: Effort required to trigger a thread-safety problem.

A question related to running time is how many tests the anal-ysis generates and executes before hitting a bug. Figure 6b showsthe average number of generated tests for each bug, listing the bugsin the same order as in Figure 6a. For some bugs, a small num-ber of tests (several hundreds or even less than hundred) suffices toexpose the problem. Other bugs require more tests, up to 17 mil-lion for bug 4. A manual inspection of bugs requiring many testssuggests that the task of executing a bug-exposing test with the“right” thread interleaving dominates over the task of generating abug-exposing test. Combining our work with techniques to controlscheduling may reduce the number of tests for hitting a bug.

7. LimitationsFirst, the approach assumes that exceptions and deadlocks are un-desirable. We found this “implicit specification“ to be true in prac-tice, but in principle a class could throw exceptions as part of itslegal, concurrent behavior. Second, the approach has limited sup-port for multi-object bugs. Although the generated tests combinemultiple classes and objects, they only reveal bugs triggered byconcurrent calls to the same object. Third, the approach is limitedto bugs that manifest through an exception or deadlock, and there-fore it may miss more subtle problems leading to incorrect but notobviously wrong results.

8. Related Work8.1 Finding Concurrency BugsTable 2 compares this work with other bug finding techniques basedon four criteria: whether static or dynamic analysis is used, the

528

Page 34: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

REQUIRED TEST CASES

class SharedPoolDataSource {void setDataSourceName(String v) {

key = InstanceKeyObjectFactory.registerNewInstance(this);

}void close() {

InstanceKeyObjectFactory.removeInstance(key);}

}

class InstanceKeyObjectFactory {static final Map instanceMap = new HashMap();synchronized static String

registerNewInstance(SharedPoolDataSource ds) {// iterate over instanceMap

}static void removeInstance(String key) {

// BUG: unsynchronized access to instanceMap

instanceMap.remove(key);}

}

(a) Supposedly thread-safe class.

SharedPoolDataSource ds1 = new SharedPoolDataSource();ds1.setConnectionPoolDataSource(null);

dataSource.setDataSourceName("a"); dataSource.close();

Thread 1 Thread 2

Result: ConcurrentModificationException in Thread 1

(b) Execution of a generated concurrent test exposing a thread-safety bug.

Figure 5: Concurrency bug in Apache Commons DBCP.

Since the thread-safety oracle does not check the integrity of suchfiles, it cannot detect problems caused by concurrent usages of theclass. For the 33 classes that are documented as thread-safe, noproblems are found after running the analysis for 24 hours.

6.4 Effort of Using the AnalysisUsing our approach involves minimal human effort, because theanalysis requires the source code or byte code of the classes undertest as only input and produces true positives as only output. Expe-rience from applying automated bug finding techniques in industryshows that both properties are important [3].

The computational effort of our implementation is acceptablefor an automatic testing tool. Figure 6a shows how long the analysistakes to trigger the problems from Table 1. The horizontal axisshows the IDs from Table 1, sorting the classes by the average timerequired to find the problem. The vertical axis gives the minimum,average, and maximum time taken to find the problem over tenruns. Most of the problems are found within one hour. For severalproblems, the analysis takes only a few seconds. Other classesrequire several hours of computation time, with up to 8.2 hours forbug 8 (JDK’s Logger). Given that the bug remained unnoticed forseveral years in one of the most popular Java libraries, we considerthis time to be still acceptable. Section 5 outlines ways to reducethe running time of our implementation with additional engineeringeffort. The approach has very moderate memory requirements. Thetest generator selects methods randomly and therefore does notmaintain significant amounts of state. If executing a generated callsequence exceeds the available memory, an exception is thrown andthe sequence is not extended.

0.01

0.1

1

10

100

1000

11 7 16 22 13 10 15 12 20 17 21 18 6 14 1 9 2 19 3 5 4 8

Tim

e (m

in/avg/m

ax m

inutes)

CUTs (sorted by avg. time)

< 1 minute

> 1 hour

(a) Time to trigger a thread-safety problem.

10

100

1000

10000

100000

1e+06

1e+07

1e+08

11 7 16 22 13 10 15 12 20 17 21 18 6 14 1 9 2 19 3 5 4 8

Generated tests (avg.)

CUTs (sorted by avg. time)

(b) Tests generated before triggering a thread-safety problem.

Figure 6: Effort required to trigger a thread-safety problem.

A question related to running time is how many tests the anal-ysis generates and executes before hitting a bug. Figure 6b showsthe average number of generated tests for each bug, listing the bugsin the same order as in Figure 6a. For some bugs, a small num-ber of tests (several hundreds or even less than hundred) suffices toexpose the problem. Other bugs require more tests, up to 17 mil-lion for bug 4. A manual inspection of bugs requiring many testssuggests that the task of executing a bug-exposing test with the“right” thread interleaving dominates over the task of generating abug-exposing test. Combining our work with techniques to controlscheduling may reduce the number of tests for hitting a bug.

7. LimitationsFirst, the approach assumes that exceptions and deadlocks are un-desirable. We found this “implicit specification“ to be true in prac-tice, but in principle a class could throw exceptions as part of itslegal, concurrent behavior. Second, the approach has limited sup-port for multi-object bugs. Although the generated tests combinemultiple classes and objects, they only reveal bugs triggered byconcurrent calls to the same object. Third, the approach is limitedto bugs that manifest through an exception or deadlock, and there-fore it may miss more subtle problems leading to incorrect but notobviously wrong results.

8. Related Work8.1 Finding Concurrency BugsTable 2 compares this work with other bug finding techniques basedon four criteria: whether static or dynamic analysis is used, the

528

Page 35: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

REFERENCEMichael Pradel and Thomas R. Gross. 2012. Fully automatic and precise detection of thread safety violations. In Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI '12). ACM, New York, NY, USA, 521-530.

Page 36: FULLY AUTOMATIC AND PRECISE DETECTION OF THREAD …cksystemsteaching.github.io/CMM-Summer-2013/martin-aigner-thread... · the execution of a concurrent test exposes a thread safety

THANKS A LOT