satcheck: sat-directed stateless model checking for sc and tso

18
C o ns i s t e nt * Comp le t e* W e l lD o c u m e n t e d * E a s y t o Re u s e * * E va l ua t e d * OO P S L A * A r ti f a ct * A E C SATCheck: SAT-Directed Stateless Model Checking for SC and TSO Brian Demsky University of California, Irvine [email protected] Patrick Lam University of Waterloo [email protected] Abstract Writing low-level concurrent code is well known to be challenging and error prone. The widespread deploy- ment of multi-core hardware and the shift towards us- ing low-level concurrent data structures has moved the problem into the mainstream. Finding bugs in such code may require finding a specific bug-revealing thread in- terleaving out of a huge space of parallel executions. Model-checking is a powerful technique for exhaus- tively testing code. However, scaling model checking presents a significant challenge. In this paper we present a new and more scalable technique for model checking concurrent code, based on concrete execution. Our technique observes concrete be- haviors, builds a model of these behaviors, encodes the model in SAT, and leverages SAT solver technology to find executions that reveal new behaviors. It then runs the new execution, incorporates the newly observed be- havior, and repeats the process until it has explored all reachable behaviors. We have implemented a prototype of our approach in the SATCheck tool with support for both the Total Store Order (TSO) and Sequentially Consistent (SC) memory models and use SATCheck to test several con- current data structure implementations. We compare SATCheck to the original DPOR stateless model check- ing algorithm implemented in CDSChecker, the source DPOR algorithm implemented in Nidhugg, and Check- Fence. Our experiments show that SATCheck scales better than previous approaches while at the same time operating on concrete executions. 1. Introduction Testing concurrent code can be challenging: exposing bugs often requires driving program execution to a spe- cific interleaving. However, bug-revealing interleavings may account for a vanishingly small proportion of a large interleaving space. Model checking can be an eec- tive way of testing concurrent code, but current model checking approaches are typically best suited for small unit tests. In this work, we present a novel approach that leverages SAT solving technology to explore only the interleavings that expose new concrete behaviors. Current approaches to model checking concurrent code primarily take one of three approaches: Explicit State: Early work on model checking con- current code explicitly modeled program state. This approach is not often used directly on software, as encoding the entire state can be problematic. Fur- thermore, encoded state can lead to a combinatorial state explosion even when all operations commute. Stateless: Stateless approaches to model checking eliminate the need to explicitly encode program state and instead explore all interleavings of conflicting (i.e. non-commutative) operations. In the context of stateless model checking, researchers have developed several partial order reduction techniques to reduce exploration of redundant executions [1, 1618]. Partial order reduction techniques for stateless model checking reason locally about the commuta- tivity of individual operations and eectively assume that any operation may depend on all previous oper- ations. There remains further potential for improve- ment in partial order reduction by incorporating rea- soning about the program’s dependency structure. To be more precise, consider a program with two threads that both first perform an operation on a concurrent queue and then each perform an inde- pendent operation on a concurrent stack. Existing POR approaches will explore every interleaving of the stack operations in the context of every inter- leaving of the queue operations even though these operations are independent! SAT-Based Approaches: Researchers have im- proved on stateless model checking by developing SAT-based approaches to model checking [9, 28]. The key insight is that, by encoding a program as a SAT formula, the model checker can leverage the SAT solver’s heuristics to avoid wasting time exploring redundant executions. 1 2015/8/12

Upload: others

Post on 16-Oct-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SATCheck: SAT-Directed Stateless Model Checking for SC and TSO

Consist

ent *Complete *

Well D

ocumented*Easyt

oR

euse* *

Evaluated*

OOPSLA*

Artifact *

AEC

SATCheck: SAT-Directed StatelessModel Checking for SC and TSO

Brian DemskyUniversity of California, Irvine

[email protected]

Patrick LamUniversity of Waterloo

[email protected]

AbstractWriting low-level concurrent code is well known to bechallenging and error prone. The widespread deploy-ment of multi-core hardware and the shift towards us-ing low-level concurrent data structures has moved theproblem into the mainstream. Finding bugs in such codemay require finding a specific bug-revealing thread in-terleaving out of a huge space of parallel executions.

Model-checking is a powerful technique for exhaus-tively testing code. However, scaling model checkingpresents a significant challenge.

In this paper we present a new and more scalabletechnique for model checking concurrent code, based onconcrete execution. Our technique observes concrete be-haviors, builds a model of these behaviors, encodes themodel in SAT, and leverages SAT solver technology tofind executions that reveal new behaviors. It then runsthe new execution, incorporates the newly observed be-havior, and repeats the process until it has explored allreachable behaviors.

We have implemented a prototype of our approachin the SATCheck tool with support for both the TotalStore Order (TSO) and Sequentially Consistent (SC)memory models and use SATCheck to test several con-current data structure implementations. We compareSATCheck to the original DPOR stateless model check-ing algorithm implemented in CDSChecker, the sourceDPOR algorithm implemented in Nidhugg, and Check-Fence. Our experiments show that SATCheck scalesbetter than previous approaches while at the same timeoperating on concrete executions.

1. IntroductionTesting concurrent code can be challenging: exposingbugs often requires driving program execution to a spe-cific interleaving. However, bug-revealing interleavingsmay account for a vanishingly small proportion of alarge interleaving space. Model checking can be an e�ec-tive way of testing concurrent code, but current modelchecking approaches are typically best suited for smallunit tests. In this work, we present a novel approach

that leverages SAT solving technology to explore onlythe interleavings that expose new concrete behaviors.

Current approaches to model checking concurrentcode primarily take one of three approaches:

• Explicit State: Early work on model checking con-current code explicitly modeled program state. Thisapproach is not often used directly on software, asencoding the entire state can be problematic. Fur-thermore, encoded state can lead to a combinatorialstate explosion even when all operations commute.

• Stateless: Stateless approaches to model checkingeliminate the need to explicitly encode program stateand instead explore all interleavings of conflicting(i.e. non-commutative) operations. In the context ofstateless model checking, researchers have developedseveral partial order reduction techniques to reduceexploration of redundant executions [1, 16–18].Partial order reduction techniques for statelessmodel checking reason locally about the commuta-tivity of individual operations and e�ectively assumethat any operation may depend on all previous oper-ations. There remains further potential for improve-ment in partial order reduction by incorporating rea-soning about the program’s dependency structure.To be more precise, consider a program with twothreads that both first perform an operation on aconcurrent queue and then each perform an inde-pendent operation on a concurrent stack. ExistingPOR approaches will explore every interleaving ofthe stack operations in the context of every inter-leaving of the queue operations even though theseoperations are independent!

• SAT-Based Approaches: Researchers have im-proved on stateless model checking by developingSAT-based approaches to model checking [9, 28]. Thekey insight is that, by encoding a program as a SATformula, the model checker can leverage the SATsolver’s heuristics to avoid wasting time exploringredundant executions.

1 2015/8/12

Page 2: SATCheck: SAT-Directed Stateless Model Checking for SC and TSO

Current SAT-based approaches translate the entireprogram to a SAT formula. This is non-ideal inthat many programs contain arithmetic expressions,which significantly complicate the SAT formula andlimit the scalability of this approach. Moreover, theapproach is not applicable to code that calls libraryfunctions whose source is not available.A second downside of existing SAT-based approachesis that they are all or nothing—if the generatedSAT equation is too complex for the SAT solver toanalyze, the model checker provides the developerwith no information.We introduce a new approach to model checking

concurrent code that leverages two key insights fromthe previous approaches:

• Reasoning about dependences is necessary forscalability: Approaches that naıvely run code don’tscale because they waste too much e�ort exploringredundant executions. Scalable approaches must rea-son about how operations depend on each other toavoid wasting e�ort on redundant executions.

• Encoding the entire program in SAT canadd significant complexity from the non-concurrent parts of the code: Approaches thattry to encode the entire program into SAT incur com-plexity from the non-concurrent parts of the compu-tation (e.g., arithmetic on integers). In many cases,concurrent executions of a given test case cannotdrive arbitrary values through the program, and thusit is not necessary to encode how the computationoperates on all values, but rather just the values thatactually arise in concurrent executions.

1.1 Our ApproachThis paper presents our novel SAT-based approachto model checking for concurrent code. Unlike pre-vious SAT-based approaches, we use the SAT solverto encode the execution, not the program. Our ap-proach was inspired by concolic testing as well as pre-vious work on model checking concurrent code [27].SATCheck leverages dependency information providedin program instrumentation, and we have developeda compiler frontend that instruments C code for usewith SATCheck. SATCheck accepts as input an in-strumented program including stores, loads, atomicRead-Modify-Write (RMW) operations, uninterpretedfunctions, equality comparisons, structured conditionalbranches, while loops, and phi functions.

Our approach, like similar approaches in statelessmodel checking, (DPOR [16, 35], Chess [23], SourceD-POR [2], Optimal DPOR [1], Maximal Causality Re-duction [20], etc.) begins by fixing a set of user-providedinputs, and concretely executing the program under test

with these inputs. We then use the results of the con-crete execution to construct an event graph represen-tation of the program’s observed behaviors. The eventgraph captures the observed control flow paths of theprogram; observed input-output relations for uninter-preted functions; observed memory operations, condi-tional branches, and loops; and dependences betweeneach of these components. SATCheck then translatesthe event graph into a SAT formula that captures theobserved behaviors of the program. SATCheck nextadds clauses to this SAT formula, that, when satis-fied, generate an execution that expands SATCheck’sknowledge of the program’s behavior—either by takinga new (unexplored) direction on a branch, by visitinga novel interleaving, or by learning a new input-outputrelation for an uninterpreted function. SATCheck usesan o�-the-shelf SAT solver to solve the SAT formula.If there is no solution, then no interleaving will yieldnew behaviors—SATCheck has explored all programbehaviors (for the given program input). If there is asolution, SATCheck generates a interleaving from thesolution and repeats the process. Once SATCheck hasconverged, the SAT formula describes all observable be-haviors of the program under test for a particular set ofinputs to that program.

1.2 ContributionsThis paper makes the following contributions:

• Basic Approach: It presents a new techniquefor model checking concurrent code. The approachlearns the behaviors of the program through con-crete executions and uses a SAT solver to search forexecutions that reveal new program behaviors.

• TSO Support: After developing the algorithmfor the sequentially consistent memory model, itpresents an extension of the algorithm to Total StoreOrdering (TSO).

• SATCheck Implementation: It presentsSATCheck, a prototype implementation of ourmodel checking technique. Our implementationincludes a Clang-based frontend that instruments Cprograms for use with SATCheck.

• Evaluation: It presents an evaluation of theSATCheck implementation on several concur-rent data structures. Our evaluation shows thatSATCheck scales to much larger problem sizes andruns more quickly than previous concrete execution-based approaches.

2. ExampleWe explain how SATCheck works using the simplespin lock example shown in Figure 1. This example

2 2015/8/12

Page 3: SATCheck: SAT-Directed Stateless Model Checking for SC and TSO

1 typedef struct lock_t {2 int lock;3 } lock;45 lock a;67 void initlock (lock *l) {8 l->lock = 0;9 }

10 bool trylock (lock *l) {11 int val=cas (&l->lock , 0, 1);12 return val ==0;13 }14 void unlock (lock *l) {15 store (&l->lock , 0);16 }17 void foo () {18 if ( trylock (&a)) {19 unlock (&a);20 }21 }

Figure 1. C spin lock implementation.

implements methods trylock and unlock, which arecalled by the driver foo method.

Assume a program which creates two threads, bothof which call foo. SATCheck starts by concretely exe-cuting the program. Assume that in the first execution(1) Thread 1 first executes the CAS operation in itstrylock, then (2) Thread 2 executes the CAS opera-tion in its trylock, and finally (3) Thread 1 executesthe store in unlock. After SATCheck observes this ex-ecution, it constructs the initial event graph in Figure 2.

2.1 Event Graph Construction

<tid=1,1>, x0=cas(r(0)w(1))

<tid=1,2>, x1=f(x0) (0 1)

<tid=1,3>, branch(x1)

<tid=1,4>, merge <tid=1,3,br(1),0>, store(0)

<tid=2,1>, y0=cas(r(1)w(1))

<tid=2,2>, y1=f(y0) (1 0)

<tid=2,3>, branch(y1)

<tid=2,4>, merge <tid=2,3,br(0),0>, nop

Figure 2. Event graph summarizes first execution (T1

CAS, T

2

CAS, T

1

store) with unconditionally executednodes (connected by solid black arrows) and condition-ally executed nodes (dashed blue arrows).

Each operation instance in the execution is as-signed a unique execution point (or EP) tuple, shownin <bolded angle brackets> in Figure 2. EP tu-ples allow SATCheck to match equivalent operationinstances between di�erent executions and reflect thenested structure of the programming language. An EPtuple starts with the thread identifier and includes a se-quence of counts. SATCheck maintains per-thread EPcounters during the program’s execution. The executioncounter for a thread starts as a 2-tuple consisting of thethread identifier along with a single counter set to 1.

Figure 2 shows that Thread 1’s first CAS operationgets EP <tid=1,1>. Since this operation runs first, itreads the value 0 and writes the value 1, as indicated bythe label r(0)w(1). Also, SATCheck assigns a uniqueidentifier for the output of each event graph node.Thread 1’s first CAS operation gets the identifier x

0

.After a normal operation (e.g. load, store, RMW, or

function invocation), SATCheck increments the last el-ement of the EP. Thus, the second operation, an un-interpreted function that evaluates val==0, gets EP<tid=1,2>. The label x1=f(x0) indicates that the func-tion depends on the value x0 produced by the previousCAS operation and that the identifier x1 gets the outputof the function. SATCheck remembers the input-outputrelation of the function: the label 0 æ 1 indicates thatSATCheck has observed that for an input of 0, functionf produces output 1.

The following branch operation gets EP tuple<tid=1,3>. After a branch operation, SATCheck ap-pends the direction of the branch and a new zeroedcounter to the EP. Thus, the conditionally executedstore operation has EP <tid=1,3,br(1),0>. The frag-ment br(1) indicates that this operation is only exe-cuted when the enclosing conditional branch is taken.We graphically indicate conditionally executed codewith a dashed blue arrow. The black arrow shows thenext statement to be executed after the code enclosedby the conditional branch finishes.

The event graph also includes information on Thread2’s execution. The explanation is analogous to that forThread 1.

From this event graph, SATCheck deduces the pos-sibility of behaviors that it has not yet observed: (1)an execution where Thread 1 skips the body of its if

statement; (2) an execution where Thread 2 executesthe body of its if statement; and (3) executions wherethe two uninterpreted functions are evaluated on inputvalues that di�er from the current inputs. An execu-tion with a di�erent interleaving will allow SATCheckto evaluate the uninterpreted functions on di�erent in-puts.

2.2 Translation to SATSATCheck next translates the event graph into a SATformula that describes the observed behaviors of theprogram and the execution interleaving. It then addsgoal clauses that encode a set of constraints that, whensatisfied, ensure that the program exhibits a new behav-ior. New execution interleavings potentially yield newprogram behaviors that satisfy these goal clauses — theSAT solver searches for such interleavings that drive theprogram to produce new behaviors.

Like CheckFence [9], SATCheck represents the ex-ecution interleaving by assigning a single boolean in-

terleaving variable to every pair of memory opera-

3 2015/8/12

Page 4: SATCheck: SAT-Directed Stateless Model Checking for SC and TSO

tions from di�erent threads. If the boolean variableis true, the first memory operation appears earlierin the interleaving. If it is false, the second memoryoperation appears earlier in the interleaving. (UnlikeCheckFence, our approach operates on concrete ex-ecutions.) For our running example, SATCheck gen-erates the SAT variable v<tid=1,1>,<tid=2,1>, which istrue when operation <tid=1,1> occurs before opera-tion <tid=2,1>. SATCheck also generates the variablev<tid=1,3,br(1),0>,<tid=2,1> for the store in Thread 1. Fi-nally, SATCheck generates constraints between thesevariables to enforce transitivity properties (to ensurethat the interleaving is a total order).

SATCheck uses a potential value set analysis on theevent graph to determine the values that variables (in-cluding addresses) may take, in the absence of novel be-haviors. Novel behaviors create new, previously unseenvalues. Potential value sets do not capture all possiblevalues, only the values a variable may have if the exe-cution does not encounter new program behaviors. Thepotential value set analysis does not need to account fornew values; they will be generated in later iterations byconcrete program iterations. In particular, SATCheckassumes that uninterpreted functions do not generatenew values.

On our running example, the analysis tells us that(1) the CAS and store operations only access one ad-dress, (2) the values written to that address are 0 and 1,and (3) the input values to the memory operations arefixed. Note that new values are created by generatingexecutions that create these values and SATCheck gen-erates goal clauses to ensure that it produces all possiblevalues. Future runs of the potential value set analysiswould then include those values.

Because the example uses fixed addresses for itsmemory operations, SATCheck does not need to allo-cate any SAT variables for addresses. Since the mem-ory operations only store two possible values ({0, 1}),SATCheck encodes them using a single SAT variable toindex into the set.

We also allocate value variables for memory opera-tions, and allocate two variables per CAS operation (onefor the value read, r<tid=1,1> and r<tid=2,1>, and one forthe value stored, w<tid=1,1> and w<tid=2,1>) and one SATvariable w<tid=1,3,br(1),0> for the store, representing thevalue it stores. For each load operation, we encode thestore it reads-from using SAT variables. The CAS op-eration <tid=1,1> can read-from one of two stores: theCAS operation <tid=2,1> or the initialization store.Thus we can use a SAT variable ¸<tid=1,1> to representwhich of these two stores it reads-from. If variable ¸ indi-cates that a load reads-from a given store, then (1) theirvalues must match, (2) the store must be ordered be-fore the load, and (3) there cannot be a conflicting store

to the same address ordered between them. SATCheckgenerates constraints to capture these properties.

The uninterpreted functions take as input the valuereturned by the CAS operation (0 or 1 from the po-tential value set). Consider the uninterpreted function<tid=1,2>x

1

= f(x0

)(0 æ 1). We model this function’soutput as either the previously observed value 1 or aspecial value to represent an as-yet unobserved output.One SAT variable can represent these two cases.

This graph contains two branches. Both brancheshave taken one direction each; the remaining direc-tions are unexplored. Thus we represent the state ofthe branch with two values: either the explored direc-tion, or a new direction. We can therefore allocate oneSAT variable per branch. From the current execution,we know that if the uninterpreted function returns 1,then the if body will be executed. Thus we can buildan implication constraint from the SAT variable for theoutput of the uninterpreted function to the SAT vari-able for the branch direction.

The e�ect of store <tid=1,3,br(1),0> store(0)must be predicated on taking the branch. Thus thebranch SAT variable appears in the constraints for loadsto ensure that if some executed load reads from thisstore, then the store was in fact executed.

Forcing Novel Executions The key novelty of ourapproach is that we model sets of observed concrete exe-cutions of the program (including dependencies betweencomputed values) and iterate to find new, interesting ex-ecutions. This greatly reduces the state space that mustbe explored. To support this approach, SATCheck gen-erates a clause whose satisfaction implies the existenceof a new execution. We have designed SATCheck’s en-coding so that it is simple to generate such a clause. Theclause simply needs to evaluate to true when a branch oran uninterpreted function takes on a novel value. Theseconditions are explicitly represented in our encoding.The interleaving that drives the program to a novel ex-ecution can then be recovered from the SAT solution— the truth assignments for the interleaving variablesdirectly encode the desired interleaving.

2.3 Iterating Concrete ExecutionsHaving encoded a SAT formula whose satisfaction im-plies a novel execution, SATCheck calls the SAT solverto request a satisfying assignment of that formula andconverts that assignment back into an execution inter-leaving. It repeats the execute/encode/solve loop untilit exhausts all possible novel behaviors.

In our example, SATCheck might learn from the SATsolver that the branch in Thread 2 would take a di�er-ent direction if Thread 2 executes trylock first. Thesatisfying assignment encodes properties of the inter-leavings which would cause this novel behavior. In one

4 2015/8/12

Page 5: SATCheck: SAT-Directed Stateless Model Checking for SC and TSO

<tid=1,1>, x0=cas(r(0)w(1))

<tid=1,2>, x1=f(x0) (0 1)

<tid=1,3>, branch(x1)

<tid=1,4>, merge <tid=1,3,br(1),0>, store(0)

<tid=2,1>, y0=cas(r(1,0)w(0,1))

<tid=2,2>, y1=f(y0) (1 0, 0 1)

<tid=2,3>, branch(y1)

<tid=2,4>, merge <tid=2,3,br(0),0>, nop <tid=2,3,br(1),0>, store(0)

Figure 3. Second execution adds new behaviors fortid=2, including new uninterpreted function outputsand a store node.

Figure 4. Complete event graph after third execution,summarizing all behaviors for given program input.

such interleaving, (1) Thread 2 executes the CAS op-eration in trylock, (2) Thread 2 executes the store

in unlock, (3) Thread 1 executes the CAS operationin trylock, and (4) Thread 1 executes the store inunlock. SATCheck then executes the program underthis new interleaving and incorporates this new execu-tion into its event graph. Figure 3 presents the resultingevent graph.

From this graph, SATCheck observes that it still hasnot explored an execution in which Thread 1 skips thebody of the if statement and may not have evaluatedthe uninterpreted functions on all possible input values.Thus, SATCheck repeats the process. The new solutionfrom the SAT solver generates the following interleav-ing: (1) Thread 2 executes the CAS operation in itstrylock, (2) Thread 1 executes the failed CAS oper-ation in its trylock, and (3) Thread 2 executes thestore in its unlock.

After SATCheck executes that interleaving, it hasexplored all possible branches in the program. Thefinal event graph shown in Figure 4 reflects the thirdinterleaving and shows that all branches are taken inall directions.

However, it is not immediately obvious from theevent graph that SATCheck has evaluated the uninter-preted functions on all possible input values. SATCheckthus generates a new query for the SAT solver. The SATsolver then reports that it is not possible to evaluate theuninterpreted function nodes on any new inputs.

At this point, SATCheck terminates, having exploredall possible behaviors for the given program input. Fur-thermore, SATCheck has produced a SAT model of allpossible executions.

cond branchn(Èi0

, ..., ikÍ) = Èi0

, ..., ik, n, 0Ímergel(Èi0

, ..., il≠1

, ..., ikÍ) = Èi0

, ..., il≠1

+ 1Íloop enter(Èi

0

, ..., ikÍ) = Èi0

, ..., ik, 0Íloop exitl(Èi0

, ..., il≠1

, ..., ikÍ) = Èi0

, ..., il≠1

+ 1Íothers(Èi

0

, ..., ikÍ) = Èi0

, ..., ik + 1Í

Figure 5. Rules for Updating Operation Tuples

3. Event GraphAs SATCheck executes a program, it dynamically buildsan event graph representation for each thread whichcaptures the thread’s behavior over observed programexecutions. The event graph consists of a set of nodes N

and set of edges E ™ N ◊ N . Nodes in the event graphrepresent dynamically executed program operations—there is a unique node for each dynamic instance ofa program operation in an execution (even if nodescorrespond to the same static source code operation).The encoding is SSA-like: all nodes (except phi nodes)accept inputs from exactly one node in the event graph.

The nodes represent the following program opera-tions: (1) stores, (2) loads, (3) atomic RMWs, (4) con-ditional branches, (5) control flow merges from condi-tional branches, (6) uninterpreted function invocations,(7) equals comparisons, (8) loop entries, (9) loop exits,and (10) phi functions.

When operations in these nodes take values fromvariables or temporaries, an event graph node recordsthe source node for that value. SATCheck uses phifunction nodes to ensure that there is always exactlyone source node for each variable for non-phi nodes.

To merge a new execution into the event graph,SATCheck must match equivalent operations from dif-ferent executions. SATCheck does this by defining ex-ecution point (EP) tuples. A thread starts with theEP Ètid, 0Í. Figure 5 presents the rules for updat-ing a thread’s operation tuple during an execution.SATCheck identifies operations at the same EP withthe same event graph node, even across executions.

Event graph nodes record all values that SATCheckhas observed as output for the operation that corre-sponds to the node. For function nodes, SATCheck alsorecords, for each previously observed assignment of val-ues to inputs, the output value of the function.

3.1 Potential Value Set AnalysisThe SAT translation process begins with a fixed pointcomputation of potential value sets for each variableover the event graph. These sets enable the translationto fix encodings for the generated SAT instance. Asdiscussed in the previous section, potential value sets

5 2015/8/12

Page 6: SATCheck: SAT-Directed Stateless Model Checking for SC and TSO

do not include all possible values for a variable. Theyonly include all possible values under the current setof executions. SATCheck lazily generates the remainingpossible values in future iterations as it encounters newprogram behaviors that create those values.

Encoding all possible values of program variableswould increase the size of the SAT encoding. However,encoding only previously-observed values would a�ectthe SAT solver’s ability to reason about new execu-tions (requiring more iterations). Thus, we compute apotential value set for the operation corresponding toeach event graph node that includes (1) values we haveobserved for that operation from previous executions,(2) values that could be propagated by rearranging thereads-from relation between loads and stores withoutevaluating uninterpreted functions on new values, and(3) values that could propagate through phi functions.Our computed value sets encoded existing behaviorsand did not require too many needless iterations.

• Load/Stores/CAS: The potential value set for thevalue read by a load is the union of the potentialvalue sets for the values written by all stores (orRMW operations) that may write to an address thatthe load may read from. To determine where a loadmay read from or where a store may write to, we usethe potential value sets of the addresses.

• Atomic Add: SATCheck approximates the outputvalues written at an atomic add using the valuesSATCheck has observed from the atomic add in pre-vious executions. This ensures termination, whichwould not otherwise be guaranteed by using the po-tential value set for the add’s inputs. This designchoice has a cost—SATCheck must treat new out-puts of atomic adds as new behaviors that the SATsolver explicitly searches for executions to generate.

• Functions: The potential value set for the outputof a function is the set of values it has generated inprevious executions.

• Equals: The potential value set for an equals oper-ation is true or false.

• Phi Functions: The potential value set for a phifunction is the union of the potential value sets of allof its inputs.

3.2 Partitioning Memory OperationsSATCheck next partitions load, store, and RMW ac-tions such that if two actions can access the same mem-ory address, they are in the same partition. For thiscomputation, we use the set of addresses computed bythe potential value set analysis. All memory operationsfrom the same partition share the same value and ad-dress encodings.

4. Encoding the Event Graph into SATWe next discuss how SATCheck encodes the eventgraph into SAT. The encoding has two components: (1)SATCheck first chooses SAT variables to encode execu-tions (Section 4.1); and (2) SATCheck formulates SATclauses to ensure that executions are valid (Section 4.2).

4.1 Representing Executions with SATVariables

Encoding Interleavings (Execution Order) Forany two memory operations ni, nj œ N

memory

, whereni is performed by thread i and nj is performed bythread j: if i < j, and ni and nj are not ordered bythread creation or thread joins, then SATCheck createsa SAT variable vni,nj that is true if ni is executed beforenj (i.e., ni

sc≠æ nj) and false if nj is executed beforeni. This SAT interleaving variable e�ectively encodesthe execution interleaving. Note that execution graphsdescribe the behaviors of multiple di�erent executions—a given store may not be executed in a given execution.Our encoding orders all memory operations includingthose that are not executed—memory operations thatare not executed simply do not have any e�ects.

Encoding Control Flow The event graph summa-rizes all observed executions. For example, if SATCheckhas explored both sides of a conditional branch, theevent graph will contain the events for both sides. A so-lution to the SAT formula describes a single execution,and thus the SAT formula must encode all potentialexecutions’ control flows.

In a given execution, a conditional branch operationcan either: (1) not be executed, (2) take the samedirection as we have observed in a previous execution,or (3) take a new direction down the branch (if thereexist unexplored directions for that branch).

Conditional branches in SATCheck are used to modelif statements and switch statements, and thus supportmore than 2 directions. Our encoding uses one state tomodel the case that the branch was not executed andone state for each of the possible directions the branchhas been observed to take. For branches that still haveunexplored directions, our encoding also uses one stateto model the branch taking a new direction.

Thus, for a conditional branch br with m possi-ble directions of which we have observed n directions,we encode r = 1 + min(m, n + 1) possible behaviors.SATCheck uses Álog

2

(r)Ë SAT variables to model eachof these possible behaviors.

Values of Program Variables As mentioned inSection 3.1, encoding all possible values for programvariables significantly increases the size of the encoding.Instead, SATCheck first computes the potential valuesSv for a variable using the potential value analysis from

6 2015/8/12

Page 7: SATCheck: SAT-Directed Stateless Model Checking for SC and TSO

Section 3.1. It then creates Álog

2

(|Sv|)Ë SAT variablesto encode the value of variable v. The SAT variableencodes the variable v’s value as an index into Sv.

For the output of uninterpreted functions or atomicadds, we add a special value to Sv to encode newoutputs that have not yet been observed.

Encoding Memory Operations For each memoryoperation partition p, we have both a set of addressesAp that the operations may access and a set of valuesVp that a load may read or that a store may write.SATCheck uses Álog

2

(|Ap|)Ë SAT variables to encodeaddresses and Álog

2

(|Vp|)Ë SAT variables to encode val-ues. SATCheck encodes addresses and values by the bi-nary encoding of their index in the corresponding valueset.

Note that memory operations may operate on SATvariables expressed in di�erent encodings. For example,a store operation may take its input value and addressfrom operations that use a di�erent encoding than thestore’s memory partition. SATCheck generates implica-tion constraints to translate between di�erent encod-ings. Load operations induce a set of SAT variables forvalues read and addresses read from, each in an ap-propriate encoding for the operation. Store operationsinduce SAT variables for values written and addresseswritten to.

Encoding the Reads-From Relation For each load¸, SATCheck computes stores R¸ whose potential ad-dress set has a nonempty intersection with the potentialaddress set for load ¸. SATCheck then uses Álog

2

(|R¸|)ËSAT variables to encode which store load ¸ reads-from.SATCheck encodes the reads-from relation as the bi-nary encoding of the store’s index in the set R¸.

4.2 SAT Clauses Ensuring Valid ExecutionsWe next describe how SATCheck encodes executions interms of constraints on the SAT variables described inthe previous section.

Branch Constraints As the event graph summa-rizes the behavior of all executions SATCheck has pre-viously explored, a given execution will typically notexecute all the events in the event graph. We next de-scribe the constraints that capture the path of a givenexecution through the event graph.

SATCheck assumes that conditional branches havea nested structure. Thus at each event e in the eventgraph, we can compute a nested stack se of conditionalbranches and directions that were taken to reach thegiven event node. For each conditional branch b, wehave an input variable vb, and potentially a requiredpreceding conditional branch b

Õ that took direction d

(e.g., branch b is only reachable when branch b

Õ takesdirection d).

We can encode the behavior of branch b using thefollowing two constraints:1. If branch b

Õ did not take direction d, then branch b

was not executed.2. If branch b

Õ took direction d, then branch b takes thedirection vb specified by its input variable.

Transitive Ordering Constraints for Interleav-ings We next describe the transitive ordering con-straints that ensure that the ordering sc≠æ, i.e., the ex-ecution interleaving, totally orders all memory opera-tions.

For any three memory operations ni, nj , nk œ N ,SATCheck generates a set of transitive ordering con-straints. These constraints capture the following prop-erty: ni

sc≠æ nj · njsc≠æ nk ∆ ni

sc≠æ nk.Load Read-From Constraints We next describeseveral consistency constraints between the reads-fromrelation, the sc≠æ total order, the values read by loadsand written by stores, and the addresses accessed byloads and stores. These constraints are the final compo-nent of encoding the execution interleaving—they en-sure that the behaviors of memory operations are con-sistent with sc≠æ (the execution interleaving).

SATCheck begins by computing a set of stores thateach load may potentially read from. For each suchstore, SATCheck instantiates the following constraints:

1. If load b reads-from store a, then store a mustbe sc ordered before load b:

a

rf≠æ b ∆ a

sc≠æ b

2. A load must read the same value as the storeit reads-from:

a

rf≠æ b ∆ value(a) = value(b)

3. If a load reads from a store, then both theload and store must access the same memoryaddress:

a

rf≠æ b ∆ address(a) = address(b)

4. To read from a store, it must have been exe-cuted:

a

rf≠æ b · was-executed(b) ∆ was-executed(a)

5. There cannot be a conflicting store between aload and the store it reads from:a

rf≠æ b ∆ (’c œ Stores. ¬was-executed(c) ‚¬a

sc≠æ c ‚ ¬c

sc≠æ b ‚ address(c) ”= address(b))

6. Every executed load must read from somestore:

was-executed(b) · is-load(b) ∆ ÷a. a

rf≠æ b

7 2015/8/12

Page 8: SATCheck: SAT-Directed Stateless Model Checking for SC and TSO

Load/Store Value and Address Encoding Eachload takes an address as an input and each store takesboth an address and a value as inputs. SATCheck usesthe results of the potential value set analysis to computethe potential values of the input variable (unless a newvalue is generated via a new behavior). For each poten-tial value of the input variable, SATCheck generates animplication that, if the input variable has value v, thenthe relevant address or value variables for the load orstore must also have value v. As mentioned above, theencodings for the memory operation and for the parti-tions need not march.

For example, for a store’s input valuevariable v

input

, SATCheck generates impli-cations of the form value(v

input

) = n ∆value(store’s SAT value variables) = n for all n inv

input

’s potential value set, to convert the encodingof the input variables into the encodings used by thestore’s memory partition.

Function Constraints For each uninterpreted func-tion, SATCheck stores the relation between specific in-put value assignments and the corresponding observedoutput values. For each known assignment of the inputvalues, SATCheck generates an implication that, if theinput variables match the assignment, then the func-tion’s output value matches the previous output.

If an execution generates a new assignment to theinputs of an uninterpreted function, then none of theimplications apply. The function’s output can thereforetake on any value. Recall that for the output of eachuninterpreted function, SATCheck includes a specialvalue that serves a placeholder for a new output value.

Equals Constraints One assumption of SATCheckis that either a fixed program input will lead to a smallset of inputs for uninterpreted functions over the setof concurrent executions, or that the outputs of theuninterpreted function on a maximal range of inputsis interesting. In our experience, this is generally true,but equality comparisons (if modeled as uninterpretedfunctions) can sometimes be an exception.

Concurrent data structure implementations often useequality comparisons on counters to see whether any-thing has changed. While such computations can feedmany combinations of values into the comparison, theonly interesting information is whether the inputs tothe comparison are equal.

We therefore provide a built-in comparison operationwhich enables SATCheck to avoid generating all possi-ble inputs for the comparison operation.

Constraints for Atomic Add and CAS opera-tions To simplify SATCheck’s treatment of CAS op-erations, we model a failed CAS operation as a store ofthe old value. We then handle CAS operations by com-

bining the techniques we have used for load and storeoperations. The key di�erence is that SATCheck gener-ates a constraint that sets the value written by the CASoperation to newvalue if the value read matches theoldvalue, and otherwise simply stores the same valuethat the CAS read. We assume a strong CAS operationhere; it is straightforward to modify the encoding tosupport weak CAS operations with spurious failures.

Although it is conceptually straightforward to com-pute the output of an atomic add operation given itsinputs during the potential value computation, doingso can prevent the potential value computation fromterminating. SATCheck treats atomic add operationsas a combination of a load operation, an uninterpretedfunction invocation, and a store operation.

Yields Like many model checkers, SATCheck usesyields to avoid the problem of unfair schedules caus-ing the program to loop. Break statements out ofconditionally-executed loops complicate our treatmentof yields: SATCheck may not be aware of their existencewhen it discovers a yield. Thus, SATCheck generates aconstraint that an execution should not contain a yieldunless the thread that calls yield first explores a newbranch direction.

4.3 Generating New Behaviors via GoalExpressions

SATCheck iteratively generates complete event graphmodels by recording past program behaviors in theevent graph and generating clauses (goal expressions)that, when true, indicate executions that demonstratenew behaviors. There are two ways that the event graphfor a program can be incomplete:• Untaken Conditional Branch Directions: If no

previous program execution has taken a given direc-tion of a conditional branch, the event graph willmiss events reachable via that branch direction.

• Unknown Behaviors for Uninterpreted Func-tions or Atomic Add Operations: If there isan input assignment to an uninterpreted function oratomic add operation that can be generated by anexecution and SATCheck has not explored an exe-cution that generates that input assignment, thenSATCheck’s model of that uninterpreted function isincomplete.For conditional branches, SATCheck generates a goal

expression that evaluates to true if the branch takes anew direction. For uninterpreted functions, SATCheckgenerates a goal expression that evaluates to true if thefunction outputs the unknown-output placeholder.

After exploring the entire event graph, SATCheckgenerates a SAT formula that is true if at least one goalexpression is true.

8 2015/8/12

Page 9: SATCheck: SAT-Directed Stateless Model Checking for SC and TSO

5. Exploring Concrete ExecutionsAfter SATCheck encodes the event graph as a SATformula, it passes this formula to a SAT solver. If theSAT solver finds that the formula is unsatisfiable, thenthe event graph is complete: it is impossible to constructan interleaving such that uninterpreted functions seenew input value assignments or that branches take newdirections. Thus, SATCheck has explored all reachablebehaviors of the program.

If the SAT solver finds a solution to the SAT formula,the solution can be converted to an execution exhibit-ing behavior that is not currently modeled. SATCheckconverts the SAT solution into paths through the eventgraph for each thread. Each path traverses a set of mem-ory operations, and the truth assignments for the inter-leaving variables specify the execution interleaving thatgenerates the desired new behavior. SATCheck repre-sents the interleaving as a set of wait pairs. A wait pairconsists of two memory operations: a stop point and anotify point. A thread’s execution stops at a stop pointuntil its partner thread has reached the notify point.

The SATCheck scheduler then performs a concreteexecution using the wait pairs. Note that the interleav-ing generated by the SAT formula is only guaranteedto be realizable until the point at which the executiondeviates from previous behavior. After the executionexhibits a new behavior, there is no guarantee that theexecution will continue to follow the path modeled bythe SAT solution.

To extend a concrete execution, the SATCheckscheduler executes events from a thread’s execution un-til that thread reaches a memory operation. At a mem-ory operation, the scheduler may choose a new thread.SATCheck’s scheduler uses a round robin approach toselect a new thread for execution, respecting constraintson thread selection imposed by wait pairs.

When an execution deviates from the interleavingspecified by the SAT formula due to a new behavior,it is possible that all threads may get stuck waiting atwait pairs. Recall that wait pairs are based on previousexecutions and may no longer be valid. Hence, if nothreads are runnable because of wait pair constraints,SATCheck can ignore these constraints and arbitrarilypick a thread to run.

Note that SATCheck does not choose which newbehavior to explore first—it explores whichever newbehavior the SAT solver discovers. When SATCheckobserves a new behavior, it integrates that behavior intothe event graph, ensuring that the corresponding goaldoes not get generated in the future.

6. ExtensionsWe next describe several extensions we have imple-mented to the core SATCheck algorithm to improveperformance and to support TSO.

6.1 Field SupportThe base algorithm can handle loads and stores to fieldsthrough uninterpreted functions. Failing to di�erentiatethe objects containing the generated fields will causeSATCheck to explore executions that actually accessall of the objects that each load or store can access.

Instead, explicitly modeling field accesses can greatlyreduce the number of executions that SATCheck needsto explore. We have therefore added explicit supportfor fields. This embeds the address computation di-rectly into the SAT encoding for the memory access.SATCheck then only needs to generate executions thataccess the fields of interesting structures.

6.2 Sharing Between Instances ofUninterpreted Functions

In many programs, the same uninterpreted functionsare accessed many times during an execution. The basealgorithm doesn’t share information between di�erentdynamic instances of the same uninterpreted functionand thus attempts to generate executions that generatethe same inputs for di�erent instances of the same un-interpreted function. We have added support for unin-terpreted function identifiers that signal to SATCheckthat inputs learned from one instance of an uninter-preted function can be shared with other instances.

6.3 Incremental SolvingMany SAT solvers support incremental solving modes inwhich variations of an initial SAT problem can be solvedmore e�ciently by leveraging clauses learned from theinitial problem.

SATCheck can leverage incremental solving capabil-ities by reusing the same SAT encoding to generate ad-ditional executions that achieve goals that were not cov-ered by the previous solutions.

While this optimization often improves performance,it can harm performance. In general SATCheck mustfail to find a solution to a new encoding before it canterminate as the reused encodings do not incorporatenewly discovered behaviors. If SATCheck encountershard incremental SAT problems to solve, the e�ortmay have been better spent on solving the updatedSAT constraints that incorporate the newly learnedbehaviors.

6.4 TSO ExtensionModern x86 processors implement the Total Store Or-dering memory model. In the TSO memory model,

9 2015/8/12

Page 10: SATCheck: SAT-Directed Stateless Model Checking for SC and TSO

stores are placed in a store bu�er before main mem-ory is updated. This allows loads to be reordered aboveprevious stores from the same thread.

We have implemented TSO support in SATCheck.The key idea is to separate store operations into twocomponents: a locally visible store action and a globallyvisible update that moves the store from the local storebu�er to update shared memory. Abdulla et al [2] usea similar approach to extend DPOR for TSO.

Without loss of generality, SATCheck executes lo-cally visible store actions immediately after the previ-ous load operation from the same thread. Because theyare only locally visible, local stores commute with op-erations from other threads. SATCheck then encodesa search (to determine when the store should becomeglobally visible) into the SAT formula.

Loads may be reordered in front of the update ac-tions for previous stores from the same thread. Thuswe introduce ordering variables between loads and theupdate actions from previous stores in the same thread.We also change the meaning of the existing executionorder variables for threads—these variables now modelwhen the update is flushed to shared memory. This doesincrease the number of potential executions to exploreas a given thread may now have more than one opera-tion it can execute before—either an update from thestore bu�er or a load operation.

TSO also includes a fence that flushes the store bu�erand makes all previous stores globally visible. RMWactions on x86 also have the e�ect of flushing the storebu�er. To account for these, we treat loads slightlydi�erently. If the update action for a store has notbeen evicted from the store bu�er, later loads from thesame address from that thread must read from the localstore bu�er. Fence operations and RMW operationshave the e�ect of flushing the store bu�er if executed.We implement this behavior as an implication—if thefence or RMW action is executed, then we force ordervariables for the updates of previous stores ahead ofthe fence or RMW to order them before loads after thefence or RMW.

7. Memory ModelsAlthough SATCheck was implemented to support theSC and TSO memory models, its techniques are appli-cable to more relaxed memory models. It is straight-forward to support processor memory models, as longas the output of uninterpreted functions must dependsolely on their inputs. Processors must not reorder astore depending on an uninterpreted function before aload that provides the function’s input.

Handling language memory models is a more com-plex issue, as the mainstream language memory mod-els are known to be incompatible with formal reason-

ing [3, 8, 29] due to out-of-thin-air (OOTA) behav-iors [7]. With the addition of reasonable constraintsthat forbid OOTA behaviors, such as those suggestedby [6, 7], it should be possible to adapt the techniquesin SATCheck for use in checking programs against re-laxed language-level memory models.

8. Test Schedule GenerationAn advantage of combining concrete execution with aSAT solver to guide exploration of executions is that,even if the resulting program is too complex to fully an-alyze, the SAT solver will generate schedules that canbe useful for testing. Approaches that are solely basedon SAT will either provide a complete answer or gen-erate a query that is too complex for the SAT solverto handle. In that case, the query itself is not of inde-pendent interest. Stateless model checking approachesbased on DPOR do yield vast numbers of executions,but many of these executions are redundant.

On the other hand, each execution of SATCheck ex-plores some new aspect of the input program—by mod-ifying the program’s scheduling, SATCheck either exer-cises a new control flow path or produces new and po-tentially interesting inputs to uninterpreted functions.These executions could generate specifications of inter-esting test cases in an appropriate test case specificationlanguage [12].

9. InstrumentationWhile SATCheck builds models of program execu-tion by observing and guiding its dynamic behavior,it also requires program instrumentation to propa-gate dependency information and to identify uninter-preted functions, branches, and loops. We have imple-mented a Clang-based frontend which accepts C codeand produces instrumented C code suitable for use withSATCheck. Our frontend generates all of the instrumen-tation for the benchmarks that we present in Section 10.

Although the current implementation is a researchprototype, it is based on the industrial-strength Clangfrontend. We have started with benchmarks in idiomaticC and gotten them through our frontend without dif-ficulty; getting a new benchmark through the fron-tend may require a modest amount of straightforwardbenchmark modification or frontend development (if thebenchmark uses parts of C that we do not currentlyhandle).

The instrumentation records provenance informationfor values that come from the heap or depend on sharedvariables. Before each access to a shared variable, theinstrumenter inserts a call to the SATCheck runtime li-brary with identifiers for the access’s input state. Theruntime library returns an identifier for the outputstate. At conditional branches, our instrumenter refac-

10 2015/8/12

Page 11: SATCheck: SAT-Directed Stateless Model Checking for SC and TSO

tors the condition into a temporary variable if neces-sary and inserts code to notify the SATCheck runtimelibrary about the condition and the direction that thebranch eventually takes.

Most importantly, the instrumentation notifies theruntime library about computations on values thatcome from shared state. After each computation thatdepends on shared state, the instrumenter generates anaccompanying uninterpreted function notification withthe identifiers for the computation’s inputs and out-puts, thus enabling the creation of function constraintsas described in Section 4.2.Example. We continue with an example demonstrat-ing the operation of our instrumenter. Figure 6 presentsthe uninstrumented code for the read routine for the se-qlock benchmark in Section 10, while Figure 7 presentsthe output of our instrumenter. The compiler frontendinserts MCID variables, which are used by the modelchecker to represent dependences.

In Figure 6, line 5 contains an if statement whosecondition needs refactoring. The instrumenter pulls outthe condition into variable cond30 and reports an unin-terpreted function to the model checker at line 9 of Fig-ure 7. The instrumenter also adds branch annotationsat lines 11 and 15 and merge annotations at lines 13and 32. Line 16 (and many others) are shared variableaccess notifications added by the instrumenter. Finally,the instrumenter inserts a custom uninterpreted func-tion, MC2 equals, for the equality condition at line 11of Figure 7.

1 int seqlock_read () {2 int res;3 int old_seq = load_32 (& _seq); // acquire45 if ( old_seq % 2 == 1) {6 res = -1;7 } else {8 res = load_32 (& _data );9 int seq = load_32 (& _seq);

1011 if (seq == old_seq ) {12 ;13 } else {14 res = -1;15 }16 }17 return res;18 }

Figure 6. Uninstrumented seqlock read.

Implications. Our front-end enables the modelchecking of realistic concurrent C code to a scale be-yond that achieved by previous techniques. It requiresthat the code to be verified use a fixed set of primitivesto perform shared memory operations (loads, storesand rmws, or read-modify-writes). The user must pro-vide a driver that exercises the functionality of interestin the code to be checked. The driver must also supplyall needed inputs.

1 int seqlock_read (MCID * retval ) {2 MCID _mres ; int res;3 MCID _mold_seq ;4 _mold_seq = MC2_nextOpLoad ( MCID_NODEP );5 int old_seq = load_32 (& _seq); // acquire67 MCID _br30 ;8 int _cond30 = old_seq % 2 == 1;9 MCID _cond30_m = MC2_function_id (31 , 1, ΩÚ

sizeof ( _cond30 ), _cond30 , _mold_seq );

10 if ( _cond30 ) {11 _br30 = MC2_branchUsesID ( _cond30_m , 1, ΩÚ

2, true);

12 res = -1;13 MC2_merge ( _br30 );14 } else {15 _br30 = MC2_branchUsesID ( _cond30_m , 0, ΩÚ

2, true);

16 _mres = MC2_nextOpLoad ( MCID_NODEP ),17 res = load_32 (& _data );18 MCID _mseq ;19 _mseq = MC2_nextOpLoad ( MCID_NODEP );20 int seq = load_32 (& _seq);21 MCID _br31 ;22 MCID _cond31_m ;23 int _cond31 = MC2_equals (_mseq , (ΩÚ

uint64_t )seq , _mold_seq , ( uint64_t )ΩÚold_seq , & _cond31_m );

24 if ( _cond31 ) {25 _br31 = MC2_branchUsesID ( _cond31_m ,ΩÚ

1, 2, true);26 MC2_merge ( _br31 );27 } else {28 _br31 = MC2_branchUsesID ( _cond31_m ,ΩÚ

0, 2, true);29 res = -1;30 MC2_merge ( _br31 );31 }32 MC2_merge ( _br30 );33 }34 * retval = _mres ;35 return res;36 }

Figure 7. Instrumented seqlock read; instrumentationcalls have a MC2 prefix.

Our use of uninterpreted functions enables the wrap-ping of arbitrary binary blobs (including library calls).If some code to be verified calls a binary blob, or libraryfunction, that is known to be pure, then the insertionof an uninterpreted function after the binary blob willenable the verification of that code.

10. EvaluationWe have implemented SATCheck and plan to make itavailable as open source. We evaluated the performanceof SATCheck on a number of benchmarks and compareit to previous work. Our results (see Table 1) show thatSATCheck greatly outperforms previous work: it canexplore larger problem sizes and runs more quickly thanthe related work. We ran our evaluations on identicallyconfigured Ubuntu Linux 14.04 machines with IntelXeon E3-1246 v3 CPUs and 32GB of RAM. We raneach tool on each benchmark with a timeout of 1 hour.We wrote the benchmark drivers so that they would

11 2015/8/12

Page 12: SATCheck: SAT-Directed Stateless Model Checking for SC and TSO

take no inputs; the behavior of some benchmarks dodepend on scheduling, i.e. on whether a lock is availableor not at the time of the request. Most of the benchmarkdrivers start 2 threads; exceptions are the CAS spinlock,where we started up to 90 threads, and seqlock, wherewe used 3 threads. Some runs failed in less than an hour(e.g. due to solver problems). This section presents theresults from the sequentially consistent (SC) memorymodel. Appendix A presents the results from the TotalStore Ordering (TSO) model.

We compare SATCheck with:

1. CDSChecker’s implementation [24] of Flanagan andGodefroid’s dynamic partial order reduction al-gorithm [16] that incorporates support for sleepsets [17] as described in the addendum [15]. Weuse sequentially consistent atomic operations in CD-SChecker, implemented directly using DPOR andsleep sets.

2. Nidhugg’s implementation of source DPOR [2]. Wepresent results for both Nidhugg’s SC and TSOmemory models.

3. An enhanced version of CheckFence that has beenextended to support atomic addition operationsto e�ciently support our benchmarks. CheckFenceuses an iterative lazy algorithm to determine loopbounds. We configured CheckFence for the SC mem-ory model.

We evaluated SATCheck on these benchmarks:

CAS spinlock This benchmark (seen in Section 2)uses a compare-and-swap instruction to acquire a lock,and a store instruction to release the lock.

To test this benchmark we create two threads thatboth attempt to acquire and then release the lock.We vary the number of times each thread attempts toacquire and release the lock from 1 time up to 250 times.

Figures 8 and 13 present the results for this bench-mark. For SC, SATCheck was able to model check atest in which 2 threads attempted to acquire and re-lease a lock 250 times. We do not report results forCheckFence for more than 60 trylock/unlock pairs as itwas unable to analyze the execution for 70 pairs. TheDPOR implementations were only able to scale up to 8trylock/unlock pairs.

We also explored scaling up the number of threadsfor this benchmark. See Figure 18 in the Appendixfor results: SATCheck scales similarly when either datastructure ops per thread or number of threads increase.In one hour, SATCheck verified a 90-thread run.

MSQueue We ported the Michael & Scott lock-freequeue [22] from CheckFence’s version to CDSCheckerand SATCheck. The benchmark starts two threads.One thread enqueues values, while the other dequeues

����

��

���

����

�����

������

��� ��� ������ ��� ������ ��� ���� ���� ���� ����

��������

�����������������������������

����������������������

����������������

Figure 8. SC CAS spinlock: SATCheck scales to 4◊more data structure operations and runs 10◊–347◊faster than next best tool, CheckFence. (Lower is better,y-axis is log scale.)

����

��

���

����

�����

������

�� �� �� �� �� �� �� �� �� ��� ��� ��� ���

��������

�����������������������������

����������������������

����������������

Figure 9. SC MSQueue: SATCheck scales to 62%more data structure operations in an hour and runs upto 17◊ faster than Nidhugg. (Lower is better, y-axis islog scale.)

values. The standard version makes extensive use ofpointer arithmetic—outside of CheckFence’s limitedsupport for pointer arithmetic. Although SATChecksupports the standard version, we report results fromthe ported version, which omits the pointer arithmetic,to enable comparisons with CheckFence.

Figures 9 and 14 present results for MSQueue.SATCheck was able to verify 13 operations per threadwithin the hour. CDSChecker was able to verify 7 op-erations per thread and Checkfence and Nidhugg wereable to verify 8 operations per thread within the allo-cated hour.

Linux reader-writer lock A reader-writer lock al-lows multiple readers or a single writer to hold the lock.No reader can share the lock with a writer. We portedour reader-writer lock benchmark from an implementa-tion in the Linux kernel. However, the kernel implemen-

12 2015/8/12

Page 13: SATCheck: SAT-Directed Stateless Model Checking for SC and TSO

Table 1. SATCheck scales to larger maximum problem sizes than previous tools.CheckFence Nidhugg SATCheck

Max problem sizes for: CDSChecker (SC) (TSO) (SC) (TSO) (SC) (TSO)CAS Spinlock 8 40 40 8 8 250 100MSQueue 7 8 8 8 8 13 13Linuxrwlock 7 15 15 6 6 20 20Dekker 5 10 10 5 4 30 40Seqlock 3 25 25 3 3 60 60

����

��

���

����

�����

������

�� �� �� �� �� �� �� �� ��� ��� ���

��������

�����������������������������

����������������������

����������������

Figure 10. SC Linux RW Lock: SATCheck scales to33% more data structure operations than CheckFenceand takes about the same time, except at 15 operations,where SATCheck is 3.8◊ faster. (Lower is better, y-axisis log scale.)

tation is written in assembler for various platforms. Wetranslated the implementation into standard C code.

To test the reader-writer lock, our test driver runstwo identical threads, with a single rwlock t protectinga shared variable v. Each thread repeatedly does atrylock on the lock and then frees it.

CheckFence was unable to run the unmodified ver-sion of the Linux reader-write locks as the bias valueused by the locks exceeded the built-in range thresholdfor CheckFence. The reported results for CheckFenceare for a modified benchmark version that uses a smallerbias.

Figures 10 and 15 present the results for this bench-mark. SATCheck can analyze 20 pairs. CDSChecker cananalyze 7 trylock/unlock pairs, Nidhugg 6, and Check-fence 15 pairs within the allocated hour.

Dekker Dekker implements a simple critical sectionusing Dekker’s algorithm [31], where a pair of non-atomic data accesses are protected from concurrentdata access. Our driver for this benchmark is simplytwo threads each repeatedly calling the critical sectionroutine. Figures 11 and 16 present results for Dekker.SATCheck can verify up to 30 operations, while Check-Fence only verifies up to 10 operations; beyond that

����

��

���

����

�����

�� �� �� �� �� �� �� �� ��� ��� ��� ���

��������

�����������������������������

����������������������

����������������

Figure 11. SC Dekker Critical Section: SATCheckscales to 3◊ more operations than CheckFence and runs10◊–86◊ faster. (Lower is better, y-axis is log scale.)

number, it returns “Inconclusive”. CDSChecker andNidhugg were only able to check up to 5 operationsper thread within the allocated hour.

Seqlock Seqlocks are used in Linux to avoid writerstarvation. They allow writers to update without wor-rying about readers and thus allow the kernel to com-municate with user-space applications.

To test this benchmark, we run one writing threadand two reading threads. Figure 12 and 17 present theresults. The seqlock benchmark scales to fewer datastructure operations per thread because there are morethreads. SATCheck could verify up to 60 operationsper thread. CDSChecker and Nidhugg were only able toverify up to 3 operations per thread, while Checkfencecould verify up to 40 operations per thread. At 40operations per thread, SATCheck was 2.6◊ faster thanCheckfence.

Discussion. Our results show that SATCheck runsmuch faster and hence scales better than previous tools.The main reason for its performance is that it uses aconcolic execution approach to finding novel behaviors.It therefore does not need to explore redundant exe-cutions, nor does it need to encode values that do notoccur in observed executions.

13 2015/8/12

Page 14: SATCheck: SAT-Directed Stateless Model Checking for SC and TSO

����

��

���

����

�����

�� ��� ��� ��� ��� ��� ��� ��� ��� ���

��������

�����������������������������

����������������������

����������������

Figure 12. SC Linux Seqlock: SATCheck scales to1.5◊ more data structure operations than CheckFenceand runs 2◊–9◊ faster. (Lower is better, y-axis is logscale.)

Because most of SATCheck’s execution time is in theSAT solver, reported running times aren’t perfectly reg-ular. Generally, its performance scales as expected whenwe increase the number of data structure operations.

11. Related WorkIn the Introduction, we described three approachesto model checking concurrent data structures: explicitstate, stateless, and SAT-based. Our approach is SAT-based but uses concrete program executions to guide asearch for novel behaviors. State-based model checkerssuch as SPIN [19] can debug concurrent data structureimplementations. Inspect combines stateless and state-ful model checking to model-check C and C++ code [30,33, 34]. Related approaches include Chess [23], whichfinds and reproduces concurrency bugs in C, C++, andC# by systematically exploring thread interleavings.However, it can miss concurrency bugs as it does notexplore all thread interleavings.

CheckFence [9] is the most closely related approachto ours. It focuses on verifying concurrent data struc-ture implementations for relaxed memory models, in-cluding usage of memory fences. The primary di�er-ence between our approach and CheckFence is that ourapproach uses the SAT solver to guide concrete pro-gram executions, while CheckFence encodes entire ab-stract program executions with SAT. Due to its staticapproach, CheckFence must lazily unroll loops whenthe SAT solver indicates that it is possible for a loopto execute more times than the current unrolling. Likeus, CheckFence also uses a range analysis to computeranges for variables. However, because CheckFence’srange analysis operates on the static program, we foundthat it often computes too large of a range for a vari-able. CheckFence then refuses to analyze the program.Also, as CheckFence must encode the entire program

into SAT, it requires source for the entire program, andthe program must be amenable to compilation to SAT.

MemSat [28] uses constraint solvers to reason aboutprograms under weak memory models. It targets verycomplex memory models (Java Memory Model) andvery simple programs. Our work explores a new ap-proach for handling large test cases.

DPOR [16] and ODPOR [1] are two dynamic ap-proaches to partial order reduction that reduce thenumber of program executions to be explored by a state-less model checker. These approaches avoid exploringexecutions that can be generated by reordering com-muting operations in some other explored execution.Recent work has extended the DPOR algorithm to han-dle the TSO and PSO memory models [2, 35]. Our workcan be viewed as exposing the interleaving operationsto the SAT solver, iteratively building a model of theexecutions, and letting the SAT solver’s heuristics avoidthe redundant behaviors (rather than doing the partialorder reduction via reordering ourselves). Alternatively,our work can also be viewed as dynamically generatinga SAT formula that describes all possible executions.

Researchers have recently proposed maximal causal-ity reduction to improve on POR [20]. The key insightbehind maximal causality reduction is that a thread’sbehavior does not depend on the specific stores that thethread’s loads take its values from, but rather the valuesthat these loads read. This approach uses a constraintsolver to generate executions in which the loads of athread read di�erent combinations of values than pre-viously explored executions. It conservatively assumesthat any store may depend on previous loads, and thusmust explore vastly more executions that SATCheck.The author has not made an implementation available,but a quick calculation reveals that it will be many or-ders of magnitude slower than SATCheck. For example,for the 250 pair spinlock example that SATCheck an-alyzes in 4.5 minutes, the maximal causality reductionapproach would need to explore at least 2250 executions.

Researchers have developed a tool based on Nitpickfor translating C/C++11 code to SAT to model checklitmus tests [5]. Litmus tests are small tests that con-tain only a handful of memory operations. Their workfocuses on automatically building SAT formulas directlyfrom a formalization of the memory model; it does notfocus on tool performance.

Industry tools like IBM ConTest tool support testingconcurrent software. While such tools may increase thelikelihood of finding races, they are not exhaustive.Like the related work in our field, SATCheck definitelyexplores all behaviors for a given input.

Several tools detect data races in code that uses stan-dard lock-based concurrency control [11, 13, 14, 21, 26].These tools generally take one of two approaches: (1)

14 2015/8/12

Page 15: SATCheck: SAT-Directed Stateless Model Checking for SC and TSO

they verify that all accesses to shared data are pro-tected by a locking discipline or (2) they verify that ahappens-before relation separates conflicting accesses.Another way to mitigate potential data races is theapproach proposed by stable and deterministic multi-threading systems [4, 10, 25, 32], which constrain theallowed interleavings. Work on data race detection andstable multithreading systems is largely orthogonal toSATCheck, since SATCheck seeks to verify data struc-tures that leverage low-level atomics to access memorywithout the use of locks.

12. ConclusionThreads commonly communicate with each otherthrough concurrent data structures. Developing correctconcurrent data structure implementations is known tobe challenging and testing tools are critical for findingimplementation bugs.

SATCheck leverages concrete executions to build anevent graph model of concurrent code and uses a SATsolver to guide executions towards discovering novelbehaviors. SATCheck scales better than other tools thatleverage concrete execution while avoiding the need tocompile the entire program to SAT.

AcknowledgmentsThis project was partly supported by the NationalScience Foundation under grants CCF-0846195, CCF-1217854, CNS-1228995, and CCF-1319786. We wouldalso like to thank the anonymous reviewers for theirhelpful comments.

References[1] P. Abdulla, S. Aronis, B. Jonsson, and K. Sagonas. Op-

timal dynamic partial order reduction. In Proceedings ofthe 41st ACM SIGPLAN-SIGACT Symposium on Prin-ciples of Programming Languages, 2014.

[2] P. A. Abdulla, S. Aronis, M. F. Atig, B. Jonsson,C. Leonardsson, and K. Sagonas. Stateless model check-ing for TSO and PSO. In Proceedings of the 21st In-ternational Conference on Tools and Algorithms for theConstruction and Analysis of Systems, 2015.

[3] M. Batty, M. Dodds, and A. Gotsman. Library abstrac-tion for C/C++ concurrency. In Proceedings of the 40thAnnual ACM SIGPLAN-SIGACT Symposium on Prin-ciples of Programming Languages, 2013.

[4] E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace:Safe multithreaded programming for c/c++. In Pro-ceedings of the 24th ACM SIGPLAN Conference on Ob-ject Oriented Programming Systems Languages and Ap-plications, 2009.

[5] J. C. Blanchette, T. Weber, M. Batty, S. Owens, andS. Sarkar. Nitpicking C++ concurrency. In Proceed-ings of the 13th International ACM SIGPLAN Sympo-

sium on Principles and Practices of Declarative Pro-gramming, 2011.

[6] H.-J. Boehm. N3786: Prohibiting “out of thin air”results in C++14. http://www.open-std.org/jtc1/

sc22/wg21/docs/papers/2013/n3786.htm, September2013.

[7] H.-J. Boehm and B. Demsky. Outlawing ghosts: Avoid-ing out-of-thin-air results. In Proceedings of the Work-shop on Memory Systems Performance and Correct-ness, 2014.

[8] H.-J. Boehm et al. N3710: Specifying the ab-sence of “out of thin air” results (LWG2265).http://www.open-std.org/jtc1/sc22/wg21/docs/

papers/2013/n3710.html, August 2013.[9] S. Burckhardt, R. Alur, and M. M. K. Martin. Check-

fence: Checking consistency of concurrent data typeson relaxed memory models. In Proceedings of the 2007ACM SIGPLAN Conference on Programming LanguageDesign and Implementation, 2007.

[10] C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, andC. Zhang. Software behavior oriented parallelization.In Proceedings of the 28th ACM SIGPLAN Conferenceon Programming Language Design and Implementation,2007.

[11] T. Elmas, S. Qadeer, and S. Tasiran. Goldilocks: A raceand transaction-aware Java runtime. In Proceedings ofthe 2007 ACM SIGPLAN Conference on ProgrammingLanguage Design and Implementation, 2007.

[12] T. Elmas, J. Burnim, G. Necula, and K. Sen. CON-CURRIT: A domain specific language for reproducingconcurrency bugs. In Proceedings of the 34th ACM SIG-PLAN Conference on Programming Language Designand Implementation, 2013.

[13] D. Engler and K. Ashcraft. RacerX: E�ective, staticdetection of race conditions and deadlocks. In Proceed-ings of the Nineteenth ACM Symposium on OperatingSystems Principles, 2003.

[14] C. Flanagan and S. N. Freund. FastTrack: E�cientand precise dynamic race detection. In Proceedings ofthe 2009 ACM SIGPLAN Conference on ProgrammingLanguage Design and Implementation, 2009.

[15] C. Flanagan and P. Godefroid. Addendum to dy-namic partial-order reduction for model checking soft-ware. http://users.soe.ucsc.edu/˜cormac/papers/

popl05-addendum.pdf.[16] C. Flanagan and P. Godefroid. Dynamic partial-order

reduction for model checking software. In Proceedingsof the Symposium on Principles of Programming Lan-guages, January 2005.

[17] P. Godefroid. Partial-order methods for the verifica-tion of concurrent systems: An approach to the state-explosion problem. Lecture Notes in Computer Science,1996.

[18] P. Godefroid. Model checking for programming lan-guages using VeriSoft. In Proceedings of the Symposiumon Principles of Programming Languages, 1997.

15 2015/8/12

Page 16: SATCheck: SAT-Directed Stateless Model Checking for SC and TSO

[19] G. J. Holzmann. The SPIN Model Checker: Primerand Reference Manual. Addison-Wesley Professional,1st edition, 2003.

[20] J. Huang. Stateless model checking concurrent pro-grams with maximal causality reduction. In Proceedingsof the 36th ACM SIGPLAN Conference on Program-ming Language Design and Implementation, 2015.

[21] B. Lucia, L. Ceze, K. Strauss, S. Qadeer, and H. Boehm.Conflict exceptions: Simplifying concurrent languagesemantics with precise hardware exceptions for data-races. In Proceedings of the 37th Annual InternationalSymposium on Computer Architecture, 2010.

[22] M. M. Michael and M. L. Scott. Simple, fast, andpractical non-blocking and blocking concurrent queuealgorithms. In Proceedings of the Fifteenth AnnualACM Symposium on Principles of Distributed Comput-ing, 1996.

[23] M. Musuvathi, S. Qadeer, P. A. Nainar, T. Ball,G. Basler, and I. Neamtiu. Finding and reproducingHeisenbugs in concurrent programs. In Proceedings ofthe 8th Symposium on Operating Systems Design andImplementation, 2008.

[24] B. Norris and B. Demsky. CDSChecker: Checking con-current data structures written with C/C++ atomics.In Proceeding of the 28th ACM SIGPLAN Conferenceon Object-Oriented Programming, Systems, Languages,and Applications, October 2013.

[25] M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo:E�cient deterministic multithreading in software. InProceedings of the 14th International Conference onArchitectural Support for Programming Languages andOperating Systems, 2009.

[26] S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, andT. Anderson. Eraser: A dynamic data race detectorfor multithreaded programs. ACM Transactions onComputing Systems, 15:391–411, Nov. 1997.

[27] K. Sen, D. Marinov, and G. Agha. CUTE: A concolicunit testing engine for C. In Proceedings of the 10thEuropean Software Engineering Conference Held Jointlywith 13th ACM SIGSOFT International Symposium onFoundations of Software Engineering, 2005.

[28] E. Torlak, M. Vaziri, and J. Dolby. MemSAT: Checkingaxiomatic specifications of memory models. In Proceed-ings of the 2010 ACM SIGPLAN Conference on Pro-gramming Language Design and Implementation, 2010.

[29] V. Vafeiadis and C. Narayan. Relaxed separation logic:A program logic for C11 concurrency. In Proceedings ofthe 2013 ACM SIGPLAN International Conference onObject Oriented Programming Systems Languages andApplications, 2013.

[30] C. Wang, Y. Yang, A. Gupta, and G. Gopalakrishnan.Dynamic model checking with property driven pruningto detect race conditions. ATVA LNCS, (126–140),2008.

[31] A. Williams. Dekker’s algorithm implementa-tion. http://www.justsoftwaresolutions.co.uk/

����

��

���

����

�����

������

��� ��� ��� ��� ��� ��� ��� ��� ��� ����

��������

�����������������������������

���������������������������

Figure 13. TSO CAS Spinlock Results (lower is bet-ter, y-axis is a log scale)

threading/. Dec. 2012.[32] J. Yang, H. Cui, J. Wu, Y. Tang, and G. Hu. Making

parallel programs reliable with stable multithreading.Communications of the ACM, 57(3), 2014.

[33] Y. Yang, X. Chen, G. Gopalakrishnan, and R. M. Kirby.E�cient stateful dynamic partial order reduction. InProceedings of the 15th International SPIN Workshopon Model Checking Software, 2008.

[34] Y. Yang, X. Chen, G. Gopalakrishnan, and C. Wang.Automatic discovery of transition symmetry in multi-threaded programs using dynamic analysis. In Proceed-ings of the 16th International SPIN Workshop on ModelChecking Software, pages 279–295, 2009.

[35] N. Zhang, M. Kusano, and C. Wang. Dynamic partialorder reduction for relaxed memory models. In Proceed-ings of the 36th ACM SIGPLAN Conference on Pro-gramming Language Design and Implementation, 2015.

A. TSO ResultsIn Section 10 we presented results under the sequen-tially consistent (SC) memory model. We have alsocollected results for Total Store Ordering (TSO) andpresent them in Figures 13 through 17 of this Appendix.

B. Scaling ThreadsIn Section 10 we presented results that keep the numberof threads fixed and scale the number of operations perthread. We also explored scaling the number of threadsfor the CAS Spinlock: each thread performs one pair oflock and unlock operations.

Figure 18 presents the result of this experiment.The results show that SATCheck scales well with theaddition of extra threads.

C. SoundnessTheorem C.1 (Reachability). If there exists some ex-

ecution e that can reach a statement st, then SATCheck

will explore some execution that executes statement st.

16 2015/8/12

Page 17: SATCheck: SAT-Directed Stateless Model Checking for SC and TSO

����

��

���

����

�����

������

�� �� �� �� �� �� �� �� �� ��� ��� ��� ���

��������

�����������������������������

���������������������������

Figure 14. TSO MSQueue Results (lower is better, y-axis is a log scale)

����

��

���

����

�����

�� �� �� �� �� �� �� �� ��� ��� ���

��������

�����������������������������

���������������������������

Figure 15. TSO Linux RW Lock Results (lower isbetter, y-axis is a log scale)

����

��

���

����

�����

������

�� �� �� �� �� �� �� �� ��� ��� ��� ��� ���

��������

�����������������������������

���������������������������

Figure 16. TSO Dekker Critical Section Results(lower is better, y-axis is a log scale)

����

��

���

����

�����

������

�� ��� ��� ��� ��� ��� ��� ��� ��� ���

��������

�����������������������������

���������������������������

Figure 17. TSO Linux Seqlock Results (lower is bet-ter, y-axis is a log scale)

����

��

���

����

�����

������

�� ��� ��� ��� ��� ��� ��� ��� ��� ���

��������

�������

��������

Figure 18. SATCheck analyzes a 90-thread run ofLinux Locks (SC) in one hour. (lower is better, y-axisis a log scale)

Proof Sketch. By contradiction. Suppose thatSATCheck does not explore any execution thatexecutes statement st. Consider the execution e.There must exist some conditional branch in e thatSATCheck has not explored, or it would have reachedthe statement st.

Consider the first such unexplored conditionalbranch b or unexplored input i to an uninterpreted func-tion in execution e. By the design of SATCheck, thebranch b or input i is a goal to SATCheck’s sat for-mula. Consider an execution prefix e

Õ of the executione up to the branch br or input i. This execution prefixsatisfies the branch br or input i goal of the generatedSAT formula — up until the new event, all of e

Õ’s be-havior is modeled by the event graph. The clauses thatmodel the event graph after the prefix e

Õ are all struc-tured as implications from past events to future events.If none of the conditions in the implications on a pastevent are satisfied, the constraint is trivially true.

Thus there must be solution to the SAT constraintsthat drives SATCheck to produce the execution prefix

17 2015/8/12

Page 18: SATCheck: SAT-Directed Stateless Model Checking for SC and TSO

e

Õ and thus SATCheck explores e

Õ unless it first exploressome other execution that reaches b or i.Theorem C.2 (Reachability). If there exists some ex-

ecution e that generates input i to uninterpreted func-

tion f , then SATCheck will explore some execution that

generates input i to uninterpreted function f .

Proof Sketch. Same proof as Theorem C.1.

18 2015/8/12