sparse record and replay with controlled schedulingafd/homepages/papers/pdfs/2019/pldi.pdf ·...

18
Sparse Record and Replay with Controlled Scheduling Christopher Lidbury Department of Computing Imperial College London London, UK [email protected] Alastair F. Donaldson Department of Computing Imperial College London London, UK [email protected] Abstract Modern applications include many sources of nondetermin- ism, e.g. due to concurrency, signals, and system calls that interact with the external environment. Finding and repro- ducing bugs in the presence of this nondeterminism has been the subject of much prior work in three main areas: (1) controlled concurrency-testing, where a custom scheduler replaces the OS scheduler to find subtle bugs; (2) record and replay, where sources of nondeterminism are captured and logged so that a failing execution can be replayed for debug- ging purposes; and (3) dynamic analysis for the detection of data races. We present a dynamic analysis tool for C++ applications, tsan11rec, which brings these strands of work together by integrating controlled concurrency testing and record and replay into the tsan11 framework for C++11 data race detection. Our novel twist on record and replay is a sparse approach, where the sources of nondeterminism to record can be configured per application. We show that our approach is effective at finding subtle concurrency bugs in small applications; is competitive in terms of performance with the state-of-the-art record and replay tool rr on larger applications; succeeds (due to our sparse approach) in replay- ing the I/O-intensive Zandronum and QuakeSpasm video games, which are out of scope for rr; but (due to limitations of our sparse approach) cannot faithfully replay applications where memory layout nondeterminism significantly affects application behaviour. CCS Concepts Software and its engineering Soft- ware testing and debugging; Concurrent programming struc- tures. Keywords record and replay, controlled concurrency test- ing, data race detection, concurrency Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. PLDI ’19, June 22–26, 2019, Phoenix, AZ, USA © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-6712-7/19/06. . . $15.00 hps://doi.org/10.1145/3314221.3314635 ACM Reference Format: Christopher Lidbury and Alastair F. Donaldson. 2019. Sparse Record and Replay with Controlled Scheduling. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’19), June 22–26, 2019, Phoenix, AZ, USA. ACM, New York, NY, USA, 18 pages. hps://doi.org/10.1145/3314221. 3314635 1 Introduction Controlled concurrency testing has proven successful in find- ing subtle bugs in concurrent programs, by exploring a di- verse set of schedules (see e.g. [33, 62, 64, 78, 87]). However, such techniques are known to be limited by the assumption that the thread scheduler is the only source of nondeter- minism. For example, in an empirical study of systematic scheduling algorithms, many benchmark programs, such as Apache’s httpd, had to be excluded due to their reliance on external factors such as the network [78]. In contrast, record and replay tools aim to capture the external factors that affect the behaviour of a system as the system runs, so that an execution can be faithfully re- played (see e.g. [47, 53, 57, 66]). The degree to which replay is faithful varies, but many systems aim to be extremely thor- ough by monitoring, intercepting and facilitating replay of virtually all sources of nondeterminism. Unlike controlled concurrency testing, these tools typically leave threads to be scheduled by the regular OS scheduler, recording whatever schedule results. This is fine if a bug happens to be triggered, but does not support systematic or controlled-randomized exploration of thread schedules to find subtle bugs. Faith- ful record and replay is also difficult from an engineering perspective, often requiring surgical changes to the OS or underlying hardware, and demanding high resource usage due to the many details that must be kept track of. Our aim in this work is to lift controlled concurrency test- ing so that it can be applied to larger and more realistic settings, by drawing on ideas from record and replay, sacri- ficing faithfulness in order to keep overhead low. To do this, we take a sparse approach: we assume the relevant sources of nondeterminism affecting an application come from (a) the thread scheduler (including the handling of signals), and (b) input from the network, peripherals such as keyboard and mouse, and well-understood system calls that can be configured for particular applications of interest. We present 576

Upload: others

Post on 10-Jan-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sparse Record and Replay with Controlled Schedulingafd/homepages/papers/pdfs/2019/PLDI.pdf · Controlled concurrency testing has proven successful in find-ing subtle bugs in concurrent

Sparse Record and Replay with Controlled SchedulingChristopher LidburyDepartment of ComputingImperial College London

London UKchristopherlidbury10imperialacuk

Alastair F DonaldsonDepartment of ComputingImperial College London

London UKalastairdonaldsonimperialacuk

AbstractModern applications include many sources of nondetermin-ism eg due to concurrency signals and system calls thatinteract with the external environment Finding and repro-ducing bugs in the presence of this nondeterminism hasbeen the subject of much prior work in three main areas (1)controlled concurrency-testing where a custom schedulerreplaces the OS scheduler to find subtle bugs (2) record andreplay where sources of nondeterminism are captured andlogged so that a failing execution can be replayed for debug-ging purposes and (3) dynamic analysis for the detectionof data races We present a dynamic analysis tool for C++applications tsan11rec which brings these strands of worktogether by integrating controlled concurrency testing andrecord and replay into the tsan11 framework for C++11 datarace detection Our novel twist on record and replay is asparse approach where the sources of nondeterminism torecord can be configured per application We show that ourapproach is effective at finding subtle concurrency bugs insmall applications is competitive in terms of performancewith the state-of-the-art record and replay tool rr on largerapplications succeeds (due to our sparse approach) in replay-ing the IO-intensive Zandronum and QuakeSpasm videogames which are out of scope for rr but (due to limitationsof our sparse approach) cannot faithfully replay applicationswhere memory layout nondeterminism significantly affectsapplication behaviour

CCS Concepts bull Software and its engineering rarr Soft-ware testing anddebuggingConcurrent programming struc-tures

Keywords record and replay controlled concurrency test-ing data race detection concurrency

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copiesare not made or distributed for profit or commercial advantage and thatcopies bear this notice and the full citation on the first page Copyrightsfor components of this work owned by others than the author(s) mustbe honored Abstracting with credit is permitted To copy otherwise orrepublish to post on servers or to redistribute to lists requires prior specificpermission andor a fee Request permissions from permissionsacmorgPLDI rsquo19 June 22ndash26 2019 Phoenix AZ USAcopy 2019 Copyright held by the ownerauthor(s) Publication rights licensedto ACMACM ISBN 978-1-4503-6712-71906 $1500httpsdoiorg10114533142213314635

ACM Reference FormatChristopher Lidbury and Alastair F Donaldson 2019 Sparse Recordand Replay with Controlled Scheduling In Proceedings of the 40thACM SIGPLAN Conference on Programming Language Design andImplementation (PLDI rsquo19) June 22ndash26 2019 Phoenix AZ USAACMNewYork NY USA 18 pages httpsdoiorg10114533142213314635

1 IntroductionControlled concurrency testing has proven successful in find-ing subtle bugs in concurrent programs by exploring a di-verse set of schedules (see eg [33 62 64 78 87]) Howeversuch techniques are known to be limited by the assumptionthat the thread scheduler is the only source of nondeter-minism For example in an empirical study of systematicscheduling algorithms many benchmark programs such asApachersquos httpd had to be excluded due to their reliance onexternal factors such as the network [78]In contrast record and replay tools aim to capture the

external factors that affect the behaviour of a system asthe system runs so that an execution can be faithfully re-played (see eg [47 53 57 66]) The degree to which replayis faithful varies but many systems aim to be extremely thor-ough by monitoring intercepting and facilitating replay ofvirtually all sources of nondeterminism Unlike controlledconcurrency testing these tools typically leave threads to bescheduled by the regular OS scheduler recording whateverschedule results This is fine if a bug happens to be triggeredbut does not support systematic or controlled-randomizedexploration of thread schedules to find subtle bugs Faith-ful record and replay is also difficult from an engineeringperspective often requiring surgical changes to the OS orunderlying hardware and demanding high resource usagedue to the many details that must be kept track of

Our aim in this work is to lift controlled concurrency test-ing so that it can be applied to larger and more realisticsettings by drawing on ideas from record and replay sacri-ficing faithfulness in order to keep overhead low To do thiswe take a sparse approach we assume the relevant sourcesof nondeterminism affecting an application come from (a)the thread scheduler (including the handling of signals) and(b) input from the network peripherals such as keyboardand mouse and well-understood system calls that can beconfigured for particular applications of interest We present

576

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

a record and replay mechanism that captures minimal infor-mation about these sources of nondeterminism that sufficeto enable efficient controlled concurrency testing for a rangeof applications The advantage of this is that execution isefficient and recording overhead is low The price is thatthe tool cannot handle systems whose behaviour is influ-enced significantly by other sources of nondeterminism (egmemory layout) without programmer interventionWe have implemented our approach as a tool tsan11rec

driven by the following objectives to maximise the extent towhich the concurrency semantics of the system under testcan be explored in order to find subtle bugs to integratethe tool with state-of-the-art data race detection capturingan important class of concurrency bugs and to preserve asmuch parallelism in the system under test as possible forefficient record and replay While race detection controlledconcurrency testing and record-and-replay are all valuabletechniques in isolation their combination is potentially ex-tremely valuable to allow the finding of races (thanks to racedetection) that arise under rare thread schedules (thanks tocontrolled concurrency testing) such that the thread sched-ule and environmental factors leading to the race can bereplayed for debugging (thanks to record-and-replay)

We present a large experimental evaluation showing thattsan11rec is effective at reliably finding and replaying weakmemory-related bugs on a set of benchmarks previously usedto evaluate the CDSchecker concurrency testing tool [64]and competitive in terms of performance with the state-of-the-art rr record and replay tool [66] on a number of largerapplications such as Apache httpd PARSEC benchmarksand Pbzip In particular for programs that rely heavily onparallelism our tool often out-performs rr since our approachallows threads to run in parallel as much as possible ratherthan being sequentialized In some cases sparsity turns outto be key to enabling record and replay For example closed-source proprietary device drivers make it very difficult tointercept and control all nondeterminism arising in the exe-cution of video games To find and reproduce bugs relatedeg to gamedisplay communication it would be necessaryto control this nondeterminism However the nondetermin-ism typically has no impact on core game logic To findand replay game logic bugs it is thus profitable to sparselyignore gamedisplay communication when recording andallow this communication to proceed freely during replayWe demonstrate this by applying tsan11rec to the SDL-basedvideo games Zandronum and QuakeSpasm both of whichare out of scope for rr due to communication between thegame and the OpenGL interface through ioctl that rr is un-able to record and replay We show that we are able to recordand replay a historical Zandronum bug related to an errorin client-server communication when the game is played ininternet multi-player modeAfter recapping necessary background (sect2) we present

our contributions via the following paper structure

void T1()

nax = 1

xstore(1 stdmemory_order_release) A

ystore(1 stdmemory_order_release) B

void T2()

if (yload(stdmemory_order_relaxed) == 1 ampamp C

xload(stdmemory_order_relaxed) == 0) D

xstore(2 stdmemory_order_relaxed)

void T3()

if (xload(stdmemory_order_acquire) gt 0) E

print(nax)

Figure 1 A racy C++11 program using atomic operations

Controlled scheduling (sect3) We detail our scheduling ap-proach which enables exploration of the concurrency se-mantics of the original program by allowing a context switchafter every visible operation splitting up high-level opera-tions into multiple irreducible visible operations By buildingon top of tsan11 [52] race detection is seamlessly integratedWe preserve the parallelism as much as possible by onlysequentializing visible operations invisible regions of codein different threads can run in parallel

Sparse record and replay (sect4) We describe our sparse ap-proach to record and replay focusing on the interplay be-tween controlled scheduling and record and replay and inparticular discuss how the granularity at which nondeter-minism is captured and recorded can be adapted on a perapplication basis

Experimental evaluation (sect5) We present a large evalu-ation applying tsan11rec to concurrency benchmarks theApache httpd PARSEC and Pbzip applications and bench-marks and the Zandronum and QuakeSpasm games Ourevaluation includes comparisons with rr

2 BackgroundWe provide relevant background on controlled schedulingrecord and replay and the tsan and tsan11 race detectors

Controlled scheduling Controlled scheduling techniques(sometimes referred to as stateless model checking) overridethe system scheduler in order to perform schedule-spaceexploration for an application eg systematically or in acontrolled randomized fashion [33 62 64 78 87] Exploringinteresting schedules can reveal subtle bugs that the systemscheduler would trigger with low probability and havingcontrol over which schedules are explored is important forreplay of bug-inducing schedules

Scheduling decisions are made at scheduling points whichcorrespond to visible operations a visible operation is anoperation performed by a thread that may influence thebehaviour of other threads As an example consider theprogram fragment shown in Figure 1 A controlled scheduler

577

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

void sig_handler()

quitstore(1)

void Listener()

while (quitload())

int res = poll(ampserver_fds 1 100)

if (res == 0) continueCHECK(res gt 0 ampamp server_fdsrevents == 1 ampamp poll error)

char buf = new char[100]recv(server_fdfd buf 100 0)

stdunique_lockltstdmutexgt lck (mtx)

requestspush(buf)

void Responder()

while (quitload())

stdunique_lockltstdmutexgt lck (mtx)

if (requestssize() == 0) continuechar buf = requestsfront()

requestspop()

lckunlock()

Process(buf)

send(server_fdfd buf 100 0)

delete[] buf

Figure 2 Generic client for processing and returning re-quests sent from some server

will begin by choosing (via some choice strategy) which ofthe visible operations A C or E to schedule first If it choosesA then at the next scheduling point the choice is between BC and E etc The non-atomic increments of nax and nay areinvisible operations and do not constitute scheduling points

Record and replay The ability to record and replay hasmany useful applications notably allowing consist reproduc-tion of bugs in nondeterministic programs In general record-ing and replaying involves identifying relevant sources ofnondeterminism and enforcing the same resolution of thisnondeterminism during replay as was observed while record-ing The granularity at which nondeterminism is controlledvaries between approaches To see how useful record andreplay is consider the example program shown in Figure 2The program receives buffers from a server processes themand then sends them back But what happens if the connec-tion fails or we get a ldquopoll errorrdquo The ability to capturean execution that shows an error by connecting to a realserver and then repeatedly replay the executionwithout hav-ing to connect to a real server allows us to reliably explorethe cause of the error This is particularly useful for largerprograms that utilise complicated communication protocolsand have time consuming setups or hard to find bugs

In general a program can have many such sources of non-determinism Aside from the thread interleaving and valueof atomic reads other sources include interaction with the

file system system calls certain libc functions (eg the con-ditions under which malloc can fail are not deterministic)instructions that query the state of the CPU (such as thex86 RDTSC for reading the processorrsquos time-stamp counter)or in some cases even the value of pointers (eg iteratingthrough an ordered container of pointers)The choice of what nondeterminism to record and the

method of recording and replaying is a substantial area ofresearch In this paper we compare our approach with thecurrent state-of-the-art tool rr [65] which achieves perfor-mance overheads compared with native execution of as lowas 15times for some applications as well as low storage over-heads It generally enforces a priority-based first come firstserved strategy for scheduling with each thread given a timeslice before yielding Execution is sequentialized so that onlyone thread runs at a time We present a detailed discussionof other approaches to record and replay in sect6

C++11 dynamic race detection We present our work inthe context of CC++ though our approach to controlledscheduling and record and replay is conceptually more gen-eral The CC++11 standards define threads atomics andvarious constructs for inter-thread communication togetherwith a description of a memory model that defines how op-erations are ordered across threads when synchronisationoccurs and what values may be read from memory [7]Working with low level atomics and other inter-thread

constructs can be very difficult and their misuse frequentlyleads to buggy concurrent programs ThreadSanitizer (tsan)is an efficient dynamic race analysis tool aimed at C++ pro-grams [74] The tool performs compile-time instrumentationof the source program in which all (atomic and non-atomic)accesses to potentially shared locations as well as fence op-erations are instrumented with calls into a statically linkedrun-time library In general all visible operations includingmost libc functions and system calls are instrumented Thislibrary implements a vector clock algorithm for tracking thehappens-before relation [28 48] using shadow memory tokeep track of accesses to all locations Requiring a simplecompiler flag to be activated tsan is very easy to use Itis also fast with runtime overheads of around 10times to 12times(competitive performance given the nature of the analysis)although memory overhead can be high

C++11 features such as atomics and threads are acceptedby tsan but the standard tool ignores most of the semanticsof the C++11 memory model A recent extension tsan11 hasadded semantics for a large fragment of the C++11 memorymodel allowing it to find races arising due to weak memorybehaviours that regular tsan would miss [52] As an exampleconsider again the program of Figure 1 For the conditionalin thread T2 to pass both the stores A and B in T1 musthave happened but an earlier value of x must be read Theend result is that the print in T 3 will be racy as the the loadin T3 now reads the value stored in T2 This cannot occur

578

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

under sequential consistency but can occur under C++11semantics the race is detected by tsan11 but not tsan

Amajor limitation of tsan and tsan11 is that the executionsexplored by the tool are at the mercy of the OS schedulerBugs that require unusual interleavings to trigger almostnever manifest The tsan11rec tool presented in this workbuilds on tsan11 combining the strengths of the tsan ap-proach with controlled scheduling and record and replayto allow the detection and reproduction of bugs that wouldbe unlikely to be discovered and harder still to repeatedlyreproduce using the OS scheduler alone

3 SchedulingRecall that one of our main aims in the design of tsan11recis to combine controlled scheduling with record and replayin a manner that directly incorporates state-of-the-art racedetection We now describe the mechanics of the schedulerin particular how the scheduler builds on and interacts withthe existing tsan11 instrumentation library and how thescheduler handles nondeterminism The mechanics of thescheduler will be important for describing the design of ourrecord and replay facilities in sect4

Rather than using an overarching scheduler thread detailsof scheduling decisions are stored in a designated piece ofshared state The threads interact indirectly via this sharedstate using a protocol to cooperatively determine when theyshould be scheduled The protocol builds on the tsan11 li-brary and has been designed so that new scheduling strate-gies can be easily added We focus on two strategies in thiswork random and queue With the random strategy the nextthread to schedule is chosen at random at each schedulingpoint via a pseudo-random number generator (PRNG) ini-tialized with some fixed initial seed This provides controlledrandom scheduling similar to that described in [78] Withthe queue strategy threads are scheduled in a first-come-first-served manner

31 Protocol DetailsRecall from sect2 that the tsan tool and its tsan11 extensionperforms compile-time instrumentation of the low-level visi-ble operations including atomic operations and system calls(syscalls) Each time such an operation is reached controljumps into a tsan library function Execution of some of theselibrary functions may be sequentialized but threads execut-ing outside these library functionsmdashie executing invisibleoperationsmdashare free to run in parallelOur tsan11rec tool essentially acts as an additional layer

of interception on top of the existing tsan11 instrumenta-tion On reaching a visible operation a thread jumps intoa tsan11 library function eg __tsan_atomic_load4 for a4-byte atomic load which will additionally call tsan11recfunctions that interact with the scheduler logic to determinewhen the thread can proceed

Figure 3 Sequentialized critical sections and parallel in-visible operations Blue wavy arrows represent scheduler-imposed ordering black arrows represent program order

Communication with the tsan11rec instrumentation layeruses a protocol based on two functions Wait() and Tick()

Wait() ndash Block this thread until the scheduler activates itTick() ndash Choose a thread to activate

A thread enters Wait() right before it executes a visible op-eration Depending on the scheduling strategy used the stateof the scheduler may need to be updated when Wait is calledthe queue strategy requires the thread to enqueue itself therandom strategy requires no action If the thread alreadyhappens to be the next thread due for scheduling Wait()returns without blocking A thread enters Tick() once ithas completed a visible operation By executing Tick() thethread applies the scheduling strategy (random or queue) tochoose the next thread to be scheduled and update the sched-uler state to reflect this In the case of the queue strategy thisinvolves incrementing the current queue position for therandom strategy this involves choosing the next thread id atrandom When the thread returns from the Tick() functionit is free to continue performing invisible operations unhin-dered until it reaches the next visible operation where it willenter Wait() This allows for parallelism between threadsas sections of invisible code are unordered Multiple invisibleoperations that can execute in parallel are illustrated in Fig-ure 3 Not that with the random strategy a thread does nothave to reach Wait() to be in the list of schedulable threadsThe combination of a visible operation and associated

scheduling-related code wrapped in a Wait() and Tick()pair is called a critical section

32 Special CasesThe approach described in sect31 of wrapping the tsan instru-mentation for a visible operation in a critical section worksdirectly for most visible operations We now discuss a num-ber of operations that require extra attention to detail mainlybecause their semantics necessitate specific updates to thescheduler state or because they cannot be represented as asimple critical section

579

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Calls to functions whose names begin with intercept_are already inserted by tsan11 to instrument visible opera-tions We have modified the implementations of these func-tions to accommodate our instrumentation

Thread management Thread creation deletion and join-ing are all treated as visible operations This is because theymust update the state of the scheduler and as such will af-fect the scheduling of threads going forward To handle thiswe introduce three scheduler functions ThreadNew(tid)ThreadJoin(tid) and ThreadDelete()

The ThreadNew and ThreadJoin functions are added dur-ing instrumentation of thread creation and joining primitivesThe ThreadNew(tid) function called by the parent of thenewly created thread enables the new thread within thescheduler The ThreadJoin(tid) function will block itselfuntil thread tid has finished and as such must instead dis-able itself in the scheduler and also mark itself as waitingon tid On completion a thread calls ThreadDelete whichinvolves (a) enabling the parent thread if it is waiting for thisthread to finish and (b) disabling itself in the scheduler Allthree operations are wrapped in a Wait() and Tick() pair

Mutexes The mutex operations trylock lock and unlock areall visible and require instrumentation Trylock can simply bewrapped in a Wait() and Tick() pair as for regular visibleoperations Unlock is similar to thread deletion in that itmust also re-enable threads that were blocked waiting onthe mutex although in this case we only need to re-enableone of the blocked threads the thread that is chosen dependson whether the queue or random strategy is being usedMutex lock poses an interesting issue in that a thread at-

tempting to acquire a mutex will block if the lock operationfails To account for this we modified the instrumented ver-sion of mutex lock to be as shown in Figure 4 This changes itto a trylock loop with a critical section associated with eachattempt at locking Note that this is the native trylock notthe instrumented version The MutexLockFail(m) functionis similar to ThreadJoin a thread calling this function dis-ables itself from the scheduler and informs the scheduler thatit is waiting on m The thread will then reenter Wait but as itis disabled it will block until it is re-enabled and then sched-uled to run The MutexUnlock(m) function is called when athread releases a mutex and will re-enable one thread thatis disabled due to waiting on m

There is no Wait nor Tick inside MutexLockFail(m) norMutexUnlock(m) Another thread can acquire the mutex be-tween a thread being re-enabled and it attempting the trylockThis is OK the thread will simply block itself again

Condition variables Condition variables allow controlover when certain threads will wake up and try to acquirea mutex When a thread initially acquires a mutex it maycheck a condition required for it to proceed and if it failsrelease the mutex and block itself via the condition variable

int intercept_mutex_lock(void m)

int res = EBUSY

while (res == EBUSY)

Wait()

res = trylock(m) native version of trylock

if (res == EBUSY)

MutexLockFail(m)

Tick()

return res

Figure 4 Instrumented mutex lock

This thread will only wake up and try to reacquire the mutexwhen another thread notifies it via the condition variableThe conditional is checked via the conditional wait functionwaking up one or all of the waiting threads is performed viathe signal and broadcast functions respectivelyThe conditional wait accepts a timer determining the

length of time after which the thread unblocks itself Thistimer represents a physical time This is in contrast to theschedulerrsquos ticker which represents a logical time This dif-ference between physical and logical means that from theperspective of the scheduler the conditionalrsquos wakeup timeris nondeterministic Semantically speaking a thread canwake up from the timer and acquire the conditionalrsquos mutexbefore another thread can even if the associated time is verylong We choose to handle this by not disabling the thread ifit calls a conditional wait with a timer Despite not being dis-abled when timed a thread can still eat a conditional signaland so should still mark itself as waiting on the conditionalin the scheduler

For the three conditional functions we provide correspond-ing CondWait(m t) CondSignal(m) and CondBroadcast(m)scheduler functions CondSignal(m) and CondBroadcast(m)simply wake up one thread and all threads waiting on m re-spectively The interaction between the instrumentation andthe scheduler for conditional signal and broadcast is sim-ple with the instrumentation simply calling the schedulerfunction in a Wait() and Tick() pair

Conditional wait is a little more involved and details areshown in Figure 5 Between the Wait and the Tick a threadinforms the scheduler that it is performing a conditional waitvia CondWait This informs the scheduler that the thread iseither blocked waiting for a signal or performing a timedconditional wait so that while not blocked it can neverthelesseat a signal The thread then releases the mutex informingthe scheduler via the MutexUnlock scheduler function de-scribed earlier that this has been done Finally the thread en-ters the intercepted version of mutex_lock described aboveBecause this starts with a Wait in the case of an untimedsignal the thread will block until it is re-enabled by a condi-tional signal or broadcast By using distinct critical sectionsto separate a thread marking itself as being blocked on a

580

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

void intercept_cond_wait(void m bool timed)

Wait()

CondWait(m timed)

mutex_unlock()

MutexUnlock(m)

Tick()

intercept_mutex_lock()

Figure 5 Instrumented conditional wait

signal and attempting to reacquire the mutex we allow thepossibility for another thread to be scheduled in betweenpossibly acquiring the mutexOnce a thread has reacquired the mutex it will typically

recheck the condition it was originally waiting on If it isnot satisfied it will call cond_wait again This is wherethe risk of deadlock comes in as if it was the only threadsignaled and it reenters intercept_cond_wait it will notsignal other threads first and all threads that are blockedby the conditional will remain blocked We want preserveany potential deadlocks in the underlying program and sodo not limit this from happening we are also careful not tointroduce new deadlocks

Signals We briefly describe our treatment of signals andsignal handlers noting that these are distinct from the signalsassociated with conditional wait operations described aboveWe focus on asynchronous signals which can be receivedby processes at any time and thus contribute an additionalsource of nondeterminism This is distinct from synchronoussignals eg SIGSEGV which are raised by the thread as andwhen certain operations are performed Unlike in the case ofeg a memory load operation which has a designated pro-gram point that can be intercepted to facilitate interactionwith the scheduler a signal can arrive at any time The stan-dard also specifies a signal function that binds a handlerfunction to a specific signalScheduling with signals is handled by simply marking

the entrance to the signal handler and the aforementionedsignal function as visible operations From a schedulingperspective besides the arrival of a signal signals are not aproblem Recording and replaying signals is where thingsbecome difficult which we discuss in sect43

33 LivenessWhile the scheduler strives to ensure that all possible pro-gram behaviours can be explored in principle in practicedepending on the strategy this can lead to massive slow-downs in particular cases For example suppose a thread isscheduled and undertakes a vast number of invisible oper-ations or calls a sleep function for some duration beforefinally issuing a visible operation If all other threads end upblocked waiting to perform visible operations and the sched-uler doesnrsquot give them a chance to run the performance of

the program may be drastically impacted This can becomeparticularly problematic when dealing with programs thatrely on responsiveness such as real-time applications

To cope with this we slightly reduce the schedulerrsquos abil-ity to explore any possible schedule by having the schedulerforce a reschedule in such cases By forcing a rescheduleafter n milliseconds the probability of exploring a schedulewhereby a thread performs two visible operations consec-utively separated by more than n milliseconds is greatlyreduced Conveniently for us tsan has a background threadwhich we configure to call our Reschedule() function everyn milliseconds for some given n Because the Reschedule()function relies on physical time it introduces nondetermin-ism into the scheduler

4 Record and ReplayWe now discuss the record and replay mechanism we haveimplemented with the aim of answering the following ques-tions What do we mean by ldquorecordingrdquo an execution andthen replaying said execution later How do we formalise arecording What makes replaying an execution valid Whatshould we record and what should we not record

When a program executes certain visible operations willlead to nondeterminism Recording an execution thereforemeans capturing information about these visible operationsin a form that can be used to reproduce the execution duringreplay We call this captured information the demo file ordemo for short We also refer to an execution that is replay-ing a demo as a replay and that the replay is synchronisedunless something has gone wrong with the replay such thatexecution has diverged from what was recorded in whichcase we say it is desynchronisedThis leaves us with the question of what it means for

a replay to desynchronise A demo is defined as a seriesof constraints arising from the recorded execution whichthe replay is required to satisfy The tool will attempt toenforce these constraints on the program during replay andas long as it can the replay is deemed to be synchronisedIf at any point the tool is unable to enforce a constraint onthe program we say the replay has hard desynchronised inwhich case the tool will abort In some cases the replaymay abide by the constraints but appear to diverge fromthe recorded execution for example by producing consoleoutput in a different order We call this soft desynchronisationTo illustrate an extreme case of this the empty demo istrivially synchronised for any replay but will lead to softdesynchronisation practically everywhere unless the systemunder test is highly deterministic

The record and replay mechanism is built into the sched-uler we discussed in sect3 In cases where a nondeterministicchoice needs to be made that is unrelated to scheduling (andso not handled by the queue nor random strategy) a PRNGis used seeded by two calls to rdtsc()

581

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

For the most part tsan11rec avoids the need for user an-notation however there are some cases where they are un-avoidable this is shown in sect54

41 Motivating ExampleTo help lay out the reasoning and technical explanation givenin the rest of this section we start off with an example pro-gram The program fragment in Figure 2 (already discussedbriefly in sect2) shows a simple client that repeatedly receives achar buffer from a server applies a transformation to it thensends it back We discuss the behaviours of the applicationthat need to be recorded in order to enable faithful replayvs those features that need not be recorded

What to record The obvious case here is the interleavingof threads Recording this will ensure that the order of oper-ations to the atomic locations quit and mtx and the order ofthe syscalls used throughout will be the same during replayOther operations are invisible and thus will not affect otherthreads or introduce nondeterminismSystem calls that interact with the environment can be

seen as inputs to the program which in this case determineshow many requests to handle and the contents of each re-quest For example poll informs us on whether there is datato be read from the server and thus needs to be recorded asdo the system calls recv and sendThe signal handler in this example is used to trigger the

end of the program The arrival of the signal is asynchronousand comes from outside the program During replay the toolwill need to ensure that the same signal arrives at the samepoint in logical time

What to ignore The first and likely most contentious ele-ment is the layout of memory This will of course depend onthe program in question but in this example and most of theprograms we encounter the position of objects in memorywill have no effect on the rest of the program If the requestqueue was instead an ordered set of char pointers then itwould matter as the pointer values will determine the orderin which requests are considered during iteration

42 InterleavingAs explained in sect41 the ordering of visible operations mustbe preserved during replay We describe how the randomand queue strategies store ordering-related information

To recap the strategies described in sect3 the random sched-uler chooses which thread to allow to run the next visibleoperation randomly after each visible operation has com-pleted while the queue scheduler is first come first serve forwhichever threads attempts to perform a visible operation

For the random strategy the entire thread interleaving isencapsulated in the PRNG Therefore no information besidesthe two seeds used for the PRNG is required

For the queue strategy the ordering during record dependson the order in which threads happen to reach Wait() which

depends on physical timing To encode the order so that itcan be enforced during replay a file called QUEUE is usedThis file records (a) a map specifying for each thread id thefirst tick at which the thread should be scheduled and (b)an ordered list of ticks to be consumed by threads each timethey leave a critical sectionmdashthe tick that a thread consumeson leaving a critical section informs the thread as to the nexttick at which it is to be scheduled Run-length encoding isused to efficiently record the case where a thread is scheduledmultiple times in succession

There is clearly a trade-off between these strategiesWherethe random strategy stores no data the queue strategy mayneed to store data on every visible operation The queuestrategy will be much faster however as it is unlikely to beblocked in Wait() unless another thread is already critical

43 SignalsWe discussed signals briefly in sect32 but deferred discussionto this section as most of the difficulties are in attempting toreplay them Synchronous signals are ignored (eg SIGSEGVSIGPIPE) as these should reoccur at the same point in theexecution without the help of our tool To clarify it is en-tering the signal handler that is the visible operation Wheninside the signal handler a thread cannot interact with therest of the process except though atomic operations whichare themselves visible From this we can say that it doesnot matter at which point between a Tick() and followingWait() pair that the signal handler is entered

In tsan11rec any asynchronous signal that arrived duringrecording becomes a synchronous signal upon replay Whena thread receives a signal it records the value of the tickseen during the most recent Tick() along with the signalvalue in a file called SIGNAL For example consider the casein Figure 2 where the Responder thread T2 has just per-formed the atomic load on tick 5 but has not yet attemptedto acquire the lock It receives the signal and performs aWait() and Tick() so that it can enter the signal handlerThe SIGNAL file will therefore have the line ldquo2 5 15rdquo in-dicating that thread T2 receives signal 15 at tick 5 Duringreplay when the Responder thread calls Tick() during tick5 it will raise signal 15 itself at the end of Tick() It doesnot matter at which precise point between Tick() and thefollowing Wait() that the signal arrived at during recordingit will float to the end of Tick() as shown in Figure 6

44 System CallsAs discussed in sect32 system calls are a significant source ofnondeterminism in an application To ensure that relevantproperties of an application are preserved during replay weneed to record the results of relevant system calls This is afundamental challenge that any record-and-replay systemmust face and state-of-the-art tools such as rr [66] aim tobe as comprehensive as possible in the system calls theysupport so that they can be applied directly to a wide range

582

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Figure 6 Signals are replayed immediately after the preced-ing tick

of applications In contrast the idea behind our sparse ap-proach to identify a minimal subset of system calls such thatrecording these system calls suffices to enable faithful replayof particular applications of interest At a high level we ap-proached this by incrementally adding support for systemcalls based on a trial-and-error process of first using straceto understand the full set of system calls issued by an appli-cation and then repeatedly attempting to record and replaythe application with tsan11rec incrementally adding supportfor additional system calls through analysis of the sources ofreplay failures We emphasise that this process did requirequite some manual effort and would need to be iteratedfurther to handle applications with significantly differentsystem call requirements compared to our case studies

The term syscall is a bit of a misnomer as both tsan and theinstrumentation detailed in this paper will instead interceptthe glibc wrappers around the syscalls instead of the syscalldirectly These glibc functions are much easier to use onbehalf of the programmer as they will take care of systemspecific details pushing the arguments onto the stack andinterpreting the results returned by the kernel We still usesyscall throughout as it is in the underlying syscall wherethe nondeterministic behaviours occurEach syscall takes a variable number of user-allocated

buffers and fills them with the appropriate data before set-ting errno and returning some value As these are sourcesof nondeterminism the return value errno and any appro-priate buffers will be compressed and stored in a demo filecalled SYSCALL During replay the actual data returned willbe overwritten by the data in SYSCALL Only the interactionwith the SYSCALL file is part of the critical section whichreduces contention in the scheduler

As an example consider the Listener thread in Figure 2performing the poll and recv syscalls in succession The re-turn value error number and two elements in the server_fdsstructure must be stored for poll the return value errornumber and contents of the buffer must be stored for recvThese will be treated as character buffers and have a simplerun length encoding applied

One of the difficulties that arises from adding a syscall isthe knock-on effect it can have with respect to other syscallsConsider for example the syscalls that interact with thefilesystem create open read etc If you intercept open

on the assumption that you may not have a valid file de-scriptor during replay where you did during record you willthen have to intercept every syscall that works with thatfile descriptor What starts out as a single interception be-comes potentially hundreds of intercepted syscalls There isa delicate balance to be struck between those syscalls thatneed to be recorded to make important execution featuresdeterministic during replay vs those syscalls that are bet-ter left un-recorded because (a) determinism of replay doesnot depend on them being recorded and (b) recording themleads to a snowball effect where many other syscalls mustalso be recorded to avoid desynchronisation

We have added syscall support to tsan11rec in a demand-driven manner by using strace to identify important callsfor key applications of interest The current set of syscallssupported includes read write recvmsg recv sendmsgaccept accept4 clock_gettime ioctl select and bindThese have allowed us to get a significant number of appli-cations up and running including the applications studiedin our evaluation modulo a few workarounds (detailed insect5) these applications issue many additional system callsthat we have found it unnecessary to recordmdashunnecessaryin the sense that simply re-issuing the system call during re-play has no observable effect on the applicationrsquos behaviourSometimes whether a call must be recorded depends on thefile descriptors that the call receives For instance for all ofour case studies it never proves necessary to record readand write calls whose file descriptors correspond to filesin the file system but it is necessary to record these callsif the associated file descriptors are associated with pipesused for inter-process communication Rather than this set ofsyscalls being a starting point towards full syscall coverageour view is that efficient record and replay that preserves par-allelism can benefit from selective syscall recording basedon application-specific knowledge and we envision the toolsupporting a core set of essential syscalls and being con-figurable with support for further syscalls to suit particularrecord and replay scenarios For example to handle a pro-gram such as htop would require instrumentation of theinteraction with the proc filesystem but doing this in thegeneral case would be wasteful and maybe even harmful iffuture calls depended on this interaction

45 Asynchronous EventsAsynchronous events are specific events that do not fit inwith any of the other categories discussed An importantcharacteristic is that they are not wrapped in a Wait() andTick() either because it was infeasible to do so duringrecording or because it would create a lot of unnecessaryoverhead These events still need to be replayed to ensurethe replay remains synchronised Currently there are twotypes of events Reschedule and Signal_wakeup

The reschedule event was discussed in sect33 It is necessaryto include this to ensure that the PRNG will be called the

583

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Figure 7 Right shows how reschedules are floated abovethe Tick()

same number of times in each critical section To see whythe signal wakeup event is necessary consider again a signalbeing received in the context of the example of Figure 2This time suppose the Responder thread receives the signalwhile it is disabled trying to acquire the lock Assume thatthe thread disabled itself on tick 10 the signal arrives andthe thread re-enables itself during tick 12 and then entersthe signal handler on tick 14 It is not OK for the thread tosimply not disable itself on tick 10 during replay as the poolof enabled threads for the scheduler to choose from duringticks 10 and 11 is different between recording and replayingwhich will affect the choice the scheduler will make

As with signals all asynchronous events are replayedsynchronously with all events that occur between a Tick()and the following Wait() floating up to the previous Tick()This is shown in Figure 7 These events are stored in theASYNC file

5 EvaluationTo evaluate the controlled scheduling abilities of tsan11recwe compare the strategies on the CDSchecker benchmarksuite (sect51) We then use larger applications to comparetsan11rec with rr [66] Of the few record-and-replay toolsthat are publicly available we chose to compare with rr be-cause (a) it represents the state-of-the-art (b) it is similar totsan11rec in terms of the kinds of applications it aims to sup-port We consider programs that bring challenges related tonetworking signals IO and real-time constraints Apachersquoshttpd web server (sect52) the PARSEC benchmarks and pbzip(sect53) and two first-person shooter games built on the SDLlibrary (sect54) The SDL case studies showcase applicationsthat tsan11rec can handle that are out of scope for rr dueto communication between the game and the OpenGL in-terface In contrast we also discuss practical limitations oftsan11rec that rr does not face related to the SQLite databaseapplication and Firefoxrsquos SpiderMonkey (sect55)

Throughout we describe howwe evolved the sparse record-ing facilities of tsan11rec and many practical challenges wefaced along the way our experience is that such challengeswhich seem fundamental to record and replay are typicallydescribed only briefly if at all in the literature We hope ourexposition will be valuable to other researchers

Common experimental setup All experiments were rununder Ubuntu 1404 LTS on an Intel i7-4770 8x340GHz plat-form with 16GB RAM The tsan11 and tsan11rec tools werebuilt on top of clang revision 286384 The version of rr usedis 510 As a key goal of our work is to apply race detectionto record and replay with controlled concurrency testingmost of the testing is done with race detection enabled evenwhen using rr We still show times for rr without race detec-tion for reference We use native rr tsan11 tsan11+rr andtsan11rec to refer to a program running without instrumen-tation under rr with tsan11 instrumentation under rr withtsan11 instrumentation and under tsan11rec respectively

51 CDSchecker Litmus TestsOverview The small programs (roughly 100 LOC each)used in prior work to evaluate CDSchecker [64] are useful toassess whether tsan11recrsquos controlled scheduling improveson tsan11rsquos ability to find races (including races related toweak memory) As these programs are closed the schedulerand memory model are the sources of nondeterminism

Experimental setup The experiments are run in the fol-lowing four modes tsan11 where tsan11 (which does notuse controlled concurrency testing) finds races tsan11 + rrwhere tsan11 finds races with rr recording and tsan11recrnd and tsan11rec queue where tsan11rec finds races usingthe random and queue strategies respectively We measurethe runtime of each tool on each benchmark averaged over1000 runs reporting standard deviation and remarking onthe coefficient of variation (CV)mdashthe ratio of the standarddeviation and mean

Results and discussion Table 1 summarises the resultsThe Time columns show mean execution times with stan-dard deviation Because these are short-running tests whosebehaviour depends intimately on themanner inwhich threadsinterleave the variance across runs is fairly high with theCV usually exceeding 1 with the exception of the longer-running results for rr for which the CV is always less than 1and usually less than 05 Rate columns show the percentageof all executions that exposed a data race

Comparing the tsan11rec rnd results with the tsan11 andtsan11rec queue results we see that randomized controlledschedulingmeans tsan11rec findsmore races across all bench-marks except chase-lev-deque and dekker-fences This isbecause tsan11 runs at the mercy of the OS scheduler whichtends to explore similar schedules on repeated runs and inthese small programs typically causes the main thread torun to completion before other threads are scheduled Theprice for this is higher runtime eg mcs-lock and ms-queuesuffer slow-downs of around 2times compared with tsan11 weattribute this to the total ordering of visible operations im-posed by tsan11rec The rr results show huge increases dueto a constant overhead applied to all programs But as rr is

584

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 1 Comparison over CDSchecker benchmarks between tsan + rr tsan11 and tsan11rec with controlled random andqueue scheduling Each benchmark was executed 1000 times in each mode The ldquoTimerdquo columns shows the mean executiontime (ms) and standard deviation (in parentheses) The ldquoRaterdquo column shows the percentage of runs that exhibited data races

tsan11 + rr tsan11 tsan11rec rnd tsan11rec queueTest Time Rate Time Rate Time Rate Time Rate

barrier 590 (1445) 00 8 (914) 00 4 (523) 375 6 (499) 00chase-lev-deque 579 (16669) 05 0 (194) 59 1 (307) 02 2 (378) 00dekker-fences 1776 (120854) 499 2 (402) 503 4 (483) 387 3 (446) 528linuxrwlocks 579 (12645) 03 2 (418) 01 4 (493) 624 3 (450) 00mcs-lock 574 (1371) 00 3 (445) 00 5 (523) 770 3 (447) 01mpmc-queue 574 (1371) 00 3 (449) 00 5 (500) 605 3 (446) 00ms-queue 3093 (10019) 1000 91 (6413) 1000 93 (8068) 1000 52 (6260) 1000

designed for larger applications this overhead will usuallybecome insignificant in more realistic examples

We examined a trace from chase-lev-deque to understandwhy tsan11rec rnd detects fewer races than tsan11 We foundthat from the creation of thread 2 to the point of the racethread 1 must perform 29 operations before thread 2 per-forms just 4 operations in order for the race to manifest Theprobability of this happening under uniform random sched-uling is very low We were able to coerce a race report outof the program by moving the creation of thread 2 to later inthe program This shows that different scheduling strategieswill affect how effective we are at finding data races andthat probabilistic concurrency testing (PCT) can be effectiveat prying out concurrency bugs [12]

52 httpdOverview Apachersquos httpd [3] is a widely-used modularhttp server that makes heavy use of concurrency to han-dle many simultaneous connections For record and replayit is of further interest due to its dependence on externalnetwork input We were able to handle httpd by capturingthe system calls described in sect44 with one workaroundthe accept system call which listens for incoming connec-tions relies on epoll_wait to listen for events This returnsuser-allocated pointers file descriptors and other data ina union with no easy way of knowing the active membersomething which tsan11rec cannot currently handle Weworked around this by using httpdrsquos option to switch to asimpler but slightly less efficient syscall poll which insteadsimply listens to file descriptors the results presented hereemploy this workaround A strength of rr is that it can han-dle httpd without this workaround due to its non-sparserecord and replay mechanism

Experimental setup We tested httpd version 2428 in single-process-multiple-thread mode using ab an Apache-providedprogram for server stress testing We sent 10000 queriesacross 10 concurrent threads to an httpd server for eachof the setups shown in Table 2 averaging results over 10runs We report on standard deviation and again remark on

the CV In the table rnd and queue refer to configurationsof tsan11rec and the presence or absence + rec indicateswhether recording was enabled

Results and discussion The results are shown in Table 2The columns under Race reports show regular results withrace reporting enabled the data under No reports shows re-sults where race-checking-capable tools do perform racechecking behind the scenes but do not actually emit racereports We make this distinction because tsan11 detectsso many races that the overhead of generating reports no-ticeably affects performance results when fewer races aredetected are more representative of the performance onewould expect using a future version of httpd in which manyraces are fixed The Throughput columns indicate the meannumber of queries the server responds to per second TheRate column is the mean number of race reports generated(only relevant for tsan11-based configurations) For eachstandard deviation is shown in parentheses Variance asmeasured by CV is low below 08 in all cases and usuallyless than 05 The Overhead columns indicate how muchslower performance is compared with native executionWithout reporting tsan11 already incurs a 3times overhead

compared to native Comparing results for native with rndand rnd+rec we find that adding controlled random sched-uling increase this overhead massively to between 79ndash89timesdepending on whether recording is enabled This is in thesame ball park as the overhead associated with rr 61times with-out race checking and 160timeswith tsan11 instrumentation (butstill with the actual reporting of races disabled) In contrastwhen our queue strategy is used the overhead comparedwith native drops to 9times and 21times with recording disabled vsenabled We attribute the gap between rrrnd and queue tohttpdrsquos heavy reliance on parallelism and frequent use ofshared mutexes This parallelism is removed by rr becausethe tool sequentializes the execution of threads while ourrandom scheduler only allows the thread that it has chosento be scheduled next to execute a visible operation even ifmany other threads are ready to execute visible operations

585

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Table 2 Comparison of throughput and race rate between native tsan11 rr and tsan11rec for Apachersquos httpd Results areaveraged over 10 runs ldquoThroughputrdquo shows mean throughput in queries per second ldquoRaterdquo is the number of races detectedper run (where relevant) Standard deviations are shown (in parentheses) Overhead is computed relative to native throughput

Race reports No reportsSetup Throughput Overhead Rate Throughput Overhead

Native NA NA NA 28895 (462256) 1timesrr NA NA NA 475 (608) 61timestsan11 3687 (29428) 8times 113 (1985) 9824 (143201) 3timestsan11 + rr 86 (6320) 336times 34 (1680) 181 (4637) 160timesrnd 141 (892) 205times 162 (3878) 367 (3326) 79timesqueue 818 (31033) 35times 381 (7362) 3261 (84367) 9timesrnd + rec 142 (1185) 203times 155 (3103) 326 (3894) 89timesqueue + rec 513 (8534) 56times 360 (6428) 1387 (24961) 21times

In contrast the queue strategy allows threads to perform visi-ble operations largely on demand Turning to the results withrace reporting enabled we see that the queue strategy hasthe highest race detection rate improving on uncontrolledtsan11 All other race detecting configurations lower the racedetection rate we believe this is because rr and rnd reducethe number of queries being responded to concurrently

Comparing demo file sizes when recording is enabled thetsan11rec demo files are around 48MB for both strategiesdropping to 48MB when only 1000 queries are issued sug-gesting that demo file size increases linearly with the numberof requests at a rate of around 48KB per request This couldbe reduced further with a more aggressive compression strat-egy but would likely increase the time overhead The demofile for rr is significantly smaller 66MB for 10000 querieswhich goes down to 39MB with 1000 queries implying arate of around 03KB per request plus a constant 36MB

53 PARSEC and pbzipOverview Wenext turn to the PARSEC benchmark suite [11]and pbzip application [68] both widely used for evaluatingconcurrency analysis tools For PARSEC we consider thebenchmarks used to evaluate iReplayer [53] however threeof these would not work on our system (dedup and swaptionsdo not compile and canneal crashes)

Experimental setup Each PARSEC (version 30) bench-mark was run with the lsquosimlargersquo test size shipped with thebenchmarks using 4 threads Pbzip (version 2-1113) wasused to compress a 400MB file with 4 threads We ran eachbenchmark 10 times per tool configuration and report av-erage runtimes We report on standard deviation and againremark on the CV A small number of races were discoveredfor some benchmarks and the race detecting tools largelyagreed on the number of races we do not detail these further

Results and discussion Table 3 shows the average timetaken to run each benchmark with each tool configurationwith standard deviation Variance as measured by CV is

reasonably low (CV is always below 1) For the tsan11rec re-sults + rec indicates whether recording was enabled Table 4is computed from the data of Table 3 and reports the over-head associated with running using each tool configurationcompared with native executionWith the exception of bodytrack and fluidanimate the

overhead tsan11rec brings over that of tsan11 is small and forall benchmarks whether recording is enabled or notmakes lit-tle difference However the overhead associated with tsan11+ rr (ie running tsan11-instrumented code under rr) is signif-icant compared with the tsan11 overhead alone despite thefact that running under rr without race detection is generallyefficient Interestingly rr without race detection performsless well on blackscholes compared with the tsan11rec con-figurations Digging into this we found that the benchmarkdistributes work between threads at the start of executionand then lets threads runwith little interaction This high par-allelismlow communication execution plays to the strengthsof tsan11rec where invisible operations are left to run in par-allel but is bad for rr which forces sequentialization acrossall operations

54 SDL-based GamesOverview Simple DirectMedia Layer (SDL) is a library thatconsolidates input graphics and various other forms of IOunder a single interface [75] and is typically used for gamesOnUbuntu 1604 SDL communicateswith X11 for IO pulseau-dio for sound andOpenGL for displayWe investigated recordand replay for two SDL-based games Zandronum [73] amulti-player Doom port (asymp400kLOC) and QuakeSpasm [71](asymp88kLOC) a port of Quake While these games support cus-tom record and replay by logging high level commands byworking at the threading and system call level tsan11rec canfacilitate record and replay of bugs that rely on low-levelinteractions to manifest We discuss below successful recordand replay of a bug in Zandronum that arises due to commu-nication of game data between the game client and serverwhich is not present in the gamersquos native replay

586

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 3 Execution times (s) for tool configurations across the pbzip and PARSEC benchmarks averaged across 10 runsStandard deviation is shown (in parentheses)

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 92 (031) 117 (049) 664 (311) 772 (187) 181 (030) 123 (066) 182 (030) 129 (129)blackscholes 04 (003) 08 (007) 10 (001) 20 (000) 07 (007) 07 (006) 07 (007) 07 (007)fluidanimate 08 (004) 160 (127) 21 (001) 385 (049) 464 (339) 390 (267) 504 (209) 398 (196)streamcluster 17 (019) 388 (441) 1133 (313) 1811 (179) 1030 (059) 485 (269) 1028 (018) 416 (261)bodytrack 05 (002) 72 (036) 38 (094) 327 (094) 500 (040) 73 (033) 500 (078) 78 (023)ferret 12 (007) 140 (086) 87 (044) 815 (220) 164 (066) 146 (060) 167 (071) 145 (037)

Table 4Overhead compared with native execution for tool configurations across the pbzip and PARSEC benchmarks computedfrom the results of Table 3

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 10times 13times 72times 84times 20times 13times 20times 14timesblackscholes 10times 20times 27times 53times 19times 19times 19times 18timesfluidanimate 10times 203times 27times 489times 590times 497times 642times 506timesstreamcluster 10times 224times 654times 1045times 595times 280times 593times 240timesbodytrack 10times 135times 72times 614times 938times 138times 939times 147timesferret 10times 119times 74times 695times 140times 125times 143times 124times

Our initial attempts to replay these SDL-based gamesfailed due to communication between the application andthe closed and proprietary NVIDIA OpenGL module on ourexperimental platform via ioctl syscallsWeworked aroundthis by ignoring ioctl during recording and letting it runnatively during replay This works because communicationwith the display driver has no effect on the game logic Dis-play interaction led to further problems with initialization ofthe input module We resorted to adapting the scheduler tolet the application run uninstrumented until SDL module ini-tialization had completed adding a custom scheduler hookto allow the application and scheduler to synchronize relatedto this These problems are not specific to our approach ortoolmdashindeed rr cannot handle these SDL-based games forsimilar reasonsmdashbut are rather a fundamental limitation ofrecording and replaying applications that make heavy useof IO To handle such applications one either needs to fullymock out IO components requiring a tremendous engi-neering effort or carefully determine those components thatshould not be instrumented and specify this via annotationsWith these workarounds we were able to handle both

games such that gameplay is displayed on screen during re-play gameplay would not be visible if the IO subsystem hadbeen mocked out and visibility might be useful in debuggingproblems that manifest as visual artifacts

Experimental setup Measuring game performance in away that allows us to compare the overhead of our sched-uling strategies is non-trivial The only metric we have isthe frame-rate (fps)mdashthe number of frames drawn to thescreen per second QuakeSpasm and Zandronum are capped

at 60 fps and will try to maintain this frame-rate dipping ifthey cannot keep up If the frame-rate is reduced too muchthe games become unplayable As a best-effort evaluationmechanism we report on whether the games are playableunder various tool combinations Additionally we found thatit was possible to remove the frame cap for Quakespasmso we report ball-park figures for the overheads of varioustool configurations when playing this game un-capped (Wecould not find a way to reliably remove the frame cap forZandronum) As mentioned previously we do not comparewith rr as it cannot record or replay the games We used Zan-dronum revision 10013dd3c3b57023f updated to use SDL2QuakeSpasm version 0930 and SDL version 205

Results anddiscussion With the random tsan11rec sched-uler Zandronum was unplayable even with recording dis-abled the frame-rate dropped to below 1 fps This is due tothe random scheduler starving the main thread by frequentlyscheduling other less critical threads (eg the audio thread)In contrast the queue scheduler could maintain the full 60fps with recording enabled for 100 seconds of play the demosize grew to just under 8MB of which 65MB was for syscalls

To test tsan11recrsquos ability to replay network communica-tion we found a previously fixed Zandronum bug [88] thatrelies on an error in this communication to manifest Thisbug involves incorrect game state information being sentfrom the server to the client during a map change We repli-cated the bug with a server and two clients one of whichwas recording After about 12 minutes the bug appearedand resulted in a demo file of 43MB We then replayed thedemo and the bug appeared as expected This demonstrates

587

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

that our tool can be used to accurately capture and facilitatereplay of bugs in large networked applicationsFor QuakeSpasm we found that it was possible to play

the game without dropping below 60 fps using tsan11 and alltsan11rec configurations To further investigate the overheadof each tool configuration on this case study we removedthe fps cap then played the game 5 times per tool configu-ration for 90 seconds per play enabling a mode where thegamersquos fps is periodically appended to a file We made a besteffort to play the game in a similar manner on each run butinevitably there will still be high variation in game activitybetween plays Indicative results are shown in Table 5 Thernd and queue configurations refer to tsan11rec with the ran-dom and queue strategies respectively and with recordingdisabled while the ldquo+ recrdquo tool configurations are similarbut with recording enabled The ldquoOverheadrdquo column showsthe overhead observed compared with native execution Thetake-away from these results is that the instrumentationoverhead for both tsan11 and tsan11rec is surprisingly mod-est (generally less than 2times) and that the additional overheadassociated with enabling recording in tsan11rec is low

55 Limitations SQLite and SpiderMonkeyA downside of our sparse approach to record and replay isthat different applications may have incompatible require-ments regarding what should be recorded and what mustnot be recorded For example recording memory layout andattempting to enforce the same layout on replay would notonly slow down the SDL games (see sect54) to the point ofbeing unplayable but would also cause problems related tocommunication with the display driver Yet the behaviour ofsome programs will depend on the memory layout such asiterating over an ordered C++ container that holds pointersIn particular we experimented with applying tsan11rec

to the SQLite database management library [76] and to Spi-derMonkey Firefoxrsquos JavaScript management engine [60]While tsan11rec was applicable for controlled schedulingof these applications we found that replay would rapidlydesynchronise due to memory layout nondeterminism caus-ing conditionals that rely on the values of pointers to evaluatedifferently during replay Tools such as rr can handle theseprograms reliably by enforcing the same memory layoutThis is a trade-off the non-sparse approach of rr can leadto higher overheads as demonstrated in sect52 and sect53 Analternative to adapting the record-and-replay tool so thatit always enforces memory layout determinism would beto adapt the application of interest so that default memoryallocation is replaced with a deterministic memory allocator

6 Related WorkControlled scheduling A large amount of work has goneinto the use of scheduling strategies as a form of state spaceexploration (eg [29 30 61 62 78 87]) and on techniques

aimed at reducing the size of the state space such as dynamicpartial-order reduction [31 89]) A particularly notable con-trolled scheduling tool in terms of successful practical ap-plication is Microsoftrsquos CHESS [62] which aims to system-atically explore all interleavings of a test scenario Similarto our approach each visible instruction has an associatedcustom wrapper that intercepts the real instruction callinginto the CHESS schedulerSchedule bounding techniques notably preemption- and

delay-bounding [26 61] have been shown to be successful inprioritising the order in which thread schedules are exploredduring controlled concurrency testing They prioritise ex-ploring schedules that exhibit small numbers of preemptionsbetween threads in line with empirical evidence that bugsrarely require large numbers of preemptions in order to man-ifest [56] Combining such techniques with the tsan11recalgorithm is an appealing idea in principle but is hinderedby the assumption that the program under test takes a fixedinput and that the scheduler is the only source of nonde-terminism This assumption allows running the programagain and again trying different schedules In the contextof tsan11rec which can be used to record and replay ap-plications where the environment presents other forms ofnondeterminism the manner in which the program interactswith its environment is captured with respect to a particularthread schedule and other thread schedules might involvecompletely different environmental interactions We believea more promising approach would be to bring ideas from theprobabilistic concurrency testing (PCT) algorithm [12] to thetsan11rec setting to introduce a degree of skewing to ourrandom strategy so that it explores more diverse schedules

Record and replay Record and replay has been a signifi-cant area of research with many tools being created to facil-itate it [2 9 19 24 32 36 39 40 43ndash45 47 50 53 54 57 5765ndash67 72 79 81] The general premise behind them is simi-lar identify order nondeterminism and input nondeterminismand create techniques to capture them while recording andcontrol them during replayVarious tools extend the OS in some way or require spe-

cific hardware [2 6 9 19 20 24 47 50 79 81] This has thebenefit of giving the tool access to much more of the systemsuch as memory pages and process information For exam-ple Scribe [47] will directly modify the system schedulerinstead of coercing it and achieves slowdowns as low as105times However this severely hits the usability of the toolas it requires the user to deploy a modified OS

Other tools reside entirely in user space [10 15 32 34 43ndash45 49 51 53 54 57 66 67 72 80 84] and trade performancefor usability This is the category that tsan11rec falls intoEase of use is particularly important in persuading users toadopt the tool rr [66] in particular allows the user to recorda program by simply passing the binary to rr as a parameter

588

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 5 Indicative frames per second (fps) result captured by playing Quakespasm for 90 seconds five times per toolconfiguration and capturing the fps reports recorded by the game Frame-rate results are reported for the minimum maximummedian and mean case as well as the 25th and 75th quartiles

Min 25th Median 75th Max Mean OverheadNative 291 369 400 428 502 4004 10timestsan11 125 193 225 275 431 2338 17timesrnd 128 205 238 281 421 2471 16timesqueue 84 188 233 273 435 2333 17timesrnd + rec 113 178 212 253 366 2168 18timesqueue + rec 86 161 185 227 348 1930 21times

and as such has become the definitive tool for record and re-play We have performed an extensive comparison with rr insect5 and note that while rr outclasses tsan11rec in some appli-cations and can handle applications that are out of scope fortsan11rec (see sect55) rr shows significantly higher overheadcompared with tsan11rec for a number of applications thatrely on a high degree of parallelism for performance Furtherour sparse approach with suitable workarounds enablesrecord and replay for graphical applications (the SDL-basedgames of sect54) that rr cannot currently handleBecause it builds on tsan11 which itself uses compiler

instrumentation and a modified libcxx tsan11rec shares sim-ilarities with tools that depend on language implementationor library-level support [1 13 13 18 36 58] Notable exam-ples here include R2 [36] and IntelliTrace [58]Whole system replay aims to record all system nonde-

terminism [14 19 22 24 25 27 53 77] Among these therecent iReplayer tool [53] performs record and replay in-situavoiding many of the problems (eg memory layout issues)that otherwise come from running the record and replayexecutions under different processes

Some tools focus on the order-nondeterminism allowingthem to retain their parallelism and thus reducing the over-head of multi-threaded application [25 42 59 63 70 86]Castor [57] will provide each thread with its own bufferfor storing information and serialize them at a later timetsan11rec also fits into this category as it will both preserverparallelism of invisible operations and apply a schedulingstrategy to resolve this nondeterminism

An alternative to recording a programrsquos nondeterminismis to remove it making some or all aspects of the programdeterministic [5 8 16 17 21 55] For example Dthreads [55]ensures that memory accesses are deterministic on eachexecution Such approaches can have a significant probeproblem by removing the behaviour necessary for certainbugs to manifest in return for avoiding the performanceoverhead associated with handling order-nondeterminism

Multi-version execution Multi-version (or multi-variant)execution (MVE) is a method for concurrently running mul-tiple processes that are expected to behave in a semanticallysimilar manner [41 46 69 82 83] MVE can be used to detect

security vulnerabilities in applications if a variant divergesthis could indicate that an attacker has modified the processin some way [46 82 83] It can also be used for running dif-ferent analyses on identical processes that would not workwhen run together on the same process such as the clangsanitizers [69] Most MVE systems hinge on a special moni-tor thread that controls the generation an maintenance ofa number of variants Keeping the variants in sync with re-spect to nondeterministic behaviours presents many of thesame problems that are associated with record and replay

7 ConclusionWe have presented tsan11rec which brings together con-trolled scheduling record and replay and dynamic data racedetection for the dynamic analysis of CC++11 applicationsOur experimental evaluation demonstrates that the tool isin many cases competitive in terms of performance withrr a state-of-the-art record and replay tool in some casesout-performing rr due to tsan11recrsquos ability to preserve par-allelism in applications under test to a high degree We havealso shown that our tool is capable of recording and replayingSDL-based video games by exploiting our sparse approachto avoid recording aspects of gamedisplay communicationthat are fundamentally hard to control The flip side of oursparse approach is that by limiting what is recorded ourtool desynchronises on uncontrolled forms of nondetermin-ism such as that related to memory layout by capturingthis form of nondeterminism tools such as rr do not sufferfrom this problem Two exciting avenues for future workinclude investigating a spectrum of recording granularitiesto bridge the gap between our sparse approach and stricterapproaches in a configurable manner and to investigatebug detection using a richer range of scheduling strategiesincluding schedule bounding [26 61 62] and probabilisticconcurrency testing [12]

AcknowledgmentsThis work was supported by a PhD studentship funded byGCHQ and EPSRC projects EPN0263141 and EPR0068651

589

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

References[1] Bowen Alpern Jong-Deok Choi Ton Ngo Manu Sridharan and

John M Vlissides 2001 A Perturbation-Free Replay Platform forCross-Optimized Multithreaded Applications In Proceedings of the15th International Parallel amp Distributed Processing Symposium (IPDPS-01) San Francisco CA USA April 23-27 2001 IEEE Computer Society23 httpsdoiorg101109IPDPS2001924957

[2] GautamAltekar and Ion Stoica 2009 ODR output-deterministic replayfor multicore debugging In Proceedings of the 22nd ACM Symposiumon Operating Systems Principles 2009 SOSP 2009 Big Sky Montana USAOctober 11-14 2009 Jeanna Neefe Matthews and Thomas E Anderson(Eds) ACM 193ndash206 httpsdoiorg10114516295751629594

[3] Apache Software Foundation 2018 Apache httpd httpshttpdapacheorgdevdevnoteshtml

[4] Remzi H Arpaci-Dusseau and Brad Chen (Eds) 2010 9th USENIXSymposium on Operating Systems Design and Implementation OSDI2010 October 4-6 2010 Vancouver BC Canada Proceedings USENIXAssociation httpwwwusenixorgeventosdi10techfull_papersosdi10_proceedingspdf

[5] Amittai Aviram Shu-Chun Weng Sen Hu and Bryan Ford 2010 Ef-ficient System-Enforced Deterministic Parallelism See [4] 193ndash206httpwwwusenixorgeventsosdi10techfull_papersAvirampdf

[6] David F Bacon and Seth Copen Goldstein 1991 Hardware-assisted Re-play of Multiprocessor Programs In Proceedings of the 1991 ACMONRWorkshop on Parallel and Distributed Debugging (PADD rsquo91) ACMNew York NY USA 194ndash206 httpsdoiorg101145122759122777

[7] Mark Batty Scott Owens Susmit Sarkar Peter Sewell and TjarkWeber2011 Mathematizing C++ concurrency In POPL 55ndash66

[8] Tom Bergan Owen Anderson Joseph Devietti Luis Ceze and DanGrossman 2010 CoreDet a compiler and runtime system for deter-ministic multithreaded execution See [38] 53ndash64 httpsdoiorg10114517360201736029

[9] Tom Bergan Nicholas Hunt Luis Ceze and Steven D Gribble 2010Deterministic Process Groups in dOS See [4] 177ndash191 httpwwwusenixorgeventsosdi10techfull_papersBerganpdf

[10] Sanjay Bhansali Wen-Ke Chen Stuart de Jong Andrew EdwardsRon Murray Milenko Drinić Darek Mihočka and Joe Chau 2006Framework for Instruction-level Tracing and Analysis of Program Ex-ecutions In Proceedings of the 2Nd International Conference on VirtualExecution Environments (VEE rsquo06) ACM New York NY USA 154ndash163httpsdoiorg10114511347601220164

[11] Christian Bienia Sanjeev Kumar Jaswinder Pal Singh and Kai Li 2008The PARSEC benchmark suite characterization and architectural im-plications In 17th International Conference on Parallel Architecture andCompilation Techniques PACT 2008 Toronto Ontario Canada October25-29 2008 Andreas Moshovos David Tarditi and Kunle Olukotun(Eds) ACM 72ndash81 httpsdoiorg10114514541151454128

[12] Sebastian Burckhardt Pravesh Kothari Madanlal Musuvathi and San-tosh Nagarakatte 2010 A randomized scheduler with probabilisticguarantees of finding bugs See [38] 167ndash178 httpsdoiorg10114517360201736040

[13] Brian Burg Richard Bailey Andrew J Ko and Michael D Ernst 2013Interactive recordreplay for web application debugging In The 26thAnnual ACM Symposium on User Interface Software and TechnologyUISTrsquo13 St Andrews United Kingdom October 8-11 2013 ShahramIzadi Aaron J Quigley Ivan Poupyrev and Takeo Igarashi (Eds) ACM473ndash484 httpsdoiorg10114525019882502050

[14] Anton Burtsev David Johnson Mike Hibler Eric Eide and John Regehr2016 Abstractions for Practical Virtual Machine Replay In Proceedingsof the 12th ACM SIGPLANSIGOPS International Conference on VirtualExecution Environments Atlanta GA USA April 2-3 2016 VishakhaGupta-Cledat Donald E Porter and Vivek Sarkar (Eds) ACM 93ndash106httpsdoiorg10114528922422892257

[15] M E Chastain 1999 MEC (January 1999) httpslwnnet19990121amechtml

[16] Heming Cui Jiriacute Simsa Yi-Hong Lin Hao Li Ben Blum Xinan XuJunfeng Yang Garth A Gibson and Randal E Bryant 2013 Parrota practical runtime for deterministic stable and reliable threads InACM SIGOPS 24th Symposium on Operating Systems Principles SOSP rsquo13Farmington PA USA November 3-6 2013 Michael Kaminsky and MikeDahlin (Eds) ACM 388ndash405 httpsdoiorg10114525173492522735

[17] Heming Cui Jingyue Wu John Gallagher Huayang Guo and JunfengYang 2011 Efficient deterministic multithreading through schedulerelaxation See [85] 337ndash351 httpsdoiorg10114520435562043588

[18] P Deva 2018 Chronon (2018) httpchrononsystemscomblogdesign-and-architecture-ofthe-chronon-record-0

[19] David Devecsery Michael Chow Xianzheng Dou Jason Flinn andPeter M Chen 2014 Eidetic Systems In 11th USENIX Symposium onOperating Systems Design and Implementation OSDI rsquo14 BroomfieldCO USA October 6-8 2014 Jason Flinn and Hank Levy (Eds) USENIXAssociation 525ndash540 httpswwwusenixorgconferenceosdi14technical-sessionspresentationdevecsery

[20] Joseph Devietti Brandon Lucia Luis Ceze andMark Oskin 2009 DMPdeterministic shared memory multiprocessing In Proceedings of the14th International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2009 Washington DC USAMarch 7-11 2009 Mary Lou Soffa and Mary Jane Irwin (Eds) ACM85ndash96 httpsdoiorg10114515082441508255

[21] Joseph Devietti Jacob Nelson Tom Bergan Luis Ceze and Dan Gross-man 2011 RCDC a relaxed consistency deterministic computer See[37] 67ndash78 httpsdoiorg10114519503651950376

[22] Brendan Dolan-Gavitt Josh Hodosh Patrick Hulin Tim Leek andRyan Whelan 2015 Repeatable Reverse Engineering with PANDAIn Proceedings of the 5th Program Protection and Reverse EngineeringWorkshop (PPREW-5) ACM New York NY USA Article 4 11 pageshttpsdoiorg10114528438592843867

[23] Richard Draves and Robbert van Renesse (Eds) 2008 8th USENIX Sym-posium on Operating Systems Design and Implementation OSDI 2008December 8-10 2008 San Diego California USA Proceedings USENIXAssociation httpswwwusenixorgpublicationsproceedingsf[0]=im_group_audience3A73

[24] George W Dunlap Samuel T King Sukru Cinar Murtaza A Bas-rai and Peter M Chen 2002 ReVirt Enabling Intrusion AnalysisThrough Virtual-Machine Logging and Replay In 5th Symposium onOperating System Design and Implementation (OSDI 2002) Boston Mas-sachusetts USA December 9-11 2002 David E Culler and Peter Dr-uschel (Eds) USENIX Association httpwwwusenixorgeventsosdi02techdunlaphtml

[25] George W Dunlap Dominic G Lucchetti Michael A Fetterman andPeter M Chen 2008 Execution replay of multiprocessor virtual ma-chines In Proceedings of the 4th International Conference on VirtualExecution Environments VEE 2008 Seattle WA USA March 5-7 2008David Gregg Vikram S Adve and Brian N Bershad (Eds) ACM121ndash130 httpsdoiorg10114513462561346273

[26] Michael Emmi Shaz Qadeer and Zvonimir Rakamaric 2011 Delay-bounded scheduling In Proceedings of the 38th ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages POPL 2011 AustinTX USA January 26-28 2011 Thomas Ball and Mooly Sagiv (Eds)ACM 411ndash422 httpsdoiorg10114519263851926432

[27] Jakob Engblom Daniel Aarno and Bengt Werner 2010 Full-SystemSimulation from Embedded to High-Performance Systems Springer USBoston MA 25ndash45 httpsdoiorg101007978-1-4419-6175-4_3

[28] Cormac Flanagan and Stephen N Freund 2009 FastTrack efficientand precise dynamic race detection In Proceedings of the 2009 ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI 2009 Dublin Ireland June 15-21 2009 Michael Hind andAmer Diwan (Eds) ACM 121ndash133 httpsdoiorg1011451542476

590

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

1542490[29] Cormac Flanagan and Stephen N Freund 2010 Adversarial memory

for detecting destructive races In Proceedings of the 2010 ACM SIGPLANConference on Programming Language Design and Implementation PLDI2010 Toronto Ontario Canada June 5-10 2010 Benjamin G Zornand Alexander Aiken (Eds) ACM 244ndash254 httpsdoiorg10114518065961806625

[30] Cormac Flanagan and Stephen N Freund 2010 The RoadRunnerdynamic analysis framework for concurrent programs In Proceedingsof the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis forSoftware Tools and Engineering PASTErsquo10 Toronto Ontario CanadaJune 5-6 2010 Sorin Lerner and Atanas Rountev (Eds) ACM 1ndash8httpsdoiorg10114518066721806674

[31] Cormac Flanagan and Patrice Godefroid 2005 Dynamic partial-orderreduction for model checking software In Proceedings of the 32nd ACMSIGPLAN-SIGACT Symposium on Principles of Programming LanguagesPOPL 2005 Long Beach California USA January 12-14 2005 JensPalsberg and Martiacuten Abadi (Eds) ACM 110ndash121 httpsdoiorg10114510403051040315

[32] Dennis Geels Gautam Altekar Scott Shenker and Ion Stoica 2006Replay Debugging for Distributed Applications (Awarded Best Paper)In Proceedings of the 2006 USENIX Annual Technical Conference BostonMA USA May 30 - June 3 2006 Atul Adya and Erich M Nahum (Eds)USENIX 289ndash300 httpwwwusenixorgeventsusenix06techgeelshtml

[33] Patrice Godefroid 2005 Software Model Checking The VeriSoftApproach Formal Methods in System Design 26 2 (2005) 77ndash101httpsdoiorg101007s10703-005-1489-x

[34] C Gottbrath 2008 Reverse debugging with the TotalView debugger(May 2008)

[35] David Grove and Steve Blackburn (Eds) 2015 Proceedings of the36th ACM SIGPLAN Conference on Programming Language Design andImplementation Portland OR USA June 15-17 2015 ACM httpdlacmorgcitationcfmid=2737924

[36] Zhenyu Guo Xi Wang Jian Tang Xuezheng Liu Zhilei Xu Ming WuM Frans Kaashoek and Zheng Zhang 2008 R2 An Application-LevelKernel for Record and Replay See [23] 193ndash208 httpwwwusenixorgeventsosdi08techfull_papersguoguopdf

[37] Rajiv Gupta and Todd C Mowry (Eds) 2011 Proceedings of the 16thInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2011 Newport Beach CAUSA March 5-11 2011 ACM

[38] James C Hoe and Vikram S Adve (Eds) 2010 Proceedings of the 15thInternational Conference on Architectural Support for Programming Lan-guages and Operating Systems ASPLOS 2010 Pittsburgh PennsylvaniaUSA March 13-17 2010 ACM

[39] Nima Honarmand and Josep Torrellas 2014 RelaxReplay recordand replay for relaxed-consistency multiprocessors In ArchitecturalSupport for Programming Languages and Operating Systems ASPLOSrsquo14 Salt Lake City UT USA March 1-5 2014 Rajeev BalasubramonianAl Davis and Sarita V Adve (Eds) ACM 223ndash238 httpsdoiorg10114525419402541979

[40] Nima Honarmand and Josep Torrellas 2014 Replay debugging Lever-aging record and replay for program debugging In ACMIEEE 41stInternational Symposium on Computer Architecture ISCA 2014 Min-neapolis MN USA June 14-18 2014 IEEE Computer Society 455ndash456httpsdoiorg101109ISCA20146853229

[41] Petr Hosek and Cristian Cadar 2015 VARAN the Unbelievable AnEfficient N-version Execution Framework In Proceedings of the Twenti-eth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS rsquo15 Istanbul Turkey March14-18 2015 Oumlzcan Oumlzturk Kemal Ebcioglu and Sandhya Dwarkadas(Eds) ACM 339ndash353 httpsdoiorg10114526943442694390

[42] Derek Hower and Mark D Hill 2008 Rerun Exploiting Episodes forLightweight Memory Race Recording In 35th International Symposiumon Computer Architecture (ISCA 2008) June 21-25 2008 Beijing China265ndash276 httpsdoiorg101109ISCA200826

[43] Jeff Huang Peng Liu and Charles Zhang 2010 LEAP lightweightdeterministic multi-processor replay of concurrent java programsIn Proceedings of the 18th ACM SIGSOFT International Symposium onFoundations of Software Engineering 2010 Santa Fe NM USA November7-11 2010 Gruia-Catalin Roman and Andreacute van der Hoek (Eds) ACM207ndash216 httpsdoiorg10114518822911882323

[44] Jeff Huang Charles Zhang and Julian Dolby 2013 CLAP recordinglocal executions to reproduce concurrency failures In ACM SIGPLANConference on Programming Language Design and Implementation PLDIrsquo13 Seattle WA USA June 16-19 2013 Hans-Juergen Boehm and Cor-mac Flanagan (Eds) ACM 141ndash152 httpsdoiorg10114524621562462167

[45] Shiyou Huang Bowen Cai and Jeff Huang 2017 Towards Production-Run Heisenbugs Reproduction on Commercial Hardware In 2017USENIX Annual Technical Conference USENIX ATC 2017 Santa ClaraCA USA July 12-14 2017 403ndash415 httpswwwusenixorgconferenceatc17technical-sessionspresentationhuang

[46] Koen Koning Herbert Bos and Cristiano Giuffrida 2016 Secure andEfficient Multi-Variant Execution Using Hardware-Assisted ProcessVirtualization In 46th Annual IEEEIFIP International Conference onDependable Systems and Networks DSN 2016 Toulouse France June28 - July 1 2016 IEEE Computer Society 431ndash442 httpsdoiorg101109DSN201646

[47] Oren Laadan Nicolas Viennot and Jason Nieh 2010 Transparentlightweight application execution replay on commodity multiproces-sor operating systems In SIGMETRICS 2010 Proceedings of the 2010ACM SIGMETRICS International Conference on Measurement and Mod-eling of Computer Systems New York New York USA 14-18 June 2010Vishal Misra Paul Barford and Mark S Squillante (Eds) ACM 155ndash166 httpsdoiorg10114518110391811057

[48] Leslie Lamport 1978 Time Clocks and the Ordering of Events ina Distributed System Commun ACM 21 7 (1978) 558ndash565 httpsdoiorg101145359545359563

[49] Dongyoon Lee Peter M Chen Jason Flinn and Satish Narayanasamy2012 Chimera hybrid program analysis for determinism In ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI rsquo12 Beijing China - June 11 - 16 2012 Jan Vitek Haibo Linand Frank Tip (Eds) ACM 463ndash474 httpsdoiorg10114522540642254119

[50] Dongyoon Lee Benjamin Wester Kaushik Veeraraghavan SatishNarayanasamy Peter M Chen and Jason Flinn 2010 Respec efficientonline multiprocessor replayvia speculation and external determinismSee [38] 77ndash90 httpsdoiorg10114517360201736031

[51] Kyu Hyung Lee Dohyeong Kim and Xiangyu Zhang 2014Infrastructure-Free Logging and Replay of Concurrent Executionon Multiple Cores In ECOOP 2014 - Object-Oriented Programming- 28th European Conference Uppsala Sweden July 28 - August 12014 Proceedings (Lecture Notes in Computer Science) Richard EJones (Ed) Vol 8586 Springer 232ndash256 httpsdoiorg101007978-3-662-44202-9_10

[52] Christopher Lidbury and Alastair F Donaldson 2017 Dynamic racedetection for C++11 In Proceedings of the 44th ACM SIGPLAN Sympo-sium on Principles of Programming Languages POPL 2017 Paris FranceJanuary 18-20 2017 Giuseppe Castagna and Andrew D Gordon (Eds)ACM 443ndash457 httpsdoiorg1011453009837

[53] Hongyu Liu Sam Silvestro Wei Wang Chen Tian and Tongping Liu2018 iReplayer in-situ and identical record-and-replay for multi-threaded applications In Proceedings of the 39th ACM SIGPLAN Confer-ence on Programming Language Design and Implementation PLDI 2018

591

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Philadelphia PA USA June 18-22 2018 Jeffrey S Foster and Dan Gross-man (Eds) ACM 344ndash358 httpsdoiorg10114531923663192380

[54] Peng Liu Xiangyu Zhang Omer Tripp and Yunhui Zheng 2015 Lightreplay via tightly bounded recording See [35] 55ndash64 httpsdoiorg10114527379242738001

[55] Tongping Liu Charlie Curtsinger and EmeryD Berger 2011 Dthreadsefficient deterministic multithreading See [85] 327ndash336 httpsdoiorg10114520435562043587

[56] Shan Lu Soyeon Park Eunsoo Seo and Yuanyuan Zhou 2008 Learn-ing from mistakes a comprehensive study on real world concurrencybug characteristics In Proceedings of the 13th International Confer-ence on Architectural Support for Programming Languages and Op-erating Systems ASPLOS 2008 Seattle WA USA March 1-5 2008Susan J Eggers and James R Larus (Eds) ACM 329ndash339 httpsdoiorg10114513462811346323

[57] Ali Joseacute Mashtizadeh Tal Garfinkel David Terei David Maziegraveresand Mendel Rosenblum 2017 Towards Practical Default-On Multi-Core RecordReplay In Proceedings of the Twenty-Second InternationalConference on Architectural Support for Programming Languages andOperating Systems ASPLOS 2017 Xirsquoan China April 8-12 2017 YunjiChen Olivier Temam and John Carter (Eds) ACM 693ndash708 httpsdoiorg10114530376973037751

[58] Microsoft 2018 Understanding IntelliTrace part I What the $ isIntelliTrace (2018) httpsblogsmsdnmicrosoftcomzainnab20130212understanding-intellitrace-part-i-what-the-is-intellitrace

[59] Pablo Montesinos Luis Ceze and Josep Torrellas 2008 DeLoreanRecording and Deterministically Replaying Shared-Memory Multi-processor Execution Effciently In 35th International Symposium onComputer Architecture (ISCA 2008) June 21-25 2008 Beijing China289ndash300 httpsdoiorg101109ISCA200836

[60] Mozilla 2018 SpiderMonkey httpsdevelopermozillaorgen-USdocsMozillaProjectsSpiderMonkey

[61] MadanlalMusuvathi and Shaz Qadeer 2007 Iterative context boundingfor systematic testing of multithreaded programs In Proceedings of theACM SIGPLAN 2007 Conference on Programming Language Design andImplementation San Diego California USA June 10-13 2007 JeanneFerrante and Kathryn S McKinley (Eds) ACM 446ndash455 httpsdoiorg10114512507341250785

[62] Madanlal Musuvathi Shaz Qadeer Thomas Ball Geacuterard Basler Pi-ramanayagam Arumuga Nainar and Iulian Neamtiu 2008 Find-ing and Reproducing Heisenbugs in Concurrent Programs See[23] 267ndash280 httpwwwusenixorgeventsosdi08techfull_papersmusuvathimusuvathipdf

[63] Satish Narayanasamy Gilles Pokam and Brad Calder 2005 BugNetContinuously Recording Program Execution for Deterministic ReplayDebugging In 32st International Symposium on Computer Architecture(ISCA 2005) 4-8 June 2005 Madison Wisconsin USA IEEE ComputerSociety 284ndash295 httpsdoiorg101109ISCA200516

[64] Brian Norris and Brian Demsky 2013 CDSchecker checking concur-rent data structures written with CC++ atomics In Proceedings ofthe 2013 ACM SIGPLAN International Conference on Object OrientedProgramming Systems Languages amp Applications OOPSLA 2013 part ofSPLASH 2013 Indianapolis IN USA October 26-31 2013 131ndash150

[65] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2016 Lightweight User-Space Record And ReplayCoRR abs161002144 (2016) arXiv161002144 httparxivorgabs161002144

[66] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2017 Engineering Record and Replay for Deploy-ability In 2017 USENIX Annual Technical Conference USENIX ATC 2017Santa Clara CA USA July 12-14 2017 377ndash389 httpswwwusenixorgconferenceatc17technical-sessionspresentationocallahan

[67] Harish Patil Cristiano Pereira Mack Stallcup Gregory Lueck andJames Cownie 2010 PinPlay a framework for deterministic replay

and reproducible analysis of parallel programs In Proceedings of theCGO 2010 The 8th International Symposium on Code Generation andOptimization Toronto Ontario Canada April 24-28 2010 AndreasMoshovos J Gregory Steffan Kim M Hazelwood and David R Kaeli(Eds) ACM 2ndash11 httpsdoiorg10114517729541772958

[68] pbzip2 development team 2018 pbzip2 httpslaunchpadnetpbzip2[69] Luiacutes Pina Anastasios Andronidis and Cristian Cadar 2018 FreeDA

deploying incompatible stock dynamic analyses in production viamulti-version execution In Proceedings of the 15th ACM InternationalConference on Computing Frontiers CF 2018 Ischia Italy May 08-102018 David R Kaeli and Miquel Pericagraves (Eds) ACM 1ndash10 httpsdoiorg10114532032173203237

[70] Gilles Pokam Klaus Danne Cristiano Pereira Rolf Kassa Tim KranichShiliang Hu Justin Emile Gottschlich Nima Honarmand Nathan Daut-enhahn Samuel T King and Josep Torrellas 2013 QuickRec proto-typing an intel architecture extension for record and replay of mul-tithreaded programs In The 40th Annual International Symposiumon Computer Architecture ISCArsquo13 Tel-Aviv Israel June 23-27 2013Avi Mendelson (Ed) ACM 643ndash654 httpsdoiorg10114524859222485977

[71] QuakeSpasm 2018 QuakeSpasm An engine for iD softwarersquos Quakehttpquakespasmsourceforgenet

[72] Yasushi Saito 2005 Jockey a user-space library for record-replaydebugging In Proceedings of the Sixth International Workshop on Auto-mated Debugging AADEBUG 2005 Monterey California USA Septem-ber 19-21 2005 Clinton Jeffery Jong-Deok Choi and Raimondas Lence-vicius (Eds) ACM 69ndash76 httpsdoiorg10114510851301085139

[73] Torr Samaho 2018 Zandronum httpszandronumcom[74] Konstantin Serebryany and Timur Iskhodzhanov 2009 ThreadSani-

tizer Data Race Detection in Practice In WBIA 62ndash71[75] Simple DirectMedia Layer 2018 SDL 20 library httpswwwlibsdl

orgdownload-20php[76] SQLite 2018 SQLite 3240 httpssqliteorgreleaselog3_24_0html[77] Deepa Srinivasan and Xuxian Jiang 2012 Time-Traveling Forensic

Analysis of VM-Based High-Interaction Honeypots In Security and Pri-vacy in Communication Networks Muttukrishnan Rajarajan Fred PiperHaining Wang and George Kesidis (Eds) Springer Berlin HeidelbergBerlin Heidelberg 209ndash226

[78] Paul Thomson Alastair F Donaldson and Adam Betts 2014 Concur-rency testing using schedule bounding an empirical study In ACMSIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPPoPP rsquo14 Orlando FL USA February 15-19 2014 Joseacute E Moreira andJames R Larus (Eds) ACM 15ndash28 httpsdoiorg10114525552432555260

[79] Joseph Tucek Shan Lu Chengdu Huang Spiros Xanthos andYuanyuan Zhou 2007 Triage diagnosing production run failuresat the userrsquos site In Proceedings of the 21st ACM Symposium on Oper-ating Systems Principles 2007 SOSP 2007 Stevenson Washington USAOctober 14-17 2007 Thomas C Bressoud and M Frans Kaashoek (Eds)ACM 131ndash144 httpsdoiorg10114512942611294275

[80] Undo 2018 Reversible debugging tools for CC++ on Linux amp Android(2018) httpundo-softwarecom

[81] Kaushik Veeraraghavan Dongyoon Lee Benjamin Wester JessicaOuyang Peter M Chen Jason Flinn and Satish Narayanasamy 2011DoublePlay parallelizing sequential logging and replay See [37] 15ndash26 httpsdoiorg10114519503651950370

[82] Stijn Volckaert Bart Coppens Bjorn De Sutter Koen De BosscherePer Larsen and Michael Franz 2017 Taming Parallelism in a Multi-Variant Execution Environment In Proceedings of the Twelfth EuropeanConference on Computer Systems EuroSys 2017 Belgrade Serbia April23-26 2017 Gustavo Alonso Ricardo Bianchini and Marko Vukolic(Eds) ACM 270ndash285 httpsdoiorg10114530641763064178

[83] Stijn Volckaert Bart Coppens Alexios Voulimeneas Andrei Home-scu Per Larsen Bjorn De Sutter and Michael Franz 2016 Secure

592

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

and Efficient Application Monitoring and Replication In 2016 USENIXAnnual Technical Conference USENIX ATC 2016 Denver CO USA June22-24 2016 Ajay Gulati and Hakim Weatherspoon (Eds) USENIXAssociation 167ndash179 httpswwwusenixorgconferenceatc16technical-sessionspresentationvolckaert

[84] YanWang Harish Patil Cristiano Pereira Gregory Lueck Rajiv Guptaand Iulian Neamtiu 2014 DrDebug Deterministic Replay based CyclicDebugging with Dynamic Slicing In 12th Annual IEEEACM Interna-tional Symposium on Code Generation and Optimization CGO 2014Orlando FL USA February 15-19 2014 David R Kaeli and TippMoseley(Eds) ACM 98 httpsdoiorg10114525441372544152

[85] Ted Wobber and Peter Druschel (Eds) 2011 Proceedings of the 23rdACM Symposium on Operating Systems Principles 2011 SOSP 2011Cascais Portugal October 23-26 2011 ACM httpsdoiorg1011452043556

[86] Min Xu Rastislav Bodiacutek and Mark D Hill 2003 A Flight DataRecorder for Enabling Full-System Multiprocessor Deterministic

Replay In 30th International Symposium on Computer Architecture(ISCA 2003) 9-11 June 2003 San Diego California USA Allan Got-tlieb and Kai Li (Eds) IEEE Computer Society 122ndash133 httpsdoiorg101109ISCA20031206994

[87] Jie Yu Satish Narayanasamy Cristiano Pereira and Gilles Pokam 2012Maple a coverage-driven testing tool for multithreaded programs InProceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications OOPSLA2012 part of SPLASH 2012 Tucson AZ USA October 21-25 2012 Gary TLeavens and Matthew B Dwyer (Eds) ACM 485ndash502 httpsdoiorg10114523846162384651

[88] Zandronum 2015 Zandronum bug tracker httpszandronumcomtrackerviewphpid=2380 Bug 0002380

[89] Naling Zhang Markus Kusano and ChaoWang 2015 Dynamic partialorder reduction for relaxed memory models See [35] 250ndash259 httpsdoiorg10114527379242737956

593

  • Abstract
  • 1 Introduction
  • 2 Background
  • 3 Scheduling
    • 31 Protocol Details
    • 32 Special Cases
    • 33 Liveness
      • 4 Record and Replay
        • 41 Motivating Example
        • 42 Interleaving
        • 43 Signals
        • 44 System Calls
        • 45 Asynchronous Events
          • 5 Evaluation
            • 51 CDSchecker Litmus Tests
            • 52 httpd
            • 53 PARSEC and pbzip
            • 54 SDL-based Games
            • 55 Limitations SQLite and SpiderMonkey
              • 6 Related Work
              • 7 Conclusion
              • Acknowledgments
              • References
Page 2: Sparse Record and Replay with Controlled Schedulingafd/homepages/papers/pdfs/2019/PLDI.pdf · Controlled concurrency testing has proven successful in find-ing subtle bugs in concurrent

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

a record and replay mechanism that captures minimal infor-mation about these sources of nondeterminism that sufficeto enable efficient controlled concurrency testing for a rangeof applications The advantage of this is that execution isefficient and recording overhead is low The price is thatthe tool cannot handle systems whose behaviour is influ-enced significantly by other sources of nondeterminism (egmemory layout) without programmer interventionWe have implemented our approach as a tool tsan11rec

driven by the following objectives to maximise the extent towhich the concurrency semantics of the system under testcan be explored in order to find subtle bugs to integratethe tool with state-of-the-art data race detection capturingan important class of concurrency bugs and to preserve asmuch parallelism in the system under test as possible forefficient record and replay While race detection controlledconcurrency testing and record-and-replay are all valuabletechniques in isolation their combination is potentially ex-tremely valuable to allow the finding of races (thanks to racedetection) that arise under rare thread schedules (thanks tocontrolled concurrency testing) such that the thread sched-ule and environmental factors leading to the race can bereplayed for debugging (thanks to record-and-replay)

We present a large experimental evaluation showing thattsan11rec is effective at reliably finding and replaying weakmemory-related bugs on a set of benchmarks previously usedto evaluate the CDSchecker concurrency testing tool [64]and competitive in terms of performance with the state-of-the-art rr record and replay tool [66] on a number of largerapplications such as Apache httpd PARSEC benchmarksand Pbzip In particular for programs that rely heavily onparallelism our tool often out-performs rr since our approachallows threads to run in parallel as much as possible ratherthan being sequentialized In some cases sparsity turns outto be key to enabling record and replay For example closed-source proprietary device drivers make it very difficult tointercept and control all nondeterminism arising in the exe-cution of video games To find and reproduce bugs relatedeg to gamedisplay communication it would be necessaryto control this nondeterminism However the nondetermin-ism typically has no impact on core game logic To findand replay game logic bugs it is thus profitable to sparselyignore gamedisplay communication when recording andallow this communication to proceed freely during replayWe demonstrate this by applying tsan11rec to the SDL-basedvideo games Zandronum and QuakeSpasm both of whichare out of scope for rr due to communication between thegame and the OpenGL interface through ioctl that rr is un-able to record and replay We show that we are able to recordand replay a historical Zandronum bug related to an errorin client-server communication when the game is played ininternet multi-player modeAfter recapping necessary background (sect2) we present

our contributions via the following paper structure

void T1()

nax = 1

xstore(1 stdmemory_order_release) A

ystore(1 stdmemory_order_release) B

void T2()

if (yload(stdmemory_order_relaxed) == 1 ampamp C

xload(stdmemory_order_relaxed) == 0) D

xstore(2 stdmemory_order_relaxed)

void T3()

if (xload(stdmemory_order_acquire) gt 0) E

print(nax)

Figure 1 A racy C++11 program using atomic operations

Controlled scheduling (sect3) We detail our scheduling ap-proach which enables exploration of the concurrency se-mantics of the original program by allowing a context switchafter every visible operation splitting up high-level opera-tions into multiple irreducible visible operations By buildingon top of tsan11 [52] race detection is seamlessly integratedWe preserve the parallelism as much as possible by onlysequentializing visible operations invisible regions of codein different threads can run in parallel

Sparse record and replay (sect4) We describe our sparse ap-proach to record and replay focusing on the interplay be-tween controlled scheduling and record and replay and inparticular discuss how the granularity at which nondeter-minism is captured and recorded can be adapted on a perapplication basis

Experimental evaluation (sect5) We present a large evalu-ation applying tsan11rec to concurrency benchmarks theApache httpd PARSEC and Pbzip applications and bench-marks and the Zandronum and QuakeSpasm games Ourevaluation includes comparisons with rr

2 BackgroundWe provide relevant background on controlled schedulingrecord and replay and the tsan and tsan11 race detectors

Controlled scheduling Controlled scheduling techniques(sometimes referred to as stateless model checking) overridethe system scheduler in order to perform schedule-spaceexploration for an application eg systematically or in acontrolled randomized fashion [33 62 64 78 87] Exploringinteresting schedules can reveal subtle bugs that the systemscheduler would trigger with low probability and havingcontrol over which schedules are explored is important forreplay of bug-inducing schedules

Scheduling decisions are made at scheduling points whichcorrespond to visible operations a visible operation is anoperation performed by a thread that may influence thebehaviour of other threads As an example consider theprogram fragment shown in Figure 1 A controlled scheduler

577

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

void sig_handler()

quitstore(1)

void Listener()

while (quitload())

int res = poll(ampserver_fds 1 100)

if (res == 0) continueCHECK(res gt 0 ampamp server_fdsrevents == 1 ampamp poll error)

char buf = new char[100]recv(server_fdfd buf 100 0)

stdunique_lockltstdmutexgt lck (mtx)

requestspush(buf)

void Responder()

while (quitload())

stdunique_lockltstdmutexgt lck (mtx)

if (requestssize() == 0) continuechar buf = requestsfront()

requestspop()

lckunlock()

Process(buf)

send(server_fdfd buf 100 0)

delete[] buf

Figure 2 Generic client for processing and returning re-quests sent from some server

will begin by choosing (via some choice strategy) which ofthe visible operations A C or E to schedule first If it choosesA then at the next scheduling point the choice is between BC and E etc The non-atomic increments of nax and nay areinvisible operations and do not constitute scheduling points

Record and replay The ability to record and replay hasmany useful applications notably allowing consist reproduc-tion of bugs in nondeterministic programs In general record-ing and replaying involves identifying relevant sources ofnondeterminism and enforcing the same resolution of thisnondeterminism during replay as was observed while record-ing The granularity at which nondeterminism is controlledvaries between approaches To see how useful record andreplay is consider the example program shown in Figure 2The program receives buffers from a server processes themand then sends them back But what happens if the connec-tion fails or we get a ldquopoll errorrdquo The ability to capturean execution that shows an error by connecting to a realserver and then repeatedly replay the executionwithout hav-ing to connect to a real server allows us to reliably explorethe cause of the error This is particularly useful for largerprograms that utilise complicated communication protocolsand have time consuming setups or hard to find bugs

In general a program can have many such sources of non-determinism Aside from the thread interleaving and valueof atomic reads other sources include interaction with the

file system system calls certain libc functions (eg the con-ditions under which malloc can fail are not deterministic)instructions that query the state of the CPU (such as thex86 RDTSC for reading the processorrsquos time-stamp counter)or in some cases even the value of pointers (eg iteratingthrough an ordered container of pointers)The choice of what nondeterminism to record and the

method of recording and replaying is a substantial area ofresearch In this paper we compare our approach with thecurrent state-of-the-art tool rr [65] which achieves perfor-mance overheads compared with native execution of as lowas 15times for some applications as well as low storage over-heads It generally enforces a priority-based first come firstserved strategy for scheduling with each thread given a timeslice before yielding Execution is sequentialized so that onlyone thread runs at a time We present a detailed discussionof other approaches to record and replay in sect6

C++11 dynamic race detection We present our work inthe context of CC++ though our approach to controlledscheduling and record and replay is conceptually more gen-eral The CC++11 standards define threads atomics andvarious constructs for inter-thread communication togetherwith a description of a memory model that defines how op-erations are ordered across threads when synchronisationoccurs and what values may be read from memory [7]Working with low level atomics and other inter-thread

constructs can be very difficult and their misuse frequentlyleads to buggy concurrent programs ThreadSanitizer (tsan)is an efficient dynamic race analysis tool aimed at C++ pro-grams [74] The tool performs compile-time instrumentationof the source program in which all (atomic and non-atomic)accesses to potentially shared locations as well as fence op-erations are instrumented with calls into a statically linkedrun-time library In general all visible operations includingmost libc functions and system calls are instrumented Thislibrary implements a vector clock algorithm for tracking thehappens-before relation [28 48] using shadow memory tokeep track of accesses to all locations Requiring a simplecompiler flag to be activated tsan is very easy to use Itis also fast with runtime overheads of around 10times to 12times(competitive performance given the nature of the analysis)although memory overhead can be high

C++11 features such as atomics and threads are acceptedby tsan but the standard tool ignores most of the semanticsof the C++11 memory model A recent extension tsan11 hasadded semantics for a large fragment of the C++11 memorymodel allowing it to find races arising due to weak memorybehaviours that regular tsan would miss [52] As an exampleconsider again the program of Figure 1 For the conditionalin thread T2 to pass both the stores A and B in T1 musthave happened but an earlier value of x must be read Theend result is that the print in T 3 will be racy as the the loadin T3 now reads the value stored in T2 This cannot occur

578

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

under sequential consistency but can occur under C++11semantics the race is detected by tsan11 but not tsan

Amajor limitation of tsan and tsan11 is that the executionsexplored by the tool are at the mercy of the OS schedulerBugs that require unusual interleavings to trigger almostnever manifest The tsan11rec tool presented in this workbuilds on tsan11 combining the strengths of the tsan ap-proach with controlled scheduling and record and replayto allow the detection and reproduction of bugs that wouldbe unlikely to be discovered and harder still to repeatedlyreproduce using the OS scheduler alone

3 SchedulingRecall that one of our main aims in the design of tsan11recis to combine controlled scheduling with record and replayin a manner that directly incorporates state-of-the-art racedetection We now describe the mechanics of the schedulerin particular how the scheduler builds on and interacts withthe existing tsan11 instrumentation library and how thescheduler handles nondeterminism The mechanics of thescheduler will be important for describing the design of ourrecord and replay facilities in sect4

Rather than using an overarching scheduler thread detailsof scheduling decisions are stored in a designated piece ofshared state The threads interact indirectly via this sharedstate using a protocol to cooperatively determine when theyshould be scheduled The protocol builds on the tsan11 li-brary and has been designed so that new scheduling strate-gies can be easily added We focus on two strategies in thiswork random and queue With the random strategy the nextthread to schedule is chosen at random at each schedulingpoint via a pseudo-random number generator (PRNG) ini-tialized with some fixed initial seed This provides controlledrandom scheduling similar to that described in [78] Withthe queue strategy threads are scheduled in a first-come-first-served manner

31 Protocol DetailsRecall from sect2 that the tsan tool and its tsan11 extensionperforms compile-time instrumentation of the low-level visi-ble operations including atomic operations and system calls(syscalls) Each time such an operation is reached controljumps into a tsan library function Execution of some of theselibrary functions may be sequentialized but threads execut-ing outside these library functionsmdashie executing invisibleoperationsmdashare free to run in parallelOur tsan11rec tool essentially acts as an additional layer

of interception on top of the existing tsan11 instrumenta-tion On reaching a visible operation a thread jumps intoa tsan11 library function eg __tsan_atomic_load4 for a4-byte atomic load which will additionally call tsan11recfunctions that interact with the scheduler logic to determinewhen the thread can proceed

Figure 3 Sequentialized critical sections and parallel in-visible operations Blue wavy arrows represent scheduler-imposed ordering black arrows represent program order

Communication with the tsan11rec instrumentation layeruses a protocol based on two functions Wait() and Tick()

Wait() ndash Block this thread until the scheduler activates itTick() ndash Choose a thread to activate

A thread enters Wait() right before it executes a visible op-eration Depending on the scheduling strategy used the stateof the scheduler may need to be updated when Wait is calledthe queue strategy requires the thread to enqueue itself therandom strategy requires no action If the thread alreadyhappens to be the next thread due for scheduling Wait()returns without blocking A thread enters Tick() once ithas completed a visible operation By executing Tick() thethread applies the scheduling strategy (random or queue) tochoose the next thread to be scheduled and update the sched-uler state to reflect this In the case of the queue strategy thisinvolves incrementing the current queue position for therandom strategy this involves choosing the next thread id atrandom When the thread returns from the Tick() functionit is free to continue performing invisible operations unhin-dered until it reaches the next visible operation where it willenter Wait() This allows for parallelism between threadsas sections of invisible code are unordered Multiple invisibleoperations that can execute in parallel are illustrated in Fig-ure 3 Not that with the random strategy a thread does nothave to reach Wait() to be in the list of schedulable threadsThe combination of a visible operation and associated

scheduling-related code wrapped in a Wait() and Tick()pair is called a critical section

32 Special CasesThe approach described in sect31 of wrapping the tsan instru-mentation for a visible operation in a critical section worksdirectly for most visible operations We now discuss a num-ber of operations that require extra attention to detail mainlybecause their semantics necessitate specific updates to thescheduler state or because they cannot be represented as asimple critical section

579

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Calls to functions whose names begin with intercept_are already inserted by tsan11 to instrument visible opera-tions We have modified the implementations of these func-tions to accommodate our instrumentation

Thread management Thread creation deletion and join-ing are all treated as visible operations This is because theymust update the state of the scheduler and as such will af-fect the scheduling of threads going forward To handle thiswe introduce three scheduler functions ThreadNew(tid)ThreadJoin(tid) and ThreadDelete()

The ThreadNew and ThreadJoin functions are added dur-ing instrumentation of thread creation and joining primitivesThe ThreadNew(tid) function called by the parent of thenewly created thread enables the new thread within thescheduler The ThreadJoin(tid) function will block itselfuntil thread tid has finished and as such must instead dis-able itself in the scheduler and also mark itself as waitingon tid On completion a thread calls ThreadDelete whichinvolves (a) enabling the parent thread if it is waiting for thisthread to finish and (b) disabling itself in the scheduler Allthree operations are wrapped in a Wait() and Tick() pair

Mutexes The mutex operations trylock lock and unlock areall visible and require instrumentation Trylock can simply bewrapped in a Wait() and Tick() pair as for regular visibleoperations Unlock is similar to thread deletion in that itmust also re-enable threads that were blocked waiting onthe mutex although in this case we only need to re-enableone of the blocked threads the thread that is chosen dependson whether the queue or random strategy is being usedMutex lock poses an interesting issue in that a thread at-

tempting to acquire a mutex will block if the lock operationfails To account for this we modified the instrumented ver-sion of mutex lock to be as shown in Figure 4 This changes itto a trylock loop with a critical section associated with eachattempt at locking Note that this is the native trylock notthe instrumented version The MutexLockFail(m) functionis similar to ThreadJoin a thread calling this function dis-ables itself from the scheduler and informs the scheduler thatit is waiting on m The thread will then reenter Wait but as itis disabled it will block until it is re-enabled and then sched-uled to run The MutexUnlock(m) function is called when athread releases a mutex and will re-enable one thread thatis disabled due to waiting on m

There is no Wait nor Tick inside MutexLockFail(m) norMutexUnlock(m) Another thread can acquire the mutex be-tween a thread being re-enabled and it attempting the trylockThis is OK the thread will simply block itself again

Condition variables Condition variables allow controlover when certain threads will wake up and try to acquirea mutex When a thread initially acquires a mutex it maycheck a condition required for it to proceed and if it failsrelease the mutex and block itself via the condition variable

int intercept_mutex_lock(void m)

int res = EBUSY

while (res == EBUSY)

Wait()

res = trylock(m) native version of trylock

if (res == EBUSY)

MutexLockFail(m)

Tick()

return res

Figure 4 Instrumented mutex lock

This thread will only wake up and try to reacquire the mutexwhen another thread notifies it via the condition variableThe conditional is checked via the conditional wait functionwaking up one or all of the waiting threads is performed viathe signal and broadcast functions respectivelyThe conditional wait accepts a timer determining the

length of time after which the thread unblocks itself Thistimer represents a physical time This is in contrast to theschedulerrsquos ticker which represents a logical time This dif-ference between physical and logical means that from theperspective of the scheduler the conditionalrsquos wakeup timeris nondeterministic Semantically speaking a thread canwake up from the timer and acquire the conditionalrsquos mutexbefore another thread can even if the associated time is verylong We choose to handle this by not disabling the thread ifit calls a conditional wait with a timer Despite not being dis-abled when timed a thread can still eat a conditional signaland so should still mark itself as waiting on the conditionalin the scheduler

For the three conditional functions we provide correspond-ing CondWait(m t) CondSignal(m) and CondBroadcast(m)scheduler functions CondSignal(m) and CondBroadcast(m)simply wake up one thread and all threads waiting on m re-spectively The interaction between the instrumentation andthe scheduler for conditional signal and broadcast is sim-ple with the instrumentation simply calling the schedulerfunction in a Wait() and Tick() pair

Conditional wait is a little more involved and details areshown in Figure 5 Between the Wait and the Tick a threadinforms the scheduler that it is performing a conditional waitvia CondWait This informs the scheduler that the thread iseither blocked waiting for a signal or performing a timedconditional wait so that while not blocked it can neverthelesseat a signal The thread then releases the mutex informingthe scheduler via the MutexUnlock scheduler function de-scribed earlier that this has been done Finally the thread en-ters the intercepted version of mutex_lock described aboveBecause this starts with a Wait in the case of an untimedsignal the thread will block until it is re-enabled by a condi-tional signal or broadcast By using distinct critical sectionsto separate a thread marking itself as being blocked on a

580

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

void intercept_cond_wait(void m bool timed)

Wait()

CondWait(m timed)

mutex_unlock()

MutexUnlock(m)

Tick()

intercept_mutex_lock()

Figure 5 Instrumented conditional wait

signal and attempting to reacquire the mutex we allow thepossibility for another thread to be scheduled in betweenpossibly acquiring the mutexOnce a thread has reacquired the mutex it will typically

recheck the condition it was originally waiting on If it isnot satisfied it will call cond_wait again This is wherethe risk of deadlock comes in as if it was the only threadsignaled and it reenters intercept_cond_wait it will notsignal other threads first and all threads that are blockedby the conditional will remain blocked We want preserveany potential deadlocks in the underlying program and sodo not limit this from happening we are also careful not tointroduce new deadlocks

Signals We briefly describe our treatment of signals andsignal handlers noting that these are distinct from the signalsassociated with conditional wait operations described aboveWe focus on asynchronous signals which can be receivedby processes at any time and thus contribute an additionalsource of nondeterminism This is distinct from synchronoussignals eg SIGSEGV which are raised by the thread as andwhen certain operations are performed Unlike in the case ofeg a memory load operation which has a designated pro-gram point that can be intercepted to facilitate interactionwith the scheduler a signal can arrive at any time The stan-dard also specifies a signal function that binds a handlerfunction to a specific signalScheduling with signals is handled by simply marking

the entrance to the signal handler and the aforementionedsignal function as visible operations From a schedulingperspective besides the arrival of a signal signals are not aproblem Recording and replaying signals is where thingsbecome difficult which we discuss in sect43

33 LivenessWhile the scheduler strives to ensure that all possible pro-gram behaviours can be explored in principle in practicedepending on the strategy this can lead to massive slow-downs in particular cases For example suppose a thread isscheduled and undertakes a vast number of invisible oper-ations or calls a sleep function for some duration beforefinally issuing a visible operation If all other threads end upblocked waiting to perform visible operations and the sched-uler doesnrsquot give them a chance to run the performance of

the program may be drastically impacted This can becomeparticularly problematic when dealing with programs thatrely on responsiveness such as real-time applications

To cope with this we slightly reduce the schedulerrsquos abil-ity to explore any possible schedule by having the schedulerforce a reschedule in such cases By forcing a rescheduleafter n milliseconds the probability of exploring a schedulewhereby a thread performs two visible operations consec-utively separated by more than n milliseconds is greatlyreduced Conveniently for us tsan has a background threadwhich we configure to call our Reschedule() function everyn milliseconds for some given n Because the Reschedule()function relies on physical time it introduces nondetermin-ism into the scheduler

4 Record and ReplayWe now discuss the record and replay mechanism we haveimplemented with the aim of answering the following ques-tions What do we mean by ldquorecordingrdquo an execution andthen replaying said execution later How do we formalise arecording What makes replaying an execution valid Whatshould we record and what should we not record

When a program executes certain visible operations willlead to nondeterminism Recording an execution thereforemeans capturing information about these visible operationsin a form that can be used to reproduce the execution duringreplay We call this captured information the demo file ordemo for short We also refer to an execution that is replay-ing a demo as a replay and that the replay is synchronisedunless something has gone wrong with the replay such thatexecution has diverged from what was recorded in whichcase we say it is desynchronisedThis leaves us with the question of what it means for

a replay to desynchronise A demo is defined as a seriesof constraints arising from the recorded execution whichthe replay is required to satisfy The tool will attempt toenforce these constraints on the program during replay andas long as it can the replay is deemed to be synchronisedIf at any point the tool is unable to enforce a constraint onthe program we say the replay has hard desynchronised inwhich case the tool will abort In some cases the replaymay abide by the constraints but appear to diverge fromthe recorded execution for example by producing consoleoutput in a different order We call this soft desynchronisationTo illustrate an extreme case of this the empty demo istrivially synchronised for any replay but will lead to softdesynchronisation practically everywhere unless the systemunder test is highly deterministic

The record and replay mechanism is built into the sched-uler we discussed in sect3 In cases where a nondeterministicchoice needs to be made that is unrelated to scheduling (andso not handled by the queue nor random strategy) a PRNGis used seeded by two calls to rdtsc()

581

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

For the most part tsan11rec avoids the need for user an-notation however there are some cases where they are un-avoidable this is shown in sect54

41 Motivating ExampleTo help lay out the reasoning and technical explanation givenin the rest of this section we start off with an example pro-gram The program fragment in Figure 2 (already discussedbriefly in sect2) shows a simple client that repeatedly receives achar buffer from a server applies a transformation to it thensends it back We discuss the behaviours of the applicationthat need to be recorded in order to enable faithful replayvs those features that need not be recorded

What to record The obvious case here is the interleavingof threads Recording this will ensure that the order of oper-ations to the atomic locations quit and mtx and the order ofthe syscalls used throughout will be the same during replayOther operations are invisible and thus will not affect otherthreads or introduce nondeterminismSystem calls that interact with the environment can be

seen as inputs to the program which in this case determineshow many requests to handle and the contents of each re-quest For example poll informs us on whether there is datato be read from the server and thus needs to be recorded asdo the system calls recv and sendThe signal handler in this example is used to trigger the

end of the program The arrival of the signal is asynchronousand comes from outside the program During replay the toolwill need to ensure that the same signal arrives at the samepoint in logical time

What to ignore The first and likely most contentious ele-ment is the layout of memory This will of course depend onthe program in question but in this example and most of theprograms we encounter the position of objects in memorywill have no effect on the rest of the program If the requestqueue was instead an ordered set of char pointers then itwould matter as the pointer values will determine the orderin which requests are considered during iteration

42 InterleavingAs explained in sect41 the ordering of visible operations mustbe preserved during replay We describe how the randomand queue strategies store ordering-related information

To recap the strategies described in sect3 the random sched-uler chooses which thread to allow to run the next visibleoperation randomly after each visible operation has com-pleted while the queue scheduler is first come first serve forwhichever threads attempts to perform a visible operation

For the random strategy the entire thread interleaving isencapsulated in the PRNG Therefore no information besidesthe two seeds used for the PRNG is required

For the queue strategy the ordering during record dependson the order in which threads happen to reach Wait() which

depends on physical timing To encode the order so that itcan be enforced during replay a file called QUEUE is usedThis file records (a) a map specifying for each thread id thefirst tick at which the thread should be scheduled and (b)an ordered list of ticks to be consumed by threads each timethey leave a critical sectionmdashthe tick that a thread consumeson leaving a critical section informs the thread as to the nexttick at which it is to be scheduled Run-length encoding isused to efficiently record the case where a thread is scheduledmultiple times in succession

There is clearly a trade-off between these strategiesWherethe random strategy stores no data the queue strategy mayneed to store data on every visible operation The queuestrategy will be much faster however as it is unlikely to beblocked in Wait() unless another thread is already critical

43 SignalsWe discussed signals briefly in sect32 but deferred discussionto this section as most of the difficulties are in attempting toreplay them Synchronous signals are ignored (eg SIGSEGVSIGPIPE) as these should reoccur at the same point in theexecution without the help of our tool To clarify it is en-tering the signal handler that is the visible operation Wheninside the signal handler a thread cannot interact with therest of the process except though atomic operations whichare themselves visible From this we can say that it doesnot matter at which point between a Tick() and followingWait() pair that the signal handler is entered

In tsan11rec any asynchronous signal that arrived duringrecording becomes a synchronous signal upon replay Whena thread receives a signal it records the value of the tickseen during the most recent Tick() along with the signalvalue in a file called SIGNAL For example consider the casein Figure 2 where the Responder thread T2 has just per-formed the atomic load on tick 5 but has not yet attemptedto acquire the lock It receives the signal and performs aWait() and Tick() so that it can enter the signal handlerThe SIGNAL file will therefore have the line ldquo2 5 15rdquo in-dicating that thread T2 receives signal 15 at tick 5 Duringreplay when the Responder thread calls Tick() during tick5 it will raise signal 15 itself at the end of Tick() It doesnot matter at which precise point between Tick() and thefollowing Wait() that the signal arrived at during recordingit will float to the end of Tick() as shown in Figure 6

44 System CallsAs discussed in sect32 system calls are a significant source ofnondeterminism in an application To ensure that relevantproperties of an application are preserved during replay weneed to record the results of relevant system calls This is afundamental challenge that any record-and-replay systemmust face and state-of-the-art tools such as rr [66] aim tobe as comprehensive as possible in the system calls theysupport so that they can be applied directly to a wide range

582

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Figure 6 Signals are replayed immediately after the preced-ing tick

of applications In contrast the idea behind our sparse ap-proach to identify a minimal subset of system calls such thatrecording these system calls suffices to enable faithful replayof particular applications of interest At a high level we ap-proached this by incrementally adding support for systemcalls based on a trial-and-error process of first using straceto understand the full set of system calls issued by an appli-cation and then repeatedly attempting to record and replaythe application with tsan11rec incrementally adding supportfor additional system calls through analysis of the sources ofreplay failures We emphasise that this process did requirequite some manual effort and would need to be iteratedfurther to handle applications with significantly differentsystem call requirements compared to our case studies

The term syscall is a bit of a misnomer as both tsan and theinstrumentation detailed in this paper will instead interceptthe glibc wrappers around the syscalls instead of the syscalldirectly These glibc functions are much easier to use onbehalf of the programmer as they will take care of systemspecific details pushing the arguments onto the stack andinterpreting the results returned by the kernel We still usesyscall throughout as it is in the underlying syscall wherethe nondeterministic behaviours occurEach syscall takes a variable number of user-allocated

buffers and fills them with the appropriate data before set-ting errno and returning some value As these are sourcesof nondeterminism the return value errno and any appro-priate buffers will be compressed and stored in a demo filecalled SYSCALL During replay the actual data returned willbe overwritten by the data in SYSCALL Only the interactionwith the SYSCALL file is part of the critical section whichreduces contention in the scheduler

As an example consider the Listener thread in Figure 2performing the poll and recv syscalls in succession The re-turn value error number and two elements in the server_fdsstructure must be stored for poll the return value errornumber and contents of the buffer must be stored for recvThese will be treated as character buffers and have a simplerun length encoding applied

One of the difficulties that arises from adding a syscall isthe knock-on effect it can have with respect to other syscallsConsider for example the syscalls that interact with thefilesystem create open read etc If you intercept open

on the assumption that you may not have a valid file de-scriptor during replay where you did during record you willthen have to intercept every syscall that works with thatfile descriptor What starts out as a single interception be-comes potentially hundreds of intercepted syscalls There isa delicate balance to be struck between those syscalls thatneed to be recorded to make important execution featuresdeterministic during replay vs those syscalls that are bet-ter left un-recorded because (a) determinism of replay doesnot depend on them being recorded and (b) recording themleads to a snowball effect where many other syscalls mustalso be recorded to avoid desynchronisation

We have added syscall support to tsan11rec in a demand-driven manner by using strace to identify important callsfor key applications of interest The current set of syscallssupported includes read write recvmsg recv sendmsgaccept accept4 clock_gettime ioctl select and bindThese have allowed us to get a significant number of appli-cations up and running including the applications studiedin our evaluation modulo a few workarounds (detailed insect5) these applications issue many additional system callsthat we have found it unnecessary to recordmdashunnecessaryin the sense that simply re-issuing the system call during re-play has no observable effect on the applicationrsquos behaviourSometimes whether a call must be recorded depends on thefile descriptors that the call receives For instance for all ofour case studies it never proves necessary to record readand write calls whose file descriptors correspond to filesin the file system but it is necessary to record these callsif the associated file descriptors are associated with pipesused for inter-process communication Rather than this set ofsyscalls being a starting point towards full syscall coverageour view is that efficient record and replay that preserves par-allelism can benefit from selective syscall recording basedon application-specific knowledge and we envision the toolsupporting a core set of essential syscalls and being con-figurable with support for further syscalls to suit particularrecord and replay scenarios For example to handle a pro-gram such as htop would require instrumentation of theinteraction with the proc filesystem but doing this in thegeneral case would be wasteful and maybe even harmful iffuture calls depended on this interaction

45 Asynchronous EventsAsynchronous events are specific events that do not fit inwith any of the other categories discussed An importantcharacteristic is that they are not wrapped in a Wait() andTick() either because it was infeasible to do so duringrecording or because it would create a lot of unnecessaryoverhead These events still need to be replayed to ensurethe replay remains synchronised Currently there are twotypes of events Reschedule and Signal_wakeup

The reschedule event was discussed in sect33 It is necessaryto include this to ensure that the PRNG will be called the

583

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Figure 7 Right shows how reschedules are floated abovethe Tick()

same number of times in each critical section To see whythe signal wakeup event is necessary consider again a signalbeing received in the context of the example of Figure 2This time suppose the Responder thread receives the signalwhile it is disabled trying to acquire the lock Assume thatthe thread disabled itself on tick 10 the signal arrives andthe thread re-enables itself during tick 12 and then entersthe signal handler on tick 14 It is not OK for the thread tosimply not disable itself on tick 10 during replay as the poolof enabled threads for the scheduler to choose from duringticks 10 and 11 is different between recording and replayingwhich will affect the choice the scheduler will make

As with signals all asynchronous events are replayedsynchronously with all events that occur between a Tick()and the following Wait() floating up to the previous Tick()This is shown in Figure 7 These events are stored in theASYNC file

5 EvaluationTo evaluate the controlled scheduling abilities of tsan11recwe compare the strategies on the CDSchecker benchmarksuite (sect51) We then use larger applications to comparetsan11rec with rr [66] Of the few record-and-replay toolsthat are publicly available we chose to compare with rr be-cause (a) it represents the state-of-the-art (b) it is similar totsan11rec in terms of the kinds of applications it aims to sup-port We consider programs that bring challenges related tonetworking signals IO and real-time constraints Apachersquoshttpd web server (sect52) the PARSEC benchmarks and pbzip(sect53) and two first-person shooter games built on the SDLlibrary (sect54) The SDL case studies showcase applicationsthat tsan11rec can handle that are out of scope for rr dueto communication between the game and the OpenGL in-terface In contrast we also discuss practical limitations oftsan11rec that rr does not face related to the SQLite databaseapplication and Firefoxrsquos SpiderMonkey (sect55)

Throughout we describe howwe evolved the sparse record-ing facilities of tsan11rec and many practical challenges wefaced along the way our experience is that such challengeswhich seem fundamental to record and replay are typicallydescribed only briefly if at all in the literature We hope ourexposition will be valuable to other researchers

Common experimental setup All experiments were rununder Ubuntu 1404 LTS on an Intel i7-4770 8x340GHz plat-form with 16GB RAM The tsan11 and tsan11rec tools werebuilt on top of clang revision 286384 The version of rr usedis 510 As a key goal of our work is to apply race detectionto record and replay with controlled concurrency testingmost of the testing is done with race detection enabled evenwhen using rr We still show times for rr without race detec-tion for reference We use native rr tsan11 tsan11+rr andtsan11rec to refer to a program running without instrumen-tation under rr with tsan11 instrumentation under rr withtsan11 instrumentation and under tsan11rec respectively

51 CDSchecker Litmus TestsOverview The small programs (roughly 100 LOC each)used in prior work to evaluate CDSchecker [64] are useful toassess whether tsan11recrsquos controlled scheduling improveson tsan11rsquos ability to find races (including races related toweak memory) As these programs are closed the schedulerand memory model are the sources of nondeterminism

Experimental setup The experiments are run in the fol-lowing four modes tsan11 where tsan11 (which does notuse controlled concurrency testing) finds races tsan11 + rrwhere tsan11 finds races with rr recording and tsan11recrnd and tsan11rec queue where tsan11rec finds races usingthe random and queue strategies respectively We measurethe runtime of each tool on each benchmark averaged over1000 runs reporting standard deviation and remarking onthe coefficient of variation (CV)mdashthe ratio of the standarddeviation and mean

Results and discussion Table 1 summarises the resultsThe Time columns show mean execution times with stan-dard deviation Because these are short-running tests whosebehaviour depends intimately on themanner inwhich threadsinterleave the variance across runs is fairly high with theCV usually exceeding 1 with the exception of the longer-running results for rr for which the CV is always less than 1and usually less than 05 Rate columns show the percentageof all executions that exposed a data race

Comparing the tsan11rec rnd results with the tsan11 andtsan11rec queue results we see that randomized controlledschedulingmeans tsan11rec findsmore races across all bench-marks except chase-lev-deque and dekker-fences This isbecause tsan11 runs at the mercy of the OS scheduler whichtends to explore similar schedules on repeated runs and inthese small programs typically causes the main thread torun to completion before other threads are scheduled Theprice for this is higher runtime eg mcs-lock and ms-queuesuffer slow-downs of around 2times compared with tsan11 weattribute this to the total ordering of visible operations im-posed by tsan11rec The rr results show huge increases dueto a constant overhead applied to all programs But as rr is

584

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 1 Comparison over CDSchecker benchmarks between tsan + rr tsan11 and tsan11rec with controlled random andqueue scheduling Each benchmark was executed 1000 times in each mode The ldquoTimerdquo columns shows the mean executiontime (ms) and standard deviation (in parentheses) The ldquoRaterdquo column shows the percentage of runs that exhibited data races

tsan11 + rr tsan11 tsan11rec rnd tsan11rec queueTest Time Rate Time Rate Time Rate Time Rate

barrier 590 (1445) 00 8 (914) 00 4 (523) 375 6 (499) 00chase-lev-deque 579 (16669) 05 0 (194) 59 1 (307) 02 2 (378) 00dekker-fences 1776 (120854) 499 2 (402) 503 4 (483) 387 3 (446) 528linuxrwlocks 579 (12645) 03 2 (418) 01 4 (493) 624 3 (450) 00mcs-lock 574 (1371) 00 3 (445) 00 5 (523) 770 3 (447) 01mpmc-queue 574 (1371) 00 3 (449) 00 5 (500) 605 3 (446) 00ms-queue 3093 (10019) 1000 91 (6413) 1000 93 (8068) 1000 52 (6260) 1000

designed for larger applications this overhead will usuallybecome insignificant in more realistic examples

We examined a trace from chase-lev-deque to understandwhy tsan11rec rnd detects fewer races than tsan11 We foundthat from the creation of thread 2 to the point of the racethread 1 must perform 29 operations before thread 2 per-forms just 4 operations in order for the race to manifest Theprobability of this happening under uniform random sched-uling is very low We were able to coerce a race report outof the program by moving the creation of thread 2 to later inthe program This shows that different scheduling strategieswill affect how effective we are at finding data races andthat probabilistic concurrency testing (PCT) can be effectiveat prying out concurrency bugs [12]

52 httpdOverview Apachersquos httpd [3] is a widely-used modularhttp server that makes heavy use of concurrency to han-dle many simultaneous connections For record and replayit is of further interest due to its dependence on externalnetwork input We were able to handle httpd by capturingthe system calls described in sect44 with one workaroundthe accept system call which listens for incoming connec-tions relies on epoll_wait to listen for events This returnsuser-allocated pointers file descriptors and other data ina union with no easy way of knowing the active membersomething which tsan11rec cannot currently handle Weworked around this by using httpdrsquos option to switch to asimpler but slightly less efficient syscall poll which insteadsimply listens to file descriptors the results presented hereemploy this workaround A strength of rr is that it can han-dle httpd without this workaround due to its non-sparserecord and replay mechanism

Experimental setup We tested httpd version 2428 in single-process-multiple-thread mode using ab an Apache-providedprogram for server stress testing We sent 10000 queriesacross 10 concurrent threads to an httpd server for eachof the setups shown in Table 2 averaging results over 10runs We report on standard deviation and again remark on

the CV In the table rnd and queue refer to configurationsof tsan11rec and the presence or absence + rec indicateswhether recording was enabled

Results and discussion The results are shown in Table 2The columns under Race reports show regular results withrace reporting enabled the data under No reports shows re-sults where race-checking-capable tools do perform racechecking behind the scenes but do not actually emit racereports We make this distinction because tsan11 detectsso many races that the overhead of generating reports no-ticeably affects performance results when fewer races aredetected are more representative of the performance onewould expect using a future version of httpd in which manyraces are fixed The Throughput columns indicate the meannumber of queries the server responds to per second TheRate column is the mean number of race reports generated(only relevant for tsan11-based configurations) For eachstandard deviation is shown in parentheses Variance asmeasured by CV is low below 08 in all cases and usuallyless than 05 The Overhead columns indicate how muchslower performance is compared with native executionWithout reporting tsan11 already incurs a 3times overhead

compared to native Comparing results for native with rndand rnd+rec we find that adding controlled random sched-uling increase this overhead massively to between 79ndash89timesdepending on whether recording is enabled This is in thesame ball park as the overhead associated with rr 61times with-out race checking and 160timeswith tsan11 instrumentation (butstill with the actual reporting of races disabled) In contrastwhen our queue strategy is used the overhead comparedwith native drops to 9times and 21times with recording disabled vsenabled We attribute the gap between rrrnd and queue tohttpdrsquos heavy reliance on parallelism and frequent use ofshared mutexes This parallelism is removed by rr becausethe tool sequentializes the execution of threads while ourrandom scheduler only allows the thread that it has chosento be scheduled next to execute a visible operation even ifmany other threads are ready to execute visible operations

585

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Table 2 Comparison of throughput and race rate between native tsan11 rr and tsan11rec for Apachersquos httpd Results areaveraged over 10 runs ldquoThroughputrdquo shows mean throughput in queries per second ldquoRaterdquo is the number of races detectedper run (where relevant) Standard deviations are shown (in parentheses) Overhead is computed relative to native throughput

Race reports No reportsSetup Throughput Overhead Rate Throughput Overhead

Native NA NA NA 28895 (462256) 1timesrr NA NA NA 475 (608) 61timestsan11 3687 (29428) 8times 113 (1985) 9824 (143201) 3timestsan11 + rr 86 (6320) 336times 34 (1680) 181 (4637) 160timesrnd 141 (892) 205times 162 (3878) 367 (3326) 79timesqueue 818 (31033) 35times 381 (7362) 3261 (84367) 9timesrnd + rec 142 (1185) 203times 155 (3103) 326 (3894) 89timesqueue + rec 513 (8534) 56times 360 (6428) 1387 (24961) 21times

In contrast the queue strategy allows threads to perform visi-ble operations largely on demand Turning to the results withrace reporting enabled we see that the queue strategy hasthe highest race detection rate improving on uncontrolledtsan11 All other race detecting configurations lower the racedetection rate we believe this is because rr and rnd reducethe number of queries being responded to concurrently

Comparing demo file sizes when recording is enabled thetsan11rec demo files are around 48MB for both strategiesdropping to 48MB when only 1000 queries are issued sug-gesting that demo file size increases linearly with the numberof requests at a rate of around 48KB per request This couldbe reduced further with a more aggressive compression strat-egy but would likely increase the time overhead The demofile for rr is significantly smaller 66MB for 10000 querieswhich goes down to 39MB with 1000 queries implying arate of around 03KB per request plus a constant 36MB

53 PARSEC and pbzipOverview Wenext turn to the PARSEC benchmark suite [11]and pbzip application [68] both widely used for evaluatingconcurrency analysis tools For PARSEC we consider thebenchmarks used to evaluate iReplayer [53] however threeof these would not work on our system (dedup and swaptionsdo not compile and canneal crashes)

Experimental setup Each PARSEC (version 30) bench-mark was run with the lsquosimlargersquo test size shipped with thebenchmarks using 4 threads Pbzip (version 2-1113) wasused to compress a 400MB file with 4 threads We ran eachbenchmark 10 times per tool configuration and report av-erage runtimes We report on standard deviation and againremark on the CV A small number of races were discoveredfor some benchmarks and the race detecting tools largelyagreed on the number of races we do not detail these further

Results and discussion Table 3 shows the average timetaken to run each benchmark with each tool configurationwith standard deviation Variance as measured by CV is

reasonably low (CV is always below 1) For the tsan11rec re-sults + rec indicates whether recording was enabled Table 4is computed from the data of Table 3 and reports the over-head associated with running using each tool configurationcompared with native executionWith the exception of bodytrack and fluidanimate the

overhead tsan11rec brings over that of tsan11 is small and forall benchmarks whether recording is enabled or notmakes lit-tle difference However the overhead associated with tsan11+ rr (ie running tsan11-instrumented code under rr) is signif-icant compared with the tsan11 overhead alone despite thefact that running under rr without race detection is generallyefficient Interestingly rr without race detection performsless well on blackscholes compared with the tsan11rec con-figurations Digging into this we found that the benchmarkdistributes work between threads at the start of executionand then lets threads runwith little interaction This high par-allelismlow communication execution plays to the strengthsof tsan11rec where invisible operations are left to run in par-allel but is bad for rr which forces sequentialization acrossall operations

54 SDL-based GamesOverview Simple DirectMedia Layer (SDL) is a library thatconsolidates input graphics and various other forms of IOunder a single interface [75] and is typically used for gamesOnUbuntu 1604 SDL communicateswith X11 for IO pulseau-dio for sound andOpenGL for displayWe investigated recordand replay for two SDL-based games Zandronum [73] amulti-player Doom port (asymp400kLOC) and QuakeSpasm [71](asymp88kLOC) a port of Quake While these games support cus-tom record and replay by logging high level commands byworking at the threading and system call level tsan11rec canfacilitate record and replay of bugs that rely on low-levelinteractions to manifest We discuss below successful recordand replay of a bug in Zandronum that arises due to commu-nication of game data between the game client and serverwhich is not present in the gamersquos native replay

586

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 3 Execution times (s) for tool configurations across the pbzip and PARSEC benchmarks averaged across 10 runsStandard deviation is shown (in parentheses)

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 92 (031) 117 (049) 664 (311) 772 (187) 181 (030) 123 (066) 182 (030) 129 (129)blackscholes 04 (003) 08 (007) 10 (001) 20 (000) 07 (007) 07 (006) 07 (007) 07 (007)fluidanimate 08 (004) 160 (127) 21 (001) 385 (049) 464 (339) 390 (267) 504 (209) 398 (196)streamcluster 17 (019) 388 (441) 1133 (313) 1811 (179) 1030 (059) 485 (269) 1028 (018) 416 (261)bodytrack 05 (002) 72 (036) 38 (094) 327 (094) 500 (040) 73 (033) 500 (078) 78 (023)ferret 12 (007) 140 (086) 87 (044) 815 (220) 164 (066) 146 (060) 167 (071) 145 (037)

Table 4Overhead compared with native execution for tool configurations across the pbzip and PARSEC benchmarks computedfrom the results of Table 3

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 10times 13times 72times 84times 20times 13times 20times 14timesblackscholes 10times 20times 27times 53times 19times 19times 19times 18timesfluidanimate 10times 203times 27times 489times 590times 497times 642times 506timesstreamcluster 10times 224times 654times 1045times 595times 280times 593times 240timesbodytrack 10times 135times 72times 614times 938times 138times 939times 147timesferret 10times 119times 74times 695times 140times 125times 143times 124times

Our initial attempts to replay these SDL-based gamesfailed due to communication between the application andthe closed and proprietary NVIDIA OpenGL module on ourexperimental platform via ioctl syscallsWeworked aroundthis by ignoring ioctl during recording and letting it runnatively during replay This works because communicationwith the display driver has no effect on the game logic Dis-play interaction led to further problems with initialization ofthe input module We resorted to adapting the scheduler tolet the application run uninstrumented until SDL module ini-tialization had completed adding a custom scheduler hookto allow the application and scheduler to synchronize relatedto this These problems are not specific to our approach ortoolmdashindeed rr cannot handle these SDL-based games forsimilar reasonsmdashbut are rather a fundamental limitation ofrecording and replaying applications that make heavy useof IO To handle such applications one either needs to fullymock out IO components requiring a tremendous engi-neering effort or carefully determine those components thatshould not be instrumented and specify this via annotationsWith these workarounds we were able to handle both

games such that gameplay is displayed on screen during re-play gameplay would not be visible if the IO subsystem hadbeen mocked out and visibility might be useful in debuggingproblems that manifest as visual artifacts

Experimental setup Measuring game performance in away that allows us to compare the overhead of our sched-uling strategies is non-trivial The only metric we have isthe frame-rate (fps)mdashthe number of frames drawn to thescreen per second QuakeSpasm and Zandronum are capped

at 60 fps and will try to maintain this frame-rate dipping ifthey cannot keep up If the frame-rate is reduced too muchthe games become unplayable As a best-effort evaluationmechanism we report on whether the games are playableunder various tool combinations Additionally we found thatit was possible to remove the frame cap for Quakespasmso we report ball-park figures for the overheads of varioustool configurations when playing this game un-capped (Wecould not find a way to reliably remove the frame cap forZandronum) As mentioned previously we do not comparewith rr as it cannot record or replay the games We used Zan-dronum revision 10013dd3c3b57023f updated to use SDL2QuakeSpasm version 0930 and SDL version 205

Results anddiscussion With the random tsan11rec sched-uler Zandronum was unplayable even with recording dis-abled the frame-rate dropped to below 1 fps This is due tothe random scheduler starving the main thread by frequentlyscheduling other less critical threads (eg the audio thread)In contrast the queue scheduler could maintain the full 60fps with recording enabled for 100 seconds of play the demosize grew to just under 8MB of which 65MB was for syscalls

To test tsan11recrsquos ability to replay network communica-tion we found a previously fixed Zandronum bug [88] thatrelies on an error in this communication to manifest Thisbug involves incorrect game state information being sentfrom the server to the client during a map change We repli-cated the bug with a server and two clients one of whichwas recording After about 12 minutes the bug appearedand resulted in a demo file of 43MB We then replayed thedemo and the bug appeared as expected This demonstrates

587

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

that our tool can be used to accurately capture and facilitatereplay of bugs in large networked applicationsFor QuakeSpasm we found that it was possible to play

the game without dropping below 60 fps using tsan11 and alltsan11rec configurations To further investigate the overheadof each tool configuration on this case study we removedthe fps cap then played the game 5 times per tool configu-ration for 90 seconds per play enabling a mode where thegamersquos fps is periodically appended to a file We made a besteffort to play the game in a similar manner on each run butinevitably there will still be high variation in game activitybetween plays Indicative results are shown in Table 5 Thernd and queue configurations refer to tsan11rec with the ran-dom and queue strategies respectively and with recordingdisabled while the ldquo+ recrdquo tool configurations are similarbut with recording enabled The ldquoOverheadrdquo column showsthe overhead observed compared with native execution Thetake-away from these results is that the instrumentationoverhead for both tsan11 and tsan11rec is surprisingly mod-est (generally less than 2times) and that the additional overheadassociated with enabling recording in tsan11rec is low

55 Limitations SQLite and SpiderMonkeyA downside of our sparse approach to record and replay isthat different applications may have incompatible require-ments regarding what should be recorded and what mustnot be recorded For example recording memory layout andattempting to enforce the same layout on replay would notonly slow down the SDL games (see sect54) to the point ofbeing unplayable but would also cause problems related tocommunication with the display driver Yet the behaviour ofsome programs will depend on the memory layout such asiterating over an ordered C++ container that holds pointersIn particular we experimented with applying tsan11rec

to the SQLite database management library [76] and to Spi-derMonkey Firefoxrsquos JavaScript management engine [60]While tsan11rec was applicable for controlled schedulingof these applications we found that replay would rapidlydesynchronise due to memory layout nondeterminism caus-ing conditionals that rely on the values of pointers to evaluatedifferently during replay Tools such as rr can handle theseprograms reliably by enforcing the same memory layoutThis is a trade-off the non-sparse approach of rr can leadto higher overheads as demonstrated in sect52 and sect53 Analternative to adapting the record-and-replay tool so thatit always enforces memory layout determinism would beto adapt the application of interest so that default memoryallocation is replaced with a deterministic memory allocator

6 Related WorkControlled scheduling A large amount of work has goneinto the use of scheduling strategies as a form of state spaceexploration (eg [29 30 61 62 78 87]) and on techniques

aimed at reducing the size of the state space such as dynamicpartial-order reduction [31 89]) A particularly notable con-trolled scheduling tool in terms of successful practical ap-plication is Microsoftrsquos CHESS [62] which aims to system-atically explore all interleavings of a test scenario Similarto our approach each visible instruction has an associatedcustom wrapper that intercepts the real instruction callinginto the CHESS schedulerSchedule bounding techniques notably preemption- and

delay-bounding [26 61] have been shown to be successful inprioritising the order in which thread schedules are exploredduring controlled concurrency testing They prioritise ex-ploring schedules that exhibit small numbers of preemptionsbetween threads in line with empirical evidence that bugsrarely require large numbers of preemptions in order to man-ifest [56] Combining such techniques with the tsan11recalgorithm is an appealing idea in principle but is hinderedby the assumption that the program under test takes a fixedinput and that the scheduler is the only source of nonde-terminism This assumption allows running the programagain and again trying different schedules In the contextof tsan11rec which can be used to record and replay ap-plications where the environment presents other forms ofnondeterminism the manner in which the program interactswith its environment is captured with respect to a particularthread schedule and other thread schedules might involvecompletely different environmental interactions We believea more promising approach would be to bring ideas from theprobabilistic concurrency testing (PCT) algorithm [12] to thetsan11rec setting to introduce a degree of skewing to ourrandom strategy so that it explores more diverse schedules

Record and replay Record and replay has been a signifi-cant area of research with many tools being created to facil-itate it [2 9 19 24 32 36 39 40 43ndash45 47 50 53 54 57 5765ndash67 72 79 81] The general premise behind them is simi-lar identify order nondeterminism and input nondeterminismand create techniques to capture them while recording andcontrol them during replayVarious tools extend the OS in some way or require spe-

cific hardware [2 6 9 19 20 24 47 50 79 81] This has thebenefit of giving the tool access to much more of the systemsuch as memory pages and process information For exam-ple Scribe [47] will directly modify the system schedulerinstead of coercing it and achieves slowdowns as low as105times However this severely hits the usability of the toolas it requires the user to deploy a modified OS

Other tools reside entirely in user space [10 15 32 34 43ndash45 49 51 53 54 57 66 67 72 80 84] and trade performancefor usability This is the category that tsan11rec falls intoEase of use is particularly important in persuading users toadopt the tool rr [66] in particular allows the user to recorda program by simply passing the binary to rr as a parameter

588

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 5 Indicative frames per second (fps) result captured by playing Quakespasm for 90 seconds five times per toolconfiguration and capturing the fps reports recorded by the game Frame-rate results are reported for the minimum maximummedian and mean case as well as the 25th and 75th quartiles

Min 25th Median 75th Max Mean OverheadNative 291 369 400 428 502 4004 10timestsan11 125 193 225 275 431 2338 17timesrnd 128 205 238 281 421 2471 16timesqueue 84 188 233 273 435 2333 17timesrnd + rec 113 178 212 253 366 2168 18timesqueue + rec 86 161 185 227 348 1930 21times

and as such has become the definitive tool for record and re-play We have performed an extensive comparison with rr insect5 and note that while rr outclasses tsan11rec in some appli-cations and can handle applications that are out of scope fortsan11rec (see sect55) rr shows significantly higher overheadcompared with tsan11rec for a number of applications thatrely on a high degree of parallelism for performance Furtherour sparse approach with suitable workarounds enablesrecord and replay for graphical applications (the SDL-basedgames of sect54) that rr cannot currently handleBecause it builds on tsan11 which itself uses compiler

instrumentation and a modified libcxx tsan11rec shares sim-ilarities with tools that depend on language implementationor library-level support [1 13 13 18 36 58] Notable exam-ples here include R2 [36] and IntelliTrace [58]Whole system replay aims to record all system nonde-

terminism [14 19 22 24 25 27 53 77] Among these therecent iReplayer tool [53] performs record and replay in-situavoiding many of the problems (eg memory layout issues)that otherwise come from running the record and replayexecutions under different processes

Some tools focus on the order-nondeterminism allowingthem to retain their parallelism and thus reducing the over-head of multi-threaded application [25 42 59 63 70 86]Castor [57] will provide each thread with its own bufferfor storing information and serialize them at a later timetsan11rec also fits into this category as it will both preserverparallelism of invisible operations and apply a schedulingstrategy to resolve this nondeterminism

An alternative to recording a programrsquos nondeterminismis to remove it making some or all aspects of the programdeterministic [5 8 16 17 21 55] For example Dthreads [55]ensures that memory accesses are deterministic on eachexecution Such approaches can have a significant probeproblem by removing the behaviour necessary for certainbugs to manifest in return for avoiding the performanceoverhead associated with handling order-nondeterminism

Multi-version execution Multi-version (or multi-variant)execution (MVE) is a method for concurrently running mul-tiple processes that are expected to behave in a semanticallysimilar manner [41 46 69 82 83] MVE can be used to detect

security vulnerabilities in applications if a variant divergesthis could indicate that an attacker has modified the processin some way [46 82 83] It can also be used for running dif-ferent analyses on identical processes that would not workwhen run together on the same process such as the clangsanitizers [69] Most MVE systems hinge on a special moni-tor thread that controls the generation an maintenance ofa number of variants Keeping the variants in sync with re-spect to nondeterministic behaviours presents many of thesame problems that are associated with record and replay

7 ConclusionWe have presented tsan11rec which brings together con-trolled scheduling record and replay and dynamic data racedetection for the dynamic analysis of CC++11 applicationsOur experimental evaluation demonstrates that the tool isin many cases competitive in terms of performance withrr a state-of-the-art record and replay tool in some casesout-performing rr due to tsan11recrsquos ability to preserve par-allelism in applications under test to a high degree We havealso shown that our tool is capable of recording and replayingSDL-based video games by exploiting our sparse approachto avoid recording aspects of gamedisplay communicationthat are fundamentally hard to control The flip side of oursparse approach is that by limiting what is recorded ourtool desynchronises on uncontrolled forms of nondetermin-ism such as that related to memory layout by capturingthis form of nondeterminism tools such as rr do not sufferfrom this problem Two exciting avenues for future workinclude investigating a spectrum of recording granularitiesto bridge the gap between our sparse approach and stricterapproaches in a configurable manner and to investigatebug detection using a richer range of scheduling strategiesincluding schedule bounding [26 61 62] and probabilisticconcurrency testing [12]

AcknowledgmentsThis work was supported by a PhD studentship funded byGCHQ and EPSRC projects EPN0263141 and EPR0068651

589

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

References[1] Bowen Alpern Jong-Deok Choi Ton Ngo Manu Sridharan and

John M Vlissides 2001 A Perturbation-Free Replay Platform forCross-Optimized Multithreaded Applications In Proceedings of the15th International Parallel amp Distributed Processing Symposium (IPDPS-01) San Francisco CA USA April 23-27 2001 IEEE Computer Society23 httpsdoiorg101109IPDPS2001924957

[2] GautamAltekar and Ion Stoica 2009 ODR output-deterministic replayfor multicore debugging In Proceedings of the 22nd ACM Symposiumon Operating Systems Principles 2009 SOSP 2009 Big Sky Montana USAOctober 11-14 2009 Jeanna Neefe Matthews and Thomas E Anderson(Eds) ACM 193ndash206 httpsdoiorg10114516295751629594

[3] Apache Software Foundation 2018 Apache httpd httpshttpdapacheorgdevdevnoteshtml

[4] Remzi H Arpaci-Dusseau and Brad Chen (Eds) 2010 9th USENIXSymposium on Operating Systems Design and Implementation OSDI2010 October 4-6 2010 Vancouver BC Canada Proceedings USENIXAssociation httpwwwusenixorgeventosdi10techfull_papersosdi10_proceedingspdf

[5] Amittai Aviram Shu-Chun Weng Sen Hu and Bryan Ford 2010 Ef-ficient System-Enforced Deterministic Parallelism See [4] 193ndash206httpwwwusenixorgeventsosdi10techfull_papersAvirampdf

[6] David F Bacon and Seth Copen Goldstein 1991 Hardware-assisted Re-play of Multiprocessor Programs In Proceedings of the 1991 ACMONRWorkshop on Parallel and Distributed Debugging (PADD rsquo91) ACMNew York NY USA 194ndash206 httpsdoiorg101145122759122777

[7] Mark Batty Scott Owens Susmit Sarkar Peter Sewell and TjarkWeber2011 Mathematizing C++ concurrency In POPL 55ndash66

[8] Tom Bergan Owen Anderson Joseph Devietti Luis Ceze and DanGrossman 2010 CoreDet a compiler and runtime system for deter-ministic multithreaded execution See [38] 53ndash64 httpsdoiorg10114517360201736029

[9] Tom Bergan Nicholas Hunt Luis Ceze and Steven D Gribble 2010Deterministic Process Groups in dOS See [4] 177ndash191 httpwwwusenixorgeventsosdi10techfull_papersBerganpdf

[10] Sanjay Bhansali Wen-Ke Chen Stuart de Jong Andrew EdwardsRon Murray Milenko Drinić Darek Mihočka and Joe Chau 2006Framework for Instruction-level Tracing and Analysis of Program Ex-ecutions In Proceedings of the 2Nd International Conference on VirtualExecution Environments (VEE rsquo06) ACM New York NY USA 154ndash163httpsdoiorg10114511347601220164

[11] Christian Bienia Sanjeev Kumar Jaswinder Pal Singh and Kai Li 2008The PARSEC benchmark suite characterization and architectural im-plications In 17th International Conference on Parallel Architecture andCompilation Techniques PACT 2008 Toronto Ontario Canada October25-29 2008 Andreas Moshovos David Tarditi and Kunle Olukotun(Eds) ACM 72ndash81 httpsdoiorg10114514541151454128

[12] Sebastian Burckhardt Pravesh Kothari Madanlal Musuvathi and San-tosh Nagarakatte 2010 A randomized scheduler with probabilisticguarantees of finding bugs See [38] 167ndash178 httpsdoiorg10114517360201736040

[13] Brian Burg Richard Bailey Andrew J Ko and Michael D Ernst 2013Interactive recordreplay for web application debugging In The 26thAnnual ACM Symposium on User Interface Software and TechnologyUISTrsquo13 St Andrews United Kingdom October 8-11 2013 ShahramIzadi Aaron J Quigley Ivan Poupyrev and Takeo Igarashi (Eds) ACM473ndash484 httpsdoiorg10114525019882502050

[14] Anton Burtsev David Johnson Mike Hibler Eric Eide and John Regehr2016 Abstractions for Practical Virtual Machine Replay In Proceedingsof the 12th ACM SIGPLANSIGOPS International Conference on VirtualExecution Environments Atlanta GA USA April 2-3 2016 VishakhaGupta-Cledat Donald E Porter and Vivek Sarkar (Eds) ACM 93ndash106httpsdoiorg10114528922422892257

[15] M E Chastain 1999 MEC (January 1999) httpslwnnet19990121amechtml

[16] Heming Cui Jiriacute Simsa Yi-Hong Lin Hao Li Ben Blum Xinan XuJunfeng Yang Garth A Gibson and Randal E Bryant 2013 Parrota practical runtime for deterministic stable and reliable threads InACM SIGOPS 24th Symposium on Operating Systems Principles SOSP rsquo13Farmington PA USA November 3-6 2013 Michael Kaminsky and MikeDahlin (Eds) ACM 388ndash405 httpsdoiorg10114525173492522735

[17] Heming Cui Jingyue Wu John Gallagher Huayang Guo and JunfengYang 2011 Efficient deterministic multithreading through schedulerelaxation See [85] 337ndash351 httpsdoiorg10114520435562043588

[18] P Deva 2018 Chronon (2018) httpchrononsystemscomblogdesign-and-architecture-ofthe-chronon-record-0

[19] David Devecsery Michael Chow Xianzheng Dou Jason Flinn andPeter M Chen 2014 Eidetic Systems In 11th USENIX Symposium onOperating Systems Design and Implementation OSDI rsquo14 BroomfieldCO USA October 6-8 2014 Jason Flinn and Hank Levy (Eds) USENIXAssociation 525ndash540 httpswwwusenixorgconferenceosdi14technical-sessionspresentationdevecsery

[20] Joseph Devietti Brandon Lucia Luis Ceze andMark Oskin 2009 DMPdeterministic shared memory multiprocessing In Proceedings of the14th International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2009 Washington DC USAMarch 7-11 2009 Mary Lou Soffa and Mary Jane Irwin (Eds) ACM85ndash96 httpsdoiorg10114515082441508255

[21] Joseph Devietti Jacob Nelson Tom Bergan Luis Ceze and Dan Gross-man 2011 RCDC a relaxed consistency deterministic computer See[37] 67ndash78 httpsdoiorg10114519503651950376

[22] Brendan Dolan-Gavitt Josh Hodosh Patrick Hulin Tim Leek andRyan Whelan 2015 Repeatable Reverse Engineering with PANDAIn Proceedings of the 5th Program Protection and Reverse EngineeringWorkshop (PPREW-5) ACM New York NY USA Article 4 11 pageshttpsdoiorg10114528438592843867

[23] Richard Draves and Robbert van Renesse (Eds) 2008 8th USENIX Sym-posium on Operating Systems Design and Implementation OSDI 2008December 8-10 2008 San Diego California USA Proceedings USENIXAssociation httpswwwusenixorgpublicationsproceedingsf[0]=im_group_audience3A73

[24] George W Dunlap Samuel T King Sukru Cinar Murtaza A Bas-rai and Peter M Chen 2002 ReVirt Enabling Intrusion AnalysisThrough Virtual-Machine Logging and Replay In 5th Symposium onOperating System Design and Implementation (OSDI 2002) Boston Mas-sachusetts USA December 9-11 2002 David E Culler and Peter Dr-uschel (Eds) USENIX Association httpwwwusenixorgeventsosdi02techdunlaphtml

[25] George W Dunlap Dominic G Lucchetti Michael A Fetterman andPeter M Chen 2008 Execution replay of multiprocessor virtual ma-chines In Proceedings of the 4th International Conference on VirtualExecution Environments VEE 2008 Seattle WA USA March 5-7 2008David Gregg Vikram S Adve and Brian N Bershad (Eds) ACM121ndash130 httpsdoiorg10114513462561346273

[26] Michael Emmi Shaz Qadeer and Zvonimir Rakamaric 2011 Delay-bounded scheduling In Proceedings of the 38th ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages POPL 2011 AustinTX USA January 26-28 2011 Thomas Ball and Mooly Sagiv (Eds)ACM 411ndash422 httpsdoiorg10114519263851926432

[27] Jakob Engblom Daniel Aarno and Bengt Werner 2010 Full-SystemSimulation from Embedded to High-Performance Systems Springer USBoston MA 25ndash45 httpsdoiorg101007978-1-4419-6175-4_3

[28] Cormac Flanagan and Stephen N Freund 2009 FastTrack efficientand precise dynamic race detection In Proceedings of the 2009 ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI 2009 Dublin Ireland June 15-21 2009 Michael Hind andAmer Diwan (Eds) ACM 121ndash133 httpsdoiorg1011451542476

590

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

1542490[29] Cormac Flanagan and Stephen N Freund 2010 Adversarial memory

for detecting destructive races In Proceedings of the 2010 ACM SIGPLANConference on Programming Language Design and Implementation PLDI2010 Toronto Ontario Canada June 5-10 2010 Benjamin G Zornand Alexander Aiken (Eds) ACM 244ndash254 httpsdoiorg10114518065961806625

[30] Cormac Flanagan and Stephen N Freund 2010 The RoadRunnerdynamic analysis framework for concurrent programs In Proceedingsof the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis forSoftware Tools and Engineering PASTErsquo10 Toronto Ontario CanadaJune 5-6 2010 Sorin Lerner and Atanas Rountev (Eds) ACM 1ndash8httpsdoiorg10114518066721806674

[31] Cormac Flanagan and Patrice Godefroid 2005 Dynamic partial-orderreduction for model checking software In Proceedings of the 32nd ACMSIGPLAN-SIGACT Symposium on Principles of Programming LanguagesPOPL 2005 Long Beach California USA January 12-14 2005 JensPalsberg and Martiacuten Abadi (Eds) ACM 110ndash121 httpsdoiorg10114510403051040315

[32] Dennis Geels Gautam Altekar Scott Shenker and Ion Stoica 2006Replay Debugging for Distributed Applications (Awarded Best Paper)In Proceedings of the 2006 USENIX Annual Technical Conference BostonMA USA May 30 - June 3 2006 Atul Adya and Erich M Nahum (Eds)USENIX 289ndash300 httpwwwusenixorgeventsusenix06techgeelshtml

[33] Patrice Godefroid 2005 Software Model Checking The VeriSoftApproach Formal Methods in System Design 26 2 (2005) 77ndash101httpsdoiorg101007s10703-005-1489-x

[34] C Gottbrath 2008 Reverse debugging with the TotalView debugger(May 2008)

[35] David Grove and Steve Blackburn (Eds) 2015 Proceedings of the36th ACM SIGPLAN Conference on Programming Language Design andImplementation Portland OR USA June 15-17 2015 ACM httpdlacmorgcitationcfmid=2737924

[36] Zhenyu Guo Xi Wang Jian Tang Xuezheng Liu Zhilei Xu Ming WuM Frans Kaashoek and Zheng Zhang 2008 R2 An Application-LevelKernel for Record and Replay See [23] 193ndash208 httpwwwusenixorgeventsosdi08techfull_papersguoguopdf

[37] Rajiv Gupta and Todd C Mowry (Eds) 2011 Proceedings of the 16thInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2011 Newport Beach CAUSA March 5-11 2011 ACM

[38] James C Hoe and Vikram S Adve (Eds) 2010 Proceedings of the 15thInternational Conference on Architectural Support for Programming Lan-guages and Operating Systems ASPLOS 2010 Pittsburgh PennsylvaniaUSA March 13-17 2010 ACM

[39] Nima Honarmand and Josep Torrellas 2014 RelaxReplay recordand replay for relaxed-consistency multiprocessors In ArchitecturalSupport for Programming Languages and Operating Systems ASPLOSrsquo14 Salt Lake City UT USA March 1-5 2014 Rajeev BalasubramonianAl Davis and Sarita V Adve (Eds) ACM 223ndash238 httpsdoiorg10114525419402541979

[40] Nima Honarmand and Josep Torrellas 2014 Replay debugging Lever-aging record and replay for program debugging In ACMIEEE 41stInternational Symposium on Computer Architecture ISCA 2014 Min-neapolis MN USA June 14-18 2014 IEEE Computer Society 455ndash456httpsdoiorg101109ISCA20146853229

[41] Petr Hosek and Cristian Cadar 2015 VARAN the Unbelievable AnEfficient N-version Execution Framework In Proceedings of the Twenti-eth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS rsquo15 Istanbul Turkey March14-18 2015 Oumlzcan Oumlzturk Kemal Ebcioglu and Sandhya Dwarkadas(Eds) ACM 339ndash353 httpsdoiorg10114526943442694390

[42] Derek Hower and Mark D Hill 2008 Rerun Exploiting Episodes forLightweight Memory Race Recording In 35th International Symposiumon Computer Architecture (ISCA 2008) June 21-25 2008 Beijing China265ndash276 httpsdoiorg101109ISCA200826

[43] Jeff Huang Peng Liu and Charles Zhang 2010 LEAP lightweightdeterministic multi-processor replay of concurrent java programsIn Proceedings of the 18th ACM SIGSOFT International Symposium onFoundations of Software Engineering 2010 Santa Fe NM USA November7-11 2010 Gruia-Catalin Roman and Andreacute van der Hoek (Eds) ACM207ndash216 httpsdoiorg10114518822911882323

[44] Jeff Huang Charles Zhang and Julian Dolby 2013 CLAP recordinglocal executions to reproduce concurrency failures In ACM SIGPLANConference on Programming Language Design and Implementation PLDIrsquo13 Seattle WA USA June 16-19 2013 Hans-Juergen Boehm and Cor-mac Flanagan (Eds) ACM 141ndash152 httpsdoiorg10114524621562462167

[45] Shiyou Huang Bowen Cai and Jeff Huang 2017 Towards Production-Run Heisenbugs Reproduction on Commercial Hardware In 2017USENIX Annual Technical Conference USENIX ATC 2017 Santa ClaraCA USA July 12-14 2017 403ndash415 httpswwwusenixorgconferenceatc17technical-sessionspresentationhuang

[46] Koen Koning Herbert Bos and Cristiano Giuffrida 2016 Secure andEfficient Multi-Variant Execution Using Hardware-Assisted ProcessVirtualization In 46th Annual IEEEIFIP International Conference onDependable Systems and Networks DSN 2016 Toulouse France June28 - July 1 2016 IEEE Computer Society 431ndash442 httpsdoiorg101109DSN201646

[47] Oren Laadan Nicolas Viennot and Jason Nieh 2010 Transparentlightweight application execution replay on commodity multiproces-sor operating systems In SIGMETRICS 2010 Proceedings of the 2010ACM SIGMETRICS International Conference on Measurement and Mod-eling of Computer Systems New York New York USA 14-18 June 2010Vishal Misra Paul Barford and Mark S Squillante (Eds) ACM 155ndash166 httpsdoiorg10114518110391811057

[48] Leslie Lamport 1978 Time Clocks and the Ordering of Events ina Distributed System Commun ACM 21 7 (1978) 558ndash565 httpsdoiorg101145359545359563

[49] Dongyoon Lee Peter M Chen Jason Flinn and Satish Narayanasamy2012 Chimera hybrid program analysis for determinism In ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI rsquo12 Beijing China - June 11 - 16 2012 Jan Vitek Haibo Linand Frank Tip (Eds) ACM 463ndash474 httpsdoiorg10114522540642254119

[50] Dongyoon Lee Benjamin Wester Kaushik Veeraraghavan SatishNarayanasamy Peter M Chen and Jason Flinn 2010 Respec efficientonline multiprocessor replayvia speculation and external determinismSee [38] 77ndash90 httpsdoiorg10114517360201736031

[51] Kyu Hyung Lee Dohyeong Kim and Xiangyu Zhang 2014Infrastructure-Free Logging and Replay of Concurrent Executionon Multiple Cores In ECOOP 2014 - Object-Oriented Programming- 28th European Conference Uppsala Sweden July 28 - August 12014 Proceedings (Lecture Notes in Computer Science) Richard EJones (Ed) Vol 8586 Springer 232ndash256 httpsdoiorg101007978-3-662-44202-9_10

[52] Christopher Lidbury and Alastair F Donaldson 2017 Dynamic racedetection for C++11 In Proceedings of the 44th ACM SIGPLAN Sympo-sium on Principles of Programming Languages POPL 2017 Paris FranceJanuary 18-20 2017 Giuseppe Castagna and Andrew D Gordon (Eds)ACM 443ndash457 httpsdoiorg1011453009837

[53] Hongyu Liu Sam Silvestro Wei Wang Chen Tian and Tongping Liu2018 iReplayer in-situ and identical record-and-replay for multi-threaded applications In Proceedings of the 39th ACM SIGPLAN Confer-ence on Programming Language Design and Implementation PLDI 2018

591

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Philadelphia PA USA June 18-22 2018 Jeffrey S Foster and Dan Gross-man (Eds) ACM 344ndash358 httpsdoiorg10114531923663192380

[54] Peng Liu Xiangyu Zhang Omer Tripp and Yunhui Zheng 2015 Lightreplay via tightly bounded recording See [35] 55ndash64 httpsdoiorg10114527379242738001

[55] Tongping Liu Charlie Curtsinger and EmeryD Berger 2011 Dthreadsefficient deterministic multithreading See [85] 327ndash336 httpsdoiorg10114520435562043587

[56] Shan Lu Soyeon Park Eunsoo Seo and Yuanyuan Zhou 2008 Learn-ing from mistakes a comprehensive study on real world concurrencybug characteristics In Proceedings of the 13th International Confer-ence on Architectural Support for Programming Languages and Op-erating Systems ASPLOS 2008 Seattle WA USA March 1-5 2008Susan J Eggers and James R Larus (Eds) ACM 329ndash339 httpsdoiorg10114513462811346323

[57] Ali Joseacute Mashtizadeh Tal Garfinkel David Terei David Maziegraveresand Mendel Rosenblum 2017 Towards Practical Default-On Multi-Core RecordReplay In Proceedings of the Twenty-Second InternationalConference on Architectural Support for Programming Languages andOperating Systems ASPLOS 2017 Xirsquoan China April 8-12 2017 YunjiChen Olivier Temam and John Carter (Eds) ACM 693ndash708 httpsdoiorg10114530376973037751

[58] Microsoft 2018 Understanding IntelliTrace part I What the $ isIntelliTrace (2018) httpsblogsmsdnmicrosoftcomzainnab20130212understanding-intellitrace-part-i-what-the-is-intellitrace

[59] Pablo Montesinos Luis Ceze and Josep Torrellas 2008 DeLoreanRecording and Deterministically Replaying Shared-Memory Multi-processor Execution Effciently In 35th International Symposium onComputer Architecture (ISCA 2008) June 21-25 2008 Beijing China289ndash300 httpsdoiorg101109ISCA200836

[60] Mozilla 2018 SpiderMonkey httpsdevelopermozillaorgen-USdocsMozillaProjectsSpiderMonkey

[61] MadanlalMusuvathi and Shaz Qadeer 2007 Iterative context boundingfor systematic testing of multithreaded programs In Proceedings of theACM SIGPLAN 2007 Conference on Programming Language Design andImplementation San Diego California USA June 10-13 2007 JeanneFerrante and Kathryn S McKinley (Eds) ACM 446ndash455 httpsdoiorg10114512507341250785

[62] Madanlal Musuvathi Shaz Qadeer Thomas Ball Geacuterard Basler Pi-ramanayagam Arumuga Nainar and Iulian Neamtiu 2008 Find-ing and Reproducing Heisenbugs in Concurrent Programs See[23] 267ndash280 httpwwwusenixorgeventsosdi08techfull_papersmusuvathimusuvathipdf

[63] Satish Narayanasamy Gilles Pokam and Brad Calder 2005 BugNetContinuously Recording Program Execution for Deterministic ReplayDebugging In 32st International Symposium on Computer Architecture(ISCA 2005) 4-8 June 2005 Madison Wisconsin USA IEEE ComputerSociety 284ndash295 httpsdoiorg101109ISCA200516

[64] Brian Norris and Brian Demsky 2013 CDSchecker checking concur-rent data structures written with CC++ atomics In Proceedings ofthe 2013 ACM SIGPLAN International Conference on Object OrientedProgramming Systems Languages amp Applications OOPSLA 2013 part ofSPLASH 2013 Indianapolis IN USA October 26-31 2013 131ndash150

[65] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2016 Lightweight User-Space Record And ReplayCoRR abs161002144 (2016) arXiv161002144 httparxivorgabs161002144

[66] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2017 Engineering Record and Replay for Deploy-ability In 2017 USENIX Annual Technical Conference USENIX ATC 2017Santa Clara CA USA July 12-14 2017 377ndash389 httpswwwusenixorgconferenceatc17technical-sessionspresentationocallahan

[67] Harish Patil Cristiano Pereira Mack Stallcup Gregory Lueck andJames Cownie 2010 PinPlay a framework for deterministic replay

and reproducible analysis of parallel programs In Proceedings of theCGO 2010 The 8th International Symposium on Code Generation andOptimization Toronto Ontario Canada April 24-28 2010 AndreasMoshovos J Gregory Steffan Kim M Hazelwood and David R Kaeli(Eds) ACM 2ndash11 httpsdoiorg10114517729541772958

[68] pbzip2 development team 2018 pbzip2 httpslaunchpadnetpbzip2[69] Luiacutes Pina Anastasios Andronidis and Cristian Cadar 2018 FreeDA

deploying incompatible stock dynamic analyses in production viamulti-version execution In Proceedings of the 15th ACM InternationalConference on Computing Frontiers CF 2018 Ischia Italy May 08-102018 David R Kaeli and Miquel Pericagraves (Eds) ACM 1ndash10 httpsdoiorg10114532032173203237

[70] Gilles Pokam Klaus Danne Cristiano Pereira Rolf Kassa Tim KranichShiliang Hu Justin Emile Gottschlich Nima Honarmand Nathan Daut-enhahn Samuel T King and Josep Torrellas 2013 QuickRec proto-typing an intel architecture extension for record and replay of mul-tithreaded programs In The 40th Annual International Symposiumon Computer Architecture ISCArsquo13 Tel-Aviv Israel June 23-27 2013Avi Mendelson (Ed) ACM 643ndash654 httpsdoiorg10114524859222485977

[71] QuakeSpasm 2018 QuakeSpasm An engine for iD softwarersquos Quakehttpquakespasmsourceforgenet

[72] Yasushi Saito 2005 Jockey a user-space library for record-replaydebugging In Proceedings of the Sixth International Workshop on Auto-mated Debugging AADEBUG 2005 Monterey California USA Septem-ber 19-21 2005 Clinton Jeffery Jong-Deok Choi and Raimondas Lence-vicius (Eds) ACM 69ndash76 httpsdoiorg10114510851301085139

[73] Torr Samaho 2018 Zandronum httpszandronumcom[74] Konstantin Serebryany and Timur Iskhodzhanov 2009 ThreadSani-

tizer Data Race Detection in Practice In WBIA 62ndash71[75] Simple DirectMedia Layer 2018 SDL 20 library httpswwwlibsdl

orgdownload-20php[76] SQLite 2018 SQLite 3240 httpssqliteorgreleaselog3_24_0html[77] Deepa Srinivasan and Xuxian Jiang 2012 Time-Traveling Forensic

Analysis of VM-Based High-Interaction Honeypots In Security and Pri-vacy in Communication Networks Muttukrishnan Rajarajan Fred PiperHaining Wang and George Kesidis (Eds) Springer Berlin HeidelbergBerlin Heidelberg 209ndash226

[78] Paul Thomson Alastair F Donaldson and Adam Betts 2014 Concur-rency testing using schedule bounding an empirical study In ACMSIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPPoPP rsquo14 Orlando FL USA February 15-19 2014 Joseacute E Moreira andJames R Larus (Eds) ACM 15ndash28 httpsdoiorg10114525552432555260

[79] Joseph Tucek Shan Lu Chengdu Huang Spiros Xanthos andYuanyuan Zhou 2007 Triage diagnosing production run failuresat the userrsquos site In Proceedings of the 21st ACM Symposium on Oper-ating Systems Principles 2007 SOSP 2007 Stevenson Washington USAOctober 14-17 2007 Thomas C Bressoud and M Frans Kaashoek (Eds)ACM 131ndash144 httpsdoiorg10114512942611294275

[80] Undo 2018 Reversible debugging tools for CC++ on Linux amp Android(2018) httpundo-softwarecom

[81] Kaushik Veeraraghavan Dongyoon Lee Benjamin Wester JessicaOuyang Peter M Chen Jason Flinn and Satish Narayanasamy 2011DoublePlay parallelizing sequential logging and replay See [37] 15ndash26 httpsdoiorg10114519503651950370

[82] Stijn Volckaert Bart Coppens Bjorn De Sutter Koen De BosscherePer Larsen and Michael Franz 2017 Taming Parallelism in a Multi-Variant Execution Environment In Proceedings of the Twelfth EuropeanConference on Computer Systems EuroSys 2017 Belgrade Serbia April23-26 2017 Gustavo Alonso Ricardo Bianchini and Marko Vukolic(Eds) ACM 270ndash285 httpsdoiorg10114530641763064178

[83] Stijn Volckaert Bart Coppens Alexios Voulimeneas Andrei Home-scu Per Larsen Bjorn De Sutter and Michael Franz 2016 Secure

592

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

and Efficient Application Monitoring and Replication In 2016 USENIXAnnual Technical Conference USENIX ATC 2016 Denver CO USA June22-24 2016 Ajay Gulati and Hakim Weatherspoon (Eds) USENIXAssociation 167ndash179 httpswwwusenixorgconferenceatc16technical-sessionspresentationvolckaert

[84] YanWang Harish Patil Cristiano Pereira Gregory Lueck Rajiv Guptaand Iulian Neamtiu 2014 DrDebug Deterministic Replay based CyclicDebugging with Dynamic Slicing In 12th Annual IEEEACM Interna-tional Symposium on Code Generation and Optimization CGO 2014Orlando FL USA February 15-19 2014 David R Kaeli and TippMoseley(Eds) ACM 98 httpsdoiorg10114525441372544152

[85] Ted Wobber and Peter Druschel (Eds) 2011 Proceedings of the 23rdACM Symposium on Operating Systems Principles 2011 SOSP 2011Cascais Portugal October 23-26 2011 ACM httpsdoiorg1011452043556

[86] Min Xu Rastislav Bodiacutek and Mark D Hill 2003 A Flight DataRecorder for Enabling Full-System Multiprocessor Deterministic

Replay In 30th International Symposium on Computer Architecture(ISCA 2003) 9-11 June 2003 San Diego California USA Allan Got-tlieb and Kai Li (Eds) IEEE Computer Society 122ndash133 httpsdoiorg101109ISCA20031206994

[87] Jie Yu Satish Narayanasamy Cristiano Pereira and Gilles Pokam 2012Maple a coverage-driven testing tool for multithreaded programs InProceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications OOPSLA2012 part of SPLASH 2012 Tucson AZ USA October 21-25 2012 Gary TLeavens and Matthew B Dwyer (Eds) ACM 485ndash502 httpsdoiorg10114523846162384651

[88] Zandronum 2015 Zandronum bug tracker httpszandronumcomtrackerviewphpid=2380 Bug 0002380

[89] Naling Zhang Markus Kusano and ChaoWang 2015 Dynamic partialorder reduction for relaxed memory models See [35] 250ndash259 httpsdoiorg10114527379242737956

593

  • Abstract
  • 1 Introduction
  • 2 Background
  • 3 Scheduling
    • 31 Protocol Details
    • 32 Special Cases
    • 33 Liveness
      • 4 Record and Replay
        • 41 Motivating Example
        • 42 Interleaving
        • 43 Signals
        • 44 System Calls
        • 45 Asynchronous Events
          • 5 Evaluation
            • 51 CDSchecker Litmus Tests
            • 52 httpd
            • 53 PARSEC and pbzip
            • 54 SDL-based Games
            • 55 Limitations SQLite and SpiderMonkey
              • 6 Related Work
              • 7 Conclusion
              • Acknowledgments
              • References
Page 3: Sparse Record and Replay with Controlled Schedulingafd/homepages/papers/pdfs/2019/PLDI.pdf · Controlled concurrency testing has proven successful in find-ing subtle bugs in concurrent

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

void sig_handler()

quitstore(1)

void Listener()

while (quitload())

int res = poll(ampserver_fds 1 100)

if (res == 0) continueCHECK(res gt 0 ampamp server_fdsrevents == 1 ampamp poll error)

char buf = new char[100]recv(server_fdfd buf 100 0)

stdunique_lockltstdmutexgt lck (mtx)

requestspush(buf)

void Responder()

while (quitload())

stdunique_lockltstdmutexgt lck (mtx)

if (requestssize() == 0) continuechar buf = requestsfront()

requestspop()

lckunlock()

Process(buf)

send(server_fdfd buf 100 0)

delete[] buf

Figure 2 Generic client for processing and returning re-quests sent from some server

will begin by choosing (via some choice strategy) which ofthe visible operations A C or E to schedule first If it choosesA then at the next scheduling point the choice is between BC and E etc The non-atomic increments of nax and nay areinvisible operations and do not constitute scheduling points

Record and replay The ability to record and replay hasmany useful applications notably allowing consist reproduc-tion of bugs in nondeterministic programs In general record-ing and replaying involves identifying relevant sources ofnondeterminism and enforcing the same resolution of thisnondeterminism during replay as was observed while record-ing The granularity at which nondeterminism is controlledvaries between approaches To see how useful record andreplay is consider the example program shown in Figure 2The program receives buffers from a server processes themand then sends them back But what happens if the connec-tion fails or we get a ldquopoll errorrdquo The ability to capturean execution that shows an error by connecting to a realserver and then repeatedly replay the executionwithout hav-ing to connect to a real server allows us to reliably explorethe cause of the error This is particularly useful for largerprograms that utilise complicated communication protocolsand have time consuming setups or hard to find bugs

In general a program can have many such sources of non-determinism Aside from the thread interleaving and valueof atomic reads other sources include interaction with the

file system system calls certain libc functions (eg the con-ditions under which malloc can fail are not deterministic)instructions that query the state of the CPU (such as thex86 RDTSC for reading the processorrsquos time-stamp counter)or in some cases even the value of pointers (eg iteratingthrough an ordered container of pointers)The choice of what nondeterminism to record and the

method of recording and replaying is a substantial area ofresearch In this paper we compare our approach with thecurrent state-of-the-art tool rr [65] which achieves perfor-mance overheads compared with native execution of as lowas 15times for some applications as well as low storage over-heads It generally enforces a priority-based first come firstserved strategy for scheduling with each thread given a timeslice before yielding Execution is sequentialized so that onlyone thread runs at a time We present a detailed discussionof other approaches to record and replay in sect6

C++11 dynamic race detection We present our work inthe context of CC++ though our approach to controlledscheduling and record and replay is conceptually more gen-eral The CC++11 standards define threads atomics andvarious constructs for inter-thread communication togetherwith a description of a memory model that defines how op-erations are ordered across threads when synchronisationoccurs and what values may be read from memory [7]Working with low level atomics and other inter-thread

constructs can be very difficult and their misuse frequentlyleads to buggy concurrent programs ThreadSanitizer (tsan)is an efficient dynamic race analysis tool aimed at C++ pro-grams [74] The tool performs compile-time instrumentationof the source program in which all (atomic and non-atomic)accesses to potentially shared locations as well as fence op-erations are instrumented with calls into a statically linkedrun-time library In general all visible operations includingmost libc functions and system calls are instrumented Thislibrary implements a vector clock algorithm for tracking thehappens-before relation [28 48] using shadow memory tokeep track of accesses to all locations Requiring a simplecompiler flag to be activated tsan is very easy to use Itis also fast with runtime overheads of around 10times to 12times(competitive performance given the nature of the analysis)although memory overhead can be high

C++11 features such as atomics and threads are acceptedby tsan but the standard tool ignores most of the semanticsof the C++11 memory model A recent extension tsan11 hasadded semantics for a large fragment of the C++11 memorymodel allowing it to find races arising due to weak memorybehaviours that regular tsan would miss [52] As an exampleconsider again the program of Figure 1 For the conditionalin thread T2 to pass both the stores A and B in T1 musthave happened but an earlier value of x must be read Theend result is that the print in T 3 will be racy as the the loadin T3 now reads the value stored in T2 This cannot occur

578

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

under sequential consistency but can occur under C++11semantics the race is detected by tsan11 but not tsan

Amajor limitation of tsan and tsan11 is that the executionsexplored by the tool are at the mercy of the OS schedulerBugs that require unusual interleavings to trigger almostnever manifest The tsan11rec tool presented in this workbuilds on tsan11 combining the strengths of the tsan ap-proach with controlled scheduling and record and replayto allow the detection and reproduction of bugs that wouldbe unlikely to be discovered and harder still to repeatedlyreproduce using the OS scheduler alone

3 SchedulingRecall that one of our main aims in the design of tsan11recis to combine controlled scheduling with record and replayin a manner that directly incorporates state-of-the-art racedetection We now describe the mechanics of the schedulerin particular how the scheduler builds on and interacts withthe existing tsan11 instrumentation library and how thescheduler handles nondeterminism The mechanics of thescheduler will be important for describing the design of ourrecord and replay facilities in sect4

Rather than using an overarching scheduler thread detailsof scheduling decisions are stored in a designated piece ofshared state The threads interact indirectly via this sharedstate using a protocol to cooperatively determine when theyshould be scheduled The protocol builds on the tsan11 li-brary and has been designed so that new scheduling strate-gies can be easily added We focus on two strategies in thiswork random and queue With the random strategy the nextthread to schedule is chosen at random at each schedulingpoint via a pseudo-random number generator (PRNG) ini-tialized with some fixed initial seed This provides controlledrandom scheduling similar to that described in [78] Withthe queue strategy threads are scheduled in a first-come-first-served manner

31 Protocol DetailsRecall from sect2 that the tsan tool and its tsan11 extensionperforms compile-time instrumentation of the low-level visi-ble operations including atomic operations and system calls(syscalls) Each time such an operation is reached controljumps into a tsan library function Execution of some of theselibrary functions may be sequentialized but threads execut-ing outside these library functionsmdashie executing invisibleoperationsmdashare free to run in parallelOur tsan11rec tool essentially acts as an additional layer

of interception on top of the existing tsan11 instrumenta-tion On reaching a visible operation a thread jumps intoa tsan11 library function eg __tsan_atomic_load4 for a4-byte atomic load which will additionally call tsan11recfunctions that interact with the scheduler logic to determinewhen the thread can proceed

Figure 3 Sequentialized critical sections and parallel in-visible operations Blue wavy arrows represent scheduler-imposed ordering black arrows represent program order

Communication with the tsan11rec instrumentation layeruses a protocol based on two functions Wait() and Tick()

Wait() ndash Block this thread until the scheduler activates itTick() ndash Choose a thread to activate

A thread enters Wait() right before it executes a visible op-eration Depending on the scheduling strategy used the stateof the scheduler may need to be updated when Wait is calledthe queue strategy requires the thread to enqueue itself therandom strategy requires no action If the thread alreadyhappens to be the next thread due for scheduling Wait()returns without blocking A thread enters Tick() once ithas completed a visible operation By executing Tick() thethread applies the scheduling strategy (random or queue) tochoose the next thread to be scheduled and update the sched-uler state to reflect this In the case of the queue strategy thisinvolves incrementing the current queue position for therandom strategy this involves choosing the next thread id atrandom When the thread returns from the Tick() functionit is free to continue performing invisible operations unhin-dered until it reaches the next visible operation where it willenter Wait() This allows for parallelism between threadsas sections of invisible code are unordered Multiple invisibleoperations that can execute in parallel are illustrated in Fig-ure 3 Not that with the random strategy a thread does nothave to reach Wait() to be in the list of schedulable threadsThe combination of a visible operation and associated

scheduling-related code wrapped in a Wait() and Tick()pair is called a critical section

32 Special CasesThe approach described in sect31 of wrapping the tsan instru-mentation for a visible operation in a critical section worksdirectly for most visible operations We now discuss a num-ber of operations that require extra attention to detail mainlybecause their semantics necessitate specific updates to thescheduler state or because they cannot be represented as asimple critical section

579

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Calls to functions whose names begin with intercept_are already inserted by tsan11 to instrument visible opera-tions We have modified the implementations of these func-tions to accommodate our instrumentation

Thread management Thread creation deletion and join-ing are all treated as visible operations This is because theymust update the state of the scheduler and as such will af-fect the scheduling of threads going forward To handle thiswe introduce three scheduler functions ThreadNew(tid)ThreadJoin(tid) and ThreadDelete()

The ThreadNew and ThreadJoin functions are added dur-ing instrumentation of thread creation and joining primitivesThe ThreadNew(tid) function called by the parent of thenewly created thread enables the new thread within thescheduler The ThreadJoin(tid) function will block itselfuntil thread tid has finished and as such must instead dis-able itself in the scheduler and also mark itself as waitingon tid On completion a thread calls ThreadDelete whichinvolves (a) enabling the parent thread if it is waiting for thisthread to finish and (b) disabling itself in the scheduler Allthree operations are wrapped in a Wait() and Tick() pair

Mutexes The mutex operations trylock lock and unlock areall visible and require instrumentation Trylock can simply bewrapped in a Wait() and Tick() pair as for regular visibleoperations Unlock is similar to thread deletion in that itmust also re-enable threads that were blocked waiting onthe mutex although in this case we only need to re-enableone of the blocked threads the thread that is chosen dependson whether the queue or random strategy is being usedMutex lock poses an interesting issue in that a thread at-

tempting to acquire a mutex will block if the lock operationfails To account for this we modified the instrumented ver-sion of mutex lock to be as shown in Figure 4 This changes itto a trylock loop with a critical section associated with eachattempt at locking Note that this is the native trylock notthe instrumented version The MutexLockFail(m) functionis similar to ThreadJoin a thread calling this function dis-ables itself from the scheduler and informs the scheduler thatit is waiting on m The thread will then reenter Wait but as itis disabled it will block until it is re-enabled and then sched-uled to run The MutexUnlock(m) function is called when athread releases a mutex and will re-enable one thread thatis disabled due to waiting on m

There is no Wait nor Tick inside MutexLockFail(m) norMutexUnlock(m) Another thread can acquire the mutex be-tween a thread being re-enabled and it attempting the trylockThis is OK the thread will simply block itself again

Condition variables Condition variables allow controlover when certain threads will wake up and try to acquirea mutex When a thread initially acquires a mutex it maycheck a condition required for it to proceed and if it failsrelease the mutex and block itself via the condition variable

int intercept_mutex_lock(void m)

int res = EBUSY

while (res == EBUSY)

Wait()

res = trylock(m) native version of trylock

if (res == EBUSY)

MutexLockFail(m)

Tick()

return res

Figure 4 Instrumented mutex lock

This thread will only wake up and try to reacquire the mutexwhen another thread notifies it via the condition variableThe conditional is checked via the conditional wait functionwaking up one or all of the waiting threads is performed viathe signal and broadcast functions respectivelyThe conditional wait accepts a timer determining the

length of time after which the thread unblocks itself Thistimer represents a physical time This is in contrast to theschedulerrsquos ticker which represents a logical time This dif-ference between physical and logical means that from theperspective of the scheduler the conditionalrsquos wakeup timeris nondeterministic Semantically speaking a thread canwake up from the timer and acquire the conditionalrsquos mutexbefore another thread can even if the associated time is verylong We choose to handle this by not disabling the thread ifit calls a conditional wait with a timer Despite not being dis-abled when timed a thread can still eat a conditional signaland so should still mark itself as waiting on the conditionalin the scheduler

For the three conditional functions we provide correspond-ing CondWait(m t) CondSignal(m) and CondBroadcast(m)scheduler functions CondSignal(m) and CondBroadcast(m)simply wake up one thread and all threads waiting on m re-spectively The interaction between the instrumentation andthe scheduler for conditional signal and broadcast is sim-ple with the instrumentation simply calling the schedulerfunction in a Wait() and Tick() pair

Conditional wait is a little more involved and details areshown in Figure 5 Between the Wait and the Tick a threadinforms the scheduler that it is performing a conditional waitvia CondWait This informs the scheduler that the thread iseither blocked waiting for a signal or performing a timedconditional wait so that while not blocked it can neverthelesseat a signal The thread then releases the mutex informingthe scheduler via the MutexUnlock scheduler function de-scribed earlier that this has been done Finally the thread en-ters the intercepted version of mutex_lock described aboveBecause this starts with a Wait in the case of an untimedsignal the thread will block until it is re-enabled by a condi-tional signal or broadcast By using distinct critical sectionsto separate a thread marking itself as being blocked on a

580

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

void intercept_cond_wait(void m bool timed)

Wait()

CondWait(m timed)

mutex_unlock()

MutexUnlock(m)

Tick()

intercept_mutex_lock()

Figure 5 Instrumented conditional wait

signal and attempting to reacquire the mutex we allow thepossibility for another thread to be scheduled in betweenpossibly acquiring the mutexOnce a thread has reacquired the mutex it will typically

recheck the condition it was originally waiting on If it isnot satisfied it will call cond_wait again This is wherethe risk of deadlock comes in as if it was the only threadsignaled and it reenters intercept_cond_wait it will notsignal other threads first and all threads that are blockedby the conditional will remain blocked We want preserveany potential deadlocks in the underlying program and sodo not limit this from happening we are also careful not tointroduce new deadlocks

Signals We briefly describe our treatment of signals andsignal handlers noting that these are distinct from the signalsassociated with conditional wait operations described aboveWe focus on asynchronous signals which can be receivedby processes at any time and thus contribute an additionalsource of nondeterminism This is distinct from synchronoussignals eg SIGSEGV which are raised by the thread as andwhen certain operations are performed Unlike in the case ofeg a memory load operation which has a designated pro-gram point that can be intercepted to facilitate interactionwith the scheduler a signal can arrive at any time The stan-dard also specifies a signal function that binds a handlerfunction to a specific signalScheduling with signals is handled by simply marking

the entrance to the signal handler and the aforementionedsignal function as visible operations From a schedulingperspective besides the arrival of a signal signals are not aproblem Recording and replaying signals is where thingsbecome difficult which we discuss in sect43

33 LivenessWhile the scheduler strives to ensure that all possible pro-gram behaviours can be explored in principle in practicedepending on the strategy this can lead to massive slow-downs in particular cases For example suppose a thread isscheduled and undertakes a vast number of invisible oper-ations or calls a sleep function for some duration beforefinally issuing a visible operation If all other threads end upblocked waiting to perform visible operations and the sched-uler doesnrsquot give them a chance to run the performance of

the program may be drastically impacted This can becomeparticularly problematic when dealing with programs thatrely on responsiveness such as real-time applications

To cope with this we slightly reduce the schedulerrsquos abil-ity to explore any possible schedule by having the schedulerforce a reschedule in such cases By forcing a rescheduleafter n milliseconds the probability of exploring a schedulewhereby a thread performs two visible operations consec-utively separated by more than n milliseconds is greatlyreduced Conveniently for us tsan has a background threadwhich we configure to call our Reschedule() function everyn milliseconds for some given n Because the Reschedule()function relies on physical time it introduces nondetermin-ism into the scheduler

4 Record and ReplayWe now discuss the record and replay mechanism we haveimplemented with the aim of answering the following ques-tions What do we mean by ldquorecordingrdquo an execution andthen replaying said execution later How do we formalise arecording What makes replaying an execution valid Whatshould we record and what should we not record

When a program executes certain visible operations willlead to nondeterminism Recording an execution thereforemeans capturing information about these visible operationsin a form that can be used to reproduce the execution duringreplay We call this captured information the demo file ordemo for short We also refer to an execution that is replay-ing a demo as a replay and that the replay is synchronisedunless something has gone wrong with the replay such thatexecution has diverged from what was recorded in whichcase we say it is desynchronisedThis leaves us with the question of what it means for

a replay to desynchronise A demo is defined as a seriesof constraints arising from the recorded execution whichthe replay is required to satisfy The tool will attempt toenforce these constraints on the program during replay andas long as it can the replay is deemed to be synchronisedIf at any point the tool is unable to enforce a constraint onthe program we say the replay has hard desynchronised inwhich case the tool will abort In some cases the replaymay abide by the constraints but appear to diverge fromthe recorded execution for example by producing consoleoutput in a different order We call this soft desynchronisationTo illustrate an extreme case of this the empty demo istrivially synchronised for any replay but will lead to softdesynchronisation practically everywhere unless the systemunder test is highly deterministic

The record and replay mechanism is built into the sched-uler we discussed in sect3 In cases where a nondeterministicchoice needs to be made that is unrelated to scheduling (andso not handled by the queue nor random strategy) a PRNGis used seeded by two calls to rdtsc()

581

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

For the most part tsan11rec avoids the need for user an-notation however there are some cases where they are un-avoidable this is shown in sect54

41 Motivating ExampleTo help lay out the reasoning and technical explanation givenin the rest of this section we start off with an example pro-gram The program fragment in Figure 2 (already discussedbriefly in sect2) shows a simple client that repeatedly receives achar buffer from a server applies a transformation to it thensends it back We discuss the behaviours of the applicationthat need to be recorded in order to enable faithful replayvs those features that need not be recorded

What to record The obvious case here is the interleavingof threads Recording this will ensure that the order of oper-ations to the atomic locations quit and mtx and the order ofthe syscalls used throughout will be the same during replayOther operations are invisible and thus will not affect otherthreads or introduce nondeterminismSystem calls that interact with the environment can be

seen as inputs to the program which in this case determineshow many requests to handle and the contents of each re-quest For example poll informs us on whether there is datato be read from the server and thus needs to be recorded asdo the system calls recv and sendThe signal handler in this example is used to trigger the

end of the program The arrival of the signal is asynchronousand comes from outside the program During replay the toolwill need to ensure that the same signal arrives at the samepoint in logical time

What to ignore The first and likely most contentious ele-ment is the layout of memory This will of course depend onthe program in question but in this example and most of theprograms we encounter the position of objects in memorywill have no effect on the rest of the program If the requestqueue was instead an ordered set of char pointers then itwould matter as the pointer values will determine the orderin which requests are considered during iteration

42 InterleavingAs explained in sect41 the ordering of visible operations mustbe preserved during replay We describe how the randomand queue strategies store ordering-related information

To recap the strategies described in sect3 the random sched-uler chooses which thread to allow to run the next visibleoperation randomly after each visible operation has com-pleted while the queue scheduler is first come first serve forwhichever threads attempts to perform a visible operation

For the random strategy the entire thread interleaving isencapsulated in the PRNG Therefore no information besidesthe two seeds used for the PRNG is required

For the queue strategy the ordering during record dependson the order in which threads happen to reach Wait() which

depends on physical timing To encode the order so that itcan be enforced during replay a file called QUEUE is usedThis file records (a) a map specifying for each thread id thefirst tick at which the thread should be scheduled and (b)an ordered list of ticks to be consumed by threads each timethey leave a critical sectionmdashthe tick that a thread consumeson leaving a critical section informs the thread as to the nexttick at which it is to be scheduled Run-length encoding isused to efficiently record the case where a thread is scheduledmultiple times in succession

There is clearly a trade-off between these strategiesWherethe random strategy stores no data the queue strategy mayneed to store data on every visible operation The queuestrategy will be much faster however as it is unlikely to beblocked in Wait() unless another thread is already critical

43 SignalsWe discussed signals briefly in sect32 but deferred discussionto this section as most of the difficulties are in attempting toreplay them Synchronous signals are ignored (eg SIGSEGVSIGPIPE) as these should reoccur at the same point in theexecution without the help of our tool To clarify it is en-tering the signal handler that is the visible operation Wheninside the signal handler a thread cannot interact with therest of the process except though atomic operations whichare themselves visible From this we can say that it doesnot matter at which point between a Tick() and followingWait() pair that the signal handler is entered

In tsan11rec any asynchronous signal that arrived duringrecording becomes a synchronous signal upon replay Whena thread receives a signal it records the value of the tickseen during the most recent Tick() along with the signalvalue in a file called SIGNAL For example consider the casein Figure 2 where the Responder thread T2 has just per-formed the atomic load on tick 5 but has not yet attemptedto acquire the lock It receives the signal and performs aWait() and Tick() so that it can enter the signal handlerThe SIGNAL file will therefore have the line ldquo2 5 15rdquo in-dicating that thread T2 receives signal 15 at tick 5 Duringreplay when the Responder thread calls Tick() during tick5 it will raise signal 15 itself at the end of Tick() It doesnot matter at which precise point between Tick() and thefollowing Wait() that the signal arrived at during recordingit will float to the end of Tick() as shown in Figure 6

44 System CallsAs discussed in sect32 system calls are a significant source ofnondeterminism in an application To ensure that relevantproperties of an application are preserved during replay weneed to record the results of relevant system calls This is afundamental challenge that any record-and-replay systemmust face and state-of-the-art tools such as rr [66] aim tobe as comprehensive as possible in the system calls theysupport so that they can be applied directly to a wide range

582

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Figure 6 Signals are replayed immediately after the preced-ing tick

of applications In contrast the idea behind our sparse ap-proach to identify a minimal subset of system calls such thatrecording these system calls suffices to enable faithful replayof particular applications of interest At a high level we ap-proached this by incrementally adding support for systemcalls based on a trial-and-error process of first using straceto understand the full set of system calls issued by an appli-cation and then repeatedly attempting to record and replaythe application with tsan11rec incrementally adding supportfor additional system calls through analysis of the sources ofreplay failures We emphasise that this process did requirequite some manual effort and would need to be iteratedfurther to handle applications with significantly differentsystem call requirements compared to our case studies

The term syscall is a bit of a misnomer as both tsan and theinstrumentation detailed in this paper will instead interceptthe glibc wrappers around the syscalls instead of the syscalldirectly These glibc functions are much easier to use onbehalf of the programmer as they will take care of systemspecific details pushing the arguments onto the stack andinterpreting the results returned by the kernel We still usesyscall throughout as it is in the underlying syscall wherethe nondeterministic behaviours occurEach syscall takes a variable number of user-allocated

buffers and fills them with the appropriate data before set-ting errno and returning some value As these are sourcesof nondeterminism the return value errno and any appro-priate buffers will be compressed and stored in a demo filecalled SYSCALL During replay the actual data returned willbe overwritten by the data in SYSCALL Only the interactionwith the SYSCALL file is part of the critical section whichreduces contention in the scheduler

As an example consider the Listener thread in Figure 2performing the poll and recv syscalls in succession The re-turn value error number and two elements in the server_fdsstructure must be stored for poll the return value errornumber and contents of the buffer must be stored for recvThese will be treated as character buffers and have a simplerun length encoding applied

One of the difficulties that arises from adding a syscall isthe knock-on effect it can have with respect to other syscallsConsider for example the syscalls that interact with thefilesystem create open read etc If you intercept open

on the assumption that you may not have a valid file de-scriptor during replay where you did during record you willthen have to intercept every syscall that works with thatfile descriptor What starts out as a single interception be-comes potentially hundreds of intercepted syscalls There isa delicate balance to be struck between those syscalls thatneed to be recorded to make important execution featuresdeterministic during replay vs those syscalls that are bet-ter left un-recorded because (a) determinism of replay doesnot depend on them being recorded and (b) recording themleads to a snowball effect where many other syscalls mustalso be recorded to avoid desynchronisation

We have added syscall support to tsan11rec in a demand-driven manner by using strace to identify important callsfor key applications of interest The current set of syscallssupported includes read write recvmsg recv sendmsgaccept accept4 clock_gettime ioctl select and bindThese have allowed us to get a significant number of appli-cations up and running including the applications studiedin our evaluation modulo a few workarounds (detailed insect5) these applications issue many additional system callsthat we have found it unnecessary to recordmdashunnecessaryin the sense that simply re-issuing the system call during re-play has no observable effect on the applicationrsquos behaviourSometimes whether a call must be recorded depends on thefile descriptors that the call receives For instance for all ofour case studies it never proves necessary to record readand write calls whose file descriptors correspond to filesin the file system but it is necessary to record these callsif the associated file descriptors are associated with pipesused for inter-process communication Rather than this set ofsyscalls being a starting point towards full syscall coverageour view is that efficient record and replay that preserves par-allelism can benefit from selective syscall recording basedon application-specific knowledge and we envision the toolsupporting a core set of essential syscalls and being con-figurable with support for further syscalls to suit particularrecord and replay scenarios For example to handle a pro-gram such as htop would require instrumentation of theinteraction with the proc filesystem but doing this in thegeneral case would be wasteful and maybe even harmful iffuture calls depended on this interaction

45 Asynchronous EventsAsynchronous events are specific events that do not fit inwith any of the other categories discussed An importantcharacteristic is that they are not wrapped in a Wait() andTick() either because it was infeasible to do so duringrecording or because it would create a lot of unnecessaryoverhead These events still need to be replayed to ensurethe replay remains synchronised Currently there are twotypes of events Reschedule and Signal_wakeup

The reschedule event was discussed in sect33 It is necessaryto include this to ensure that the PRNG will be called the

583

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Figure 7 Right shows how reschedules are floated abovethe Tick()

same number of times in each critical section To see whythe signal wakeup event is necessary consider again a signalbeing received in the context of the example of Figure 2This time suppose the Responder thread receives the signalwhile it is disabled trying to acquire the lock Assume thatthe thread disabled itself on tick 10 the signal arrives andthe thread re-enables itself during tick 12 and then entersthe signal handler on tick 14 It is not OK for the thread tosimply not disable itself on tick 10 during replay as the poolof enabled threads for the scheduler to choose from duringticks 10 and 11 is different between recording and replayingwhich will affect the choice the scheduler will make

As with signals all asynchronous events are replayedsynchronously with all events that occur between a Tick()and the following Wait() floating up to the previous Tick()This is shown in Figure 7 These events are stored in theASYNC file

5 EvaluationTo evaluate the controlled scheduling abilities of tsan11recwe compare the strategies on the CDSchecker benchmarksuite (sect51) We then use larger applications to comparetsan11rec with rr [66] Of the few record-and-replay toolsthat are publicly available we chose to compare with rr be-cause (a) it represents the state-of-the-art (b) it is similar totsan11rec in terms of the kinds of applications it aims to sup-port We consider programs that bring challenges related tonetworking signals IO and real-time constraints Apachersquoshttpd web server (sect52) the PARSEC benchmarks and pbzip(sect53) and two first-person shooter games built on the SDLlibrary (sect54) The SDL case studies showcase applicationsthat tsan11rec can handle that are out of scope for rr dueto communication between the game and the OpenGL in-terface In contrast we also discuss practical limitations oftsan11rec that rr does not face related to the SQLite databaseapplication and Firefoxrsquos SpiderMonkey (sect55)

Throughout we describe howwe evolved the sparse record-ing facilities of tsan11rec and many practical challenges wefaced along the way our experience is that such challengeswhich seem fundamental to record and replay are typicallydescribed only briefly if at all in the literature We hope ourexposition will be valuable to other researchers

Common experimental setup All experiments were rununder Ubuntu 1404 LTS on an Intel i7-4770 8x340GHz plat-form with 16GB RAM The tsan11 and tsan11rec tools werebuilt on top of clang revision 286384 The version of rr usedis 510 As a key goal of our work is to apply race detectionto record and replay with controlled concurrency testingmost of the testing is done with race detection enabled evenwhen using rr We still show times for rr without race detec-tion for reference We use native rr tsan11 tsan11+rr andtsan11rec to refer to a program running without instrumen-tation under rr with tsan11 instrumentation under rr withtsan11 instrumentation and under tsan11rec respectively

51 CDSchecker Litmus TestsOverview The small programs (roughly 100 LOC each)used in prior work to evaluate CDSchecker [64] are useful toassess whether tsan11recrsquos controlled scheduling improveson tsan11rsquos ability to find races (including races related toweak memory) As these programs are closed the schedulerand memory model are the sources of nondeterminism

Experimental setup The experiments are run in the fol-lowing four modes tsan11 where tsan11 (which does notuse controlled concurrency testing) finds races tsan11 + rrwhere tsan11 finds races with rr recording and tsan11recrnd and tsan11rec queue where tsan11rec finds races usingthe random and queue strategies respectively We measurethe runtime of each tool on each benchmark averaged over1000 runs reporting standard deviation and remarking onthe coefficient of variation (CV)mdashthe ratio of the standarddeviation and mean

Results and discussion Table 1 summarises the resultsThe Time columns show mean execution times with stan-dard deviation Because these are short-running tests whosebehaviour depends intimately on themanner inwhich threadsinterleave the variance across runs is fairly high with theCV usually exceeding 1 with the exception of the longer-running results for rr for which the CV is always less than 1and usually less than 05 Rate columns show the percentageof all executions that exposed a data race

Comparing the tsan11rec rnd results with the tsan11 andtsan11rec queue results we see that randomized controlledschedulingmeans tsan11rec findsmore races across all bench-marks except chase-lev-deque and dekker-fences This isbecause tsan11 runs at the mercy of the OS scheduler whichtends to explore similar schedules on repeated runs and inthese small programs typically causes the main thread torun to completion before other threads are scheduled Theprice for this is higher runtime eg mcs-lock and ms-queuesuffer slow-downs of around 2times compared with tsan11 weattribute this to the total ordering of visible operations im-posed by tsan11rec The rr results show huge increases dueto a constant overhead applied to all programs But as rr is

584

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 1 Comparison over CDSchecker benchmarks between tsan + rr tsan11 and tsan11rec with controlled random andqueue scheduling Each benchmark was executed 1000 times in each mode The ldquoTimerdquo columns shows the mean executiontime (ms) and standard deviation (in parentheses) The ldquoRaterdquo column shows the percentage of runs that exhibited data races

tsan11 + rr tsan11 tsan11rec rnd tsan11rec queueTest Time Rate Time Rate Time Rate Time Rate

barrier 590 (1445) 00 8 (914) 00 4 (523) 375 6 (499) 00chase-lev-deque 579 (16669) 05 0 (194) 59 1 (307) 02 2 (378) 00dekker-fences 1776 (120854) 499 2 (402) 503 4 (483) 387 3 (446) 528linuxrwlocks 579 (12645) 03 2 (418) 01 4 (493) 624 3 (450) 00mcs-lock 574 (1371) 00 3 (445) 00 5 (523) 770 3 (447) 01mpmc-queue 574 (1371) 00 3 (449) 00 5 (500) 605 3 (446) 00ms-queue 3093 (10019) 1000 91 (6413) 1000 93 (8068) 1000 52 (6260) 1000

designed for larger applications this overhead will usuallybecome insignificant in more realistic examples

We examined a trace from chase-lev-deque to understandwhy tsan11rec rnd detects fewer races than tsan11 We foundthat from the creation of thread 2 to the point of the racethread 1 must perform 29 operations before thread 2 per-forms just 4 operations in order for the race to manifest Theprobability of this happening under uniform random sched-uling is very low We were able to coerce a race report outof the program by moving the creation of thread 2 to later inthe program This shows that different scheduling strategieswill affect how effective we are at finding data races andthat probabilistic concurrency testing (PCT) can be effectiveat prying out concurrency bugs [12]

52 httpdOverview Apachersquos httpd [3] is a widely-used modularhttp server that makes heavy use of concurrency to han-dle many simultaneous connections For record and replayit is of further interest due to its dependence on externalnetwork input We were able to handle httpd by capturingthe system calls described in sect44 with one workaroundthe accept system call which listens for incoming connec-tions relies on epoll_wait to listen for events This returnsuser-allocated pointers file descriptors and other data ina union with no easy way of knowing the active membersomething which tsan11rec cannot currently handle Weworked around this by using httpdrsquos option to switch to asimpler but slightly less efficient syscall poll which insteadsimply listens to file descriptors the results presented hereemploy this workaround A strength of rr is that it can han-dle httpd without this workaround due to its non-sparserecord and replay mechanism

Experimental setup We tested httpd version 2428 in single-process-multiple-thread mode using ab an Apache-providedprogram for server stress testing We sent 10000 queriesacross 10 concurrent threads to an httpd server for eachof the setups shown in Table 2 averaging results over 10runs We report on standard deviation and again remark on

the CV In the table rnd and queue refer to configurationsof tsan11rec and the presence or absence + rec indicateswhether recording was enabled

Results and discussion The results are shown in Table 2The columns under Race reports show regular results withrace reporting enabled the data under No reports shows re-sults where race-checking-capable tools do perform racechecking behind the scenes but do not actually emit racereports We make this distinction because tsan11 detectsso many races that the overhead of generating reports no-ticeably affects performance results when fewer races aredetected are more representative of the performance onewould expect using a future version of httpd in which manyraces are fixed The Throughput columns indicate the meannumber of queries the server responds to per second TheRate column is the mean number of race reports generated(only relevant for tsan11-based configurations) For eachstandard deviation is shown in parentheses Variance asmeasured by CV is low below 08 in all cases and usuallyless than 05 The Overhead columns indicate how muchslower performance is compared with native executionWithout reporting tsan11 already incurs a 3times overhead

compared to native Comparing results for native with rndand rnd+rec we find that adding controlled random sched-uling increase this overhead massively to between 79ndash89timesdepending on whether recording is enabled This is in thesame ball park as the overhead associated with rr 61times with-out race checking and 160timeswith tsan11 instrumentation (butstill with the actual reporting of races disabled) In contrastwhen our queue strategy is used the overhead comparedwith native drops to 9times and 21times with recording disabled vsenabled We attribute the gap between rrrnd and queue tohttpdrsquos heavy reliance on parallelism and frequent use ofshared mutexes This parallelism is removed by rr becausethe tool sequentializes the execution of threads while ourrandom scheduler only allows the thread that it has chosento be scheduled next to execute a visible operation even ifmany other threads are ready to execute visible operations

585

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Table 2 Comparison of throughput and race rate between native tsan11 rr and tsan11rec for Apachersquos httpd Results areaveraged over 10 runs ldquoThroughputrdquo shows mean throughput in queries per second ldquoRaterdquo is the number of races detectedper run (where relevant) Standard deviations are shown (in parentheses) Overhead is computed relative to native throughput

Race reports No reportsSetup Throughput Overhead Rate Throughput Overhead

Native NA NA NA 28895 (462256) 1timesrr NA NA NA 475 (608) 61timestsan11 3687 (29428) 8times 113 (1985) 9824 (143201) 3timestsan11 + rr 86 (6320) 336times 34 (1680) 181 (4637) 160timesrnd 141 (892) 205times 162 (3878) 367 (3326) 79timesqueue 818 (31033) 35times 381 (7362) 3261 (84367) 9timesrnd + rec 142 (1185) 203times 155 (3103) 326 (3894) 89timesqueue + rec 513 (8534) 56times 360 (6428) 1387 (24961) 21times

In contrast the queue strategy allows threads to perform visi-ble operations largely on demand Turning to the results withrace reporting enabled we see that the queue strategy hasthe highest race detection rate improving on uncontrolledtsan11 All other race detecting configurations lower the racedetection rate we believe this is because rr and rnd reducethe number of queries being responded to concurrently

Comparing demo file sizes when recording is enabled thetsan11rec demo files are around 48MB for both strategiesdropping to 48MB when only 1000 queries are issued sug-gesting that demo file size increases linearly with the numberof requests at a rate of around 48KB per request This couldbe reduced further with a more aggressive compression strat-egy but would likely increase the time overhead The demofile for rr is significantly smaller 66MB for 10000 querieswhich goes down to 39MB with 1000 queries implying arate of around 03KB per request plus a constant 36MB

53 PARSEC and pbzipOverview Wenext turn to the PARSEC benchmark suite [11]and pbzip application [68] both widely used for evaluatingconcurrency analysis tools For PARSEC we consider thebenchmarks used to evaluate iReplayer [53] however threeof these would not work on our system (dedup and swaptionsdo not compile and canneal crashes)

Experimental setup Each PARSEC (version 30) bench-mark was run with the lsquosimlargersquo test size shipped with thebenchmarks using 4 threads Pbzip (version 2-1113) wasused to compress a 400MB file with 4 threads We ran eachbenchmark 10 times per tool configuration and report av-erage runtimes We report on standard deviation and againremark on the CV A small number of races were discoveredfor some benchmarks and the race detecting tools largelyagreed on the number of races we do not detail these further

Results and discussion Table 3 shows the average timetaken to run each benchmark with each tool configurationwith standard deviation Variance as measured by CV is

reasonably low (CV is always below 1) For the tsan11rec re-sults + rec indicates whether recording was enabled Table 4is computed from the data of Table 3 and reports the over-head associated with running using each tool configurationcompared with native executionWith the exception of bodytrack and fluidanimate the

overhead tsan11rec brings over that of tsan11 is small and forall benchmarks whether recording is enabled or notmakes lit-tle difference However the overhead associated with tsan11+ rr (ie running tsan11-instrumented code under rr) is signif-icant compared with the tsan11 overhead alone despite thefact that running under rr without race detection is generallyefficient Interestingly rr without race detection performsless well on blackscholes compared with the tsan11rec con-figurations Digging into this we found that the benchmarkdistributes work between threads at the start of executionand then lets threads runwith little interaction This high par-allelismlow communication execution plays to the strengthsof tsan11rec where invisible operations are left to run in par-allel but is bad for rr which forces sequentialization acrossall operations

54 SDL-based GamesOverview Simple DirectMedia Layer (SDL) is a library thatconsolidates input graphics and various other forms of IOunder a single interface [75] and is typically used for gamesOnUbuntu 1604 SDL communicateswith X11 for IO pulseau-dio for sound andOpenGL for displayWe investigated recordand replay for two SDL-based games Zandronum [73] amulti-player Doom port (asymp400kLOC) and QuakeSpasm [71](asymp88kLOC) a port of Quake While these games support cus-tom record and replay by logging high level commands byworking at the threading and system call level tsan11rec canfacilitate record and replay of bugs that rely on low-levelinteractions to manifest We discuss below successful recordand replay of a bug in Zandronum that arises due to commu-nication of game data between the game client and serverwhich is not present in the gamersquos native replay

586

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 3 Execution times (s) for tool configurations across the pbzip and PARSEC benchmarks averaged across 10 runsStandard deviation is shown (in parentheses)

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 92 (031) 117 (049) 664 (311) 772 (187) 181 (030) 123 (066) 182 (030) 129 (129)blackscholes 04 (003) 08 (007) 10 (001) 20 (000) 07 (007) 07 (006) 07 (007) 07 (007)fluidanimate 08 (004) 160 (127) 21 (001) 385 (049) 464 (339) 390 (267) 504 (209) 398 (196)streamcluster 17 (019) 388 (441) 1133 (313) 1811 (179) 1030 (059) 485 (269) 1028 (018) 416 (261)bodytrack 05 (002) 72 (036) 38 (094) 327 (094) 500 (040) 73 (033) 500 (078) 78 (023)ferret 12 (007) 140 (086) 87 (044) 815 (220) 164 (066) 146 (060) 167 (071) 145 (037)

Table 4Overhead compared with native execution for tool configurations across the pbzip and PARSEC benchmarks computedfrom the results of Table 3

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 10times 13times 72times 84times 20times 13times 20times 14timesblackscholes 10times 20times 27times 53times 19times 19times 19times 18timesfluidanimate 10times 203times 27times 489times 590times 497times 642times 506timesstreamcluster 10times 224times 654times 1045times 595times 280times 593times 240timesbodytrack 10times 135times 72times 614times 938times 138times 939times 147timesferret 10times 119times 74times 695times 140times 125times 143times 124times

Our initial attempts to replay these SDL-based gamesfailed due to communication between the application andthe closed and proprietary NVIDIA OpenGL module on ourexperimental platform via ioctl syscallsWeworked aroundthis by ignoring ioctl during recording and letting it runnatively during replay This works because communicationwith the display driver has no effect on the game logic Dis-play interaction led to further problems with initialization ofthe input module We resorted to adapting the scheduler tolet the application run uninstrumented until SDL module ini-tialization had completed adding a custom scheduler hookto allow the application and scheduler to synchronize relatedto this These problems are not specific to our approach ortoolmdashindeed rr cannot handle these SDL-based games forsimilar reasonsmdashbut are rather a fundamental limitation ofrecording and replaying applications that make heavy useof IO To handle such applications one either needs to fullymock out IO components requiring a tremendous engi-neering effort or carefully determine those components thatshould not be instrumented and specify this via annotationsWith these workarounds we were able to handle both

games such that gameplay is displayed on screen during re-play gameplay would not be visible if the IO subsystem hadbeen mocked out and visibility might be useful in debuggingproblems that manifest as visual artifacts

Experimental setup Measuring game performance in away that allows us to compare the overhead of our sched-uling strategies is non-trivial The only metric we have isthe frame-rate (fps)mdashthe number of frames drawn to thescreen per second QuakeSpasm and Zandronum are capped

at 60 fps and will try to maintain this frame-rate dipping ifthey cannot keep up If the frame-rate is reduced too muchthe games become unplayable As a best-effort evaluationmechanism we report on whether the games are playableunder various tool combinations Additionally we found thatit was possible to remove the frame cap for Quakespasmso we report ball-park figures for the overheads of varioustool configurations when playing this game un-capped (Wecould not find a way to reliably remove the frame cap forZandronum) As mentioned previously we do not comparewith rr as it cannot record or replay the games We used Zan-dronum revision 10013dd3c3b57023f updated to use SDL2QuakeSpasm version 0930 and SDL version 205

Results anddiscussion With the random tsan11rec sched-uler Zandronum was unplayable even with recording dis-abled the frame-rate dropped to below 1 fps This is due tothe random scheduler starving the main thread by frequentlyscheduling other less critical threads (eg the audio thread)In contrast the queue scheduler could maintain the full 60fps with recording enabled for 100 seconds of play the demosize grew to just under 8MB of which 65MB was for syscalls

To test tsan11recrsquos ability to replay network communica-tion we found a previously fixed Zandronum bug [88] thatrelies on an error in this communication to manifest Thisbug involves incorrect game state information being sentfrom the server to the client during a map change We repli-cated the bug with a server and two clients one of whichwas recording After about 12 minutes the bug appearedand resulted in a demo file of 43MB We then replayed thedemo and the bug appeared as expected This demonstrates

587

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

that our tool can be used to accurately capture and facilitatereplay of bugs in large networked applicationsFor QuakeSpasm we found that it was possible to play

the game without dropping below 60 fps using tsan11 and alltsan11rec configurations To further investigate the overheadof each tool configuration on this case study we removedthe fps cap then played the game 5 times per tool configu-ration for 90 seconds per play enabling a mode where thegamersquos fps is periodically appended to a file We made a besteffort to play the game in a similar manner on each run butinevitably there will still be high variation in game activitybetween plays Indicative results are shown in Table 5 Thernd and queue configurations refer to tsan11rec with the ran-dom and queue strategies respectively and with recordingdisabled while the ldquo+ recrdquo tool configurations are similarbut with recording enabled The ldquoOverheadrdquo column showsthe overhead observed compared with native execution Thetake-away from these results is that the instrumentationoverhead for both tsan11 and tsan11rec is surprisingly mod-est (generally less than 2times) and that the additional overheadassociated with enabling recording in tsan11rec is low

55 Limitations SQLite and SpiderMonkeyA downside of our sparse approach to record and replay isthat different applications may have incompatible require-ments regarding what should be recorded and what mustnot be recorded For example recording memory layout andattempting to enforce the same layout on replay would notonly slow down the SDL games (see sect54) to the point ofbeing unplayable but would also cause problems related tocommunication with the display driver Yet the behaviour ofsome programs will depend on the memory layout such asiterating over an ordered C++ container that holds pointersIn particular we experimented with applying tsan11rec

to the SQLite database management library [76] and to Spi-derMonkey Firefoxrsquos JavaScript management engine [60]While tsan11rec was applicable for controlled schedulingof these applications we found that replay would rapidlydesynchronise due to memory layout nondeterminism caus-ing conditionals that rely on the values of pointers to evaluatedifferently during replay Tools such as rr can handle theseprograms reliably by enforcing the same memory layoutThis is a trade-off the non-sparse approach of rr can leadto higher overheads as demonstrated in sect52 and sect53 Analternative to adapting the record-and-replay tool so thatit always enforces memory layout determinism would beto adapt the application of interest so that default memoryallocation is replaced with a deterministic memory allocator

6 Related WorkControlled scheduling A large amount of work has goneinto the use of scheduling strategies as a form of state spaceexploration (eg [29 30 61 62 78 87]) and on techniques

aimed at reducing the size of the state space such as dynamicpartial-order reduction [31 89]) A particularly notable con-trolled scheduling tool in terms of successful practical ap-plication is Microsoftrsquos CHESS [62] which aims to system-atically explore all interleavings of a test scenario Similarto our approach each visible instruction has an associatedcustom wrapper that intercepts the real instruction callinginto the CHESS schedulerSchedule bounding techniques notably preemption- and

delay-bounding [26 61] have been shown to be successful inprioritising the order in which thread schedules are exploredduring controlled concurrency testing They prioritise ex-ploring schedules that exhibit small numbers of preemptionsbetween threads in line with empirical evidence that bugsrarely require large numbers of preemptions in order to man-ifest [56] Combining such techniques with the tsan11recalgorithm is an appealing idea in principle but is hinderedby the assumption that the program under test takes a fixedinput and that the scheduler is the only source of nonde-terminism This assumption allows running the programagain and again trying different schedules In the contextof tsan11rec which can be used to record and replay ap-plications where the environment presents other forms ofnondeterminism the manner in which the program interactswith its environment is captured with respect to a particularthread schedule and other thread schedules might involvecompletely different environmental interactions We believea more promising approach would be to bring ideas from theprobabilistic concurrency testing (PCT) algorithm [12] to thetsan11rec setting to introduce a degree of skewing to ourrandom strategy so that it explores more diverse schedules

Record and replay Record and replay has been a signifi-cant area of research with many tools being created to facil-itate it [2 9 19 24 32 36 39 40 43ndash45 47 50 53 54 57 5765ndash67 72 79 81] The general premise behind them is simi-lar identify order nondeterminism and input nondeterminismand create techniques to capture them while recording andcontrol them during replayVarious tools extend the OS in some way or require spe-

cific hardware [2 6 9 19 20 24 47 50 79 81] This has thebenefit of giving the tool access to much more of the systemsuch as memory pages and process information For exam-ple Scribe [47] will directly modify the system schedulerinstead of coercing it and achieves slowdowns as low as105times However this severely hits the usability of the toolas it requires the user to deploy a modified OS

Other tools reside entirely in user space [10 15 32 34 43ndash45 49 51 53 54 57 66 67 72 80 84] and trade performancefor usability This is the category that tsan11rec falls intoEase of use is particularly important in persuading users toadopt the tool rr [66] in particular allows the user to recorda program by simply passing the binary to rr as a parameter

588

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 5 Indicative frames per second (fps) result captured by playing Quakespasm for 90 seconds five times per toolconfiguration and capturing the fps reports recorded by the game Frame-rate results are reported for the minimum maximummedian and mean case as well as the 25th and 75th quartiles

Min 25th Median 75th Max Mean OverheadNative 291 369 400 428 502 4004 10timestsan11 125 193 225 275 431 2338 17timesrnd 128 205 238 281 421 2471 16timesqueue 84 188 233 273 435 2333 17timesrnd + rec 113 178 212 253 366 2168 18timesqueue + rec 86 161 185 227 348 1930 21times

and as such has become the definitive tool for record and re-play We have performed an extensive comparison with rr insect5 and note that while rr outclasses tsan11rec in some appli-cations and can handle applications that are out of scope fortsan11rec (see sect55) rr shows significantly higher overheadcompared with tsan11rec for a number of applications thatrely on a high degree of parallelism for performance Furtherour sparse approach with suitable workarounds enablesrecord and replay for graphical applications (the SDL-basedgames of sect54) that rr cannot currently handleBecause it builds on tsan11 which itself uses compiler

instrumentation and a modified libcxx tsan11rec shares sim-ilarities with tools that depend on language implementationor library-level support [1 13 13 18 36 58] Notable exam-ples here include R2 [36] and IntelliTrace [58]Whole system replay aims to record all system nonde-

terminism [14 19 22 24 25 27 53 77] Among these therecent iReplayer tool [53] performs record and replay in-situavoiding many of the problems (eg memory layout issues)that otherwise come from running the record and replayexecutions under different processes

Some tools focus on the order-nondeterminism allowingthem to retain their parallelism and thus reducing the over-head of multi-threaded application [25 42 59 63 70 86]Castor [57] will provide each thread with its own bufferfor storing information and serialize them at a later timetsan11rec also fits into this category as it will both preserverparallelism of invisible operations and apply a schedulingstrategy to resolve this nondeterminism

An alternative to recording a programrsquos nondeterminismis to remove it making some or all aspects of the programdeterministic [5 8 16 17 21 55] For example Dthreads [55]ensures that memory accesses are deterministic on eachexecution Such approaches can have a significant probeproblem by removing the behaviour necessary for certainbugs to manifest in return for avoiding the performanceoverhead associated with handling order-nondeterminism

Multi-version execution Multi-version (or multi-variant)execution (MVE) is a method for concurrently running mul-tiple processes that are expected to behave in a semanticallysimilar manner [41 46 69 82 83] MVE can be used to detect

security vulnerabilities in applications if a variant divergesthis could indicate that an attacker has modified the processin some way [46 82 83] It can also be used for running dif-ferent analyses on identical processes that would not workwhen run together on the same process such as the clangsanitizers [69] Most MVE systems hinge on a special moni-tor thread that controls the generation an maintenance ofa number of variants Keeping the variants in sync with re-spect to nondeterministic behaviours presents many of thesame problems that are associated with record and replay

7 ConclusionWe have presented tsan11rec which brings together con-trolled scheduling record and replay and dynamic data racedetection for the dynamic analysis of CC++11 applicationsOur experimental evaluation demonstrates that the tool isin many cases competitive in terms of performance withrr a state-of-the-art record and replay tool in some casesout-performing rr due to tsan11recrsquos ability to preserve par-allelism in applications under test to a high degree We havealso shown that our tool is capable of recording and replayingSDL-based video games by exploiting our sparse approachto avoid recording aspects of gamedisplay communicationthat are fundamentally hard to control The flip side of oursparse approach is that by limiting what is recorded ourtool desynchronises on uncontrolled forms of nondetermin-ism such as that related to memory layout by capturingthis form of nondeterminism tools such as rr do not sufferfrom this problem Two exciting avenues for future workinclude investigating a spectrum of recording granularitiesto bridge the gap between our sparse approach and stricterapproaches in a configurable manner and to investigatebug detection using a richer range of scheduling strategiesincluding schedule bounding [26 61 62] and probabilisticconcurrency testing [12]

AcknowledgmentsThis work was supported by a PhD studentship funded byGCHQ and EPSRC projects EPN0263141 and EPR0068651

589

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

References[1] Bowen Alpern Jong-Deok Choi Ton Ngo Manu Sridharan and

John M Vlissides 2001 A Perturbation-Free Replay Platform forCross-Optimized Multithreaded Applications In Proceedings of the15th International Parallel amp Distributed Processing Symposium (IPDPS-01) San Francisco CA USA April 23-27 2001 IEEE Computer Society23 httpsdoiorg101109IPDPS2001924957

[2] GautamAltekar and Ion Stoica 2009 ODR output-deterministic replayfor multicore debugging In Proceedings of the 22nd ACM Symposiumon Operating Systems Principles 2009 SOSP 2009 Big Sky Montana USAOctober 11-14 2009 Jeanna Neefe Matthews and Thomas E Anderson(Eds) ACM 193ndash206 httpsdoiorg10114516295751629594

[3] Apache Software Foundation 2018 Apache httpd httpshttpdapacheorgdevdevnoteshtml

[4] Remzi H Arpaci-Dusseau and Brad Chen (Eds) 2010 9th USENIXSymposium on Operating Systems Design and Implementation OSDI2010 October 4-6 2010 Vancouver BC Canada Proceedings USENIXAssociation httpwwwusenixorgeventosdi10techfull_papersosdi10_proceedingspdf

[5] Amittai Aviram Shu-Chun Weng Sen Hu and Bryan Ford 2010 Ef-ficient System-Enforced Deterministic Parallelism See [4] 193ndash206httpwwwusenixorgeventsosdi10techfull_papersAvirampdf

[6] David F Bacon and Seth Copen Goldstein 1991 Hardware-assisted Re-play of Multiprocessor Programs In Proceedings of the 1991 ACMONRWorkshop on Parallel and Distributed Debugging (PADD rsquo91) ACMNew York NY USA 194ndash206 httpsdoiorg101145122759122777

[7] Mark Batty Scott Owens Susmit Sarkar Peter Sewell and TjarkWeber2011 Mathematizing C++ concurrency In POPL 55ndash66

[8] Tom Bergan Owen Anderson Joseph Devietti Luis Ceze and DanGrossman 2010 CoreDet a compiler and runtime system for deter-ministic multithreaded execution See [38] 53ndash64 httpsdoiorg10114517360201736029

[9] Tom Bergan Nicholas Hunt Luis Ceze and Steven D Gribble 2010Deterministic Process Groups in dOS See [4] 177ndash191 httpwwwusenixorgeventsosdi10techfull_papersBerganpdf

[10] Sanjay Bhansali Wen-Ke Chen Stuart de Jong Andrew EdwardsRon Murray Milenko Drinić Darek Mihočka and Joe Chau 2006Framework for Instruction-level Tracing and Analysis of Program Ex-ecutions In Proceedings of the 2Nd International Conference on VirtualExecution Environments (VEE rsquo06) ACM New York NY USA 154ndash163httpsdoiorg10114511347601220164

[11] Christian Bienia Sanjeev Kumar Jaswinder Pal Singh and Kai Li 2008The PARSEC benchmark suite characterization and architectural im-plications In 17th International Conference on Parallel Architecture andCompilation Techniques PACT 2008 Toronto Ontario Canada October25-29 2008 Andreas Moshovos David Tarditi and Kunle Olukotun(Eds) ACM 72ndash81 httpsdoiorg10114514541151454128

[12] Sebastian Burckhardt Pravesh Kothari Madanlal Musuvathi and San-tosh Nagarakatte 2010 A randomized scheduler with probabilisticguarantees of finding bugs See [38] 167ndash178 httpsdoiorg10114517360201736040

[13] Brian Burg Richard Bailey Andrew J Ko and Michael D Ernst 2013Interactive recordreplay for web application debugging In The 26thAnnual ACM Symposium on User Interface Software and TechnologyUISTrsquo13 St Andrews United Kingdom October 8-11 2013 ShahramIzadi Aaron J Quigley Ivan Poupyrev and Takeo Igarashi (Eds) ACM473ndash484 httpsdoiorg10114525019882502050

[14] Anton Burtsev David Johnson Mike Hibler Eric Eide and John Regehr2016 Abstractions for Practical Virtual Machine Replay In Proceedingsof the 12th ACM SIGPLANSIGOPS International Conference on VirtualExecution Environments Atlanta GA USA April 2-3 2016 VishakhaGupta-Cledat Donald E Porter and Vivek Sarkar (Eds) ACM 93ndash106httpsdoiorg10114528922422892257

[15] M E Chastain 1999 MEC (January 1999) httpslwnnet19990121amechtml

[16] Heming Cui Jiriacute Simsa Yi-Hong Lin Hao Li Ben Blum Xinan XuJunfeng Yang Garth A Gibson and Randal E Bryant 2013 Parrota practical runtime for deterministic stable and reliable threads InACM SIGOPS 24th Symposium on Operating Systems Principles SOSP rsquo13Farmington PA USA November 3-6 2013 Michael Kaminsky and MikeDahlin (Eds) ACM 388ndash405 httpsdoiorg10114525173492522735

[17] Heming Cui Jingyue Wu John Gallagher Huayang Guo and JunfengYang 2011 Efficient deterministic multithreading through schedulerelaxation See [85] 337ndash351 httpsdoiorg10114520435562043588

[18] P Deva 2018 Chronon (2018) httpchrononsystemscomblogdesign-and-architecture-ofthe-chronon-record-0

[19] David Devecsery Michael Chow Xianzheng Dou Jason Flinn andPeter M Chen 2014 Eidetic Systems In 11th USENIX Symposium onOperating Systems Design and Implementation OSDI rsquo14 BroomfieldCO USA October 6-8 2014 Jason Flinn and Hank Levy (Eds) USENIXAssociation 525ndash540 httpswwwusenixorgconferenceosdi14technical-sessionspresentationdevecsery

[20] Joseph Devietti Brandon Lucia Luis Ceze andMark Oskin 2009 DMPdeterministic shared memory multiprocessing In Proceedings of the14th International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2009 Washington DC USAMarch 7-11 2009 Mary Lou Soffa and Mary Jane Irwin (Eds) ACM85ndash96 httpsdoiorg10114515082441508255

[21] Joseph Devietti Jacob Nelson Tom Bergan Luis Ceze and Dan Gross-man 2011 RCDC a relaxed consistency deterministic computer See[37] 67ndash78 httpsdoiorg10114519503651950376

[22] Brendan Dolan-Gavitt Josh Hodosh Patrick Hulin Tim Leek andRyan Whelan 2015 Repeatable Reverse Engineering with PANDAIn Proceedings of the 5th Program Protection and Reverse EngineeringWorkshop (PPREW-5) ACM New York NY USA Article 4 11 pageshttpsdoiorg10114528438592843867

[23] Richard Draves and Robbert van Renesse (Eds) 2008 8th USENIX Sym-posium on Operating Systems Design and Implementation OSDI 2008December 8-10 2008 San Diego California USA Proceedings USENIXAssociation httpswwwusenixorgpublicationsproceedingsf[0]=im_group_audience3A73

[24] George W Dunlap Samuel T King Sukru Cinar Murtaza A Bas-rai and Peter M Chen 2002 ReVirt Enabling Intrusion AnalysisThrough Virtual-Machine Logging and Replay In 5th Symposium onOperating System Design and Implementation (OSDI 2002) Boston Mas-sachusetts USA December 9-11 2002 David E Culler and Peter Dr-uschel (Eds) USENIX Association httpwwwusenixorgeventsosdi02techdunlaphtml

[25] George W Dunlap Dominic G Lucchetti Michael A Fetterman andPeter M Chen 2008 Execution replay of multiprocessor virtual ma-chines In Proceedings of the 4th International Conference on VirtualExecution Environments VEE 2008 Seattle WA USA March 5-7 2008David Gregg Vikram S Adve and Brian N Bershad (Eds) ACM121ndash130 httpsdoiorg10114513462561346273

[26] Michael Emmi Shaz Qadeer and Zvonimir Rakamaric 2011 Delay-bounded scheduling In Proceedings of the 38th ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages POPL 2011 AustinTX USA January 26-28 2011 Thomas Ball and Mooly Sagiv (Eds)ACM 411ndash422 httpsdoiorg10114519263851926432

[27] Jakob Engblom Daniel Aarno and Bengt Werner 2010 Full-SystemSimulation from Embedded to High-Performance Systems Springer USBoston MA 25ndash45 httpsdoiorg101007978-1-4419-6175-4_3

[28] Cormac Flanagan and Stephen N Freund 2009 FastTrack efficientand precise dynamic race detection In Proceedings of the 2009 ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI 2009 Dublin Ireland June 15-21 2009 Michael Hind andAmer Diwan (Eds) ACM 121ndash133 httpsdoiorg1011451542476

590

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

1542490[29] Cormac Flanagan and Stephen N Freund 2010 Adversarial memory

for detecting destructive races In Proceedings of the 2010 ACM SIGPLANConference on Programming Language Design and Implementation PLDI2010 Toronto Ontario Canada June 5-10 2010 Benjamin G Zornand Alexander Aiken (Eds) ACM 244ndash254 httpsdoiorg10114518065961806625

[30] Cormac Flanagan and Stephen N Freund 2010 The RoadRunnerdynamic analysis framework for concurrent programs In Proceedingsof the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis forSoftware Tools and Engineering PASTErsquo10 Toronto Ontario CanadaJune 5-6 2010 Sorin Lerner and Atanas Rountev (Eds) ACM 1ndash8httpsdoiorg10114518066721806674

[31] Cormac Flanagan and Patrice Godefroid 2005 Dynamic partial-orderreduction for model checking software In Proceedings of the 32nd ACMSIGPLAN-SIGACT Symposium on Principles of Programming LanguagesPOPL 2005 Long Beach California USA January 12-14 2005 JensPalsberg and Martiacuten Abadi (Eds) ACM 110ndash121 httpsdoiorg10114510403051040315

[32] Dennis Geels Gautam Altekar Scott Shenker and Ion Stoica 2006Replay Debugging for Distributed Applications (Awarded Best Paper)In Proceedings of the 2006 USENIX Annual Technical Conference BostonMA USA May 30 - June 3 2006 Atul Adya and Erich M Nahum (Eds)USENIX 289ndash300 httpwwwusenixorgeventsusenix06techgeelshtml

[33] Patrice Godefroid 2005 Software Model Checking The VeriSoftApproach Formal Methods in System Design 26 2 (2005) 77ndash101httpsdoiorg101007s10703-005-1489-x

[34] C Gottbrath 2008 Reverse debugging with the TotalView debugger(May 2008)

[35] David Grove and Steve Blackburn (Eds) 2015 Proceedings of the36th ACM SIGPLAN Conference on Programming Language Design andImplementation Portland OR USA June 15-17 2015 ACM httpdlacmorgcitationcfmid=2737924

[36] Zhenyu Guo Xi Wang Jian Tang Xuezheng Liu Zhilei Xu Ming WuM Frans Kaashoek and Zheng Zhang 2008 R2 An Application-LevelKernel for Record and Replay See [23] 193ndash208 httpwwwusenixorgeventsosdi08techfull_papersguoguopdf

[37] Rajiv Gupta and Todd C Mowry (Eds) 2011 Proceedings of the 16thInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2011 Newport Beach CAUSA March 5-11 2011 ACM

[38] James C Hoe and Vikram S Adve (Eds) 2010 Proceedings of the 15thInternational Conference on Architectural Support for Programming Lan-guages and Operating Systems ASPLOS 2010 Pittsburgh PennsylvaniaUSA March 13-17 2010 ACM

[39] Nima Honarmand and Josep Torrellas 2014 RelaxReplay recordand replay for relaxed-consistency multiprocessors In ArchitecturalSupport for Programming Languages and Operating Systems ASPLOSrsquo14 Salt Lake City UT USA March 1-5 2014 Rajeev BalasubramonianAl Davis and Sarita V Adve (Eds) ACM 223ndash238 httpsdoiorg10114525419402541979

[40] Nima Honarmand and Josep Torrellas 2014 Replay debugging Lever-aging record and replay for program debugging In ACMIEEE 41stInternational Symposium on Computer Architecture ISCA 2014 Min-neapolis MN USA June 14-18 2014 IEEE Computer Society 455ndash456httpsdoiorg101109ISCA20146853229

[41] Petr Hosek and Cristian Cadar 2015 VARAN the Unbelievable AnEfficient N-version Execution Framework In Proceedings of the Twenti-eth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS rsquo15 Istanbul Turkey March14-18 2015 Oumlzcan Oumlzturk Kemal Ebcioglu and Sandhya Dwarkadas(Eds) ACM 339ndash353 httpsdoiorg10114526943442694390

[42] Derek Hower and Mark D Hill 2008 Rerun Exploiting Episodes forLightweight Memory Race Recording In 35th International Symposiumon Computer Architecture (ISCA 2008) June 21-25 2008 Beijing China265ndash276 httpsdoiorg101109ISCA200826

[43] Jeff Huang Peng Liu and Charles Zhang 2010 LEAP lightweightdeterministic multi-processor replay of concurrent java programsIn Proceedings of the 18th ACM SIGSOFT International Symposium onFoundations of Software Engineering 2010 Santa Fe NM USA November7-11 2010 Gruia-Catalin Roman and Andreacute van der Hoek (Eds) ACM207ndash216 httpsdoiorg10114518822911882323

[44] Jeff Huang Charles Zhang and Julian Dolby 2013 CLAP recordinglocal executions to reproduce concurrency failures In ACM SIGPLANConference on Programming Language Design and Implementation PLDIrsquo13 Seattle WA USA June 16-19 2013 Hans-Juergen Boehm and Cor-mac Flanagan (Eds) ACM 141ndash152 httpsdoiorg10114524621562462167

[45] Shiyou Huang Bowen Cai and Jeff Huang 2017 Towards Production-Run Heisenbugs Reproduction on Commercial Hardware In 2017USENIX Annual Technical Conference USENIX ATC 2017 Santa ClaraCA USA July 12-14 2017 403ndash415 httpswwwusenixorgconferenceatc17technical-sessionspresentationhuang

[46] Koen Koning Herbert Bos and Cristiano Giuffrida 2016 Secure andEfficient Multi-Variant Execution Using Hardware-Assisted ProcessVirtualization In 46th Annual IEEEIFIP International Conference onDependable Systems and Networks DSN 2016 Toulouse France June28 - July 1 2016 IEEE Computer Society 431ndash442 httpsdoiorg101109DSN201646

[47] Oren Laadan Nicolas Viennot and Jason Nieh 2010 Transparentlightweight application execution replay on commodity multiproces-sor operating systems In SIGMETRICS 2010 Proceedings of the 2010ACM SIGMETRICS International Conference on Measurement and Mod-eling of Computer Systems New York New York USA 14-18 June 2010Vishal Misra Paul Barford and Mark S Squillante (Eds) ACM 155ndash166 httpsdoiorg10114518110391811057

[48] Leslie Lamport 1978 Time Clocks and the Ordering of Events ina Distributed System Commun ACM 21 7 (1978) 558ndash565 httpsdoiorg101145359545359563

[49] Dongyoon Lee Peter M Chen Jason Flinn and Satish Narayanasamy2012 Chimera hybrid program analysis for determinism In ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI rsquo12 Beijing China - June 11 - 16 2012 Jan Vitek Haibo Linand Frank Tip (Eds) ACM 463ndash474 httpsdoiorg10114522540642254119

[50] Dongyoon Lee Benjamin Wester Kaushik Veeraraghavan SatishNarayanasamy Peter M Chen and Jason Flinn 2010 Respec efficientonline multiprocessor replayvia speculation and external determinismSee [38] 77ndash90 httpsdoiorg10114517360201736031

[51] Kyu Hyung Lee Dohyeong Kim and Xiangyu Zhang 2014Infrastructure-Free Logging and Replay of Concurrent Executionon Multiple Cores In ECOOP 2014 - Object-Oriented Programming- 28th European Conference Uppsala Sweden July 28 - August 12014 Proceedings (Lecture Notes in Computer Science) Richard EJones (Ed) Vol 8586 Springer 232ndash256 httpsdoiorg101007978-3-662-44202-9_10

[52] Christopher Lidbury and Alastair F Donaldson 2017 Dynamic racedetection for C++11 In Proceedings of the 44th ACM SIGPLAN Sympo-sium on Principles of Programming Languages POPL 2017 Paris FranceJanuary 18-20 2017 Giuseppe Castagna and Andrew D Gordon (Eds)ACM 443ndash457 httpsdoiorg1011453009837

[53] Hongyu Liu Sam Silvestro Wei Wang Chen Tian and Tongping Liu2018 iReplayer in-situ and identical record-and-replay for multi-threaded applications In Proceedings of the 39th ACM SIGPLAN Confer-ence on Programming Language Design and Implementation PLDI 2018

591

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Philadelphia PA USA June 18-22 2018 Jeffrey S Foster and Dan Gross-man (Eds) ACM 344ndash358 httpsdoiorg10114531923663192380

[54] Peng Liu Xiangyu Zhang Omer Tripp and Yunhui Zheng 2015 Lightreplay via tightly bounded recording See [35] 55ndash64 httpsdoiorg10114527379242738001

[55] Tongping Liu Charlie Curtsinger and EmeryD Berger 2011 Dthreadsefficient deterministic multithreading See [85] 327ndash336 httpsdoiorg10114520435562043587

[56] Shan Lu Soyeon Park Eunsoo Seo and Yuanyuan Zhou 2008 Learn-ing from mistakes a comprehensive study on real world concurrencybug characteristics In Proceedings of the 13th International Confer-ence on Architectural Support for Programming Languages and Op-erating Systems ASPLOS 2008 Seattle WA USA March 1-5 2008Susan J Eggers and James R Larus (Eds) ACM 329ndash339 httpsdoiorg10114513462811346323

[57] Ali Joseacute Mashtizadeh Tal Garfinkel David Terei David Maziegraveresand Mendel Rosenblum 2017 Towards Practical Default-On Multi-Core RecordReplay In Proceedings of the Twenty-Second InternationalConference on Architectural Support for Programming Languages andOperating Systems ASPLOS 2017 Xirsquoan China April 8-12 2017 YunjiChen Olivier Temam and John Carter (Eds) ACM 693ndash708 httpsdoiorg10114530376973037751

[58] Microsoft 2018 Understanding IntelliTrace part I What the $ isIntelliTrace (2018) httpsblogsmsdnmicrosoftcomzainnab20130212understanding-intellitrace-part-i-what-the-is-intellitrace

[59] Pablo Montesinos Luis Ceze and Josep Torrellas 2008 DeLoreanRecording and Deterministically Replaying Shared-Memory Multi-processor Execution Effciently In 35th International Symposium onComputer Architecture (ISCA 2008) June 21-25 2008 Beijing China289ndash300 httpsdoiorg101109ISCA200836

[60] Mozilla 2018 SpiderMonkey httpsdevelopermozillaorgen-USdocsMozillaProjectsSpiderMonkey

[61] MadanlalMusuvathi and Shaz Qadeer 2007 Iterative context boundingfor systematic testing of multithreaded programs In Proceedings of theACM SIGPLAN 2007 Conference on Programming Language Design andImplementation San Diego California USA June 10-13 2007 JeanneFerrante and Kathryn S McKinley (Eds) ACM 446ndash455 httpsdoiorg10114512507341250785

[62] Madanlal Musuvathi Shaz Qadeer Thomas Ball Geacuterard Basler Pi-ramanayagam Arumuga Nainar and Iulian Neamtiu 2008 Find-ing and Reproducing Heisenbugs in Concurrent Programs See[23] 267ndash280 httpwwwusenixorgeventsosdi08techfull_papersmusuvathimusuvathipdf

[63] Satish Narayanasamy Gilles Pokam and Brad Calder 2005 BugNetContinuously Recording Program Execution for Deterministic ReplayDebugging In 32st International Symposium on Computer Architecture(ISCA 2005) 4-8 June 2005 Madison Wisconsin USA IEEE ComputerSociety 284ndash295 httpsdoiorg101109ISCA200516

[64] Brian Norris and Brian Demsky 2013 CDSchecker checking concur-rent data structures written with CC++ atomics In Proceedings ofthe 2013 ACM SIGPLAN International Conference on Object OrientedProgramming Systems Languages amp Applications OOPSLA 2013 part ofSPLASH 2013 Indianapolis IN USA October 26-31 2013 131ndash150

[65] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2016 Lightweight User-Space Record And ReplayCoRR abs161002144 (2016) arXiv161002144 httparxivorgabs161002144

[66] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2017 Engineering Record and Replay for Deploy-ability In 2017 USENIX Annual Technical Conference USENIX ATC 2017Santa Clara CA USA July 12-14 2017 377ndash389 httpswwwusenixorgconferenceatc17technical-sessionspresentationocallahan

[67] Harish Patil Cristiano Pereira Mack Stallcup Gregory Lueck andJames Cownie 2010 PinPlay a framework for deterministic replay

and reproducible analysis of parallel programs In Proceedings of theCGO 2010 The 8th International Symposium on Code Generation andOptimization Toronto Ontario Canada April 24-28 2010 AndreasMoshovos J Gregory Steffan Kim M Hazelwood and David R Kaeli(Eds) ACM 2ndash11 httpsdoiorg10114517729541772958

[68] pbzip2 development team 2018 pbzip2 httpslaunchpadnetpbzip2[69] Luiacutes Pina Anastasios Andronidis and Cristian Cadar 2018 FreeDA

deploying incompatible stock dynamic analyses in production viamulti-version execution In Proceedings of the 15th ACM InternationalConference on Computing Frontiers CF 2018 Ischia Italy May 08-102018 David R Kaeli and Miquel Pericagraves (Eds) ACM 1ndash10 httpsdoiorg10114532032173203237

[70] Gilles Pokam Klaus Danne Cristiano Pereira Rolf Kassa Tim KranichShiliang Hu Justin Emile Gottschlich Nima Honarmand Nathan Daut-enhahn Samuel T King and Josep Torrellas 2013 QuickRec proto-typing an intel architecture extension for record and replay of mul-tithreaded programs In The 40th Annual International Symposiumon Computer Architecture ISCArsquo13 Tel-Aviv Israel June 23-27 2013Avi Mendelson (Ed) ACM 643ndash654 httpsdoiorg10114524859222485977

[71] QuakeSpasm 2018 QuakeSpasm An engine for iD softwarersquos Quakehttpquakespasmsourceforgenet

[72] Yasushi Saito 2005 Jockey a user-space library for record-replaydebugging In Proceedings of the Sixth International Workshop on Auto-mated Debugging AADEBUG 2005 Monterey California USA Septem-ber 19-21 2005 Clinton Jeffery Jong-Deok Choi and Raimondas Lence-vicius (Eds) ACM 69ndash76 httpsdoiorg10114510851301085139

[73] Torr Samaho 2018 Zandronum httpszandronumcom[74] Konstantin Serebryany and Timur Iskhodzhanov 2009 ThreadSani-

tizer Data Race Detection in Practice In WBIA 62ndash71[75] Simple DirectMedia Layer 2018 SDL 20 library httpswwwlibsdl

orgdownload-20php[76] SQLite 2018 SQLite 3240 httpssqliteorgreleaselog3_24_0html[77] Deepa Srinivasan and Xuxian Jiang 2012 Time-Traveling Forensic

Analysis of VM-Based High-Interaction Honeypots In Security and Pri-vacy in Communication Networks Muttukrishnan Rajarajan Fred PiperHaining Wang and George Kesidis (Eds) Springer Berlin HeidelbergBerlin Heidelberg 209ndash226

[78] Paul Thomson Alastair F Donaldson and Adam Betts 2014 Concur-rency testing using schedule bounding an empirical study In ACMSIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPPoPP rsquo14 Orlando FL USA February 15-19 2014 Joseacute E Moreira andJames R Larus (Eds) ACM 15ndash28 httpsdoiorg10114525552432555260

[79] Joseph Tucek Shan Lu Chengdu Huang Spiros Xanthos andYuanyuan Zhou 2007 Triage diagnosing production run failuresat the userrsquos site In Proceedings of the 21st ACM Symposium on Oper-ating Systems Principles 2007 SOSP 2007 Stevenson Washington USAOctober 14-17 2007 Thomas C Bressoud and M Frans Kaashoek (Eds)ACM 131ndash144 httpsdoiorg10114512942611294275

[80] Undo 2018 Reversible debugging tools for CC++ on Linux amp Android(2018) httpundo-softwarecom

[81] Kaushik Veeraraghavan Dongyoon Lee Benjamin Wester JessicaOuyang Peter M Chen Jason Flinn and Satish Narayanasamy 2011DoublePlay parallelizing sequential logging and replay See [37] 15ndash26 httpsdoiorg10114519503651950370

[82] Stijn Volckaert Bart Coppens Bjorn De Sutter Koen De BosscherePer Larsen and Michael Franz 2017 Taming Parallelism in a Multi-Variant Execution Environment In Proceedings of the Twelfth EuropeanConference on Computer Systems EuroSys 2017 Belgrade Serbia April23-26 2017 Gustavo Alonso Ricardo Bianchini and Marko Vukolic(Eds) ACM 270ndash285 httpsdoiorg10114530641763064178

[83] Stijn Volckaert Bart Coppens Alexios Voulimeneas Andrei Home-scu Per Larsen Bjorn De Sutter and Michael Franz 2016 Secure

592

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

and Efficient Application Monitoring and Replication In 2016 USENIXAnnual Technical Conference USENIX ATC 2016 Denver CO USA June22-24 2016 Ajay Gulati and Hakim Weatherspoon (Eds) USENIXAssociation 167ndash179 httpswwwusenixorgconferenceatc16technical-sessionspresentationvolckaert

[84] YanWang Harish Patil Cristiano Pereira Gregory Lueck Rajiv Guptaand Iulian Neamtiu 2014 DrDebug Deterministic Replay based CyclicDebugging with Dynamic Slicing In 12th Annual IEEEACM Interna-tional Symposium on Code Generation and Optimization CGO 2014Orlando FL USA February 15-19 2014 David R Kaeli and TippMoseley(Eds) ACM 98 httpsdoiorg10114525441372544152

[85] Ted Wobber and Peter Druschel (Eds) 2011 Proceedings of the 23rdACM Symposium on Operating Systems Principles 2011 SOSP 2011Cascais Portugal October 23-26 2011 ACM httpsdoiorg1011452043556

[86] Min Xu Rastislav Bodiacutek and Mark D Hill 2003 A Flight DataRecorder for Enabling Full-System Multiprocessor Deterministic

Replay In 30th International Symposium on Computer Architecture(ISCA 2003) 9-11 June 2003 San Diego California USA Allan Got-tlieb and Kai Li (Eds) IEEE Computer Society 122ndash133 httpsdoiorg101109ISCA20031206994

[87] Jie Yu Satish Narayanasamy Cristiano Pereira and Gilles Pokam 2012Maple a coverage-driven testing tool for multithreaded programs InProceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications OOPSLA2012 part of SPLASH 2012 Tucson AZ USA October 21-25 2012 Gary TLeavens and Matthew B Dwyer (Eds) ACM 485ndash502 httpsdoiorg10114523846162384651

[88] Zandronum 2015 Zandronum bug tracker httpszandronumcomtrackerviewphpid=2380 Bug 0002380

[89] Naling Zhang Markus Kusano and ChaoWang 2015 Dynamic partialorder reduction for relaxed memory models See [35] 250ndash259 httpsdoiorg10114527379242737956

593

  • Abstract
  • 1 Introduction
  • 2 Background
  • 3 Scheduling
    • 31 Protocol Details
    • 32 Special Cases
    • 33 Liveness
      • 4 Record and Replay
        • 41 Motivating Example
        • 42 Interleaving
        • 43 Signals
        • 44 System Calls
        • 45 Asynchronous Events
          • 5 Evaluation
            • 51 CDSchecker Litmus Tests
            • 52 httpd
            • 53 PARSEC and pbzip
            • 54 SDL-based Games
            • 55 Limitations SQLite and SpiderMonkey
              • 6 Related Work
              • 7 Conclusion
              • Acknowledgments
              • References
Page 4: Sparse Record and Replay with Controlled Schedulingafd/homepages/papers/pdfs/2019/PLDI.pdf · Controlled concurrency testing has proven successful in find-ing subtle bugs in concurrent

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

under sequential consistency but can occur under C++11semantics the race is detected by tsan11 but not tsan

Amajor limitation of tsan and tsan11 is that the executionsexplored by the tool are at the mercy of the OS schedulerBugs that require unusual interleavings to trigger almostnever manifest The tsan11rec tool presented in this workbuilds on tsan11 combining the strengths of the tsan ap-proach with controlled scheduling and record and replayto allow the detection and reproduction of bugs that wouldbe unlikely to be discovered and harder still to repeatedlyreproduce using the OS scheduler alone

3 SchedulingRecall that one of our main aims in the design of tsan11recis to combine controlled scheduling with record and replayin a manner that directly incorporates state-of-the-art racedetection We now describe the mechanics of the schedulerin particular how the scheduler builds on and interacts withthe existing tsan11 instrumentation library and how thescheduler handles nondeterminism The mechanics of thescheduler will be important for describing the design of ourrecord and replay facilities in sect4

Rather than using an overarching scheduler thread detailsof scheduling decisions are stored in a designated piece ofshared state The threads interact indirectly via this sharedstate using a protocol to cooperatively determine when theyshould be scheduled The protocol builds on the tsan11 li-brary and has been designed so that new scheduling strate-gies can be easily added We focus on two strategies in thiswork random and queue With the random strategy the nextthread to schedule is chosen at random at each schedulingpoint via a pseudo-random number generator (PRNG) ini-tialized with some fixed initial seed This provides controlledrandom scheduling similar to that described in [78] Withthe queue strategy threads are scheduled in a first-come-first-served manner

31 Protocol DetailsRecall from sect2 that the tsan tool and its tsan11 extensionperforms compile-time instrumentation of the low-level visi-ble operations including atomic operations and system calls(syscalls) Each time such an operation is reached controljumps into a tsan library function Execution of some of theselibrary functions may be sequentialized but threads execut-ing outside these library functionsmdashie executing invisibleoperationsmdashare free to run in parallelOur tsan11rec tool essentially acts as an additional layer

of interception on top of the existing tsan11 instrumenta-tion On reaching a visible operation a thread jumps intoa tsan11 library function eg __tsan_atomic_load4 for a4-byte atomic load which will additionally call tsan11recfunctions that interact with the scheduler logic to determinewhen the thread can proceed

Figure 3 Sequentialized critical sections and parallel in-visible operations Blue wavy arrows represent scheduler-imposed ordering black arrows represent program order

Communication with the tsan11rec instrumentation layeruses a protocol based on two functions Wait() and Tick()

Wait() ndash Block this thread until the scheduler activates itTick() ndash Choose a thread to activate

A thread enters Wait() right before it executes a visible op-eration Depending on the scheduling strategy used the stateof the scheduler may need to be updated when Wait is calledthe queue strategy requires the thread to enqueue itself therandom strategy requires no action If the thread alreadyhappens to be the next thread due for scheduling Wait()returns without blocking A thread enters Tick() once ithas completed a visible operation By executing Tick() thethread applies the scheduling strategy (random or queue) tochoose the next thread to be scheduled and update the sched-uler state to reflect this In the case of the queue strategy thisinvolves incrementing the current queue position for therandom strategy this involves choosing the next thread id atrandom When the thread returns from the Tick() functionit is free to continue performing invisible operations unhin-dered until it reaches the next visible operation where it willenter Wait() This allows for parallelism between threadsas sections of invisible code are unordered Multiple invisibleoperations that can execute in parallel are illustrated in Fig-ure 3 Not that with the random strategy a thread does nothave to reach Wait() to be in the list of schedulable threadsThe combination of a visible operation and associated

scheduling-related code wrapped in a Wait() and Tick()pair is called a critical section

32 Special CasesThe approach described in sect31 of wrapping the tsan instru-mentation for a visible operation in a critical section worksdirectly for most visible operations We now discuss a num-ber of operations that require extra attention to detail mainlybecause their semantics necessitate specific updates to thescheduler state or because they cannot be represented as asimple critical section

579

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Calls to functions whose names begin with intercept_are already inserted by tsan11 to instrument visible opera-tions We have modified the implementations of these func-tions to accommodate our instrumentation

Thread management Thread creation deletion and join-ing are all treated as visible operations This is because theymust update the state of the scheduler and as such will af-fect the scheduling of threads going forward To handle thiswe introduce three scheduler functions ThreadNew(tid)ThreadJoin(tid) and ThreadDelete()

The ThreadNew and ThreadJoin functions are added dur-ing instrumentation of thread creation and joining primitivesThe ThreadNew(tid) function called by the parent of thenewly created thread enables the new thread within thescheduler The ThreadJoin(tid) function will block itselfuntil thread tid has finished and as such must instead dis-able itself in the scheduler and also mark itself as waitingon tid On completion a thread calls ThreadDelete whichinvolves (a) enabling the parent thread if it is waiting for thisthread to finish and (b) disabling itself in the scheduler Allthree operations are wrapped in a Wait() and Tick() pair

Mutexes The mutex operations trylock lock and unlock areall visible and require instrumentation Trylock can simply bewrapped in a Wait() and Tick() pair as for regular visibleoperations Unlock is similar to thread deletion in that itmust also re-enable threads that were blocked waiting onthe mutex although in this case we only need to re-enableone of the blocked threads the thread that is chosen dependson whether the queue or random strategy is being usedMutex lock poses an interesting issue in that a thread at-

tempting to acquire a mutex will block if the lock operationfails To account for this we modified the instrumented ver-sion of mutex lock to be as shown in Figure 4 This changes itto a trylock loop with a critical section associated with eachattempt at locking Note that this is the native trylock notthe instrumented version The MutexLockFail(m) functionis similar to ThreadJoin a thread calling this function dis-ables itself from the scheduler and informs the scheduler thatit is waiting on m The thread will then reenter Wait but as itis disabled it will block until it is re-enabled and then sched-uled to run The MutexUnlock(m) function is called when athread releases a mutex and will re-enable one thread thatis disabled due to waiting on m

There is no Wait nor Tick inside MutexLockFail(m) norMutexUnlock(m) Another thread can acquire the mutex be-tween a thread being re-enabled and it attempting the trylockThis is OK the thread will simply block itself again

Condition variables Condition variables allow controlover when certain threads will wake up and try to acquirea mutex When a thread initially acquires a mutex it maycheck a condition required for it to proceed and if it failsrelease the mutex and block itself via the condition variable

int intercept_mutex_lock(void m)

int res = EBUSY

while (res == EBUSY)

Wait()

res = trylock(m) native version of trylock

if (res == EBUSY)

MutexLockFail(m)

Tick()

return res

Figure 4 Instrumented mutex lock

This thread will only wake up and try to reacquire the mutexwhen another thread notifies it via the condition variableThe conditional is checked via the conditional wait functionwaking up one or all of the waiting threads is performed viathe signal and broadcast functions respectivelyThe conditional wait accepts a timer determining the

length of time after which the thread unblocks itself Thistimer represents a physical time This is in contrast to theschedulerrsquos ticker which represents a logical time This dif-ference between physical and logical means that from theperspective of the scheduler the conditionalrsquos wakeup timeris nondeterministic Semantically speaking a thread canwake up from the timer and acquire the conditionalrsquos mutexbefore another thread can even if the associated time is verylong We choose to handle this by not disabling the thread ifit calls a conditional wait with a timer Despite not being dis-abled when timed a thread can still eat a conditional signaland so should still mark itself as waiting on the conditionalin the scheduler

For the three conditional functions we provide correspond-ing CondWait(m t) CondSignal(m) and CondBroadcast(m)scheduler functions CondSignal(m) and CondBroadcast(m)simply wake up one thread and all threads waiting on m re-spectively The interaction between the instrumentation andthe scheduler for conditional signal and broadcast is sim-ple with the instrumentation simply calling the schedulerfunction in a Wait() and Tick() pair

Conditional wait is a little more involved and details areshown in Figure 5 Between the Wait and the Tick a threadinforms the scheduler that it is performing a conditional waitvia CondWait This informs the scheduler that the thread iseither blocked waiting for a signal or performing a timedconditional wait so that while not blocked it can neverthelesseat a signal The thread then releases the mutex informingthe scheduler via the MutexUnlock scheduler function de-scribed earlier that this has been done Finally the thread en-ters the intercepted version of mutex_lock described aboveBecause this starts with a Wait in the case of an untimedsignal the thread will block until it is re-enabled by a condi-tional signal or broadcast By using distinct critical sectionsto separate a thread marking itself as being blocked on a

580

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

void intercept_cond_wait(void m bool timed)

Wait()

CondWait(m timed)

mutex_unlock()

MutexUnlock(m)

Tick()

intercept_mutex_lock()

Figure 5 Instrumented conditional wait

signal and attempting to reacquire the mutex we allow thepossibility for another thread to be scheduled in betweenpossibly acquiring the mutexOnce a thread has reacquired the mutex it will typically

recheck the condition it was originally waiting on If it isnot satisfied it will call cond_wait again This is wherethe risk of deadlock comes in as if it was the only threadsignaled and it reenters intercept_cond_wait it will notsignal other threads first and all threads that are blockedby the conditional will remain blocked We want preserveany potential deadlocks in the underlying program and sodo not limit this from happening we are also careful not tointroduce new deadlocks

Signals We briefly describe our treatment of signals andsignal handlers noting that these are distinct from the signalsassociated with conditional wait operations described aboveWe focus on asynchronous signals which can be receivedby processes at any time and thus contribute an additionalsource of nondeterminism This is distinct from synchronoussignals eg SIGSEGV which are raised by the thread as andwhen certain operations are performed Unlike in the case ofeg a memory load operation which has a designated pro-gram point that can be intercepted to facilitate interactionwith the scheduler a signal can arrive at any time The stan-dard also specifies a signal function that binds a handlerfunction to a specific signalScheduling with signals is handled by simply marking

the entrance to the signal handler and the aforementionedsignal function as visible operations From a schedulingperspective besides the arrival of a signal signals are not aproblem Recording and replaying signals is where thingsbecome difficult which we discuss in sect43

33 LivenessWhile the scheduler strives to ensure that all possible pro-gram behaviours can be explored in principle in practicedepending on the strategy this can lead to massive slow-downs in particular cases For example suppose a thread isscheduled and undertakes a vast number of invisible oper-ations or calls a sleep function for some duration beforefinally issuing a visible operation If all other threads end upblocked waiting to perform visible operations and the sched-uler doesnrsquot give them a chance to run the performance of

the program may be drastically impacted This can becomeparticularly problematic when dealing with programs thatrely on responsiveness such as real-time applications

To cope with this we slightly reduce the schedulerrsquos abil-ity to explore any possible schedule by having the schedulerforce a reschedule in such cases By forcing a rescheduleafter n milliseconds the probability of exploring a schedulewhereby a thread performs two visible operations consec-utively separated by more than n milliseconds is greatlyreduced Conveniently for us tsan has a background threadwhich we configure to call our Reschedule() function everyn milliseconds for some given n Because the Reschedule()function relies on physical time it introduces nondetermin-ism into the scheduler

4 Record and ReplayWe now discuss the record and replay mechanism we haveimplemented with the aim of answering the following ques-tions What do we mean by ldquorecordingrdquo an execution andthen replaying said execution later How do we formalise arecording What makes replaying an execution valid Whatshould we record and what should we not record

When a program executes certain visible operations willlead to nondeterminism Recording an execution thereforemeans capturing information about these visible operationsin a form that can be used to reproduce the execution duringreplay We call this captured information the demo file ordemo for short We also refer to an execution that is replay-ing a demo as a replay and that the replay is synchronisedunless something has gone wrong with the replay such thatexecution has diverged from what was recorded in whichcase we say it is desynchronisedThis leaves us with the question of what it means for

a replay to desynchronise A demo is defined as a seriesof constraints arising from the recorded execution whichthe replay is required to satisfy The tool will attempt toenforce these constraints on the program during replay andas long as it can the replay is deemed to be synchronisedIf at any point the tool is unable to enforce a constraint onthe program we say the replay has hard desynchronised inwhich case the tool will abort In some cases the replaymay abide by the constraints but appear to diverge fromthe recorded execution for example by producing consoleoutput in a different order We call this soft desynchronisationTo illustrate an extreme case of this the empty demo istrivially synchronised for any replay but will lead to softdesynchronisation practically everywhere unless the systemunder test is highly deterministic

The record and replay mechanism is built into the sched-uler we discussed in sect3 In cases where a nondeterministicchoice needs to be made that is unrelated to scheduling (andso not handled by the queue nor random strategy) a PRNGis used seeded by two calls to rdtsc()

581

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

For the most part tsan11rec avoids the need for user an-notation however there are some cases where they are un-avoidable this is shown in sect54

41 Motivating ExampleTo help lay out the reasoning and technical explanation givenin the rest of this section we start off with an example pro-gram The program fragment in Figure 2 (already discussedbriefly in sect2) shows a simple client that repeatedly receives achar buffer from a server applies a transformation to it thensends it back We discuss the behaviours of the applicationthat need to be recorded in order to enable faithful replayvs those features that need not be recorded

What to record The obvious case here is the interleavingof threads Recording this will ensure that the order of oper-ations to the atomic locations quit and mtx and the order ofthe syscalls used throughout will be the same during replayOther operations are invisible and thus will not affect otherthreads or introduce nondeterminismSystem calls that interact with the environment can be

seen as inputs to the program which in this case determineshow many requests to handle and the contents of each re-quest For example poll informs us on whether there is datato be read from the server and thus needs to be recorded asdo the system calls recv and sendThe signal handler in this example is used to trigger the

end of the program The arrival of the signal is asynchronousand comes from outside the program During replay the toolwill need to ensure that the same signal arrives at the samepoint in logical time

What to ignore The first and likely most contentious ele-ment is the layout of memory This will of course depend onthe program in question but in this example and most of theprograms we encounter the position of objects in memorywill have no effect on the rest of the program If the requestqueue was instead an ordered set of char pointers then itwould matter as the pointer values will determine the orderin which requests are considered during iteration

42 InterleavingAs explained in sect41 the ordering of visible operations mustbe preserved during replay We describe how the randomand queue strategies store ordering-related information

To recap the strategies described in sect3 the random sched-uler chooses which thread to allow to run the next visibleoperation randomly after each visible operation has com-pleted while the queue scheduler is first come first serve forwhichever threads attempts to perform a visible operation

For the random strategy the entire thread interleaving isencapsulated in the PRNG Therefore no information besidesthe two seeds used for the PRNG is required

For the queue strategy the ordering during record dependson the order in which threads happen to reach Wait() which

depends on physical timing To encode the order so that itcan be enforced during replay a file called QUEUE is usedThis file records (a) a map specifying for each thread id thefirst tick at which the thread should be scheduled and (b)an ordered list of ticks to be consumed by threads each timethey leave a critical sectionmdashthe tick that a thread consumeson leaving a critical section informs the thread as to the nexttick at which it is to be scheduled Run-length encoding isused to efficiently record the case where a thread is scheduledmultiple times in succession

There is clearly a trade-off between these strategiesWherethe random strategy stores no data the queue strategy mayneed to store data on every visible operation The queuestrategy will be much faster however as it is unlikely to beblocked in Wait() unless another thread is already critical

43 SignalsWe discussed signals briefly in sect32 but deferred discussionto this section as most of the difficulties are in attempting toreplay them Synchronous signals are ignored (eg SIGSEGVSIGPIPE) as these should reoccur at the same point in theexecution without the help of our tool To clarify it is en-tering the signal handler that is the visible operation Wheninside the signal handler a thread cannot interact with therest of the process except though atomic operations whichare themselves visible From this we can say that it doesnot matter at which point between a Tick() and followingWait() pair that the signal handler is entered

In tsan11rec any asynchronous signal that arrived duringrecording becomes a synchronous signal upon replay Whena thread receives a signal it records the value of the tickseen during the most recent Tick() along with the signalvalue in a file called SIGNAL For example consider the casein Figure 2 where the Responder thread T2 has just per-formed the atomic load on tick 5 but has not yet attemptedto acquire the lock It receives the signal and performs aWait() and Tick() so that it can enter the signal handlerThe SIGNAL file will therefore have the line ldquo2 5 15rdquo in-dicating that thread T2 receives signal 15 at tick 5 Duringreplay when the Responder thread calls Tick() during tick5 it will raise signal 15 itself at the end of Tick() It doesnot matter at which precise point between Tick() and thefollowing Wait() that the signal arrived at during recordingit will float to the end of Tick() as shown in Figure 6

44 System CallsAs discussed in sect32 system calls are a significant source ofnondeterminism in an application To ensure that relevantproperties of an application are preserved during replay weneed to record the results of relevant system calls This is afundamental challenge that any record-and-replay systemmust face and state-of-the-art tools such as rr [66] aim tobe as comprehensive as possible in the system calls theysupport so that they can be applied directly to a wide range

582

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Figure 6 Signals are replayed immediately after the preced-ing tick

of applications In contrast the idea behind our sparse ap-proach to identify a minimal subset of system calls such thatrecording these system calls suffices to enable faithful replayof particular applications of interest At a high level we ap-proached this by incrementally adding support for systemcalls based on a trial-and-error process of first using straceto understand the full set of system calls issued by an appli-cation and then repeatedly attempting to record and replaythe application with tsan11rec incrementally adding supportfor additional system calls through analysis of the sources ofreplay failures We emphasise that this process did requirequite some manual effort and would need to be iteratedfurther to handle applications with significantly differentsystem call requirements compared to our case studies

The term syscall is a bit of a misnomer as both tsan and theinstrumentation detailed in this paper will instead interceptthe glibc wrappers around the syscalls instead of the syscalldirectly These glibc functions are much easier to use onbehalf of the programmer as they will take care of systemspecific details pushing the arguments onto the stack andinterpreting the results returned by the kernel We still usesyscall throughout as it is in the underlying syscall wherethe nondeterministic behaviours occurEach syscall takes a variable number of user-allocated

buffers and fills them with the appropriate data before set-ting errno and returning some value As these are sourcesof nondeterminism the return value errno and any appro-priate buffers will be compressed and stored in a demo filecalled SYSCALL During replay the actual data returned willbe overwritten by the data in SYSCALL Only the interactionwith the SYSCALL file is part of the critical section whichreduces contention in the scheduler

As an example consider the Listener thread in Figure 2performing the poll and recv syscalls in succession The re-turn value error number and two elements in the server_fdsstructure must be stored for poll the return value errornumber and contents of the buffer must be stored for recvThese will be treated as character buffers and have a simplerun length encoding applied

One of the difficulties that arises from adding a syscall isthe knock-on effect it can have with respect to other syscallsConsider for example the syscalls that interact with thefilesystem create open read etc If you intercept open

on the assumption that you may not have a valid file de-scriptor during replay where you did during record you willthen have to intercept every syscall that works with thatfile descriptor What starts out as a single interception be-comes potentially hundreds of intercepted syscalls There isa delicate balance to be struck between those syscalls thatneed to be recorded to make important execution featuresdeterministic during replay vs those syscalls that are bet-ter left un-recorded because (a) determinism of replay doesnot depend on them being recorded and (b) recording themleads to a snowball effect where many other syscalls mustalso be recorded to avoid desynchronisation

We have added syscall support to tsan11rec in a demand-driven manner by using strace to identify important callsfor key applications of interest The current set of syscallssupported includes read write recvmsg recv sendmsgaccept accept4 clock_gettime ioctl select and bindThese have allowed us to get a significant number of appli-cations up and running including the applications studiedin our evaluation modulo a few workarounds (detailed insect5) these applications issue many additional system callsthat we have found it unnecessary to recordmdashunnecessaryin the sense that simply re-issuing the system call during re-play has no observable effect on the applicationrsquos behaviourSometimes whether a call must be recorded depends on thefile descriptors that the call receives For instance for all ofour case studies it never proves necessary to record readand write calls whose file descriptors correspond to filesin the file system but it is necessary to record these callsif the associated file descriptors are associated with pipesused for inter-process communication Rather than this set ofsyscalls being a starting point towards full syscall coverageour view is that efficient record and replay that preserves par-allelism can benefit from selective syscall recording basedon application-specific knowledge and we envision the toolsupporting a core set of essential syscalls and being con-figurable with support for further syscalls to suit particularrecord and replay scenarios For example to handle a pro-gram such as htop would require instrumentation of theinteraction with the proc filesystem but doing this in thegeneral case would be wasteful and maybe even harmful iffuture calls depended on this interaction

45 Asynchronous EventsAsynchronous events are specific events that do not fit inwith any of the other categories discussed An importantcharacteristic is that they are not wrapped in a Wait() andTick() either because it was infeasible to do so duringrecording or because it would create a lot of unnecessaryoverhead These events still need to be replayed to ensurethe replay remains synchronised Currently there are twotypes of events Reschedule and Signal_wakeup

The reschedule event was discussed in sect33 It is necessaryto include this to ensure that the PRNG will be called the

583

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Figure 7 Right shows how reschedules are floated abovethe Tick()

same number of times in each critical section To see whythe signal wakeup event is necessary consider again a signalbeing received in the context of the example of Figure 2This time suppose the Responder thread receives the signalwhile it is disabled trying to acquire the lock Assume thatthe thread disabled itself on tick 10 the signal arrives andthe thread re-enables itself during tick 12 and then entersthe signal handler on tick 14 It is not OK for the thread tosimply not disable itself on tick 10 during replay as the poolof enabled threads for the scheduler to choose from duringticks 10 and 11 is different between recording and replayingwhich will affect the choice the scheduler will make

As with signals all asynchronous events are replayedsynchronously with all events that occur between a Tick()and the following Wait() floating up to the previous Tick()This is shown in Figure 7 These events are stored in theASYNC file

5 EvaluationTo evaluate the controlled scheduling abilities of tsan11recwe compare the strategies on the CDSchecker benchmarksuite (sect51) We then use larger applications to comparetsan11rec with rr [66] Of the few record-and-replay toolsthat are publicly available we chose to compare with rr be-cause (a) it represents the state-of-the-art (b) it is similar totsan11rec in terms of the kinds of applications it aims to sup-port We consider programs that bring challenges related tonetworking signals IO and real-time constraints Apachersquoshttpd web server (sect52) the PARSEC benchmarks and pbzip(sect53) and two first-person shooter games built on the SDLlibrary (sect54) The SDL case studies showcase applicationsthat tsan11rec can handle that are out of scope for rr dueto communication between the game and the OpenGL in-terface In contrast we also discuss practical limitations oftsan11rec that rr does not face related to the SQLite databaseapplication and Firefoxrsquos SpiderMonkey (sect55)

Throughout we describe howwe evolved the sparse record-ing facilities of tsan11rec and many practical challenges wefaced along the way our experience is that such challengeswhich seem fundamental to record and replay are typicallydescribed only briefly if at all in the literature We hope ourexposition will be valuable to other researchers

Common experimental setup All experiments were rununder Ubuntu 1404 LTS on an Intel i7-4770 8x340GHz plat-form with 16GB RAM The tsan11 and tsan11rec tools werebuilt on top of clang revision 286384 The version of rr usedis 510 As a key goal of our work is to apply race detectionto record and replay with controlled concurrency testingmost of the testing is done with race detection enabled evenwhen using rr We still show times for rr without race detec-tion for reference We use native rr tsan11 tsan11+rr andtsan11rec to refer to a program running without instrumen-tation under rr with tsan11 instrumentation under rr withtsan11 instrumentation and under tsan11rec respectively

51 CDSchecker Litmus TestsOverview The small programs (roughly 100 LOC each)used in prior work to evaluate CDSchecker [64] are useful toassess whether tsan11recrsquos controlled scheduling improveson tsan11rsquos ability to find races (including races related toweak memory) As these programs are closed the schedulerand memory model are the sources of nondeterminism

Experimental setup The experiments are run in the fol-lowing four modes tsan11 where tsan11 (which does notuse controlled concurrency testing) finds races tsan11 + rrwhere tsan11 finds races with rr recording and tsan11recrnd and tsan11rec queue where tsan11rec finds races usingthe random and queue strategies respectively We measurethe runtime of each tool on each benchmark averaged over1000 runs reporting standard deviation and remarking onthe coefficient of variation (CV)mdashthe ratio of the standarddeviation and mean

Results and discussion Table 1 summarises the resultsThe Time columns show mean execution times with stan-dard deviation Because these are short-running tests whosebehaviour depends intimately on themanner inwhich threadsinterleave the variance across runs is fairly high with theCV usually exceeding 1 with the exception of the longer-running results for rr for which the CV is always less than 1and usually less than 05 Rate columns show the percentageof all executions that exposed a data race

Comparing the tsan11rec rnd results with the tsan11 andtsan11rec queue results we see that randomized controlledschedulingmeans tsan11rec findsmore races across all bench-marks except chase-lev-deque and dekker-fences This isbecause tsan11 runs at the mercy of the OS scheduler whichtends to explore similar schedules on repeated runs and inthese small programs typically causes the main thread torun to completion before other threads are scheduled Theprice for this is higher runtime eg mcs-lock and ms-queuesuffer slow-downs of around 2times compared with tsan11 weattribute this to the total ordering of visible operations im-posed by tsan11rec The rr results show huge increases dueto a constant overhead applied to all programs But as rr is

584

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 1 Comparison over CDSchecker benchmarks between tsan + rr tsan11 and tsan11rec with controlled random andqueue scheduling Each benchmark was executed 1000 times in each mode The ldquoTimerdquo columns shows the mean executiontime (ms) and standard deviation (in parentheses) The ldquoRaterdquo column shows the percentage of runs that exhibited data races

tsan11 + rr tsan11 tsan11rec rnd tsan11rec queueTest Time Rate Time Rate Time Rate Time Rate

barrier 590 (1445) 00 8 (914) 00 4 (523) 375 6 (499) 00chase-lev-deque 579 (16669) 05 0 (194) 59 1 (307) 02 2 (378) 00dekker-fences 1776 (120854) 499 2 (402) 503 4 (483) 387 3 (446) 528linuxrwlocks 579 (12645) 03 2 (418) 01 4 (493) 624 3 (450) 00mcs-lock 574 (1371) 00 3 (445) 00 5 (523) 770 3 (447) 01mpmc-queue 574 (1371) 00 3 (449) 00 5 (500) 605 3 (446) 00ms-queue 3093 (10019) 1000 91 (6413) 1000 93 (8068) 1000 52 (6260) 1000

designed for larger applications this overhead will usuallybecome insignificant in more realistic examples

We examined a trace from chase-lev-deque to understandwhy tsan11rec rnd detects fewer races than tsan11 We foundthat from the creation of thread 2 to the point of the racethread 1 must perform 29 operations before thread 2 per-forms just 4 operations in order for the race to manifest Theprobability of this happening under uniform random sched-uling is very low We were able to coerce a race report outof the program by moving the creation of thread 2 to later inthe program This shows that different scheduling strategieswill affect how effective we are at finding data races andthat probabilistic concurrency testing (PCT) can be effectiveat prying out concurrency bugs [12]

52 httpdOverview Apachersquos httpd [3] is a widely-used modularhttp server that makes heavy use of concurrency to han-dle many simultaneous connections For record and replayit is of further interest due to its dependence on externalnetwork input We were able to handle httpd by capturingthe system calls described in sect44 with one workaroundthe accept system call which listens for incoming connec-tions relies on epoll_wait to listen for events This returnsuser-allocated pointers file descriptors and other data ina union with no easy way of knowing the active membersomething which tsan11rec cannot currently handle Weworked around this by using httpdrsquos option to switch to asimpler but slightly less efficient syscall poll which insteadsimply listens to file descriptors the results presented hereemploy this workaround A strength of rr is that it can han-dle httpd without this workaround due to its non-sparserecord and replay mechanism

Experimental setup We tested httpd version 2428 in single-process-multiple-thread mode using ab an Apache-providedprogram for server stress testing We sent 10000 queriesacross 10 concurrent threads to an httpd server for eachof the setups shown in Table 2 averaging results over 10runs We report on standard deviation and again remark on

the CV In the table rnd and queue refer to configurationsof tsan11rec and the presence or absence + rec indicateswhether recording was enabled

Results and discussion The results are shown in Table 2The columns under Race reports show regular results withrace reporting enabled the data under No reports shows re-sults where race-checking-capable tools do perform racechecking behind the scenes but do not actually emit racereports We make this distinction because tsan11 detectsso many races that the overhead of generating reports no-ticeably affects performance results when fewer races aredetected are more representative of the performance onewould expect using a future version of httpd in which manyraces are fixed The Throughput columns indicate the meannumber of queries the server responds to per second TheRate column is the mean number of race reports generated(only relevant for tsan11-based configurations) For eachstandard deviation is shown in parentheses Variance asmeasured by CV is low below 08 in all cases and usuallyless than 05 The Overhead columns indicate how muchslower performance is compared with native executionWithout reporting tsan11 already incurs a 3times overhead

compared to native Comparing results for native with rndand rnd+rec we find that adding controlled random sched-uling increase this overhead massively to between 79ndash89timesdepending on whether recording is enabled This is in thesame ball park as the overhead associated with rr 61times with-out race checking and 160timeswith tsan11 instrumentation (butstill with the actual reporting of races disabled) In contrastwhen our queue strategy is used the overhead comparedwith native drops to 9times and 21times with recording disabled vsenabled We attribute the gap between rrrnd and queue tohttpdrsquos heavy reliance on parallelism and frequent use ofshared mutexes This parallelism is removed by rr becausethe tool sequentializes the execution of threads while ourrandom scheduler only allows the thread that it has chosento be scheduled next to execute a visible operation even ifmany other threads are ready to execute visible operations

585

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Table 2 Comparison of throughput and race rate between native tsan11 rr and tsan11rec for Apachersquos httpd Results areaveraged over 10 runs ldquoThroughputrdquo shows mean throughput in queries per second ldquoRaterdquo is the number of races detectedper run (where relevant) Standard deviations are shown (in parentheses) Overhead is computed relative to native throughput

Race reports No reportsSetup Throughput Overhead Rate Throughput Overhead

Native NA NA NA 28895 (462256) 1timesrr NA NA NA 475 (608) 61timestsan11 3687 (29428) 8times 113 (1985) 9824 (143201) 3timestsan11 + rr 86 (6320) 336times 34 (1680) 181 (4637) 160timesrnd 141 (892) 205times 162 (3878) 367 (3326) 79timesqueue 818 (31033) 35times 381 (7362) 3261 (84367) 9timesrnd + rec 142 (1185) 203times 155 (3103) 326 (3894) 89timesqueue + rec 513 (8534) 56times 360 (6428) 1387 (24961) 21times

In contrast the queue strategy allows threads to perform visi-ble operations largely on demand Turning to the results withrace reporting enabled we see that the queue strategy hasthe highest race detection rate improving on uncontrolledtsan11 All other race detecting configurations lower the racedetection rate we believe this is because rr and rnd reducethe number of queries being responded to concurrently

Comparing demo file sizes when recording is enabled thetsan11rec demo files are around 48MB for both strategiesdropping to 48MB when only 1000 queries are issued sug-gesting that demo file size increases linearly with the numberof requests at a rate of around 48KB per request This couldbe reduced further with a more aggressive compression strat-egy but would likely increase the time overhead The demofile for rr is significantly smaller 66MB for 10000 querieswhich goes down to 39MB with 1000 queries implying arate of around 03KB per request plus a constant 36MB

53 PARSEC and pbzipOverview Wenext turn to the PARSEC benchmark suite [11]and pbzip application [68] both widely used for evaluatingconcurrency analysis tools For PARSEC we consider thebenchmarks used to evaluate iReplayer [53] however threeof these would not work on our system (dedup and swaptionsdo not compile and canneal crashes)

Experimental setup Each PARSEC (version 30) bench-mark was run with the lsquosimlargersquo test size shipped with thebenchmarks using 4 threads Pbzip (version 2-1113) wasused to compress a 400MB file with 4 threads We ran eachbenchmark 10 times per tool configuration and report av-erage runtimes We report on standard deviation and againremark on the CV A small number of races were discoveredfor some benchmarks and the race detecting tools largelyagreed on the number of races we do not detail these further

Results and discussion Table 3 shows the average timetaken to run each benchmark with each tool configurationwith standard deviation Variance as measured by CV is

reasonably low (CV is always below 1) For the tsan11rec re-sults + rec indicates whether recording was enabled Table 4is computed from the data of Table 3 and reports the over-head associated with running using each tool configurationcompared with native executionWith the exception of bodytrack and fluidanimate the

overhead tsan11rec brings over that of tsan11 is small and forall benchmarks whether recording is enabled or notmakes lit-tle difference However the overhead associated with tsan11+ rr (ie running tsan11-instrumented code under rr) is signif-icant compared with the tsan11 overhead alone despite thefact that running under rr without race detection is generallyefficient Interestingly rr without race detection performsless well on blackscholes compared with the tsan11rec con-figurations Digging into this we found that the benchmarkdistributes work between threads at the start of executionand then lets threads runwith little interaction This high par-allelismlow communication execution plays to the strengthsof tsan11rec where invisible operations are left to run in par-allel but is bad for rr which forces sequentialization acrossall operations

54 SDL-based GamesOverview Simple DirectMedia Layer (SDL) is a library thatconsolidates input graphics and various other forms of IOunder a single interface [75] and is typically used for gamesOnUbuntu 1604 SDL communicateswith X11 for IO pulseau-dio for sound andOpenGL for displayWe investigated recordand replay for two SDL-based games Zandronum [73] amulti-player Doom port (asymp400kLOC) and QuakeSpasm [71](asymp88kLOC) a port of Quake While these games support cus-tom record and replay by logging high level commands byworking at the threading and system call level tsan11rec canfacilitate record and replay of bugs that rely on low-levelinteractions to manifest We discuss below successful recordand replay of a bug in Zandronum that arises due to commu-nication of game data between the game client and serverwhich is not present in the gamersquos native replay

586

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 3 Execution times (s) for tool configurations across the pbzip and PARSEC benchmarks averaged across 10 runsStandard deviation is shown (in parentheses)

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 92 (031) 117 (049) 664 (311) 772 (187) 181 (030) 123 (066) 182 (030) 129 (129)blackscholes 04 (003) 08 (007) 10 (001) 20 (000) 07 (007) 07 (006) 07 (007) 07 (007)fluidanimate 08 (004) 160 (127) 21 (001) 385 (049) 464 (339) 390 (267) 504 (209) 398 (196)streamcluster 17 (019) 388 (441) 1133 (313) 1811 (179) 1030 (059) 485 (269) 1028 (018) 416 (261)bodytrack 05 (002) 72 (036) 38 (094) 327 (094) 500 (040) 73 (033) 500 (078) 78 (023)ferret 12 (007) 140 (086) 87 (044) 815 (220) 164 (066) 146 (060) 167 (071) 145 (037)

Table 4Overhead compared with native execution for tool configurations across the pbzip and PARSEC benchmarks computedfrom the results of Table 3

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 10times 13times 72times 84times 20times 13times 20times 14timesblackscholes 10times 20times 27times 53times 19times 19times 19times 18timesfluidanimate 10times 203times 27times 489times 590times 497times 642times 506timesstreamcluster 10times 224times 654times 1045times 595times 280times 593times 240timesbodytrack 10times 135times 72times 614times 938times 138times 939times 147timesferret 10times 119times 74times 695times 140times 125times 143times 124times

Our initial attempts to replay these SDL-based gamesfailed due to communication between the application andthe closed and proprietary NVIDIA OpenGL module on ourexperimental platform via ioctl syscallsWeworked aroundthis by ignoring ioctl during recording and letting it runnatively during replay This works because communicationwith the display driver has no effect on the game logic Dis-play interaction led to further problems with initialization ofthe input module We resorted to adapting the scheduler tolet the application run uninstrumented until SDL module ini-tialization had completed adding a custom scheduler hookto allow the application and scheduler to synchronize relatedto this These problems are not specific to our approach ortoolmdashindeed rr cannot handle these SDL-based games forsimilar reasonsmdashbut are rather a fundamental limitation ofrecording and replaying applications that make heavy useof IO To handle such applications one either needs to fullymock out IO components requiring a tremendous engi-neering effort or carefully determine those components thatshould not be instrumented and specify this via annotationsWith these workarounds we were able to handle both

games such that gameplay is displayed on screen during re-play gameplay would not be visible if the IO subsystem hadbeen mocked out and visibility might be useful in debuggingproblems that manifest as visual artifacts

Experimental setup Measuring game performance in away that allows us to compare the overhead of our sched-uling strategies is non-trivial The only metric we have isthe frame-rate (fps)mdashthe number of frames drawn to thescreen per second QuakeSpasm and Zandronum are capped

at 60 fps and will try to maintain this frame-rate dipping ifthey cannot keep up If the frame-rate is reduced too muchthe games become unplayable As a best-effort evaluationmechanism we report on whether the games are playableunder various tool combinations Additionally we found thatit was possible to remove the frame cap for Quakespasmso we report ball-park figures for the overheads of varioustool configurations when playing this game un-capped (Wecould not find a way to reliably remove the frame cap forZandronum) As mentioned previously we do not comparewith rr as it cannot record or replay the games We used Zan-dronum revision 10013dd3c3b57023f updated to use SDL2QuakeSpasm version 0930 and SDL version 205

Results anddiscussion With the random tsan11rec sched-uler Zandronum was unplayable even with recording dis-abled the frame-rate dropped to below 1 fps This is due tothe random scheduler starving the main thread by frequentlyscheduling other less critical threads (eg the audio thread)In contrast the queue scheduler could maintain the full 60fps with recording enabled for 100 seconds of play the demosize grew to just under 8MB of which 65MB was for syscalls

To test tsan11recrsquos ability to replay network communica-tion we found a previously fixed Zandronum bug [88] thatrelies on an error in this communication to manifest Thisbug involves incorrect game state information being sentfrom the server to the client during a map change We repli-cated the bug with a server and two clients one of whichwas recording After about 12 minutes the bug appearedand resulted in a demo file of 43MB We then replayed thedemo and the bug appeared as expected This demonstrates

587

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

that our tool can be used to accurately capture and facilitatereplay of bugs in large networked applicationsFor QuakeSpasm we found that it was possible to play

the game without dropping below 60 fps using tsan11 and alltsan11rec configurations To further investigate the overheadof each tool configuration on this case study we removedthe fps cap then played the game 5 times per tool configu-ration for 90 seconds per play enabling a mode where thegamersquos fps is periodically appended to a file We made a besteffort to play the game in a similar manner on each run butinevitably there will still be high variation in game activitybetween plays Indicative results are shown in Table 5 Thernd and queue configurations refer to tsan11rec with the ran-dom and queue strategies respectively and with recordingdisabled while the ldquo+ recrdquo tool configurations are similarbut with recording enabled The ldquoOverheadrdquo column showsthe overhead observed compared with native execution Thetake-away from these results is that the instrumentationoverhead for both tsan11 and tsan11rec is surprisingly mod-est (generally less than 2times) and that the additional overheadassociated with enabling recording in tsan11rec is low

55 Limitations SQLite and SpiderMonkeyA downside of our sparse approach to record and replay isthat different applications may have incompatible require-ments regarding what should be recorded and what mustnot be recorded For example recording memory layout andattempting to enforce the same layout on replay would notonly slow down the SDL games (see sect54) to the point ofbeing unplayable but would also cause problems related tocommunication with the display driver Yet the behaviour ofsome programs will depend on the memory layout such asiterating over an ordered C++ container that holds pointersIn particular we experimented with applying tsan11rec

to the SQLite database management library [76] and to Spi-derMonkey Firefoxrsquos JavaScript management engine [60]While tsan11rec was applicable for controlled schedulingof these applications we found that replay would rapidlydesynchronise due to memory layout nondeterminism caus-ing conditionals that rely on the values of pointers to evaluatedifferently during replay Tools such as rr can handle theseprograms reliably by enforcing the same memory layoutThis is a trade-off the non-sparse approach of rr can leadto higher overheads as demonstrated in sect52 and sect53 Analternative to adapting the record-and-replay tool so thatit always enforces memory layout determinism would beto adapt the application of interest so that default memoryallocation is replaced with a deterministic memory allocator

6 Related WorkControlled scheduling A large amount of work has goneinto the use of scheduling strategies as a form of state spaceexploration (eg [29 30 61 62 78 87]) and on techniques

aimed at reducing the size of the state space such as dynamicpartial-order reduction [31 89]) A particularly notable con-trolled scheduling tool in terms of successful practical ap-plication is Microsoftrsquos CHESS [62] which aims to system-atically explore all interleavings of a test scenario Similarto our approach each visible instruction has an associatedcustom wrapper that intercepts the real instruction callinginto the CHESS schedulerSchedule bounding techniques notably preemption- and

delay-bounding [26 61] have been shown to be successful inprioritising the order in which thread schedules are exploredduring controlled concurrency testing They prioritise ex-ploring schedules that exhibit small numbers of preemptionsbetween threads in line with empirical evidence that bugsrarely require large numbers of preemptions in order to man-ifest [56] Combining such techniques with the tsan11recalgorithm is an appealing idea in principle but is hinderedby the assumption that the program under test takes a fixedinput and that the scheduler is the only source of nonde-terminism This assumption allows running the programagain and again trying different schedules In the contextof tsan11rec which can be used to record and replay ap-plications where the environment presents other forms ofnondeterminism the manner in which the program interactswith its environment is captured with respect to a particularthread schedule and other thread schedules might involvecompletely different environmental interactions We believea more promising approach would be to bring ideas from theprobabilistic concurrency testing (PCT) algorithm [12] to thetsan11rec setting to introduce a degree of skewing to ourrandom strategy so that it explores more diverse schedules

Record and replay Record and replay has been a signifi-cant area of research with many tools being created to facil-itate it [2 9 19 24 32 36 39 40 43ndash45 47 50 53 54 57 5765ndash67 72 79 81] The general premise behind them is simi-lar identify order nondeterminism and input nondeterminismand create techniques to capture them while recording andcontrol them during replayVarious tools extend the OS in some way or require spe-

cific hardware [2 6 9 19 20 24 47 50 79 81] This has thebenefit of giving the tool access to much more of the systemsuch as memory pages and process information For exam-ple Scribe [47] will directly modify the system schedulerinstead of coercing it and achieves slowdowns as low as105times However this severely hits the usability of the toolas it requires the user to deploy a modified OS

Other tools reside entirely in user space [10 15 32 34 43ndash45 49 51 53 54 57 66 67 72 80 84] and trade performancefor usability This is the category that tsan11rec falls intoEase of use is particularly important in persuading users toadopt the tool rr [66] in particular allows the user to recorda program by simply passing the binary to rr as a parameter

588

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 5 Indicative frames per second (fps) result captured by playing Quakespasm for 90 seconds five times per toolconfiguration and capturing the fps reports recorded by the game Frame-rate results are reported for the minimum maximummedian and mean case as well as the 25th and 75th quartiles

Min 25th Median 75th Max Mean OverheadNative 291 369 400 428 502 4004 10timestsan11 125 193 225 275 431 2338 17timesrnd 128 205 238 281 421 2471 16timesqueue 84 188 233 273 435 2333 17timesrnd + rec 113 178 212 253 366 2168 18timesqueue + rec 86 161 185 227 348 1930 21times

and as such has become the definitive tool for record and re-play We have performed an extensive comparison with rr insect5 and note that while rr outclasses tsan11rec in some appli-cations and can handle applications that are out of scope fortsan11rec (see sect55) rr shows significantly higher overheadcompared with tsan11rec for a number of applications thatrely on a high degree of parallelism for performance Furtherour sparse approach with suitable workarounds enablesrecord and replay for graphical applications (the SDL-basedgames of sect54) that rr cannot currently handleBecause it builds on tsan11 which itself uses compiler

instrumentation and a modified libcxx tsan11rec shares sim-ilarities with tools that depend on language implementationor library-level support [1 13 13 18 36 58] Notable exam-ples here include R2 [36] and IntelliTrace [58]Whole system replay aims to record all system nonde-

terminism [14 19 22 24 25 27 53 77] Among these therecent iReplayer tool [53] performs record and replay in-situavoiding many of the problems (eg memory layout issues)that otherwise come from running the record and replayexecutions under different processes

Some tools focus on the order-nondeterminism allowingthem to retain their parallelism and thus reducing the over-head of multi-threaded application [25 42 59 63 70 86]Castor [57] will provide each thread with its own bufferfor storing information and serialize them at a later timetsan11rec also fits into this category as it will both preserverparallelism of invisible operations and apply a schedulingstrategy to resolve this nondeterminism

An alternative to recording a programrsquos nondeterminismis to remove it making some or all aspects of the programdeterministic [5 8 16 17 21 55] For example Dthreads [55]ensures that memory accesses are deterministic on eachexecution Such approaches can have a significant probeproblem by removing the behaviour necessary for certainbugs to manifest in return for avoiding the performanceoverhead associated with handling order-nondeterminism

Multi-version execution Multi-version (or multi-variant)execution (MVE) is a method for concurrently running mul-tiple processes that are expected to behave in a semanticallysimilar manner [41 46 69 82 83] MVE can be used to detect

security vulnerabilities in applications if a variant divergesthis could indicate that an attacker has modified the processin some way [46 82 83] It can also be used for running dif-ferent analyses on identical processes that would not workwhen run together on the same process such as the clangsanitizers [69] Most MVE systems hinge on a special moni-tor thread that controls the generation an maintenance ofa number of variants Keeping the variants in sync with re-spect to nondeterministic behaviours presents many of thesame problems that are associated with record and replay

7 ConclusionWe have presented tsan11rec which brings together con-trolled scheduling record and replay and dynamic data racedetection for the dynamic analysis of CC++11 applicationsOur experimental evaluation demonstrates that the tool isin many cases competitive in terms of performance withrr a state-of-the-art record and replay tool in some casesout-performing rr due to tsan11recrsquos ability to preserve par-allelism in applications under test to a high degree We havealso shown that our tool is capable of recording and replayingSDL-based video games by exploiting our sparse approachto avoid recording aspects of gamedisplay communicationthat are fundamentally hard to control The flip side of oursparse approach is that by limiting what is recorded ourtool desynchronises on uncontrolled forms of nondetermin-ism such as that related to memory layout by capturingthis form of nondeterminism tools such as rr do not sufferfrom this problem Two exciting avenues for future workinclude investigating a spectrum of recording granularitiesto bridge the gap between our sparse approach and stricterapproaches in a configurable manner and to investigatebug detection using a richer range of scheduling strategiesincluding schedule bounding [26 61 62] and probabilisticconcurrency testing [12]

AcknowledgmentsThis work was supported by a PhD studentship funded byGCHQ and EPSRC projects EPN0263141 and EPR0068651

589

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

References[1] Bowen Alpern Jong-Deok Choi Ton Ngo Manu Sridharan and

John M Vlissides 2001 A Perturbation-Free Replay Platform forCross-Optimized Multithreaded Applications In Proceedings of the15th International Parallel amp Distributed Processing Symposium (IPDPS-01) San Francisco CA USA April 23-27 2001 IEEE Computer Society23 httpsdoiorg101109IPDPS2001924957

[2] GautamAltekar and Ion Stoica 2009 ODR output-deterministic replayfor multicore debugging In Proceedings of the 22nd ACM Symposiumon Operating Systems Principles 2009 SOSP 2009 Big Sky Montana USAOctober 11-14 2009 Jeanna Neefe Matthews and Thomas E Anderson(Eds) ACM 193ndash206 httpsdoiorg10114516295751629594

[3] Apache Software Foundation 2018 Apache httpd httpshttpdapacheorgdevdevnoteshtml

[4] Remzi H Arpaci-Dusseau and Brad Chen (Eds) 2010 9th USENIXSymposium on Operating Systems Design and Implementation OSDI2010 October 4-6 2010 Vancouver BC Canada Proceedings USENIXAssociation httpwwwusenixorgeventosdi10techfull_papersosdi10_proceedingspdf

[5] Amittai Aviram Shu-Chun Weng Sen Hu and Bryan Ford 2010 Ef-ficient System-Enforced Deterministic Parallelism See [4] 193ndash206httpwwwusenixorgeventsosdi10techfull_papersAvirampdf

[6] David F Bacon and Seth Copen Goldstein 1991 Hardware-assisted Re-play of Multiprocessor Programs In Proceedings of the 1991 ACMONRWorkshop on Parallel and Distributed Debugging (PADD rsquo91) ACMNew York NY USA 194ndash206 httpsdoiorg101145122759122777

[7] Mark Batty Scott Owens Susmit Sarkar Peter Sewell and TjarkWeber2011 Mathematizing C++ concurrency In POPL 55ndash66

[8] Tom Bergan Owen Anderson Joseph Devietti Luis Ceze and DanGrossman 2010 CoreDet a compiler and runtime system for deter-ministic multithreaded execution See [38] 53ndash64 httpsdoiorg10114517360201736029

[9] Tom Bergan Nicholas Hunt Luis Ceze and Steven D Gribble 2010Deterministic Process Groups in dOS See [4] 177ndash191 httpwwwusenixorgeventsosdi10techfull_papersBerganpdf

[10] Sanjay Bhansali Wen-Ke Chen Stuart de Jong Andrew EdwardsRon Murray Milenko Drinić Darek Mihočka and Joe Chau 2006Framework for Instruction-level Tracing and Analysis of Program Ex-ecutions In Proceedings of the 2Nd International Conference on VirtualExecution Environments (VEE rsquo06) ACM New York NY USA 154ndash163httpsdoiorg10114511347601220164

[11] Christian Bienia Sanjeev Kumar Jaswinder Pal Singh and Kai Li 2008The PARSEC benchmark suite characterization and architectural im-plications In 17th International Conference on Parallel Architecture andCompilation Techniques PACT 2008 Toronto Ontario Canada October25-29 2008 Andreas Moshovos David Tarditi and Kunle Olukotun(Eds) ACM 72ndash81 httpsdoiorg10114514541151454128

[12] Sebastian Burckhardt Pravesh Kothari Madanlal Musuvathi and San-tosh Nagarakatte 2010 A randomized scheduler with probabilisticguarantees of finding bugs See [38] 167ndash178 httpsdoiorg10114517360201736040

[13] Brian Burg Richard Bailey Andrew J Ko and Michael D Ernst 2013Interactive recordreplay for web application debugging In The 26thAnnual ACM Symposium on User Interface Software and TechnologyUISTrsquo13 St Andrews United Kingdom October 8-11 2013 ShahramIzadi Aaron J Quigley Ivan Poupyrev and Takeo Igarashi (Eds) ACM473ndash484 httpsdoiorg10114525019882502050

[14] Anton Burtsev David Johnson Mike Hibler Eric Eide and John Regehr2016 Abstractions for Practical Virtual Machine Replay In Proceedingsof the 12th ACM SIGPLANSIGOPS International Conference on VirtualExecution Environments Atlanta GA USA April 2-3 2016 VishakhaGupta-Cledat Donald E Porter and Vivek Sarkar (Eds) ACM 93ndash106httpsdoiorg10114528922422892257

[15] M E Chastain 1999 MEC (January 1999) httpslwnnet19990121amechtml

[16] Heming Cui Jiriacute Simsa Yi-Hong Lin Hao Li Ben Blum Xinan XuJunfeng Yang Garth A Gibson and Randal E Bryant 2013 Parrota practical runtime for deterministic stable and reliable threads InACM SIGOPS 24th Symposium on Operating Systems Principles SOSP rsquo13Farmington PA USA November 3-6 2013 Michael Kaminsky and MikeDahlin (Eds) ACM 388ndash405 httpsdoiorg10114525173492522735

[17] Heming Cui Jingyue Wu John Gallagher Huayang Guo and JunfengYang 2011 Efficient deterministic multithreading through schedulerelaxation See [85] 337ndash351 httpsdoiorg10114520435562043588

[18] P Deva 2018 Chronon (2018) httpchrononsystemscomblogdesign-and-architecture-ofthe-chronon-record-0

[19] David Devecsery Michael Chow Xianzheng Dou Jason Flinn andPeter M Chen 2014 Eidetic Systems In 11th USENIX Symposium onOperating Systems Design and Implementation OSDI rsquo14 BroomfieldCO USA October 6-8 2014 Jason Flinn and Hank Levy (Eds) USENIXAssociation 525ndash540 httpswwwusenixorgconferenceosdi14technical-sessionspresentationdevecsery

[20] Joseph Devietti Brandon Lucia Luis Ceze andMark Oskin 2009 DMPdeterministic shared memory multiprocessing In Proceedings of the14th International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2009 Washington DC USAMarch 7-11 2009 Mary Lou Soffa and Mary Jane Irwin (Eds) ACM85ndash96 httpsdoiorg10114515082441508255

[21] Joseph Devietti Jacob Nelson Tom Bergan Luis Ceze and Dan Gross-man 2011 RCDC a relaxed consistency deterministic computer See[37] 67ndash78 httpsdoiorg10114519503651950376

[22] Brendan Dolan-Gavitt Josh Hodosh Patrick Hulin Tim Leek andRyan Whelan 2015 Repeatable Reverse Engineering with PANDAIn Proceedings of the 5th Program Protection and Reverse EngineeringWorkshop (PPREW-5) ACM New York NY USA Article 4 11 pageshttpsdoiorg10114528438592843867

[23] Richard Draves and Robbert van Renesse (Eds) 2008 8th USENIX Sym-posium on Operating Systems Design and Implementation OSDI 2008December 8-10 2008 San Diego California USA Proceedings USENIXAssociation httpswwwusenixorgpublicationsproceedingsf[0]=im_group_audience3A73

[24] George W Dunlap Samuel T King Sukru Cinar Murtaza A Bas-rai and Peter M Chen 2002 ReVirt Enabling Intrusion AnalysisThrough Virtual-Machine Logging and Replay In 5th Symposium onOperating System Design and Implementation (OSDI 2002) Boston Mas-sachusetts USA December 9-11 2002 David E Culler and Peter Dr-uschel (Eds) USENIX Association httpwwwusenixorgeventsosdi02techdunlaphtml

[25] George W Dunlap Dominic G Lucchetti Michael A Fetterman andPeter M Chen 2008 Execution replay of multiprocessor virtual ma-chines In Proceedings of the 4th International Conference on VirtualExecution Environments VEE 2008 Seattle WA USA March 5-7 2008David Gregg Vikram S Adve and Brian N Bershad (Eds) ACM121ndash130 httpsdoiorg10114513462561346273

[26] Michael Emmi Shaz Qadeer and Zvonimir Rakamaric 2011 Delay-bounded scheduling In Proceedings of the 38th ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages POPL 2011 AustinTX USA January 26-28 2011 Thomas Ball and Mooly Sagiv (Eds)ACM 411ndash422 httpsdoiorg10114519263851926432

[27] Jakob Engblom Daniel Aarno and Bengt Werner 2010 Full-SystemSimulation from Embedded to High-Performance Systems Springer USBoston MA 25ndash45 httpsdoiorg101007978-1-4419-6175-4_3

[28] Cormac Flanagan and Stephen N Freund 2009 FastTrack efficientand precise dynamic race detection In Proceedings of the 2009 ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI 2009 Dublin Ireland June 15-21 2009 Michael Hind andAmer Diwan (Eds) ACM 121ndash133 httpsdoiorg1011451542476

590

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

1542490[29] Cormac Flanagan and Stephen N Freund 2010 Adversarial memory

for detecting destructive races In Proceedings of the 2010 ACM SIGPLANConference on Programming Language Design and Implementation PLDI2010 Toronto Ontario Canada June 5-10 2010 Benjamin G Zornand Alexander Aiken (Eds) ACM 244ndash254 httpsdoiorg10114518065961806625

[30] Cormac Flanagan and Stephen N Freund 2010 The RoadRunnerdynamic analysis framework for concurrent programs In Proceedingsof the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis forSoftware Tools and Engineering PASTErsquo10 Toronto Ontario CanadaJune 5-6 2010 Sorin Lerner and Atanas Rountev (Eds) ACM 1ndash8httpsdoiorg10114518066721806674

[31] Cormac Flanagan and Patrice Godefroid 2005 Dynamic partial-orderreduction for model checking software In Proceedings of the 32nd ACMSIGPLAN-SIGACT Symposium on Principles of Programming LanguagesPOPL 2005 Long Beach California USA January 12-14 2005 JensPalsberg and Martiacuten Abadi (Eds) ACM 110ndash121 httpsdoiorg10114510403051040315

[32] Dennis Geels Gautam Altekar Scott Shenker and Ion Stoica 2006Replay Debugging for Distributed Applications (Awarded Best Paper)In Proceedings of the 2006 USENIX Annual Technical Conference BostonMA USA May 30 - June 3 2006 Atul Adya and Erich M Nahum (Eds)USENIX 289ndash300 httpwwwusenixorgeventsusenix06techgeelshtml

[33] Patrice Godefroid 2005 Software Model Checking The VeriSoftApproach Formal Methods in System Design 26 2 (2005) 77ndash101httpsdoiorg101007s10703-005-1489-x

[34] C Gottbrath 2008 Reverse debugging with the TotalView debugger(May 2008)

[35] David Grove and Steve Blackburn (Eds) 2015 Proceedings of the36th ACM SIGPLAN Conference on Programming Language Design andImplementation Portland OR USA June 15-17 2015 ACM httpdlacmorgcitationcfmid=2737924

[36] Zhenyu Guo Xi Wang Jian Tang Xuezheng Liu Zhilei Xu Ming WuM Frans Kaashoek and Zheng Zhang 2008 R2 An Application-LevelKernel for Record and Replay See [23] 193ndash208 httpwwwusenixorgeventsosdi08techfull_papersguoguopdf

[37] Rajiv Gupta and Todd C Mowry (Eds) 2011 Proceedings of the 16thInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2011 Newport Beach CAUSA March 5-11 2011 ACM

[38] James C Hoe and Vikram S Adve (Eds) 2010 Proceedings of the 15thInternational Conference on Architectural Support for Programming Lan-guages and Operating Systems ASPLOS 2010 Pittsburgh PennsylvaniaUSA March 13-17 2010 ACM

[39] Nima Honarmand and Josep Torrellas 2014 RelaxReplay recordand replay for relaxed-consistency multiprocessors In ArchitecturalSupport for Programming Languages and Operating Systems ASPLOSrsquo14 Salt Lake City UT USA March 1-5 2014 Rajeev BalasubramonianAl Davis and Sarita V Adve (Eds) ACM 223ndash238 httpsdoiorg10114525419402541979

[40] Nima Honarmand and Josep Torrellas 2014 Replay debugging Lever-aging record and replay for program debugging In ACMIEEE 41stInternational Symposium on Computer Architecture ISCA 2014 Min-neapolis MN USA June 14-18 2014 IEEE Computer Society 455ndash456httpsdoiorg101109ISCA20146853229

[41] Petr Hosek and Cristian Cadar 2015 VARAN the Unbelievable AnEfficient N-version Execution Framework In Proceedings of the Twenti-eth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS rsquo15 Istanbul Turkey March14-18 2015 Oumlzcan Oumlzturk Kemal Ebcioglu and Sandhya Dwarkadas(Eds) ACM 339ndash353 httpsdoiorg10114526943442694390

[42] Derek Hower and Mark D Hill 2008 Rerun Exploiting Episodes forLightweight Memory Race Recording In 35th International Symposiumon Computer Architecture (ISCA 2008) June 21-25 2008 Beijing China265ndash276 httpsdoiorg101109ISCA200826

[43] Jeff Huang Peng Liu and Charles Zhang 2010 LEAP lightweightdeterministic multi-processor replay of concurrent java programsIn Proceedings of the 18th ACM SIGSOFT International Symposium onFoundations of Software Engineering 2010 Santa Fe NM USA November7-11 2010 Gruia-Catalin Roman and Andreacute van der Hoek (Eds) ACM207ndash216 httpsdoiorg10114518822911882323

[44] Jeff Huang Charles Zhang and Julian Dolby 2013 CLAP recordinglocal executions to reproduce concurrency failures In ACM SIGPLANConference on Programming Language Design and Implementation PLDIrsquo13 Seattle WA USA June 16-19 2013 Hans-Juergen Boehm and Cor-mac Flanagan (Eds) ACM 141ndash152 httpsdoiorg10114524621562462167

[45] Shiyou Huang Bowen Cai and Jeff Huang 2017 Towards Production-Run Heisenbugs Reproduction on Commercial Hardware In 2017USENIX Annual Technical Conference USENIX ATC 2017 Santa ClaraCA USA July 12-14 2017 403ndash415 httpswwwusenixorgconferenceatc17technical-sessionspresentationhuang

[46] Koen Koning Herbert Bos and Cristiano Giuffrida 2016 Secure andEfficient Multi-Variant Execution Using Hardware-Assisted ProcessVirtualization In 46th Annual IEEEIFIP International Conference onDependable Systems and Networks DSN 2016 Toulouse France June28 - July 1 2016 IEEE Computer Society 431ndash442 httpsdoiorg101109DSN201646

[47] Oren Laadan Nicolas Viennot and Jason Nieh 2010 Transparentlightweight application execution replay on commodity multiproces-sor operating systems In SIGMETRICS 2010 Proceedings of the 2010ACM SIGMETRICS International Conference on Measurement and Mod-eling of Computer Systems New York New York USA 14-18 June 2010Vishal Misra Paul Barford and Mark S Squillante (Eds) ACM 155ndash166 httpsdoiorg10114518110391811057

[48] Leslie Lamport 1978 Time Clocks and the Ordering of Events ina Distributed System Commun ACM 21 7 (1978) 558ndash565 httpsdoiorg101145359545359563

[49] Dongyoon Lee Peter M Chen Jason Flinn and Satish Narayanasamy2012 Chimera hybrid program analysis for determinism In ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI rsquo12 Beijing China - June 11 - 16 2012 Jan Vitek Haibo Linand Frank Tip (Eds) ACM 463ndash474 httpsdoiorg10114522540642254119

[50] Dongyoon Lee Benjamin Wester Kaushik Veeraraghavan SatishNarayanasamy Peter M Chen and Jason Flinn 2010 Respec efficientonline multiprocessor replayvia speculation and external determinismSee [38] 77ndash90 httpsdoiorg10114517360201736031

[51] Kyu Hyung Lee Dohyeong Kim and Xiangyu Zhang 2014Infrastructure-Free Logging and Replay of Concurrent Executionon Multiple Cores In ECOOP 2014 - Object-Oriented Programming- 28th European Conference Uppsala Sweden July 28 - August 12014 Proceedings (Lecture Notes in Computer Science) Richard EJones (Ed) Vol 8586 Springer 232ndash256 httpsdoiorg101007978-3-662-44202-9_10

[52] Christopher Lidbury and Alastair F Donaldson 2017 Dynamic racedetection for C++11 In Proceedings of the 44th ACM SIGPLAN Sympo-sium on Principles of Programming Languages POPL 2017 Paris FranceJanuary 18-20 2017 Giuseppe Castagna and Andrew D Gordon (Eds)ACM 443ndash457 httpsdoiorg1011453009837

[53] Hongyu Liu Sam Silvestro Wei Wang Chen Tian and Tongping Liu2018 iReplayer in-situ and identical record-and-replay for multi-threaded applications In Proceedings of the 39th ACM SIGPLAN Confer-ence on Programming Language Design and Implementation PLDI 2018

591

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Philadelphia PA USA June 18-22 2018 Jeffrey S Foster and Dan Gross-man (Eds) ACM 344ndash358 httpsdoiorg10114531923663192380

[54] Peng Liu Xiangyu Zhang Omer Tripp and Yunhui Zheng 2015 Lightreplay via tightly bounded recording See [35] 55ndash64 httpsdoiorg10114527379242738001

[55] Tongping Liu Charlie Curtsinger and EmeryD Berger 2011 Dthreadsefficient deterministic multithreading See [85] 327ndash336 httpsdoiorg10114520435562043587

[56] Shan Lu Soyeon Park Eunsoo Seo and Yuanyuan Zhou 2008 Learn-ing from mistakes a comprehensive study on real world concurrencybug characteristics In Proceedings of the 13th International Confer-ence on Architectural Support for Programming Languages and Op-erating Systems ASPLOS 2008 Seattle WA USA March 1-5 2008Susan J Eggers and James R Larus (Eds) ACM 329ndash339 httpsdoiorg10114513462811346323

[57] Ali Joseacute Mashtizadeh Tal Garfinkel David Terei David Maziegraveresand Mendel Rosenblum 2017 Towards Practical Default-On Multi-Core RecordReplay In Proceedings of the Twenty-Second InternationalConference on Architectural Support for Programming Languages andOperating Systems ASPLOS 2017 Xirsquoan China April 8-12 2017 YunjiChen Olivier Temam and John Carter (Eds) ACM 693ndash708 httpsdoiorg10114530376973037751

[58] Microsoft 2018 Understanding IntelliTrace part I What the $ isIntelliTrace (2018) httpsblogsmsdnmicrosoftcomzainnab20130212understanding-intellitrace-part-i-what-the-is-intellitrace

[59] Pablo Montesinos Luis Ceze and Josep Torrellas 2008 DeLoreanRecording and Deterministically Replaying Shared-Memory Multi-processor Execution Effciently In 35th International Symposium onComputer Architecture (ISCA 2008) June 21-25 2008 Beijing China289ndash300 httpsdoiorg101109ISCA200836

[60] Mozilla 2018 SpiderMonkey httpsdevelopermozillaorgen-USdocsMozillaProjectsSpiderMonkey

[61] MadanlalMusuvathi and Shaz Qadeer 2007 Iterative context boundingfor systematic testing of multithreaded programs In Proceedings of theACM SIGPLAN 2007 Conference on Programming Language Design andImplementation San Diego California USA June 10-13 2007 JeanneFerrante and Kathryn S McKinley (Eds) ACM 446ndash455 httpsdoiorg10114512507341250785

[62] Madanlal Musuvathi Shaz Qadeer Thomas Ball Geacuterard Basler Pi-ramanayagam Arumuga Nainar and Iulian Neamtiu 2008 Find-ing and Reproducing Heisenbugs in Concurrent Programs See[23] 267ndash280 httpwwwusenixorgeventsosdi08techfull_papersmusuvathimusuvathipdf

[63] Satish Narayanasamy Gilles Pokam and Brad Calder 2005 BugNetContinuously Recording Program Execution for Deterministic ReplayDebugging In 32st International Symposium on Computer Architecture(ISCA 2005) 4-8 June 2005 Madison Wisconsin USA IEEE ComputerSociety 284ndash295 httpsdoiorg101109ISCA200516

[64] Brian Norris and Brian Demsky 2013 CDSchecker checking concur-rent data structures written with CC++ atomics In Proceedings ofthe 2013 ACM SIGPLAN International Conference on Object OrientedProgramming Systems Languages amp Applications OOPSLA 2013 part ofSPLASH 2013 Indianapolis IN USA October 26-31 2013 131ndash150

[65] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2016 Lightweight User-Space Record And ReplayCoRR abs161002144 (2016) arXiv161002144 httparxivorgabs161002144

[66] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2017 Engineering Record and Replay for Deploy-ability In 2017 USENIX Annual Technical Conference USENIX ATC 2017Santa Clara CA USA July 12-14 2017 377ndash389 httpswwwusenixorgconferenceatc17technical-sessionspresentationocallahan

[67] Harish Patil Cristiano Pereira Mack Stallcup Gregory Lueck andJames Cownie 2010 PinPlay a framework for deterministic replay

and reproducible analysis of parallel programs In Proceedings of theCGO 2010 The 8th International Symposium on Code Generation andOptimization Toronto Ontario Canada April 24-28 2010 AndreasMoshovos J Gregory Steffan Kim M Hazelwood and David R Kaeli(Eds) ACM 2ndash11 httpsdoiorg10114517729541772958

[68] pbzip2 development team 2018 pbzip2 httpslaunchpadnetpbzip2[69] Luiacutes Pina Anastasios Andronidis and Cristian Cadar 2018 FreeDA

deploying incompatible stock dynamic analyses in production viamulti-version execution In Proceedings of the 15th ACM InternationalConference on Computing Frontiers CF 2018 Ischia Italy May 08-102018 David R Kaeli and Miquel Pericagraves (Eds) ACM 1ndash10 httpsdoiorg10114532032173203237

[70] Gilles Pokam Klaus Danne Cristiano Pereira Rolf Kassa Tim KranichShiliang Hu Justin Emile Gottschlich Nima Honarmand Nathan Daut-enhahn Samuel T King and Josep Torrellas 2013 QuickRec proto-typing an intel architecture extension for record and replay of mul-tithreaded programs In The 40th Annual International Symposiumon Computer Architecture ISCArsquo13 Tel-Aviv Israel June 23-27 2013Avi Mendelson (Ed) ACM 643ndash654 httpsdoiorg10114524859222485977

[71] QuakeSpasm 2018 QuakeSpasm An engine for iD softwarersquos Quakehttpquakespasmsourceforgenet

[72] Yasushi Saito 2005 Jockey a user-space library for record-replaydebugging In Proceedings of the Sixth International Workshop on Auto-mated Debugging AADEBUG 2005 Monterey California USA Septem-ber 19-21 2005 Clinton Jeffery Jong-Deok Choi and Raimondas Lence-vicius (Eds) ACM 69ndash76 httpsdoiorg10114510851301085139

[73] Torr Samaho 2018 Zandronum httpszandronumcom[74] Konstantin Serebryany and Timur Iskhodzhanov 2009 ThreadSani-

tizer Data Race Detection in Practice In WBIA 62ndash71[75] Simple DirectMedia Layer 2018 SDL 20 library httpswwwlibsdl

orgdownload-20php[76] SQLite 2018 SQLite 3240 httpssqliteorgreleaselog3_24_0html[77] Deepa Srinivasan and Xuxian Jiang 2012 Time-Traveling Forensic

Analysis of VM-Based High-Interaction Honeypots In Security and Pri-vacy in Communication Networks Muttukrishnan Rajarajan Fred PiperHaining Wang and George Kesidis (Eds) Springer Berlin HeidelbergBerlin Heidelberg 209ndash226

[78] Paul Thomson Alastair F Donaldson and Adam Betts 2014 Concur-rency testing using schedule bounding an empirical study In ACMSIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPPoPP rsquo14 Orlando FL USA February 15-19 2014 Joseacute E Moreira andJames R Larus (Eds) ACM 15ndash28 httpsdoiorg10114525552432555260

[79] Joseph Tucek Shan Lu Chengdu Huang Spiros Xanthos andYuanyuan Zhou 2007 Triage diagnosing production run failuresat the userrsquos site In Proceedings of the 21st ACM Symposium on Oper-ating Systems Principles 2007 SOSP 2007 Stevenson Washington USAOctober 14-17 2007 Thomas C Bressoud and M Frans Kaashoek (Eds)ACM 131ndash144 httpsdoiorg10114512942611294275

[80] Undo 2018 Reversible debugging tools for CC++ on Linux amp Android(2018) httpundo-softwarecom

[81] Kaushik Veeraraghavan Dongyoon Lee Benjamin Wester JessicaOuyang Peter M Chen Jason Flinn and Satish Narayanasamy 2011DoublePlay parallelizing sequential logging and replay See [37] 15ndash26 httpsdoiorg10114519503651950370

[82] Stijn Volckaert Bart Coppens Bjorn De Sutter Koen De BosscherePer Larsen and Michael Franz 2017 Taming Parallelism in a Multi-Variant Execution Environment In Proceedings of the Twelfth EuropeanConference on Computer Systems EuroSys 2017 Belgrade Serbia April23-26 2017 Gustavo Alonso Ricardo Bianchini and Marko Vukolic(Eds) ACM 270ndash285 httpsdoiorg10114530641763064178

[83] Stijn Volckaert Bart Coppens Alexios Voulimeneas Andrei Home-scu Per Larsen Bjorn De Sutter and Michael Franz 2016 Secure

592

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

and Efficient Application Monitoring and Replication In 2016 USENIXAnnual Technical Conference USENIX ATC 2016 Denver CO USA June22-24 2016 Ajay Gulati and Hakim Weatherspoon (Eds) USENIXAssociation 167ndash179 httpswwwusenixorgconferenceatc16technical-sessionspresentationvolckaert

[84] YanWang Harish Patil Cristiano Pereira Gregory Lueck Rajiv Guptaand Iulian Neamtiu 2014 DrDebug Deterministic Replay based CyclicDebugging with Dynamic Slicing In 12th Annual IEEEACM Interna-tional Symposium on Code Generation and Optimization CGO 2014Orlando FL USA February 15-19 2014 David R Kaeli and TippMoseley(Eds) ACM 98 httpsdoiorg10114525441372544152

[85] Ted Wobber and Peter Druschel (Eds) 2011 Proceedings of the 23rdACM Symposium on Operating Systems Principles 2011 SOSP 2011Cascais Portugal October 23-26 2011 ACM httpsdoiorg1011452043556

[86] Min Xu Rastislav Bodiacutek and Mark D Hill 2003 A Flight DataRecorder for Enabling Full-System Multiprocessor Deterministic

Replay In 30th International Symposium on Computer Architecture(ISCA 2003) 9-11 June 2003 San Diego California USA Allan Got-tlieb and Kai Li (Eds) IEEE Computer Society 122ndash133 httpsdoiorg101109ISCA20031206994

[87] Jie Yu Satish Narayanasamy Cristiano Pereira and Gilles Pokam 2012Maple a coverage-driven testing tool for multithreaded programs InProceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications OOPSLA2012 part of SPLASH 2012 Tucson AZ USA October 21-25 2012 Gary TLeavens and Matthew B Dwyer (Eds) ACM 485ndash502 httpsdoiorg10114523846162384651

[88] Zandronum 2015 Zandronum bug tracker httpszandronumcomtrackerviewphpid=2380 Bug 0002380

[89] Naling Zhang Markus Kusano and ChaoWang 2015 Dynamic partialorder reduction for relaxed memory models See [35] 250ndash259 httpsdoiorg10114527379242737956

593

  • Abstract
  • 1 Introduction
  • 2 Background
  • 3 Scheduling
    • 31 Protocol Details
    • 32 Special Cases
    • 33 Liveness
      • 4 Record and Replay
        • 41 Motivating Example
        • 42 Interleaving
        • 43 Signals
        • 44 System Calls
        • 45 Asynchronous Events
          • 5 Evaluation
            • 51 CDSchecker Litmus Tests
            • 52 httpd
            • 53 PARSEC and pbzip
            • 54 SDL-based Games
            • 55 Limitations SQLite and SpiderMonkey
              • 6 Related Work
              • 7 Conclusion
              • Acknowledgments
              • References
Page 5: Sparse Record and Replay with Controlled Schedulingafd/homepages/papers/pdfs/2019/PLDI.pdf · Controlled concurrency testing has proven successful in find-ing subtle bugs in concurrent

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Calls to functions whose names begin with intercept_are already inserted by tsan11 to instrument visible opera-tions We have modified the implementations of these func-tions to accommodate our instrumentation

Thread management Thread creation deletion and join-ing are all treated as visible operations This is because theymust update the state of the scheduler and as such will af-fect the scheduling of threads going forward To handle thiswe introduce three scheduler functions ThreadNew(tid)ThreadJoin(tid) and ThreadDelete()

The ThreadNew and ThreadJoin functions are added dur-ing instrumentation of thread creation and joining primitivesThe ThreadNew(tid) function called by the parent of thenewly created thread enables the new thread within thescheduler The ThreadJoin(tid) function will block itselfuntil thread tid has finished and as such must instead dis-able itself in the scheduler and also mark itself as waitingon tid On completion a thread calls ThreadDelete whichinvolves (a) enabling the parent thread if it is waiting for thisthread to finish and (b) disabling itself in the scheduler Allthree operations are wrapped in a Wait() and Tick() pair

Mutexes The mutex operations trylock lock and unlock areall visible and require instrumentation Trylock can simply bewrapped in a Wait() and Tick() pair as for regular visibleoperations Unlock is similar to thread deletion in that itmust also re-enable threads that were blocked waiting onthe mutex although in this case we only need to re-enableone of the blocked threads the thread that is chosen dependson whether the queue or random strategy is being usedMutex lock poses an interesting issue in that a thread at-

tempting to acquire a mutex will block if the lock operationfails To account for this we modified the instrumented ver-sion of mutex lock to be as shown in Figure 4 This changes itto a trylock loop with a critical section associated with eachattempt at locking Note that this is the native trylock notthe instrumented version The MutexLockFail(m) functionis similar to ThreadJoin a thread calling this function dis-ables itself from the scheduler and informs the scheduler thatit is waiting on m The thread will then reenter Wait but as itis disabled it will block until it is re-enabled and then sched-uled to run The MutexUnlock(m) function is called when athread releases a mutex and will re-enable one thread thatis disabled due to waiting on m

There is no Wait nor Tick inside MutexLockFail(m) norMutexUnlock(m) Another thread can acquire the mutex be-tween a thread being re-enabled and it attempting the trylockThis is OK the thread will simply block itself again

Condition variables Condition variables allow controlover when certain threads will wake up and try to acquirea mutex When a thread initially acquires a mutex it maycheck a condition required for it to proceed and if it failsrelease the mutex and block itself via the condition variable

int intercept_mutex_lock(void m)

int res = EBUSY

while (res == EBUSY)

Wait()

res = trylock(m) native version of trylock

if (res == EBUSY)

MutexLockFail(m)

Tick()

return res

Figure 4 Instrumented mutex lock

This thread will only wake up and try to reacquire the mutexwhen another thread notifies it via the condition variableThe conditional is checked via the conditional wait functionwaking up one or all of the waiting threads is performed viathe signal and broadcast functions respectivelyThe conditional wait accepts a timer determining the

length of time after which the thread unblocks itself Thistimer represents a physical time This is in contrast to theschedulerrsquos ticker which represents a logical time This dif-ference between physical and logical means that from theperspective of the scheduler the conditionalrsquos wakeup timeris nondeterministic Semantically speaking a thread canwake up from the timer and acquire the conditionalrsquos mutexbefore another thread can even if the associated time is verylong We choose to handle this by not disabling the thread ifit calls a conditional wait with a timer Despite not being dis-abled when timed a thread can still eat a conditional signaland so should still mark itself as waiting on the conditionalin the scheduler

For the three conditional functions we provide correspond-ing CondWait(m t) CondSignal(m) and CondBroadcast(m)scheduler functions CondSignal(m) and CondBroadcast(m)simply wake up one thread and all threads waiting on m re-spectively The interaction between the instrumentation andthe scheduler for conditional signal and broadcast is sim-ple with the instrumentation simply calling the schedulerfunction in a Wait() and Tick() pair

Conditional wait is a little more involved and details areshown in Figure 5 Between the Wait and the Tick a threadinforms the scheduler that it is performing a conditional waitvia CondWait This informs the scheduler that the thread iseither blocked waiting for a signal or performing a timedconditional wait so that while not blocked it can neverthelesseat a signal The thread then releases the mutex informingthe scheduler via the MutexUnlock scheduler function de-scribed earlier that this has been done Finally the thread en-ters the intercepted version of mutex_lock described aboveBecause this starts with a Wait in the case of an untimedsignal the thread will block until it is re-enabled by a condi-tional signal or broadcast By using distinct critical sectionsto separate a thread marking itself as being blocked on a

580

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

void intercept_cond_wait(void m bool timed)

Wait()

CondWait(m timed)

mutex_unlock()

MutexUnlock(m)

Tick()

intercept_mutex_lock()

Figure 5 Instrumented conditional wait

signal and attempting to reacquire the mutex we allow thepossibility for another thread to be scheduled in betweenpossibly acquiring the mutexOnce a thread has reacquired the mutex it will typically

recheck the condition it was originally waiting on If it isnot satisfied it will call cond_wait again This is wherethe risk of deadlock comes in as if it was the only threadsignaled and it reenters intercept_cond_wait it will notsignal other threads first and all threads that are blockedby the conditional will remain blocked We want preserveany potential deadlocks in the underlying program and sodo not limit this from happening we are also careful not tointroduce new deadlocks

Signals We briefly describe our treatment of signals andsignal handlers noting that these are distinct from the signalsassociated with conditional wait operations described aboveWe focus on asynchronous signals which can be receivedby processes at any time and thus contribute an additionalsource of nondeterminism This is distinct from synchronoussignals eg SIGSEGV which are raised by the thread as andwhen certain operations are performed Unlike in the case ofeg a memory load operation which has a designated pro-gram point that can be intercepted to facilitate interactionwith the scheduler a signal can arrive at any time The stan-dard also specifies a signal function that binds a handlerfunction to a specific signalScheduling with signals is handled by simply marking

the entrance to the signal handler and the aforementionedsignal function as visible operations From a schedulingperspective besides the arrival of a signal signals are not aproblem Recording and replaying signals is where thingsbecome difficult which we discuss in sect43

33 LivenessWhile the scheduler strives to ensure that all possible pro-gram behaviours can be explored in principle in practicedepending on the strategy this can lead to massive slow-downs in particular cases For example suppose a thread isscheduled and undertakes a vast number of invisible oper-ations or calls a sleep function for some duration beforefinally issuing a visible operation If all other threads end upblocked waiting to perform visible operations and the sched-uler doesnrsquot give them a chance to run the performance of

the program may be drastically impacted This can becomeparticularly problematic when dealing with programs thatrely on responsiveness such as real-time applications

To cope with this we slightly reduce the schedulerrsquos abil-ity to explore any possible schedule by having the schedulerforce a reschedule in such cases By forcing a rescheduleafter n milliseconds the probability of exploring a schedulewhereby a thread performs two visible operations consec-utively separated by more than n milliseconds is greatlyreduced Conveniently for us tsan has a background threadwhich we configure to call our Reschedule() function everyn milliseconds for some given n Because the Reschedule()function relies on physical time it introduces nondetermin-ism into the scheduler

4 Record and ReplayWe now discuss the record and replay mechanism we haveimplemented with the aim of answering the following ques-tions What do we mean by ldquorecordingrdquo an execution andthen replaying said execution later How do we formalise arecording What makes replaying an execution valid Whatshould we record and what should we not record

When a program executes certain visible operations willlead to nondeterminism Recording an execution thereforemeans capturing information about these visible operationsin a form that can be used to reproduce the execution duringreplay We call this captured information the demo file ordemo for short We also refer to an execution that is replay-ing a demo as a replay and that the replay is synchronisedunless something has gone wrong with the replay such thatexecution has diverged from what was recorded in whichcase we say it is desynchronisedThis leaves us with the question of what it means for

a replay to desynchronise A demo is defined as a seriesof constraints arising from the recorded execution whichthe replay is required to satisfy The tool will attempt toenforce these constraints on the program during replay andas long as it can the replay is deemed to be synchronisedIf at any point the tool is unable to enforce a constraint onthe program we say the replay has hard desynchronised inwhich case the tool will abort In some cases the replaymay abide by the constraints but appear to diverge fromthe recorded execution for example by producing consoleoutput in a different order We call this soft desynchronisationTo illustrate an extreme case of this the empty demo istrivially synchronised for any replay but will lead to softdesynchronisation practically everywhere unless the systemunder test is highly deterministic

The record and replay mechanism is built into the sched-uler we discussed in sect3 In cases where a nondeterministicchoice needs to be made that is unrelated to scheduling (andso not handled by the queue nor random strategy) a PRNGis used seeded by two calls to rdtsc()

581

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

For the most part tsan11rec avoids the need for user an-notation however there are some cases where they are un-avoidable this is shown in sect54

41 Motivating ExampleTo help lay out the reasoning and technical explanation givenin the rest of this section we start off with an example pro-gram The program fragment in Figure 2 (already discussedbriefly in sect2) shows a simple client that repeatedly receives achar buffer from a server applies a transformation to it thensends it back We discuss the behaviours of the applicationthat need to be recorded in order to enable faithful replayvs those features that need not be recorded

What to record The obvious case here is the interleavingof threads Recording this will ensure that the order of oper-ations to the atomic locations quit and mtx and the order ofthe syscalls used throughout will be the same during replayOther operations are invisible and thus will not affect otherthreads or introduce nondeterminismSystem calls that interact with the environment can be

seen as inputs to the program which in this case determineshow many requests to handle and the contents of each re-quest For example poll informs us on whether there is datato be read from the server and thus needs to be recorded asdo the system calls recv and sendThe signal handler in this example is used to trigger the

end of the program The arrival of the signal is asynchronousand comes from outside the program During replay the toolwill need to ensure that the same signal arrives at the samepoint in logical time

What to ignore The first and likely most contentious ele-ment is the layout of memory This will of course depend onthe program in question but in this example and most of theprograms we encounter the position of objects in memorywill have no effect on the rest of the program If the requestqueue was instead an ordered set of char pointers then itwould matter as the pointer values will determine the orderin which requests are considered during iteration

42 InterleavingAs explained in sect41 the ordering of visible operations mustbe preserved during replay We describe how the randomand queue strategies store ordering-related information

To recap the strategies described in sect3 the random sched-uler chooses which thread to allow to run the next visibleoperation randomly after each visible operation has com-pleted while the queue scheduler is first come first serve forwhichever threads attempts to perform a visible operation

For the random strategy the entire thread interleaving isencapsulated in the PRNG Therefore no information besidesthe two seeds used for the PRNG is required

For the queue strategy the ordering during record dependson the order in which threads happen to reach Wait() which

depends on physical timing To encode the order so that itcan be enforced during replay a file called QUEUE is usedThis file records (a) a map specifying for each thread id thefirst tick at which the thread should be scheduled and (b)an ordered list of ticks to be consumed by threads each timethey leave a critical sectionmdashthe tick that a thread consumeson leaving a critical section informs the thread as to the nexttick at which it is to be scheduled Run-length encoding isused to efficiently record the case where a thread is scheduledmultiple times in succession

There is clearly a trade-off between these strategiesWherethe random strategy stores no data the queue strategy mayneed to store data on every visible operation The queuestrategy will be much faster however as it is unlikely to beblocked in Wait() unless another thread is already critical

43 SignalsWe discussed signals briefly in sect32 but deferred discussionto this section as most of the difficulties are in attempting toreplay them Synchronous signals are ignored (eg SIGSEGVSIGPIPE) as these should reoccur at the same point in theexecution without the help of our tool To clarify it is en-tering the signal handler that is the visible operation Wheninside the signal handler a thread cannot interact with therest of the process except though atomic operations whichare themselves visible From this we can say that it doesnot matter at which point between a Tick() and followingWait() pair that the signal handler is entered

In tsan11rec any asynchronous signal that arrived duringrecording becomes a synchronous signal upon replay Whena thread receives a signal it records the value of the tickseen during the most recent Tick() along with the signalvalue in a file called SIGNAL For example consider the casein Figure 2 where the Responder thread T2 has just per-formed the atomic load on tick 5 but has not yet attemptedto acquire the lock It receives the signal and performs aWait() and Tick() so that it can enter the signal handlerThe SIGNAL file will therefore have the line ldquo2 5 15rdquo in-dicating that thread T2 receives signal 15 at tick 5 Duringreplay when the Responder thread calls Tick() during tick5 it will raise signal 15 itself at the end of Tick() It doesnot matter at which precise point between Tick() and thefollowing Wait() that the signal arrived at during recordingit will float to the end of Tick() as shown in Figure 6

44 System CallsAs discussed in sect32 system calls are a significant source ofnondeterminism in an application To ensure that relevantproperties of an application are preserved during replay weneed to record the results of relevant system calls This is afundamental challenge that any record-and-replay systemmust face and state-of-the-art tools such as rr [66] aim tobe as comprehensive as possible in the system calls theysupport so that they can be applied directly to a wide range

582

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Figure 6 Signals are replayed immediately after the preced-ing tick

of applications In contrast the idea behind our sparse ap-proach to identify a minimal subset of system calls such thatrecording these system calls suffices to enable faithful replayof particular applications of interest At a high level we ap-proached this by incrementally adding support for systemcalls based on a trial-and-error process of first using straceto understand the full set of system calls issued by an appli-cation and then repeatedly attempting to record and replaythe application with tsan11rec incrementally adding supportfor additional system calls through analysis of the sources ofreplay failures We emphasise that this process did requirequite some manual effort and would need to be iteratedfurther to handle applications with significantly differentsystem call requirements compared to our case studies

The term syscall is a bit of a misnomer as both tsan and theinstrumentation detailed in this paper will instead interceptthe glibc wrappers around the syscalls instead of the syscalldirectly These glibc functions are much easier to use onbehalf of the programmer as they will take care of systemspecific details pushing the arguments onto the stack andinterpreting the results returned by the kernel We still usesyscall throughout as it is in the underlying syscall wherethe nondeterministic behaviours occurEach syscall takes a variable number of user-allocated

buffers and fills them with the appropriate data before set-ting errno and returning some value As these are sourcesof nondeterminism the return value errno and any appro-priate buffers will be compressed and stored in a demo filecalled SYSCALL During replay the actual data returned willbe overwritten by the data in SYSCALL Only the interactionwith the SYSCALL file is part of the critical section whichreduces contention in the scheduler

As an example consider the Listener thread in Figure 2performing the poll and recv syscalls in succession The re-turn value error number and two elements in the server_fdsstructure must be stored for poll the return value errornumber and contents of the buffer must be stored for recvThese will be treated as character buffers and have a simplerun length encoding applied

One of the difficulties that arises from adding a syscall isthe knock-on effect it can have with respect to other syscallsConsider for example the syscalls that interact with thefilesystem create open read etc If you intercept open

on the assumption that you may not have a valid file de-scriptor during replay where you did during record you willthen have to intercept every syscall that works with thatfile descriptor What starts out as a single interception be-comes potentially hundreds of intercepted syscalls There isa delicate balance to be struck between those syscalls thatneed to be recorded to make important execution featuresdeterministic during replay vs those syscalls that are bet-ter left un-recorded because (a) determinism of replay doesnot depend on them being recorded and (b) recording themleads to a snowball effect where many other syscalls mustalso be recorded to avoid desynchronisation

We have added syscall support to tsan11rec in a demand-driven manner by using strace to identify important callsfor key applications of interest The current set of syscallssupported includes read write recvmsg recv sendmsgaccept accept4 clock_gettime ioctl select and bindThese have allowed us to get a significant number of appli-cations up and running including the applications studiedin our evaluation modulo a few workarounds (detailed insect5) these applications issue many additional system callsthat we have found it unnecessary to recordmdashunnecessaryin the sense that simply re-issuing the system call during re-play has no observable effect on the applicationrsquos behaviourSometimes whether a call must be recorded depends on thefile descriptors that the call receives For instance for all ofour case studies it never proves necessary to record readand write calls whose file descriptors correspond to filesin the file system but it is necessary to record these callsif the associated file descriptors are associated with pipesused for inter-process communication Rather than this set ofsyscalls being a starting point towards full syscall coverageour view is that efficient record and replay that preserves par-allelism can benefit from selective syscall recording basedon application-specific knowledge and we envision the toolsupporting a core set of essential syscalls and being con-figurable with support for further syscalls to suit particularrecord and replay scenarios For example to handle a pro-gram such as htop would require instrumentation of theinteraction with the proc filesystem but doing this in thegeneral case would be wasteful and maybe even harmful iffuture calls depended on this interaction

45 Asynchronous EventsAsynchronous events are specific events that do not fit inwith any of the other categories discussed An importantcharacteristic is that they are not wrapped in a Wait() andTick() either because it was infeasible to do so duringrecording or because it would create a lot of unnecessaryoverhead These events still need to be replayed to ensurethe replay remains synchronised Currently there are twotypes of events Reschedule and Signal_wakeup

The reschedule event was discussed in sect33 It is necessaryto include this to ensure that the PRNG will be called the

583

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Figure 7 Right shows how reschedules are floated abovethe Tick()

same number of times in each critical section To see whythe signal wakeup event is necessary consider again a signalbeing received in the context of the example of Figure 2This time suppose the Responder thread receives the signalwhile it is disabled trying to acquire the lock Assume thatthe thread disabled itself on tick 10 the signal arrives andthe thread re-enables itself during tick 12 and then entersthe signal handler on tick 14 It is not OK for the thread tosimply not disable itself on tick 10 during replay as the poolof enabled threads for the scheduler to choose from duringticks 10 and 11 is different between recording and replayingwhich will affect the choice the scheduler will make

As with signals all asynchronous events are replayedsynchronously with all events that occur between a Tick()and the following Wait() floating up to the previous Tick()This is shown in Figure 7 These events are stored in theASYNC file

5 EvaluationTo evaluate the controlled scheduling abilities of tsan11recwe compare the strategies on the CDSchecker benchmarksuite (sect51) We then use larger applications to comparetsan11rec with rr [66] Of the few record-and-replay toolsthat are publicly available we chose to compare with rr be-cause (a) it represents the state-of-the-art (b) it is similar totsan11rec in terms of the kinds of applications it aims to sup-port We consider programs that bring challenges related tonetworking signals IO and real-time constraints Apachersquoshttpd web server (sect52) the PARSEC benchmarks and pbzip(sect53) and two first-person shooter games built on the SDLlibrary (sect54) The SDL case studies showcase applicationsthat tsan11rec can handle that are out of scope for rr dueto communication between the game and the OpenGL in-terface In contrast we also discuss practical limitations oftsan11rec that rr does not face related to the SQLite databaseapplication and Firefoxrsquos SpiderMonkey (sect55)

Throughout we describe howwe evolved the sparse record-ing facilities of tsan11rec and many practical challenges wefaced along the way our experience is that such challengeswhich seem fundamental to record and replay are typicallydescribed only briefly if at all in the literature We hope ourexposition will be valuable to other researchers

Common experimental setup All experiments were rununder Ubuntu 1404 LTS on an Intel i7-4770 8x340GHz plat-form with 16GB RAM The tsan11 and tsan11rec tools werebuilt on top of clang revision 286384 The version of rr usedis 510 As a key goal of our work is to apply race detectionto record and replay with controlled concurrency testingmost of the testing is done with race detection enabled evenwhen using rr We still show times for rr without race detec-tion for reference We use native rr tsan11 tsan11+rr andtsan11rec to refer to a program running without instrumen-tation under rr with tsan11 instrumentation under rr withtsan11 instrumentation and under tsan11rec respectively

51 CDSchecker Litmus TestsOverview The small programs (roughly 100 LOC each)used in prior work to evaluate CDSchecker [64] are useful toassess whether tsan11recrsquos controlled scheduling improveson tsan11rsquos ability to find races (including races related toweak memory) As these programs are closed the schedulerand memory model are the sources of nondeterminism

Experimental setup The experiments are run in the fol-lowing four modes tsan11 where tsan11 (which does notuse controlled concurrency testing) finds races tsan11 + rrwhere tsan11 finds races with rr recording and tsan11recrnd and tsan11rec queue where tsan11rec finds races usingthe random and queue strategies respectively We measurethe runtime of each tool on each benchmark averaged over1000 runs reporting standard deviation and remarking onthe coefficient of variation (CV)mdashthe ratio of the standarddeviation and mean

Results and discussion Table 1 summarises the resultsThe Time columns show mean execution times with stan-dard deviation Because these are short-running tests whosebehaviour depends intimately on themanner inwhich threadsinterleave the variance across runs is fairly high with theCV usually exceeding 1 with the exception of the longer-running results for rr for which the CV is always less than 1and usually less than 05 Rate columns show the percentageof all executions that exposed a data race

Comparing the tsan11rec rnd results with the tsan11 andtsan11rec queue results we see that randomized controlledschedulingmeans tsan11rec findsmore races across all bench-marks except chase-lev-deque and dekker-fences This isbecause tsan11 runs at the mercy of the OS scheduler whichtends to explore similar schedules on repeated runs and inthese small programs typically causes the main thread torun to completion before other threads are scheduled Theprice for this is higher runtime eg mcs-lock and ms-queuesuffer slow-downs of around 2times compared with tsan11 weattribute this to the total ordering of visible operations im-posed by tsan11rec The rr results show huge increases dueto a constant overhead applied to all programs But as rr is

584

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 1 Comparison over CDSchecker benchmarks between tsan + rr tsan11 and tsan11rec with controlled random andqueue scheduling Each benchmark was executed 1000 times in each mode The ldquoTimerdquo columns shows the mean executiontime (ms) and standard deviation (in parentheses) The ldquoRaterdquo column shows the percentage of runs that exhibited data races

tsan11 + rr tsan11 tsan11rec rnd tsan11rec queueTest Time Rate Time Rate Time Rate Time Rate

barrier 590 (1445) 00 8 (914) 00 4 (523) 375 6 (499) 00chase-lev-deque 579 (16669) 05 0 (194) 59 1 (307) 02 2 (378) 00dekker-fences 1776 (120854) 499 2 (402) 503 4 (483) 387 3 (446) 528linuxrwlocks 579 (12645) 03 2 (418) 01 4 (493) 624 3 (450) 00mcs-lock 574 (1371) 00 3 (445) 00 5 (523) 770 3 (447) 01mpmc-queue 574 (1371) 00 3 (449) 00 5 (500) 605 3 (446) 00ms-queue 3093 (10019) 1000 91 (6413) 1000 93 (8068) 1000 52 (6260) 1000

designed for larger applications this overhead will usuallybecome insignificant in more realistic examples

We examined a trace from chase-lev-deque to understandwhy tsan11rec rnd detects fewer races than tsan11 We foundthat from the creation of thread 2 to the point of the racethread 1 must perform 29 operations before thread 2 per-forms just 4 operations in order for the race to manifest Theprobability of this happening under uniform random sched-uling is very low We were able to coerce a race report outof the program by moving the creation of thread 2 to later inthe program This shows that different scheduling strategieswill affect how effective we are at finding data races andthat probabilistic concurrency testing (PCT) can be effectiveat prying out concurrency bugs [12]

52 httpdOverview Apachersquos httpd [3] is a widely-used modularhttp server that makes heavy use of concurrency to han-dle many simultaneous connections For record and replayit is of further interest due to its dependence on externalnetwork input We were able to handle httpd by capturingthe system calls described in sect44 with one workaroundthe accept system call which listens for incoming connec-tions relies on epoll_wait to listen for events This returnsuser-allocated pointers file descriptors and other data ina union with no easy way of knowing the active membersomething which tsan11rec cannot currently handle Weworked around this by using httpdrsquos option to switch to asimpler but slightly less efficient syscall poll which insteadsimply listens to file descriptors the results presented hereemploy this workaround A strength of rr is that it can han-dle httpd without this workaround due to its non-sparserecord and replay mechanism

Experimental setup We tested httpd version 2428 in single-process-multiple-thread mode using ab an Apache-providedprogram for server stress testing We sent 10000 queriesacross 10 concurrent threads to an httpd server for eachof the setups shown in Table 2 averaging results over 10runs We report on standard deviation and again remark on

the CV In the table rnd and queue refer to configurationsof tsan11rec and the presence or absence + rec indicateswhether recording was enabled

Results and discussion The results are shown in Table 2The columns under Race reports show regular results withrace reporting enabled the data under No reports shows re-sults where race-checking-capable tools do perform racechecking behind the scenes but do not actually emit racereports We make this distinction because tsan11 detectsso many races that the overhead of generating reports no-ticeably affects performance results when fewer races aredetected are more representative of the performance onewould expect using a future version of httpd in which manyraces are fixed The Throughput columns indicate the meannumber of queries the server responds to per second TheRate column is the mean number of race reports generated(only relevant for tsan11-based configurations) For eachstandard deviation is shown in parentheses Variance asmeasured by CV is low below 08 in all cases and usuallyless than 05 The Overhead columns indicate how muchslower performance is compared with native executionWithout reporting tsan11 already incurs a 3times overhead

compared to native Comparing results for native with rndand rnd+rec we find that adding controlled random sched-uling increase this overhead massively to between 79ndash89timesdepending on whether recording is enabled This is in thesame ball park as the overhead associated with rr 61times with-out race checking and 160timeswith tsan11 instrumentation (butstill with the actual reporting of races disabled) In contrastwhen our queue strategy is used the overhead comparedwith native drops to 9times and 21times with recording disabled vsenabled We attribute the gap between rrrnd and queue tohttpdrsquos heavy reliance on parallelism and frequent use ofshared mutexes This parallelism is removed by rr becausethe tool sequentializes the execution of threads while ourrandom scheduler only allows the thread that it has chosento be scheduled next to execute a visible operation even ifmany other threads are ready to execute visible operations

585

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Table 2 Comparison of throughput and race rate between native tsan11 rr and tsan11rec for Apachersquos httpd Results areaveraged over 10 runs ldquoThroughputrdquo shows mean throughput in queries per second ldquoRaterdquo is the number of races detectedper run (where relevant) Standard deviations are shown (in parentheses) Overhead is computed relative to native throughput

Race reports No reportsSetup Throughput Overhead Rate Throughput Overhead

Native NA NA NA 28895 (462256) 1timesrr NA NA NA 475 (608) 61timestsan11 3687 (29428) 8times 113 (1985) 9824 (143201) 3timestsan11 + rr 86 (6320) 336times 34 (1680) 181 (4637) 160timesrnd 141 (892) 205times 162 (3878) 367 (3326) 79timesqueue 818 (31033) 35times 381 (7362) 3261 (84367) 9timesrnd + rec 142 (1185) 203times 155 (3103) 326 (3894) 89timesqueue + rec 513 (8534) 56times 360 (6428) 1387 (24961) 21times

In contrast the queue strategy allows threads to perform visi-ble operations largely on demand Turning to the results withrace reporting enabled we see that the queue strategy hasthe highest race detection rate improving on uncontrolledtsan11 All other race detecting configurations lower the racedetection rate we believe this is because rr and rnd reducethe number of queries being responded to concurrently

Comparing demo file sizes when recording is enabled thetsan11rec demo files are around 48MB for both strategiesdropping to 48MB when only 1000 queries are issued sug-gesting that demo file size increases linearly with the numberof requests at a rate of around 48KB per request This couldbe reduced further with a more aggressive compression strat-egy but would likely increase the time overhead The demofile for rr is significantly smaller 66MB for 10000 querieswhich goes down to 39MB with 1000 queries implying arate of around 03KB per request plus a constant 36MB

53 PARSEC and pbzipOverview Wenext turn to the PARSEC benchmark suite [11]and pbzip application [68] both widely used for evaluatingconcurrency analysis tools For PARSEC we consider thebenchmarks used to evaluate iReplayer [53] however threeof these would not work on our system (dedup and swaptionsdo not compile and canneal crashes)

Experimental setup Each PARSEC (version 30) bench-mark was run with the lsquosimlargersquo test size shipped with thebenchmarks using 4 threads Pbzip (version 2-1113) wasused to compress a 400MB file with 4 threads We ran eachbenchmark 10 times per tool configuration and report av-erage runtimes We report on standard deviation and againremark on the CV A small number of races were discoveredfor some benchmarks and the race detecting tools largelyagreed on the number of races we do not detail these further

Results and discussion Table 3 shows the average timetaken to run each benchmark with each tool configurationwith standard deviation Variance as measured by CV is

reasonably low (CV is always below 1) For the tsan11rec re-sults + rec indicates whether recording was enabled Table 4is computed from the data of Table 3 and reports the over-head associated with running using each tool configurationcompared with native executionWith the exception of bodytrack and fluidanimate the

overhead tsan11rec brings over that of tsan11 is small and forall benchmarks whether recording is enabled or notmakes lit-tle difference However the overhead associated with tsan11+ rr (ie running tsan11-instrumented code under rr) is signif-icant compared with the tsan11 overhead alone despite thefact that running under rr without race detection is generallyefficient Interestingly rr without race detection performsless well on blackscholes compared with the tsan11rec con-figurations Digging into this we found that the benchmarkdistributes work between threads at the start of executionand then lets threads runwith little interaction This high par-allelismlow communication execution plays to the strengthsof tsan11rec where invisible operations are left to run in par-allel but is bad for rr which forces sequentialization acrossall operations

54 SDL-based GamesOverview Simple DirectMedia Layer (SDL) is a library thatconsolidates input graphics and various other forms of IOunder a single interface [75] and is typically used for gamesOnUbuntu 1604 SDL communicateswith X11 for IO pulseau-dio for sound andOpenGL for displayWe investigated recordand replay for two SDL-based games Zandronum [73] amulti-player Doom port (asymp400kLOC) and QuakeSpasm [71](asymp88kLOC) a port of Quake While these games support cus-tom record and replay by logging high level commands byworking at the threading and system call level tsan11rec canfacilitate record and replay of bugs that rely on low-levelinteractions to manifest We discuss below successful recordand replay of a bug in Zandronum that arises due to commu-nication of game data between the game client and serverwhich is not present in the gamersquos native replay

586

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 3 Execution times (s) for tool configurations across the pbzip and PARSEC benchmarks averaged across 10 runsStandard deviation is shown (in parentheses)

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 92 (031) 117 (049) 664 (311) 772 (187) 181 (030) 123 (066) 182 (030) 129 (129)blackscholes 04 (003) 08 (007) 10 (001) 20 (000) 07 (007) 07 (006) 07 (007) 07 (007)fluidanimate 08 (004) 160 (127) 21 (001) 385 (049) 464 (339) 390 (267) 504 (209) 398 (196)streamcluster 17 (019) 388 (441) 1133 (313) 1811 (179) 1030 (059) 485 (269) 1028 (018) 416 (261)bodytrack 05 (002) 72 (036) 38 (094) 327 (094) 500 (040) 73 (033) 500 (078) 78 (023)ferret 12 (007) 140 (086) 87 (044) 815 (220) 164 (066) 146 (060) 167 (071) 145 (037)

Table 4Overhead compared with native execution for tool configurations across the pbzip and PARSEC benchmarks computedfrom the results of Table 3

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 10times 13times 72times 84times 20times 13times 20times 14timesblackscholes 10times 20times 27times 53times 19times 19times 19times 18timesfluidanimate 10times 203times 27times 489times 590times 497times 642times 506timesstreamcluster 10times 224times 654times 1045times 595times 280times 593times 240timesbodytrack 10times 135times 72times 614times 938times 138times 939times 147timesferret 10times 119times 74times 695times 140times 125times 143times 124times

Our initial attempts to replay these SDL-based gamesfailed due to communication between the application andthe closed and proprietary NVIDIA OpenGL module on ourexperimental platform via ioctl syscallsWeworked aroundthis by ignoring ioctl during recording and letting it runnatively during replay This works because communicationwith the display driver has no effect on the game logic Dis-play interaction led to further problems with initialization ofthe input module We resorted to adapting the scheduler tolet the application run uninstrumented until SDL module ini-tialization had completed adding a custom scheduler hookto allow the application and scheduler to synchronize relatedto this These problems are not specific to our approach ortoolmdashindeed rr cannot handle these SDL-based games forsimilar reasonsmdashbut are rather a fundamental limitation ofrecording and replaying applications that make heavy useof IO To handle such applications one either needs to fullymock out IO components requiring a tremendous engi-neering effort or carefully determine those components thatshould not be instrumented and specify this via annotationsWith these workarounds we were able to handle both

games such that gameplay is displayed on screen during re-play gameplay would not be visible if the IO subsystem hadbeen mocked out and visibility might be useful in debuggingproblems that manifest as visual artifacts

Experimental setup Measuring game performance in away that allows us to compare the overhead of our sched-uling strategies is non-trivial The only metric we have isthe frame-rate (fps)mdashthe number of frames drawn to thescreen per second QuakeSpasm and Zandronum are capped

at 60 fps and will try to maintain this frame-rate dipping ifthey cannot keep up If the frame-rate is reduced too muchthe games become unplayable As a best-effort evaluationmechanism we report on whether the games are playableunder various tool combinations Additionally we found thatit was possible to remove the frame cap for Quakespasmso we report ball-park figures for the overheads of varioustool configurations when playing this game un-capped (Wecould not find a way to reliably remove the frame cap forZandronum) As mentioned previously we do not comparewith rr as it cannot record or replay the games We used Zan-dronum revision 10013dd3c3b57023f updated to use SDL2QuakeSpasm version 0930 and SDL version 205

Results anddiscussion With the random tsan11rec sched-uler Zandronum was unplayable even with recording dis-abled the frame-rate dropped to below 1 fps This is due tothe random scheduler starving the main thread by frequentlyscheduling other less critical threads (eg the audio thread)In contrast the queue scheduler could maintain the full 60fps with recording enabled for 100 seconds of play the demosize grew to just under 8MB of which 65MB was for syscalls

To test tsan11recrsquos ability to replay network communica-tion we found a previously fixed Zandronum bug [88] thatrelies on an error in this communication to manifest Thisbug involves incorrect game state information being sentfrom the server to the client during a map change We repli-cated the bug with a server and two clients one of whichwas recording After about 12 minutes the bug appearedand resulted in a demo file of 43MB We then replayed thedemo and the bug appeared as expected This demonstrates

587

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

that our tool can be used to accurately capture and facilitatereplay of bugs in large networked applicationsFor QuakeSpasm we found that it was possible to play

the game without dropping below 60 fps using tsan11 and alltsan11rec configurations To further investigate the overheadof each tool configuration on this case study we removedthe fps cap then played the game 5 times per tool configu-ration for 90 seconds per play enabling a mode where thegamersquos fps is periodically appended to a file We made a besteffort to play the game in a similar manner on each run butinevitably there will still be high variation in game activitybetween plays Indicative results are shown in Table 5 Thernd and queue configurations refer to tsan11rec with the ran-dom and queue strategies respectively and with recordingdisabled while the ldquo+ recrdquo tool configurations are similarbut with recording enabled The ldquoOverheadrdquo column showsthe overhead observed compared with native execution Thetake-away from these results is that the instrumentationoverhead for both tsan11 and tsan11rec is surprisingly mod-est (generally less than 2times) and that the additional overheadassociated with enabling recording in tsan11rec is low

55 Limitations SQLite and SpiderMonkeyA downside of our sparse approach to record and replay isthat different applications may have incompatible require-ments regarding what should be recorded and what mustnot be recorded For example recording memory layout andattempting to enforce the same layout on replay would notonly slow down the SDL games (see sect54) to the point ofbeing unplayable but would also cause problems related tocommunication with the display driver Yet the behaviour ofsome programs will depend on the memory layout such asiterating over an ordered C++ container that holds pointersIn particular we experimented with applying tsan11rec

to the SQLite database management library [76] and to Spi-derMonkey Firefoxrsquos JavaScript management engine [60]While tsan11rec was applicable for controlled schedulingof these applications we found that replay would rapidlydesynchronise due to memory layout nondeterminism caus-ing conditionals that rely on the values of pointers to evaluatedifferently during replay Tools such as rr can handle theseprograms reliably by enforcing the same memory layoutThis is a trade-off the non-sparse approach of rr can leadto higher overheads as demonstrated in sect52 and sect53 Analternative to adapting the record-and-replay tool so thatit always enforces memory layout determinism would beto adapt the application of interest so that default memoryallocation is replaced with a deterministic memory allocator

6 Related WorkControlled scheduling A large amount of work has goneinto the use of scheduling strategies as a form of state spaceexploration (eg [29 30 61 62 78 87]) and on techniques

aimed at reducing the size of the state space such as dynamicpartial-order reduction [31 89]) A particularly notable con-trolled scheduling tool in terms of successful practical ap-plication is Microsoftrsquos CHESS [62] which aims to system-atically explore all interleavings of a test scenario Similarto our approach each visible instruction has an associatedcustom wrapper that intercepts the real instruction callinginto the CHESS schedulerSchedule bounding techniques notably preemption- and

delay-bounding [26 61] have been shown to be successful inprioritising the order in which thread schedules are exploredduring controlled concurrency testing They prioritise ex-ploring schedules that exhibit small numbers of preemptionsbetween threads in line with empirical evidence that bugsrarely require large numbers of preemptions in order to man-ifest [56] Combining such techniques with the tsan11recalgorithm is an appealing idea in principle but is hinderedby the assumption that the program under test takes a fixedinput and that the scheduler is the only source of nonde-terminism This assumption allows running the programagain and again trying different schedules In the contextof tsan11rec which can be used to record and replay ap-plications where the environment presents other forms ofnondeterminism the manner in which the program interactswith its environment is captured with respect to a particularthread schedule and other thread schedules might involvecompletely different environmental interactions We believea more promising approach would be to bring ideas from theprobabilistic concurrency testing (PCT) algorithm [12] to thetsan11rec setting to introduce a degree of skewing to ourrandom strategy so that it explores more diverse schedules

Record and replay Record and replay has been a signifi-cant area of research with many tools being created to facil-itate it [2 9 19 24 32 36 39 40 43ndash45 47 50 53 54 57 5765ndash67 72 79 81] The general premise behind them is simi-lar identify order nondeterminism and input nondeterminismand create techniques to capture them while recording andcontrol them during replayVarious tools extend the OS in some way or require spe-

cific hardware [2 6 9 19 20 24 47 50 79 81] This has thebenefit of giving the tool access to much more of the systemsuch as memory pages and process information For exam-ple Scribe [47] will directly modify the system schedulerinstead of coercing it and achieves slowdowns as low as105times However this severely hits the usability of the toolas it requires the user to deploy a modified OS

Other tools reside entirely in user space [10 15 32 34 43ndash45 49 51 53 54 57 66 67 72 80 84] and trade performancefor usability This is the category that tsan11rec falls intoEase of use is particularly important in persuading users toadopt the tool rr [66] in particular allows the user to recorda program by simply passing the binary to rr as a parameter

588

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 5 Indicative frames per second (fps) result captured by playing Quakespasm for 90 seconds five times per toolconfiguration and capturing the fps reports recorded by the game Frame-rate results are reported for the minimum maximummedian and mean case as well as the 25th and 75th quartiles

Min 25th Median 75th Max Mean OverheadNative 291 369 400 428 502 4004 10timestsan11 125 193 225 275 431 2338 17timesrnd 128 205 238 281 421 2471 16timesqueue 84 188 233 273 435 2333 17timesrnd + rec 113 178 212 253 366 2168 18timesqueue + rec 86 161 185 227 348 1930 21times

and as such has become the definitive tool for record and re-play We have performed an extensive comparison with rr insect5 and note that while rr outclasses tsan11rec in some appli-cations and can handle applications that are out of scope fortsan11rec (see sect55) rr shows significantly higher overheadcompared with tsan11rec for a number of applications thatrely on a high degree of parallelism for performance Furtherour sparse approach with suitable workarounds enablesrecord and replay for graphical applications (the SDL-basedgames of sect54) that rr cannot currently handleBecause it builds on tsan11 which itself uses compiler

instrumentation and a modified libcxx tsan11rec shares sim-ilarities with tools that depend on language implementationor library-level support [1 13 13 18 36 58] Notable exam-ples here include R2 [36] and IntelliTrace [58]Whole system replay aims to record all system nonde-

terminism [14 19 22 24 25 27 53 77] Among these therecent iReplayer tool [53] performs record and replay in-situavoiding many of the problems (eg memory layout issues)that otherwise come from running the record and replayexecutions under different processes

Some tools focus on the order-nondeterminism allowingthem to retain their parallelism and thus reducing the over-head of multi-threaded application [25 42 59 63 70 86]Castor [57] will provide each thread with its own bufferfor storing information and serialize them at a later timetsan11rec also fits into this category as it will both preserverparallelism of invisible operations and apply a schedulingstrategy to resolve this nondeterminism

An alternative to recording a programrsquos nondeterminismis to remove it making some or all aspects of the programdeterministic [5 8 16 17 21 55] For example Dthreads [55]ensures that memory accesses are deterministic on eachexecution Such approaches can have a significant probeproblem by removing the behaviour necessary for certainbugs to manifest in return for avoiding the performanceoverhead associated with handling order-nondeterminism

Multi-version execution Multi-version (or multi-variant)execution (MVE) is a method for concurrently running mul-tiple processes that are expected to behave in a semanticallysimilar manner [41 46 69 82 83] MVE can be used to detect

security vulnerabilities in applications if a variant divergesthis could indicate that an attacker has modified the processin some way [46 82 83] It can also be used for running dif-ferent analyses on identical processes that would not workwhen run together on the same process such as the clangsanitizers [69] Most MVE systems hinge on a special moni-tor thread that controls the generation an maintenance ofa number of variants Keeping the variants in sync with re-spect to nondeterministic behaviours presents many of thesame problems that are associated with record and replay

7 ConclusionWe have presented tsan11rec which brings together con-trolled scheduling record and replay and dynamic data racedetection for the dynamic analysis of CC++11 applicationsOur experimental evaluation demonstrates that the tool isin many cases competitive in terms of performance withrr a state-of-the-art record and replay tool in some casesout-performing rr due to tsan11recrsquos ability to preserve par-allelism in applications under test to a high degree We havealso shown that our tool is capable of recording and replayingSDL-based video games by exploiting our sparse approachto avoid recording aspects of gamedisplay communicationthat are fundamentally hard to control The flip side of oursparse approach is that by limiting what is recorded ourtool desynchronises on uncontrolled forms of nondetermin-ism such as that related to memory layout by capturingthis form of nondeterminism tools such as rr do not sufferfrom this problem Two exciting avenues for future workinclude investigating a spectrum of recording granularitiesto bridge the gap between our sparse approach and stricterapproaches in a configurable manner and to investigatebug detection using a richer range of scheduling strategiesincluding schedule bounding [26 61 62] and probabilisticconcurrency testing [12]

AcknowledgmentsThis work was supported by a PhD studentship funded byGCHQ and EPSRC projects EPN0263141 and EPR0068651

589

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

References[1] Bowen Alpern Jong-Deok Choi Ton Ngo Manu Sridharan and

John M Vlissides 2001 A Perturbation-Free Replay Platform forCross-Optimized Multithreaded Applications In Proceedings of the15th International Parallel amp Distributed Processing Symposium (IPDPS-01) San Francisco CA USA April 23-27 2001 IEEE Computer Society23 httpsdoiorg101109IPDPS2001924957

[2] GautamAltekar and Ion Stoica 2009 ODR output-deterministic replayfor multicore debugging In Proceedings of the 22nd ACM Symposiumon Operating Systems Principles 2009 SOSP 2009 Big Sky Montana USAOctober 11-14 2009 Jeanna Neefe Matthews and Thomas E Anderson(Eds) ACM 193ndash206 httpsdoiorg10114516295751629594

[3] Apache Software Foundation 2018 Apache httpd httpshttpdapacheorgdevdevnoteshtml

[4] Remzi H Arpaci-Dusseau and Brad Chen (Eds) 2010 9th USENIXSymposium on Operating Systems Design and Implementation OSDI2010 October 4-6 2010 Vancouver BC Canada Proceedings USENIXAssociation httpwwwusenixorgeventosdi10techfull_papersosdi10_proceedingspdf

[5] Amittai Aviram Shu-Chun Weng Sen Hu and Bryan Ford 2010 Ef-ficient System-Enforced Deterministic Parallelism See [4] 193ndash206httpwwwusenixorgeventsosdi10techfull_papersAvirampdf

[6] David F Bacon and Seth Copen Goldstein 1991 Hardware-assisted Re-play of Multiprocessor Programs In Proceedings of the 1991 ACMONRWorkshop on Parallel and Distributed Debugging (PADD rsquo91) ACMNew York NY USA 194ndash206 httpsdoiorg101145122759122777

[7] Mark Batty Scott Owens Susmit Sarkar Peter Sewell and TjarkWeber2011 Mathematizing C++ concurrency In POPL 55ndash66

[8] Tom Bergan Owen Anderson Joseph Devietti Luis Ceze and DanGrossman 2010 CoreDet a compiler and runtime system for deter-ministic multithreaded execution See [38] 53ndash64 httpsdoiorg10114517360201736029

[9] Tom Bergan Nicholas Hunt Luis Ceze and Steven D Gribble 2010Deterministic Process Groups in dOS See [4] 177ndash191 httpwwwusenixorgeventsosdi10techfull_papersBerganpdf

[10] Sanjay Bhansali Wen-Ke Chen Stuart de Jong Andrew EdwardsRon Murray Milenko Drinić Darek Mihočka and Joe Chau 2006Framework for Instruction-level Tracing and Analysis of Program Ex-ecutions In Proceedings of the 2Nd International Conference on VirtualExecution Environments (VEE rsquo06) ACM New York NY USA 154ndash163httpsdoiorg10114511347601220164

[11] Christian Bienia Sanjeev Kumar Jaswinder Pal Singh and Kai Li 2008The PARSEC benchmark suite characterization and architectural im-plications In 17th International Conference on Parallel Architecture andCompilation Techniques PACT 2008 Toronto Ontario Canada October25-29 2008 Andreas Moshovos David Tarditi and Kunle Olukotun(Eds) ACM 72ndash81 httpsdoiorg10114514541151454128

[12] Sebastian Burckhardt Pravesh Kothari Madanlal Musuvathi and San-tosh Nagarakatte 2010 A randomized scheduler with probabilisticguarantees of finding bugs See [38] 167ndash178 httpsdoiorg10114517360201736040

[13] Brian Burg Richard Bailey Andrew J Ko and Michael D Ernst 2013Interactive recordreplay for web application debugging In The 26thAnnual ACM Symposium on User Interface Software and TechnologyUISTrsquo13 St Andrews United Kingdom October 8-11 2013 ShahramIzadi Aaron J Quigley Ivan Poupyrev and Takeo Igarashi (Eds) ACM473ndash484 httpsdoiorg10114525019882502050

[14] Anton Burtsev David Johnson Mike Hibler Eric Eide and John Regehr2016 Abstractions for Practical Virtual Machine Replay In Proceedingsof the 12th ACM SIGPLANSIGOPS International Conference on VirtualExecution Environments Atlanta GA USA April 2-3 2016 VishakhaGupta-Cledat Donald E Porter and Vivek Sarkar (Eds) ACM 93ndash106httpsdoiorg10114528922422892257

[15] M E Chastain 1999 MEC (January 1999) httpslwnnet19990121amechtml

[16] Heming Cui Jiriacute Simsa Yi-Hong Lin Hao Li Ben Blum Xinan XuJunfeng Yang Garth A Gibson and Randal E Bryant 2013 Parrota practical runtime for deterministic stable and reliable threads InACM SIGOPS 24th Symposium on Operating Systems Principles SOSP rsquo13Farmington PA USA November 3-6 2013 Michael Kaminsky and MikeDahlin (Eds) ACM 388ndash405 httpsdoiorg10114525173492522735

[17] Heming Cui Jingyue Wu John Gallagher Huayang Guo and JunfengYang 2011 Efficient deterministic multithreading through schedulerelaxation See [85] 337ndash351 httpsdoiorg10114520435562043588

[18] P Deva 2018 Chronon (2018) httpchrononsystemscomblogdesign-and-architecture-ofthe-chronon-record-0

[19] David Devecsery Michael Chow Xianzheng Dou Jason Flinn andPeter M Chen 2014 Eidetic Systems In 11th USENIX Symposium onOperating Systems Design and Implementation OSDI rsquo14 BroomfieldCO USA October 6-8 2014 Jason Flinn and Hank Levy (Eds) USENIXAssociation 525ndash540 httpswwwusenixorgconferenceosdi14technical-sessionspresentationdevecsery

[20] Joseph Devietti Brandon Lucia Luis Ceze andMark Oskin 2009 DMPdeterministic shared memory multiprocessing In Proceedings of the14th International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2009 Washington DC USAMarch 7-11 2009 Mary Lou Soffa and Mary Jane Irwin (Eds) ACM85ndash96 httpsdoiorg10114515082441508255

[21] Joseph Devietti Jacob Nelson Tom Bergan Luis Ceze and Dan Gross-man 2011 RCDC a relaxed consistency deterministic computer See[37] 67ndash78 httpsdoiorg10114519503651950376

[22] Brendan Dolan-Gavitt Josh Hodosh Patrick Hulin Tim Leek andRyan Whelan 2015 Repeatable Reverse Engineering with PANDAIn Proceedings of the 5th Program Protection and Reverse EngineeringWorkshop (PPREW-5) ACM New York NY USA Article 4 11 pageshttpsdoiorg10114528438592843867

[23] Richard Draves and Robbert van Renesse (Eds) 2008 8th USENIX Sym-posium on Operating Systems Design and Implementation OSDI 2008December 8-10 2008 San Diego California USA Proceedings USENIXAssociation httpswwwusenixorgpublicationsproceedingsf[0]=im_group_audience3A73

[24] George W Dunlap Samuel T King Sukru Cinar Murtaza A Bas-rai and Peter M Chen 2002 ReVirt Enabling Intrusion AnalysisThrough Virtual-Machine Logging and Replay In 5th Symposium onOperating System Design and Implementation (OSDI 2002) Boston Mas-sachusetts USA December 9-11 2002 David E Culler and Peter Dr-uschel (Eds) USENIX Association httpwwwusenixorgeventsosdi02techdunlaphtml

[25] George W Dunlap Dominic G Lucchetti Michael A Fetterman andPeter M Chen 2008 Execution replay of multiprocessor virtual ma-chines In Proceedings of the 4th International Conference on VirtualExecution Environments VEE 2008 Seattle WA USA March 5-7 2008David Gregg Vikram S Adve and Brian N Bershad (Eds) ACM121ndash130 httpsdoiorg10114513462561346273

[26] Michael Emmi Shaz Qadeer and Zvonimir Rakamaric 2011 Delay-bounded scheduling In Proceedings of the 38th ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages POPL 2011 AustinTX USA January 26-28 2011 Thomas Ball and Mooly Sagiv (Eds)ACM 411ndash422 httpsdoiorg10114519263851926432

[27] Jakob Engblom Daniel Aarno and Bengt Werner 2010 Full-SystemSimulation from Embedded to High-Performance Systems Springer USBoston MA 25ndash45 httpsdoiorg101007978-1-4419-6175-4_3

[28] Cormac Flanagan and Stephen N Freund 2009 FastTrack efficientand precise dynamic race detection In Proceedings of the 2009 ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI 2009 Dublin Ireland June 15-21 2009 Michael Hind andAmer Diwan (Eds) ACM 121ndash133 httpsdoiorg1011451542476

590

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

1542490[29] Cormac Flanagan and Stephen N Freund 2010 Adversarial memory

for detecting destructive races In Proceedings of the 2010 ACM SIGPLANConference on Programming Language Design and Implementation PLDI2010 Toronto Ontario Canada June 5-10 2010 Benjamin G Zornand Alexander Aiken (Eds) ACM 244ndash254 httpsdoiorg10114518065961806625

[30] Cormac Flanagan and Stephen N Freund 2010 The RoadRunnerdynamic analysis framework for concurrent programs In Proceedingsof the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis forSoftware Tools and Engineering PASTErsquo10 Toronto Ontario CanadaJune 5-6 2010 Sorin Lerner and Atanas Rountev (Eds) ACM 1ndash8httpsdoiorg10114518066721806674

[31] Cormac Flanagan and Patrice Godefroid 2005 Dynamic partial-orderreduction for model checking software In Proceedings of the 32nd ACMSIGPLAN-SIGACT Symposium on Principles of Programming LanguagesPOPL 2005 Long Beach California USA January 12-14 2005 JensPalsberg and Martiacuten Abadi (Eds) ACM 110ndash121 httpsdoiorg10114510403051040315

[32] Dennis Geels Gautam Altekar Scott Shenker and Ion Stoica 2006Replay Debugging for Distributed Applications (Awarded Best Paper)In Proceedings of the 2006 USENIX Annual Technical Conference BostonMA USA May 30 - June 3 2006 Atul Adya and Erich M Nahum (Eds)USENIX 289ndash300 httpwwwusenixorgeventsusenix06techgeelshtml

[33] Patrice Godefroid 2005 Software Model Checking The VeriSoftApproach Formal Methods in System Design 26 2 (2005) 77ndash101httpsdoiorg101007s10703-005-1489-x

[34] C Gottbrath 2008 Reverse debugging with the TotalView debugger(May 2008)

[35] David Grove and Steve Blackburn (Eds) 2015 Proceedings of the36th ACM SIGPLAN Conference on Programming Language Design andImplementation Portland OR USA June 15-17 2015 ACM httpdlacmorgcitationcfmid=2737924

[36] Zhenyu Guo Xi Wang Jian Tang Xuezheng Liu Zhilei Xu Ming WuM Frans Kaashoek and Zheng Zhang 2008 R2 An Application-LevelKernel for Record and Replay See [23] 193ndash208 httpwwwusenixorgeventsosdi08techfull_papersguoguopdf

[37] Rajiv Gupta and Todd C Mowry (Eds) 2011 Proceedings of the 16thInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2011 Newport Beach CAUSA March 5-11 2011 ACM

[38] James C Hoe and Vikram S Adve (Eds) 2010 Proceedings of the 15thInternational Conference on Architectural Support for Programming Lan-guages and Operating Systems ASPLOS 2010 Pittsburgh PennsylvaniaUSA March 13-17 2010 ACM

[39] Nima Honarmand and Josep Torrellas 2014 RelaxReplay recordand replay for relaxed-consistency multiprocessors In ArchitecturalSupport for Programming Languages and Operating Systems ASPLOSrsquo14 Salt Lake City UT USA March 1-5 2014 Rajeev BalasubramonianAl Davis and Sarita V Adve (Eds) ACM 223ndash238 httpsdoiorg10114525419402541979

[40] Nima Honarmand and Josep Torrellas 2014 Replay debugging Lever-aging record and replay for program debugging In ACMIEEE 41stInternational Symposium on Computer Architecture ISCA 2014 Min-neapolis MN USA June 14-18 2014 IEEE Computer Society 455ndash456httpsdoiorg101109ISCA20146853229

[41] Petr Hosek and Cristian Cadar 2015 VARAN the Unbelievable AnEfficient N-version Execution Framework In Proceedings of the Twenti-eth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS rsquo15 Istanbul Turkey March14-18 2015 Oumlzcan Oumlzturk Kemal Ebcioglu and Sandhya Dwarkadas(Eds) ACM 339ndash353 httpsdoiorg10114526943442694390

[42] Derek Hower and Mark D Hill 2008 Rerun Exploiting Episodes forLightweight Memory Race Recording In 35th International Symposiumon Computer Architecture (ISCA 2008) June 21-25 2008 Beijing China265ndash276 httpsdoiorg101109ISCA200826

[43] Jeff Huang Peng Liu and Charles Zhang 2010 LEAP lightweightdeterministic multi-processor replay of concurrent java programsIn Proceedings of the 18th ACM SIGSOFT International Symposium onFoundations of Software Engineering 2010 Santa Fe NM USA November7-11 2010 Gruia-Catalin Roman and Andreacute van der Hoek (Eds) ACM207ndash216 httpsdoiorg10114518822911882323

[44] Jeff Huang Charles Zhang and Julian Dolby 2013 CLAP recordinglocal executions to reproduce concurrency failures In ACM SIGPLANConference on Programming Language Design and Implementation PLDIrsquo13 Seattle WA USA June 16-19 2013 Hans-Juergen Boehm and Cor-mac Flanagan (Eds) ACM 141ndash152 httpsdoiorg10114524621562462167

[45] Shiyou Huang Bowen Cai and Jeff Huang 2017 Towards Production-Run Heisenbugs Reproduction on Commercial Hardware In 2017USENIX Annual Technical Conference USENIX ATC 2017 Santa ClaraCA USA July 12-14 2017 403ndash415 httpswwwusenixorgconferenceatc17technical-sessionspresentationhuang

[46] Koen Koning Herbert Bos and Cristiano Giuffrida 2016 Secure andEfficient Multi-Variant Execution Using Hardware-Assisted ProcessVirtualization In 46th Annual IEEEIFIP International Conference onDependable Systems and Networks DSN 2016 Toulouse France June28 - July 1 2016 IEEE Computer Society 431ndash442 httpsdoiorg101109DSN201646

[47] Oren Laadan Nicolas Viennot and Jason Nieh 2010 Transparentlightweight application execution replay on commodity multiproces-sor operating systems In SIGMETRICS 2010 Proceedings of the 2010ACM SIGMETRICS International Conference on Measurement and Mod-eling of Computer Systems New York New York USA 14-18 June 2010Vishal Misra Paul Barford and Mark S Squillante (Eds) ACM 155ndash166 httpsdoiorg10114518110391811057

[48] Leslie Lamport 1978 Time Clocks and the Ordering of Events ina Distributed System Commun ACM 21 7 (1978) 558ndash565 httpsdoiorg101145359545359563

[49] Dongyoon Lee Peter M Chen Jason Flinn and Satish Narayanasamy2012 Chimera hybrid program analysis for determinism In ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI rsquo12 Beijing China - June 11 - 16 2012 Jan Vitek Haibo Linand Frank Tip (Eds) ACM 463ndash474 httpsdoiorg10114522540642254119

[50] Dongyoon Lee Benjamin Wester Kaushik Veeraraghavan SatishNarayanasamy Peter M Chen and Jason Flinn 2010 Respec efficientonline multiprocessor replayvia speculation and external determinismSee [38] 77ndash90 httpsdoiorg10114517360201736031

[51] Kyu Hyung Lee Dohyeong Kim and Xiangyu Zhang 2014Infrastructure-Free Logging and Replay of Concurrent Executionon Multiple Cores In ECOOP 2014 - Object-Oriented Programming- 28th European Conference Uppsala Sweden July 28 - August 12014 Proceedings (Lecture Notes in Computer Science) Richard EJones (Ed) Vol 8586 Springer 232ndash256 httpsdoiorg101007978-3-662-44202-9_10

[52] Christopher Lidbury and Alastair F Donaldson 2017 Dynamic racedetection for C++11 In Proceedings of the 44th ACM SIGPLAN Sympo-sium on Principles of Programming Languages POPL 2017 Paris FranceJanuary 18-20 2017 Giuseppe Castagna and Andrew D Gordon (Eds)ACM 443ndash457 httpsdoiorg1011453009837

[53] Hongyu Liu Sam Silvestro Wei Wang Chen Tian and Tongping Liu2018 iReplayer in-situ and identical record-and-replay for multi-threaded applications In Proceedings of the 39th ACM SIGPLAN Confer-ence on Programming Language Design and Implementation PLDI 2018

591

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Philadelphia PA USA June 18-22 2018 Jeffrey S Foster and Dan Gross-man (Eds) ACM 344ndash358 httpsdoiorg10114531923663192380

[54] Peng Liu Xiangyu Zhang Omer Tripp and Yunhui Zheng 2015 Lightreplay via tightly bounded recording See [35] 55ndash64 httpsdoiorg10114527379242738001

[55] Tongping Liu Charlie Curtsinger and EmeryD Berger 2011 Dthreadsefficient deterministic multithreading See [85] 327ndash336 httpsdoiorg10114520435562043587

[56] Shan Lu Soyeon Park Eunsoo Seo and Yuanyuan Zhou 2008 Learn-ing from mistakes a comprehensive study on real world concurrencybug characteristics In Proceedings of the 13th International Confer-ence on Architectural Support for Programming Languages and Op-erating Systems ASPLOS 2008 Seattle WA USA March 1-5 2008Susan J Eggers and James R Larus (Eds) ACM 329ndash339 httpsdoiorg10114513462811346323

[57] Ali Joseacute Mashtizadeh Tal Garfinkel David Terei David Maziegraveresand Mendel Rosenblum 2017 Towards Practical Default-On Multi-Core RecordReplay In Proceedings of the Twenty-Second InternationalConference on Architectural Support for Programming Languages andOperating Systems ASPLOS 2017 Xirsquoan China April 8-12 2017 YunjiChen Olivier Temam and John Carter (Eds) ACM 693ndash708 httpsdoiorg10114530376973037751

[58] Microsoft 2018 Understanding IntelliTrace part I What the $ isIntelliTrace (2018) httpsblogsmsdnmicrosoftcomzainnab20130212understanding-intellitrace-part-i-what-the-is-intellitrace

[59] Pablo Montesinos Luis Ceze and Josep Torrellas 2008 DeLoreanRecording and Deterministically Replaying Shared-Memory Multi-processor Execution Effciently In 35th International Symposium onComputer Architecture (ISCA 2008) June 21-25 2008 Beijing China289ndash300 httpsdoiorg101109ISCA200836

[60] Mozilla 2018 SpiderMonkey httpsdevelopermozillaorgen-USdocsMozillaProjectsSpiderMonkey

[61] MadanlalMusuvathi and Shaz Qadeer 2007 Iterative context boundingfor systematic testing of multithreaded programs In Proceedings of theACM SIGPLAN 2007 Conference on Programming Language Design andImplementation San Diego California USA June 10-13 2007 JeanneFerrante and Kathryn S McKinley (Eds) ACM 446ndash455 httpsdoiorg10114512507341250785

[62] Madanlal Musuvathi Shaz Qadeer Thomas Ball Geacuterard Basler Pi-ramanayagam Arumuga Nainar and Iulian Neamtiu 2008 Find-ing and Reproducing Heisenbugs in Concurrent Programs See[23] 267ndash280 httpwwwusenixorgeventsosdi08techfull_papersmusuvathimusuvathipdf

[63] Satish Narayanasamy Gilles Pokam and Brad Calder 2005 BugNetContinuously Recording Program Execution for Deterministic ReplayDebugging In 32st International Symposium on Computer Architecture(ISCA 2005) 4-8 June 2005 Madison Wisconsin USA IEEE ComputerSociety 284ndash295 httpsdoiorg101109ISCA200516

[64] Brian Norris and Brian Demsky 2013 CDSchecker checking concur-rent data structures written with CC++ atomics In Proceedings ofthe 2013 ACM SIGPLAN International Conference on Object OrientedProgramming Systems Languages amp Applications OOPSLA 2013 part ofSPLASH 2013 Indianapolis IN USA October 26-31 2013 131ndash150

[65] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2016 Lightweight User-Space Record And ReplayCoRR abs161002144 (2016) arXiv161002144 httparxivorgabs161002144

[66] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2017 Engineering Record and Replay for Deploy-ability In 2017 USENIX Annual Technical Conference USENIX ATC 2017Santa Clara CA USA July 12-14 2017 377ndash389 httpswwwusenixorgconferenceatc17technical-sessionspresentationocallahan

[67] Harish Patil Cristiano Pereira Mack Stallcup Gregory Lueck andJames Cownie 2010 PinPlay a framework for deterministic replay

and reproducible analysis of parallel programs In Proceedings of theCGO 2010 The 8th International Symposium on Code Generation andOptimization Toronto Ontario Canada April 24-28 2010 AndreasMoshovos J Gregory Steffan Kim M Hazelwood and David R Kaeli(Eds) ACM 2ndash11 httpsdoiorg10114517729541772958

[68] pbzip2 development team 2018 pbzip2 httpslaunchpadnetpbzip2[69] Luiacutes Pina Anastasios Andronidis and Cristian Cadar 2018 FreeDA

deploying incompatible stock dynamic analyses in production viamulti-version execution In Proceedings of the 15th ACM InternationalConference on Computing Frontiers CF 2018 Ischia Italy May 08-102018 David R Kaeli and Miquel Pericagraves (Eds) ACM 1ndash10 httpsdoiorg10114532032173203237

[70] Gilles Pokam Klaus Danne Cristiano Pereira Rolf Kassa Tim KranichShiliang Hu Justin Emile Gottschlich Nima Honarmand Nathan Daut-enhahn Samuel T King and Josep Torrellas 2013 QuickRec proto-typing an intel architecture extension for record and replay of mul-tithreaded programs In The 40th Annual International Symposiumon Computer Architecture ISCArsquo13 Tel-Aviv Israel June 23-27 2013Avi Mendelson (Ed) ACM 643ndash654 httpsdoiorg10114524859222485977

[71] QuakeSpasm 2018 QuakeSpasm An engine for iD softwarersquos Quakehttpquakespasmsourceforgenet

[72] Yasushi Saito 2005 Jockey a user-space library for record-replaydebugging In Proceedings of the Sixth International Workshop on Auto-mated Debugging AADEBUG 2005 Monterey California USA Septem-ber 19-21 2005 Clinton Jeffery Jong-Deok Choi and Raimondas Lence-vicius (Eds) ACM 69ndash76 httpsdoiorg10114510851301085139

[73] Torr Samaho 2018 Zandronum httpszandronumcom[74] Konstantin Serebryany and Timur Iskhodzhanov 2009 ThreadSani-

tizer Data Race Detection in Practice In WBIA 62ndash71[75] Simple DirectMedia Layer 2018 SDL 20 library httpswwwlibsdl

orgdownload-20php[76] SQLite 2018 SQLite 3240 httpssqliteorgreleaselog3_24_0html[77] Deepa Srinivasan and Xuxian Jiang 2012 Time-Traveling Forensic

Analysis of VM-Based High-Interaction Honeypots In Security and Pri-vacy in Communication Networks Muttukrishnan Rajarajan Fred PiperHaining Wang and George Kesidis (Eds) Springer Berlin HeidelbergBerlin Heidelberg 209ndash226

[78] Paul Thomson Alastair F Donaldson and Adam Betts 2014 Concur-rency testing using schedule bounding an empirical study In ACMSIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPPoPP rsquo14 Orlando FL USA February 15-19 2014 Joseacute E Moreira andJames R Larus (Eds) ACM 15ndash28 httpsdoiorg10114525552432555260

[79] Joseph Tucek Shan Lu Chengdu Huang Spiros Xanthos andYuanyuan Zhou 2007 Triage diagnosing production run failuresat the userrsquos site In Proceedings of the 21st ACM Symposium on Oper-ating Systems Principles 2007 SOSP 2007 Stevenson Washington USAOctober 14-17 2007 Thomas C Bressoud and M Frans Kaashoek (Eds)ACM 131ndash144 httpsdoiorg10114512942611294275

[80] Undo 2018 Reversible debugging tools for CC++ on Linux amp Android(2018) httpundo-softwarecom

[81] Kaushik Veeraraghavan Dongyoon Lee Benjamin Wester JessicaOuyang Peter M Chen Jason Flinn and Satish Narayanasamy 2011DoublePlay parallelizing sequential logging and replay See [37] 15ndash26 httpsdoiorg10114519503651950370

[82] Stijn Volckaert Bart Coppens Bjorn De Sutter Koen De BosscherePer Larsen and Michael Franz 2017 Taming Parallelism in a Multi-Variant Execution Environment In Proceedings of the Twelfth EuropeanConference on Computer Systems EuroSys 2017 Belgrade Serbia April23-26 2017 Gustavo Alonso Ricardo Bianchini and Marko Vukolic(Eds) ACM 270ndash285 httpsdoiorg10114530641763064178

[83] Stijn Volckaert Bart Coppens Alexios Voulimeneas Andrei Home-scu Per Larsen Bjorn De Sutter and Michael Franz 2016 Secure

592

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

and Efficient Application Monitoring and Replication In 2016 USENIXAnnual Technical Conference USENIX ATC 2016 Denver CO USA June22-24 2016 Ajay Gulati and Hakim Weatherspoon (Eds) USENIXAssociation 167ndash179 httpswwwusenixorgconferenceatc16technical-sessionspresentationvolckaert

[84] YanWang Harish Patil Cristiano Pereira Gregory Lueck Rajiv Guptaand Iulian Neamtiu 2014 DrDebug Deterministic Replay based CyclicDebugging with Dynamic Slicing In 12th Annual IEEEACM Interna-tional Symposium on Code Generation and Optimization CGO 2014Orlando FL USA February 15-19 2014 David R Kaeli and TippMoseley(Eds) ACM 98 httpsdoiorg10114525441372544152

[85] Ted Wobber and Peter Druschel (Eds) 2011 Proceedings of the 23rdACM Symposium on Operating Systems Principles 2011 SOSP 2011Cascais Portugal October 23-26 2011 ACM httpsdoiorg1011452043556

[86] Min Xu Rastislav Bodiacutek and Mark D Hill 2003 A Flight DataRecorder for Enabling Full-System Multiprocessor Deterministic

Replay In 30th International Symposium on Computer Architecture(ISCA 2003) 9-11 June 2003 San Diego California USA Allan Got-tlieb and Kai Li (Eds) IEEE Computer Society 122ndash133 httpsdoiorg101109ISCA20031206994

[87] Jie Yu Satish Narayanasamy Cristiano Pereira and Gilles Pokam 2012Maple a coverage-driven testing tool for multithreaded programs InProceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications OOPSLA2012 part of SPLASH 2012 Tucson AZ USA October 21-25 2012 Gary TLeavens and Matthew B Dwyer (Eds) ACM 485ndash502 httpsdoiorg10114523846162384651

[88] Zandronum 2015 Zandronum bug tracker httpszandronumcomtrackerviewphpid=2380 Bug 0002380

[89] Naling Zhang Markus Kusano and ChaoWang 2015 Dynamic partialorder reduction for relaxed memory models See [35] 250ndash259 httpsdoiorg10114527379242737956

593

  • Abstract
  • 1 Introduction
  • 2 Background
  • 3 Scheduling
    • 31 Protocol Details
    • 32 Special Cases
    • 33 Liveness
      • 4 Record and Replay
        • 41 Motivating Example
        • 42 Interleaving
        • 43 Signals
        • 44 System Calls
        • 45 Asynchronous Events
          • 5 Evaluation
            • 51 CDSchecker Litmus Tests
            • 52 httpd
            • 53 PARSEC and pbzip
            • 54 SDL-based Games
            • 55 Limitations SQLite and SpiderMonkey
              • 6 Related Work
              • 7 Conclusion
              • Acknowledgments
              • References
Page 6: Sparse Record and Replay with Controlled Schedulingafd/homepages/papers/pdfs/2019/PLDI.pdf · Controlled concurrency testing has proven successful in find-ing subtle bugs in concurrent

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

void intercept_cond_wait(void m bool timed)

Wait()

CondWait(m timed)

mutex_unlock()

MutexUnlock(m)

Tick()

intercept_mutex_lock()

Figure 5 Instrumented conditional wait

signal and attempting to reacquire the mutex we allow thepossibility for another thread to be scheduled in betweenpossibly acquiring the mutexOnce a thread has reacquired the mutex it will typically

recheck the condition it was originally waiting on If it isnot satisfied it will call cond_wait again This is wherethe risk of deadlock comes in as if it was the only threadsignaled and it reenters intercept_cond_wait it will notsignal other threads first and all threads that are blockedby the conditional will remain blocked We want preserveany potential deadlocks in the underlying program and sodo not limit this from happening we are also careful not tointroduce new deadlocks

Signals We briefly describe our treatment of signals andsignal handlers noting that these are distinct from the signalsassociated with conditional wait operations described aboveWe focus on asynchronous signals which can be receivedby processes at any time and thus contribute an additionalsource of nondeterminism This is distinct from synchronoussignals eg SIGSEGV which are raised by the thread as andwhen certain operations are performed Unlike in the case ofeg a memory load operation which has a designated pro-gram point that can be intercepted to facilitate interactionwith the scheduler a signal can arrive at any time The stan-dard also specifies a signal function that binds a handlerfunction to a specific signalScheduling with signals is handled by simply marking

the entrance to the signal handler and the aforementionedsignal function as visible operations From a schedulingperspective besides the arrival of a signal signals are not aproblem Recording and replaying signals is where thingsbecome difficult which we discuss in sect43

33 LivenessWhile the scheduler strives to ensure that all possible pro-gram behaviours can be explored in principle in practicedepending on the strategy this can lead to massive slow-downs in particular cases For example suppose a thread isscheduled and undertakes a vast number of invisible oper-ations or calls a sleep function for some duration beforefinally issuing a visible operation If all other threads end upblocked waiting to perform visible operations and the sched-uler doesnrsquot give them a chance to run the performance of

the program may be drastically impacted This can becomeparticularly problematic when dealing with programs thatrely on responsiveness such as real-time applications

To cope with this we slightly reduce the schedulerrsquos abil-ity to explore any possible schedule by having the schedulerforce a reschedule in such cases By forcing a rescheduleafter n milliseconds the probability of exploring a schedulewhereby a thread performs two visible operations consec-utively separated by more than n milliseconds is greatlyreduced Conveniently for us tsan has a background threadwhich we configure to call our Reschedule() function everyn milliseconds for some given n Because the Reschedule()function relies on physical time it introduces nondetermin-ism into the scheduler

4 Record and ReplayWe now discuss the record and replay mechanism we haveimplemented with the aim of answering the following ques-tions What do we mean by ldquorecordingrdquo an execution andthen replaying said execution later How do we formalise arecording What makes replaying an execution valid Whatshould we record and what should we not record

When a program executes certain visible operations willlead to nondeterminism Recording an execution thereforemeans capturing information about these visible operationsin a form that can be used to reproduce the execution duringreplay We call this captured information the demo file ordemo for short We also refer to an execution that is replay-ing a demo as a replay and that the replay is synchronisedunless something has gone wrong with the replay such thatexecution has diverged from what was recorded in whichcase we say it is desynchronisedThis leaves us with the question of what it means for

a replay to desynchronise A demo is defined as a seriesof constraints arising from the recorded execution whichthe replay is required to satisfy The tool will attempt toenforce these constraints on the program during replay andas long as it can the replay is deemed to be synchronisedIf at any point the tool is unable to enforce a constraint onthe program we say the replay has hard desynchronised inwhich case the tool will abort In some cases the replaymay abide by the constraints but appear to diverge fromthe recorded execution for example by producing consoleoutput in a different order We call this soft desynchronisationTo illustrate an extreme case of this the empty demo istrivially synchronised for any replay but will lead to softdesynchronisation practically everywhere unless the systemunder test is highly deterministic

The record and replay mechanism is built into the sched-uler we discussed in sect3 In cases where a nondeterministicchoice needs to be made that is unrelated to scheduling (andso not handled by the queue nor random strategy) a PRNGis used seeded by two calls to rdtsc()

581

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

For the most part tsan11rec avoids the need for user an-notation however there are some cases where they are un-avoidable this is shown in sect54

41 Motivating ExampleTo help lay out the reasoning and technical explanation givenin the rest of this section we start off with an example pro-gram The program fragment in Figure 2 (already discussedbriefly in sect2) shows a simple client that repeatedly receives achar buffer from a server applies a transformation to it thensends it back We discuss the behaviours of the applicationthat need to be recorded in order to enable faithful replayvs those features that need not be recorded

What to record The obvious case here is the interleavingof threads Recording this will ensure that the order of oper-ations to the atomic locations quit and mtx and the order ofthe syscalls used throughout will be the same during replayOther operations are invisible and thus will not affect otherthreads or introduce nondeterminismSystem calls that interact with the environment can be

seen as inputs to the program which in this case determineshow many requests to handle and the contents of each re-quest For example poll informs us on whether there is datato be read from the server and thus needs to be recorded asdo the system calls recv and sendThe signal handler in this example is used to trigger the

end of the program The arrival of the signal is asynchronousand comes from outside the program During replay the toolwill need to ensure that the same signal arrives at the samepoint in logical time

What to ignore The first and likely most contentious ele-ment is the layout of memory This will of course depend onthe program in question but in this example and most of theprograms we encounter the position of objects in memorywill have no effect on the rest of the program If the requestqueue was instead an ordered set of char pointers then itwould matter as the pointer values will determine the orderin which requests are considered during iteration

42 InterleavingAs explained in sect41 the ordering of visible operations mustbe preserved during replay We describe how the randomand queue strategies store ordering-related information

To recap the strategies described in sect3 the random sched-uler chooses which thread to allow to run the next visibleoperation randomly after each visible operation has com-pleted while the queue scheduler is first come first serve forwhichever threads attempts to perform a visible operation

For the random strategy the entire thread interleaving isencapsulated in the PRNG Therefore no information besidesthe two seeds used for the PRNG is required

For the queue strategy the ordering during record dependson the order in which threads happen to reach Wait() which

depends on physical timing To encode the order so that itcan be enforced during replay a file called QUEUE is usedThis file records (a) a map specifying for each thread id thefirst tick at which the thread should be scheduled and (b)an ordered list of ticks to be consumed by threads each timethey leave a critical sectionmdashthe tick that a thread consumeson leaving a critical section informs the thread as to the nexttick at which it is to be scheduled Run-length encoding isused to efficiently record the case where a thread is scheduledmultiple times in succession

There is clearly a trade-off between these strategiesWherethe random strategy stores no data the queue strategy mayneed to store data on every visible operation The queuestrategy will be much faster however as it is unlikely to beblocked in Wait() unless another thread is already critical

43 SignalsWe discussed signals briefly in sect32 but deferred discussionto this section as most of the difficulties are in attempting toreplay them Synchronous signals are ignored (eg SIGSEGVSIGPIPE) as these should reoccur at the same point in theexecution without the help of our tool To clarify it is en-tering the signal handler that is the visible operation Wheninside the signal handler a thread cannot interact with therest of the process except though atomic operations whichare themselves visible From this we can say that it doesnot matter at which point between a Tick() and followingWait() pair that the signal handler is entered

In tsan11rec any asynchronous signal that arrived duringrecording becomes a synchronous signal upon replay Whena thread receives a signal it records the value of the tickseen during the most recent Tick() along with the signalvalue in a file called SIGNAL For example consider the casein Figure 2 where the Responder thread T2 has just per-formed the atomic load on tick 5 but has not yet attemptedto acquire the lock It receives the signal and performs aWait() and Tick() so that it can enter the signal handlerThe SIGNAL file will therefore have the line ldquo2 5 15rdquo in-dicating that thread T2 receives signal 15 at tick 5 Duringreplay when the Responder thread calls Tick() during tick5 it will raise signal 15 itself at the end of Tick() It doesnot matter at which precise point between Tick() and thefollowing Wait() that the signal arrived at during recordingit will float to the end of Tick() as shown in Figure 6

44 System CallsAs discussed in sect32 system calls are a significant source ofnondeterminism in an application To ensure that relevantproperties of an application are preserved during replay weneed to record the results of relevant system calls This is afundamental challenge that any record-and-replay systemmust face and state-of-the-art tools such as rr [66] aim tobe as comprehensive as possible in the system calls theysupport so that they can be applied directly to a wide range

582

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Figure 6 Signals are replayed immediately after the preced-ing tick

of applications In contrast the idea behind our sparse ap-proach to identify a minimal subset of system calls such thatrecording these system calls suffices to enable faithful replayof particular applications of interest At a high level we ap-proached this by incrementally adding support for systemcalls based on a trial-and-error process of first using straceto understand the full set of system calls issued by an appli-cation and then repeatedly attempting to record and replaythe application with tsan11rec incrementally adding supportfor additional system calls through analysis of the sources ofreplay failures We emphasise that this process did requirequite some manual effort and would need to be iteratedfurther to handle applications with significantly differentsystem call requirements compared to our case studies

The term syscall is a bit of a misnomer as both tsan and theinstrumentation detailed in this paper will instead interceptthe glibc wrappers around the syscalls instead of the syscalldirectly These glibc functions are much easier to use onbehalf of the programmer as they will take care of systemspecific details pushing the arguments onto the stack andinterpreting the results returned by the kernel We still usesyscall throughout as it is in the underlying syscall wherethe nondeterministic behaviours occurEach syscall takes a variable number of user-allocated

buffers and fills them with the appropriate data before set-ting errno and returning some value As these are sourcesof nondeterminism the return value errno and any appro-priate buffers will be compressed and stored in a demo filecalled SYSCALL During replay the actual data returned willbe overwritten by the data in SYSCALL Only the interactionwith the SYSCALL file is part of the critical section whichreduces contention in the scheduler

As an example consider the Listener thread in Figure 2performing the poll and recv syscalls in succession The re-turn value error number and two elements in the server_fdsstructure must be stored for poll the return value errornumber and contents of the buffer must be stored for recvThese will be treated as character buffers and have a simplerun length encoding applied

One of the difficulties that arises from adding a syscall isthe knock-on effect it can have with respect to other syscallsConsider for example the syscalls that interact with thefilesystem create open read etc If you intercept open

on the assumption that you may not have a valid file de-scriptor during replay where you did during record you willthen have to intercept every syscall that works with thatfile descriptor What starts out as a single interception be-comes potentially hundreds of intercepted syscalls There isa delicate balance to be struck between those syscalls thatneed to be recorded to make important execution featuresdeterministic during replay vs those syscalls that are bet-ter left un-recorded because (a) determinism of replay doesnot depend on them being recorded and (b) recording themleads to a snowball effect where many other syscalls mustalso be recorded to avoid desynchronisation

We have added syscall support to tsan11rec in a demand-driven manner by using strace to identify important callsfor key applications of interest The current set of syscallssupported includes read write recvmsg recv sendmsgaccept accept4 clock_gettime ioctl select and bindThese have allowed us to get a significant number of appli-cations up and running including the applications studiedin our evaluation modulo a few workarounds (detailed insect5) these applications issue many additional system callsthat we have found it unnecessary to recordmdashunnecessaryin the sense that simply re-issuing the system call during re-play has no observable effect on the applicationrsquos behaviourSometimes whether a call must be recorded depends on thefile descriptors that the call receives For instance for all ofour case studies it never proves necessary to record readand write calls whose file descriptors correspond to filesin the file system but it is necessary to record these callsif the associated file descriptors are associated with pipesused for inter-process communication Rather than this set ofsyscalls being a starting point towards full syscall coverageour view is that efficient record and replay that preserves par-allelism can benefit from selective syscall recording basedon application-specific knowledge and we envision the toolsupporting a core set of essential syscalls and being con-figurable with support for further syscalls to suit particularrecord and replay scenarios For example to handle a pro-gram such as htop would require instrumentation of theinteraction with the proc filesystem but doing this in thegeneral case would be wasteful and maybe even harmful iffuture calls depended on this interaction

45 Asynchronous EventsAsynchronous events are specific events that do not fit inwith any of the other categories discussed An importantcharacteristic is that they are not wrapped in a Wait() andTick() either because it was infeasible to do so duringrecording or because it would create a lot of unnecessaryoverhead These events still need to be replayed to ensurethe replay remains synchronised Currently there are twotypes of events Reschedule and Signal_wakeup

The reschedule event was discussed in sect33 It is necessaryto include this to ensure that the PRNG will be called the

583

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Figure 7 Right shows how reschedules are floated abovethe Tick()

same number of times in each critical section To see whythe signal wakeup event is necessary consider again a signalbeing received in the context of the example of Figure 2This time suppose the Responder thread receives the signalwhile it is disabled trying to acquire the lock Assume thatthe thread disabled itself on tick 10 the signal arrives andthe thread re-enables itself during tick 12 and then entersthe signal handler on tick 14 It is not OK for the thread tosimply not disable itself on tick 10 during replay as the poolof enabled threads for the scheduler to choose from duringticks 10 and 11 is different between recording and replayingwhich will affect the choice the scheduler will make

As with signals all asynchronous events are replayedsynchronously with all events that occur between a Tick()and the following Wait() floating up to the previous Tick()This is shown in Figure 7 These events are stored in theASYNC file

5 EvaluationTo evaluate the controlled scheduling abilities of tsan11recwe compare the strategies on the CDSchecker benchmarksuite (sect51) We then use larger applications to comparetsan11rec with rr [66] Of the few record-and-replay toolsthat are publicly available we chose to compare with rr be-cause (a) it represents the state-of-the-art (b) it is similar totsan11rec in terms of the kinds of applications it aims to sup-port We consider programs that bring challenges related tonetworking signals IO and real-time constraints Apachersquoshttpd web server (sect52) the PARSEC benchmarks and pbzip(sect53) and two first-person shooter games built on the SDLlibrary (sect54) The SDL case studies showcase applicationsthat tsan11rec can handle that are out of scope for rr dueto communication between the game and the OpenGL in-terface In contrast we also discuss practical limitations oftsan11rec that rr does not face related to the SQLite databaseapplication and Firefoxrsquos SpiderMonkey (sect55)

Throughout we describe howwe evolved the sparse record-ing facilities of tsan11rec and many practical challenges wefaced along the way our experience is that such challengeswhich seem fundamental to record and replay are typicallydescribed only briefly if at all in the literature We hope ourexposition will be valuable to other researchers

Common experimental setup All experiments were rununder Ubuntu 1404 LTS on an Intel i7-4770 8x340GHz plat-form with 16GB RAM The tsan11 and tsan11rec tools werebuilt on top of clang revision 286384 The version of rr usedis 510 As a key goal of our work is to apply race detectionto record and replay with controlled concurrency testingmost of the testing is done with race detection enabled evenwhen using rr We still show times for rr without race detec-tion for reference We use native rr tsan11 tsan11+rr andtsan11rec to refer to a program running without instrumen-tation under rr with tsan11 instrumentation under rr withtsan11 instrumentation and under tsan11rec respectively

51 CDSchecker Litmus TestsOverview The small programs (roughly 100 LOC each)used in prior work to evaluate CDSchecker [64] are useful toassess whether tsan11recrsquos controlled scheduling improveson tsan11rsquos ability to find races (including races related toweak memory) As these programs are closed the schedulerand memory model are the sources of nondeterminism

Experimental setup The experiments are run in the fol-lowing four modes tsan11 where tsan11 (which does notuse controlled concurrency testing) finds races tsan11 + rrwhere tsan11 finds races with rr recording and tsan11recrnd and tsan11rec queue where tsan11rec finds races usingthe random and queue strategies respectively We measurethe runtime of each tool on each benchmark averaged over1000 runs reporting standard deviation and remarking onthe coefficient of variation (CV)mdashthe ratio of the standarddeviation and mean

Results and discussion Table 1 summarises the resultsThe Time columns show mean execution times with stan-dard deviation Because these are short-running tests whosebehaviour depends intimately on themanner inwhich threadsinterleave the variance across runs is fairly high with theCV usually exceeding 1 with the exception of the longer-running results for rr for which the CV is always less than 1and usually less than 05 Rate columns show the percentageof all executions that exposed a data race

Comparing the tsan11rec rnd results with the tsan11 andtsan11rec queue results we see that randomized controlledschedulingmeans tsan11rec findsmore races across all bench-marks except chase-lev-deque and dekker-fences This isbecause tsan11 runs at the mercy of the OS scheduler whichtends to explore similar schedules on repeated runs and inthese small programs typically causes the main thread torun to completion before other threads are scheduled Theprice for this is higher runtime eg mcs-lock and ms-queuesuffer slow-downs of around 2times compared with tsan11 weattribute this to the total ordering of visible operations im-posed by tsan11rec The rr results show huge increases dueto a constant overhead applied to all programs But as rr is

584

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 1 Comparison over CDSchecker benchmarks between tsan + rr tsan11 and tsan11rec with controlled random andqueue scheduling Each benchmark was executed 1000 times in each mode The ldquoTimerdquo columns shows the mean executiontime (ms) and standard deviation (in parentheses) The ldquoRaterdquo column shows the percentage of runs that exhibited data races

tsan11 + rr tsan11 tsan11rec rnd tsan11rec queueTest Time Rate Time Rate Time Rate Time Rate

barrier 590 (1445) 00 8 (914) 00 4 (523) 375 6 (499) 00chase-lev-deque 579 (16669) 05 0 (194) 59 1 (307) 02 2 (378) 00dekker-fences 1776 (120854) 499 2 (402) 503 4 (483) 387 3 (446) 528linuxrwlocks 579 (12645) 03 2 (418) 01 4 (493) 624 3 (450) 00mcs-lock 574 (1371) 00 3 (445) 00 5 (523) 770 3 (447) 01mpmc-queue 574 (1371) 00 3 (449) 00 5 (500) 605 3 (446) 00ms-queue 3093 (10019) 1000 91 (6413) 1000 93 (8068) 1000 52 (6260) 1000

designed for larger applications this overhead will usuallybecome insignificant in more realistic examples

We examined a trace from chase-lev-deque to understandwhy tsan11rec rnd detects fewer races than tsan11 We foundthat from the creation of thread 2 to the point of the racethread 1 must perform 29 operations before thread 2 per-forms just 4 operations in order for the race to manifest Theprobability of this happening under uniform random sched-uling is very low We were able to coerce a race report outof the program by moving the creation of thread 2 to later inthe program This shows that different scheduling strategieswill affect how effective we are at finding data races andthat probabilistic concurrency testing (PCT) can be effectiveat prying out concurrency bugs [12]

52 httpdOverview Apachersquos httpd [3] is a widely-used modularhttp server that makes heavy use of concurrency to han-dle many simultaneous connections For record and replayit is of further interest due to its dependence on externalnetwork input We were able to handle httpd by capturingthe system calls described in sect44 with one workaroundthe accept system call which listens for incoming connec-tions relies on epoll_wait to listen for events This returnsuser-allocated pointers file descriptors and other data ina union with no easy way of knowing the active membersomething which tsan11rec cannot currently handle Weworked around this by using httpdrsquos option to switch to asimpler but slightly less efficient syscall poll which insteadsimply listens to file descriptors the results presented hereemploy this workaround A strength of rr is that it can han-dle httpd without this workaround due to its non-sparserecord and replay mechanism

Experimental setup We tested httpd version 2428 in single-process-multiple-thread mode using ab an Apache-providedprogram for server stress testing We sent 10000 queriesacross 10 concurrent threads to an httpd server for eachof the setups shown in Table 2 averaging results over 10runs We report on standard deviation and again remark on

the CV In the table rnd and queue refer to configurationsof tsan11rec and the presence or absence + rec indicateswhether recording was enabled

Results and discussion The results are shown in Table 2The columns under Race reports show regular results withrace reporting enabled the data under No reports shows re-sults where race-checking-capable tools do perform racechecking behind the scenes but do not actually emit racereports We make this distinction because tsan11 detectsso many races that the overhead of generating reports no-ticeably affects performance results when fewer races aredetected are more representative of the performance onewould expect using a future version of httpd in which manyraces are fixed The Throughput columns indicate the meannumber of queries the server responds to per second TheRate column is the mean number of race reports generated(only relevant for tsan11-based configurations) For eachstandard deviation is shown in parentheses Variance asmeasured by CV is low below 08 in all cases and usuallyless than 05 The Overhead columns indicate how muchslower performance is compared with native executionWithout reporting tsan11 already incurs a 3times overhead

compared to native Comparing results for native with rndand rnd+rec we find that adding controlled random sched-uling increase this overhead massively to between 79ndash89timesdepending on whether recording is enabled This is in thesame ball park as the overhead associated with rr 61times with-out race checking and 160timeswith tsan11 instrumentation (butstill with the actual reporting of races disabled) In contrastwhen our queue strategy is used the overhead comparedwith native drops to 9times and 21times with recording disabled vsenabled We attribute the gap between rrrnd and queue tohttpdrsquos heavy reliance on parallelism and frequent use ofshared mutexes This parallelism is removed by rr becausethe tool sequentializes the execution of threads while ourrandom scheduler only allows the thread that it has chosento be scheduled next to execute a visible operation even ifmany other threads are ready to execute visible operations

585

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Table 2 Comparison of throughput and race rate between native tsan11 rr and tsan11rec for Apachersquos httpd Results areaveraged over 10 runs ldquoThroughputrdquo shows mean throughput in queries per second ldquoRaterdquo is the number of races detectedper run (where relevant) Standard deviations are shown (in parentheses) Overhead is computed relative to native throughput

Race reports No reportsSetup Throughput Overhead Rate Throughput Overhead

Native NA NA NA 28895 (462256) 1timesrr NA NA NA 475 (608) 61timestsan11 3687 (29428) 8times 113 (1985) 9824 (143201) 3timestsan11 + rr 86 (6320) 336times 34 (1680) 181 (4637) 160timesrnd 141 (892) 205times 162 (3878) 367 (3326) 79timesqueue 818 (31033) 35times 381 (7362) 3261 (84367) 9timesrnd + rec 142 (1185) 203times 155 (3103) 326 (3894) 89timesqueue + rec 513 (8534) 56times 360 (6428) 1387 (24961) 21times

In contrast the queue strategy allows threads to perform visi-ble operations largely on demand Turning to the results withrace reporting enabled we see that the queue strategy hasthe highest race detection rate improving on uncontrolledtsan11 All other race detecting configurations lower the racedetection rate we believe this is because rr and rnd reducethe number of queries being responded to concurrently

Comparing demo file sizes when recording is enabled thetsan11rec demo files are around 48MB for both strategiesdropping to 48MB when only 1000 queries are issued sug-gesting that demo file size increases linearly with the numberof requests at a rate of around 48KB per request This couldbe reduced further with a more aggressive compression strat-egy but would likely increase the time overhead The demofile for rr is significantly smaller 66MB for 10000 querieswhich goes down to 39MB with 1000 queries implying arate of around 03KB per request plus a constant 36MB

53 PARSEC and pbzipOverview Wenext turn to the PARSEC benchmark suite [11]and pbzip application [68] both widely used for evaluatingconcurrency analysis tools For PARSEC we consider thebenchmarks used to evaluate iReplayer [53] however threeof these would not work on our system (dedup and swaptionsdo not compile and canneal crashes)

Experimental setup Each PARSEC (version 30) bench-mark was run with the lsquosimlargersquo test size shipped with thebenchmarks using 4 threads Pbzip (version 2-1113) wasused to compress a 400MB file with 4 threads We ran eachbenchmark 10 times per tool configuration and report av-erage runtimes We report on standard deviation and againremark on the CV A small number of races were discoveredfor some benchmarks and the race detecting tools largelyagreed on the number of races we do not detail these further

Results and discussion Table 3 shows the average timetaken to run each benchmark with each tool configurationwith standard deviation Variance as measured by CV is

reasonably low (CV is always below 1) For the tsan11rec re-sults + rec indicates whether recording was enabled Table 4is computed from the data of Table 3 and reports the over-head associated with running using each tool configurationcompared with native executionWith the exception of bodytrack and fluidanimate the

overhead tsan11rec brings over that of tsan11 is small and forall benchmarks whether recording is enabled or notmakes lit-tle difference However the overhead associated with tsan11+ rr (ie running tsan11-instrumented code under rr) is signif-icant compared with the tsan11 overhead alone despite thefact that running under rr without race detection is generallyefficient Interestingly rr without race detection performsless well on blackscholes compared with the tsan11rec con-figurations Digging into this we found that the benchmarkdistributes work between threads at the start of executionand then lets threads runwith little interaction This high par-allelismlow communication execution plays to the strengthsof tsan11rec where invisible operations are left to run in par-allel but is bad for rr which forces sequentialization acrossall operations

54 SDL-based GamesOverview Simple DirectMedia Layer (SDL) is a library thatconsolidates input graphics and various other forms of IOunder a single interface [75] and is typically used for gamesOnUbuntu 1604 SDL communicateswith X11 for IO pulseau-dio for sound andOpenGL for displayWe investigated recordand replay for two SDL-based games Zandronum [73] amulti-player Doom port (asymp400kLOC) and QuakeSpasm [71](asymp88kLOC) a port of Quake While these games support cus-tom record and replay by logging high level commands byworking at the threading and system call level tsan11rec canfacilitate record and replay of bugs that rely on low-levelinteractions to manifest We discuss below successful recordand replay of a bug in Zandronum that arises due to commu-nication of game data between the game client and serverwhich is not present in the gamersquos native replay

586

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 3 Execution times (s) for tool configurations across the pbzip and PARSEC benchmarks averaged across 10 runsStandard deviation is shown (in parentheses)

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 92 (031) 117 (049) 664 (311) 772 (187) 181 (030) 123 (066) 182 (030) 129 (129)blackscholes 04 (003) 08 (007) 10 (001) 20 (000) 07 (007) 07 (006) 07 (007) 07 (007)fluidanimate 08 (004) 160 (127) 21 (001) 385 (049) 464 (339) 390 (267) 504 (209) 398 (196)streamcluster 17 (019) 388 (441) 1133 (313) 1811 (179) 1030 (059) 485 (269) 1028 (018) 416 (261)bodytrack 05 (002) 72 (036) 38 (094) 327 (094) 500 (040) 73 (033) 500 (078) 78 (023)ferret 12 (007) 140 (086) 87 (044) 815 (220) 164 (066) 146 (060) 167 (071) 145 (037)

Table 4Overhead compared with native execution for tool configurations across the pbzip and PARSEC benchmarks computedfrom the results of Table 3

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 10times 13times 72times 84times 20times 13times 20times 14timesblackscholes 10times 20times 27times 53times 19times 19times 19times 18timesfluidanimate 10times 203times 27times 489times 590times 497times 642times 506timesstreamcluster 10times 224times 654times 1045times 595times 280times 593times 240timesbodytrack 10times 135times 72times 614times 938times 138times 939times 147timesferret 10times 119times 74times 695times 140times 125times 143times 124times

Our initial attempts to replay these SDL-based gamesfailed due to communication between the application andthe closed and proprietary NVIDIA OpenGL module on ourexperimental platform via ioctl syscallsWeworked aroundthis by ignoring ioctl during recording and letting it runnatively during replay This works because communicationwith the display driver has no effect on the game logic Dis-play interaction led to further problems with initialization ofthe input module We resorted to adapting the scheduler tolet the application run uninstrumented until SDL module ini-tialization had completed adding a custom scheduler hookto allow the application and scheduler to synchronize relatedto this These problems are not specific to our approach ortoolmdashindeed rr cannot handle these SDL-based games forsimilar reasonsmdashbut are rather a fundamental limitation ofrecording and replaying applications that make heavy useof IO To handle such applications one either needs to fullymock out IO components requiring a tremendous engi-neering effort or carefully determine those components thatshould not be instrumented and specify this via annotationsWith these workarounds we were able to handle both

games such that gameplay is displayed on screen during re-play gameplay would not be visible if the IO subsystem hadbeen mocked out and visibility might be useful in debuggingproblems that manifest as visual artifacts

Experimental setup Measuring game performance in away that allows us to compare the overhead of our sched-uling strategies is non-trivial The only metric we have isthe frame-rate (fps)mdashthe number of frames drawn to thescreen per second QuakeSpasm and Zandronum are capped

at 60 fps and will try to maintain this frame-rate dipping ifthey cannot keep up If the frame-rate is reduced too muchthe games become unplayable As a best-effort evaluationmechanism we report on whether the games are playableunder various tool combinations Additionally we found thatit was possible to remove the frame cap for Quakespasmso we report ball-park figures for the overheads of varioustool configurations when playing this game un-capped (Wecould not find a way to reliably remove the frame cap forZandronum) As mentioned previously we do not comparewith rr as it cannot record or replay the games We used Zan-dronum revision 10013dd3c3b57023f updated to use SDL2QuakeSpasm version 0930 and SDL version 205

Results anddiscussion With the random tsan11rec sched-uler Zandronum was unplayable even with recording dis-abled the frame-rate dropped to below 1 fps This is due tothe random scheduler starving the main thread by frequentlyscheduling other less critical threads (eg the audio thread)In contrast the queue scheduler could maintain the full 60fps with recording enabled for 100 seconds of play the demosize grew to just under 8MB of which 65MB was for syscalls

To test tsan11recrsquos ability to replay network communica-tion we found a previously fixed Zandronum bug [88] thatrelies on an error in this communication to manifest Thisbug involves incorrect game state information being sentfrom the server to the client during a map change We repli-cated the bug with a server and two clients one of whichwas recording After about 12 minutes the bug appearedand resulted in a demo file of 43MB We then replayed thedemo and the bug appeared as expected This demonstrates

587

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

that our tool can be used to accurately capture and facilitatereplay of bugs in large networked applicationsFor QuakeSpasm we found that it was possible to play

the game without dropping below 60 fps using tsan11 and alltsan11rec configurations To further investigate the overheadof each tool configuration on this case study we removedthe fps cap then played the game 5 times per tool configu-ration for 90 seconds per play enabling a mode where thegamersquos fps is periodically appended to a file We made a besteffort to play the game in a similar manner on each run butinevitably there will still be high variation in game activitybetween plays Indicative results are shown in Table 5 Thernd and queue configurations refer to tsan11rec with the ran-dom and queue strategies respectively and with recordingdisabled while the ldquo+ recrdquo tool configurations are similarbut with recording enabled The ldquoOverheadrdquo column showsthe overhead observed compared with native execution Thetake-away from these results is that the instrumentationoverhead for both tsan11 and tsan11rec is surprisingly mod-est (generally less than 2times) and that the additional overheadassociated with enabling recording in tsan11rec is low

55 Limitations SQLite and SpiderMonkeyA downside of our sparse approach to record and replay isthat different applications may have incompatible require-ments regarding what should be recorded and what mustnot be recorded For example recording memory layout andattempting to enforce the same layout on replay would notonly slow down the SDL games (see sect54) to the point ofbeing unplayable but would also cause problems related tocommunication with the display driver Yet the behaviour ofsome programs will depend on the memory layout such asiterating over an ordered C++ container that holds pointersIn particular we experimented with applying tsan11rec

to the SQLite database management library [76] and to Spi-derMonkey Firefoxrsquos JavaScript management engine [60]While tsan11rec was applicable for controlled schedulingof these applications we found that replay would rapidlydesynchronise due to memory layout nondeterminism caus-ing conditionals that rely on the values of pointers to evaluatedifferently during replay Tools such as rr can handle theseprograms reliably by enforcing the same memory layoutThis is a trade-off the non-sparse approach of rr can leadto higher overheads as demonstrated in sect52 and sect53 Analternative to adapting the record-and-replay tool so thatit always enforces memory layout determinism would beto adapt the application of interest so that default memoryallocation is replaced with a deterministic memory allocator

6 Related WorkControlled scheduling A large amount of work has goneinto the use of scheduling strategies as a form of state spaceexploration (eg [29 30 61 62 78 87]) and on techniques

aimed at reducing the size of the state space such as dynamicpartial-order reduction [31 89]) A particularly notable con-trolled scheduling tool in terms of successful practical ap-plication is Microsoftrsquos CHESS [62] which aims to system-atically explore all interleavings of a test scenario Similarto our approach each visible instruction has an associatedcustom wrapper that intercepts the real instruction callinginto the CHESS schedulerSchedule bounding techniques notably preemption- and

delay-bounding [26 61] have been shown to be successful inprioritising the order in which thread schedules are exploredduring controlled concurrency testing They prioritise ex-ploring schedules that exhibit small numbers of preemptionsbetween threads in line with empirical evidence that bugsrarely require large numbers of preemptions in order to man-ifest [56] Combining such techniques with the tsan11recalgorithm is an appealing idea in principle but is hinderedby the assumption that the program under test takes a fixedinput and that the scheduler is the only source of nonde-terminism This assumption allows running the programagain and again trying different schedules In the contextof tsan11rec which can be used to record and replay ap-plications where the environment presents other forms ofnondeterminism the manner in which the program interactswith its environment is captured with respect to a particularthread schedule and other thread schedules might involvecompletely different environmental interactions We believea more promising approach would be to bring ideas from theprobabilistic concurrency testing (PCT) algorithm [12] to thetsan11rec setting to introduce a degree of skewing to ourrandom strategy so that it explores more diverse schedules

Record and replay Record and replay has been a signifi-cant area of research with many tools being created to facil-itate it [2 9 19 24 32 36 39 40 43ndash45 47 50 53 54 57 5765ndash67 72 79 81] The general premise behind them is simi-lar identify order nondeterminism and input nondeterminismand create techniques to capture them while recording andcontrol them during replayVarious tools extend the OS in some way or require spe-

cific hardware [2 6 9 19 20 24 47 50 79 81] This has thebenefit of giving the tool access to much more of the systemsuch as memory pages and process information For exam-ple Scribe [47] will directly modify the system schedulerinstead of coercing it and achieves slowdowns as low as105times However this severely hits the usability of the toolas it requires the user to deploy a modified OS

Other tools reside entirely in user space [10 15 32 34 43ndash45 49 51 53 54 57 66 67 72 80 84] and trade performancefor usability This is the category that tsan11rec falls intoEase of use is particularly important in persuading users toadopt the tool rr [66] in particular allows the user to recorda program by simply passing the binary to rr as a parameter

588

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 5 Indicative frames per second (fps) result captured by playing Quakespasm for 90 seconds five times per toolconfiguration and capturing the fps reports recorded by the game Frame-rate results are reported for the minimum maximummedian and mean case as well as the 25th and 75th quartiles

Min 25th Median 75th Max Mean OverheadNative 291 369 400 428 502 4004 10timestsan11 125 193 225 275 431 2338 17timesrnd 128 205 238 281 421 2471 16timesqueue 84 188 233 273 435 2333 17timesrnd + rec 113 178 212 253 366 2168 18timesqueue + rec 86 161 185 227 348 1930 21times

and as such has become the definitive tool for record and re-play We have performed an extensive comparison with rr insect5 and note that while rr outclasses tsan11rec in some appli-cations and can handle applications that are out of scope fortsan11rec (see sect55) rr shows significantly higher overheadcompared with tsan11rec for a number of applications thatrely on a high degree of parallelism for performance Furtherour sparse approach with suitable workarounds enablesrecord and replay for graphical applications (the SDL-basedgames of sect54) that rr cannot currently handleBecause it builds on tsan11 which itself uses compiler

instrumentation and a modified libcxx tsan11rec shares sim-ilarities with tools that depend on language implementationor library-level support [1 13 13 18 36 58] Notable exam-ples here include R2 [36] and IntelliTrace [58]Whole system replay aims to record all system nonde-

terminism [14 19 22 24 25 27 53 77] Among these therecent iReplayer tool [53] performs record and replay in-situavoiding many of the problems (eg memory layout issues)that otherwise come from running the record and replayexecutions under different processes

Some tools focus on the order-nondeterminism allowingthem to retain their parallelism and thus reducing the over-head of multi-threaded application [25 42 59 63 70 86]Castor [57] will provide each thread with its own bufferfor storing information and serialize them at a later timetsan11rec also fits into this category as it will both preserverparallelism of invisible operations and apply a schedulingstrategy to resolve this nondeterminism

An alternative to recording a programrsquos nondeterminismis to remove it making some or all aspects of the programdeterministic [5 8 16 17 21 55] For example Dthreads [55]ensures that memory accesses are deterministic on eachexecution Such approaches can have a significant probeproblem by removing the behaviour necessary for certainbugs to manifest in return for avoiding the performanceoverhead associated with handling order-nondeterminism

Multi-version execution Multi-version (or multi-variant)execution (MVE) is a method for concurrently running mul-tiple processes that are expected to behave in a semanticallysimilar manner [41 46 69 82 83] MVE can be used to detect

security vulnerabilities in applications if a variant divergesthis could indicate that an attacker has modified the processin some way [46 82 83] It can also be used for running dif-ferent analyses on identical processes that would not workwhen run together on the same process such as the clangsanitizers [69] Most MVE systems hinge on a special moni-tor thread that controls the generation an maintenance ofa number of variants Keeping the variants in sync with re-spect to nondeterministic behaviours presents many of thesame problems that are associated with record and replay

7 ConclusionWe have presented tsan11rec which brings together con-trolled scheduling record and replay and dynamic data racedetection for the dynamic analysis of CC++11 applicationsOur experimental evaluation demonstrates that the tool isin many cases competitive in terms of performance withrr a state-of-the-art record and replay tool in some casesout-performing rr due to tsan11recrsquos ability to preserve par-allelism in applications under test to a high degree We havealso shown that our tool is capable of recording and replayingSDL-based video games by exploiting our sparse approachto avoid recording aspects of gamedisplay communicationthat are fundamentally hard to control The flip side of oursparse approach is that by limiting what is recorded ourtool desynchronises on uncontrolled forms of nondetermin-ism such as that related to memory layout by capturingthis form of nondeterminism tools such as rr do not sufferfrom this problem Two exciting avenues for future workinclude investigating a spectrum of recording granularitiesto bridge the gap between our sparse approach and stricterapproaches in a configurable manner and to investigatebug detection using a richer range of scheduling strategiesincluding schedule bounding [26 61 62] and probabilisticconcurrency testing [12]

AcknowledgmentsThis work was supported by a PhD studentship funded byGCHQ and EPSRC projects EPN0263141 and EPR0068651

589

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

References[1] Bowen Alpern Jong-Deok Choi Ton Ngo Manu Sridharan and

John M Vlissides 2001 A Perturbation-Free Replay Platform forCross-Optimized Multithreaded Applications In Proceedings of the15th International Parallel amp Distributed Processing Symposium (IPDPS-01) San Francisco CA USA April 23-27 2001 IEEE Computer Society23 httpsdoiorg101109IPDPS2001924957

[2] GautamAltekar and Ion Stoica 2009 ODR output-deterministic replayfor multicore debugging In Proceedings of the 22nd ACM Symposiumon Operating Systems Principles 2009 SOSP 2009 Big Sky Montana USAOctober 11-14 2009 Jeanna Neefe Matthews and Thomas E Anderson(Eds) ACM 193ndash206 httpsdoiorg10114516295751629594

[3] Apache Software Foundation 2018 Apache httpd httpshttpdapacheorgdevdevnoteshtml

[4] Remzi H Arpaci-Dusseau and Brad Chen (Eds) 2010 9th USENIXSymposium on Operating Systems Design and Implementation OSDI2010 October 4-6 2010 Vancouver BC Canada Proceedings USENIXAssociation httpwwwusenixorgeventosdi10techfull_papersosdi10_proceedingspdf

[5] Amittai Aviram Shu-Chun Weng Sen Hu and Bryan Ford 2010 Ef-ficient System-Enforced Deterministic Parallelism See [4] 193ndash206httpwwwusenixorgeventsosdi10techfull_papersAvirampdf

[6] David F Bacon and Seth Copen Goldstein 1991 Hardware-assisted Re-play of Multiprocessor Programs In Proceedings of the 1991 ACMONRWorkshop on Parallel and Distributed Debugging (PADD rsquo91) ACMNew York NY USA 194ndash206 httpsdoiorg101145122759122777

[7] Mark Batty Scott Owens Susmit Sarkar Peter Sewell and TjarkWeber2011 Mathematizing C++ concurrency In POPL 55ndash66

[8] Tom Bergan Owen Anderson Joseph Devietti Luis Ceze and DanGrossman 2010 CoreDet a compiler and runtime system for deter-ministic multithreaded execution See [38] 53ndash64 httpsdoiorg10114517360201736029

[9] Tom Bergan Nicholas Hunt Luis Ceze and Steven D Gribble 2010Deterministic Process Groups in dOS See [4] 177ndash191 httpwwwusenixorgeventsosdi10techfull_papersBerganpdf

[10] Sanjay Bhansali Wen-Ke Chen Stuart de Jong Andrew EdwardsRon Murray Milenko Drinić Darek Mihočka and Joe Chau 2006Framework for Instruction-level Tracing and Analysis of Program Ex-ecutions In Proceedings of the 2Nd International Conference on VirtualExecution Environments (VEE rsquo06) ACM New York NY USA 154ndash163httpsdoiorg10114511347601220164

[11] Christian Bienia Sanjeev Kumar Jaswinder Pal Singh and Kai Li 2008The PARSEC benchmark suite characterization and architectural im-plications In 17th International Conference on Parallel Architecture andCompilation Techniques PACT 2008 Toronto Ontario Canada October25-29 2008 Andreas Moshovos David Tarditi and Kunle Olukotun(Eds) ACM 72ndash81 httpsdoiorg10114514541151454128

[12] Sebastian Burckhardt Pravesh Kothari Madanlal Musuvathi and San-tosh Nagarakatte 2010 A randomized scheduler with probabilisticguarantees of finding bugs See [38] 167ndash178 httpsdoiorg10114517360201736040

[13] Brian Burg Richard Bailey Andrew J Ko and Michael D Ernst 2013Interactive recordreplay for web application debugging In The 26thAnnual ACM Symposium on User Interface Software and TechnologyUISTrsquo13 St Andrews United Kingdom October 8-11 2013 ShahramIzadi Aaron J Quigley Ivan Poupyrev and Takeo Igarashi (Eds) ACM473ndash484 httpsdoiorg10114525019882502050

[14] Anton Burtsev David Johnson Mike Hibler Eric Eide and John Regehr2016 Abstractions for Practical Virtual Machine Replay In Proceedingsof the 12th ACM SIGPLANSIGOPS International Conference on VirtualExecution Environments Atlanta GA USA April 2-3 2016 VishakhaGupta-Cledat Donald E Porter and Vivek Sarkar (Eds) ACM 93ndash106httpsdoiorg10114528922422892257

[15] M E Chastain 1999 MEC (January 1999) httpslwnnet19990121amechtml

[16] Heming Cui Jiriacute Simsa Yi-Hong Lin Hao Li Ben Blum Xinan XuJunfeng Yang Garth A Gibson and Randal E Bryant 2013 Parrota practical runtime for deterministic stable and reliable threads InACM SIGOPS 24th Symposium on Operating Systems Principles SOSP rsquo13Farmington PA USA November 3-6 2013 Michael Kaminsky and MikeDahlin (Eds) ACM 388ndash405 httpsdoiorg10114525173492522735

[17] Heming Cui Jingyue Wu John Gallagher Huayang Guo and JunfengYang 2011 Efficient deterministic multithreading through schedulerelaxation See [85] 337ndash351 httpsdoiorg10114520435562043588

[18] P Deva 2018 Chronon (2018) httpchrononsystemscomblogdesign-and-architecture-ofthe-chronon-record-0

[19] David Devecsery Michael Chow Xianzheng Dou Jason Flinn andPeter M Chen 2014 Eidetic Systems In 11th USENIX Symposium onOperating Systems Design and Implementation OSDI rsquo14 BroomfieldCO USA October 6-8 2014 Jason Flinn and Hank Levy (Eds) USENIXAssociation 525ndash540 httpswwwusenixorgconferenceosdi14technical-sessionspresentationdevecsery

[20] Joseph Devietti Brandon Lucia Luis Ceze andMark Oskin 2009 DMPdeterministic shared memory multiprocessing In Proceedings of the14th International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2009 Washington DC USAMarch 7-11 2009 Mary Lou Soffa and Mary Jane Irwin (Eds) ACM85ndash96 httpsdoiorg10114515082441508255

[21] Joseph Devietti Jacob Nelson Tom Bergan Luis Ceze and Dan Gross-man 2011 RCDC a relaxed consistency deterministic computer See[37] 67ndash78 httpsdoiorg10114519503651950376

[22] Brendan Dolan-Gavitt Josh Hodosh Patrick Hulin Tim Leek andRyan Whelan 2015 Repeatable Reverse Engineering with PANDAIn Proceedings of the 5th Program Protection and Reverse EngineeringWorkshop (PPREW-5) ACM New York NY USA Article 4 11 pageshttpsdoiorg10114528438592843867

[23] Richard Draves and Robbert van Renesse (Eds) 2008 8th USENIX Sym-posium on Operating Systems Design and Implementation OSDI 2008December 8-10 2008 San Diego California USA Proceedings USENIXAssociation httpswwwusenixorgpublicationsproceedingsf[0]=im_group_audience3A73

[24] George W Dunlap Samuel T King Sukru Cinar Murtaza A Bas-rai and Peter M Chen 2002 ReVirt Enabling Intrusion AnalysisThrough Virtual-Machine Logging and Replay In 5th Symposium onOperating System Design and Implementation (OSDI 2002) Boston Mas-sachusetts USA December 9-11 2002 David E Culler and Peter Dr-uschel (Eds) USENIX Association httpwwwusenixorgeventsosdi02techdunlaphtml

[25] George W Dunlap Dominic G Lucchetti Michael A Fetterman andPeter M Chen 2008 Execution replay of multiprocessor virtual ma-chines In Proceedings of the 4th International Conference on VirtualExecution Environments VEE 2008 Seattle WA USA March 5-7 2008David Gregg Vikram S Adve and Brian N Bershad (Eds) ACM121ndash130 httpsdoiorg10114513462561346273

[26] Michael Emmi Shaz Qadeer and Zvonimir Rakamaric 2011 Delay-bounded scheduling In Proceedings of the 38th ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages POPL 2011 AustinTX USA January 26-28 2011 Thomas Ball and Mooly Sagiv (Eds)ACM 411ndash422 httpsdoiorg10114519263851926432

[27] Jakob Engblom Daniel Aarno and Bengt Werner 2010 Full-SystemSimulation from Embedded to High-Performance Systems Springer USBoston MA 25ndash45 httpsdoiorg101007978-1-4419-6175-4_3

[28] Cormac Flanagan and Stephen N Freund 2009 FastTrack efficientand precise dynamic race detection In Proceedings of the 2009 ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI 2009 Dublin Ireland June 15-21 2009 Michael Hind andAmer Diwan (Eds) ACM 121ndash133 httpsdoiorg1011451542476

590

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

1542490[29] Cormac Flanagan and Stephen N Freund 2010 Adversarial memory

for detecting destructive races In Proceedings of the 2010 ACM SIGPLANConference on Programming Language Design and Implementation PLDI2010 Toronto Ontario Canada June 5-10 2010 Benjamin G Zornand Alexander Aiken (Eds) ACM 244ndash254 httpsdoiorg10114518065961806625

[30] Cormac Flanagan and Stephen N Freund 2010 The RoadRunnerdynamic analysis framework for concurrent programs In Proceedingsof the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis forSoftware Tools and Engineering PASTErsquo10 Toronto Ontario CanadaJune 5-6 2010 Sorin Lerner and Atanas Rountev (Eds) ACM 1ndash8httpsdoiorg10114518066721806674

[31] Cormac Flanagan and Patrice Godefroid 2005 Dynamic partial-orderreduction for model checking software In Proceedings of the 32nd ACMSIGPLAN-SIGACT Symposium on Principles of Programming LanguagesPOPL 2005 Long Beach California USA January 12-14 2005 JensPalsberg and Martiacuten Abadi (Eds) ACM 110ndash121 httpsdoiorg10114510403051040315

[32] Dennis Geels Gautam Altekar Scott Shenker and Ion Stoica 2006Replay Debugging for Distributed Applications (Awarded Best Paper)In Proceedings of the 2006 USENIX Annual Technical Conference BostonMA USA May 30 - June 3 2006 Atul Adya and Erich M Nahum (Eds)USENIX 289ndash300 httpwwwusenixorgeventsusenix06techgeelshtml

[33] Patrice Godefroid 2005 Software Model Checking The VeriSoftApproach Formal Methods in System Design 26 2 (2005) 77ndash101httpsdoiorg101007s10703-005-1489-x

[34] C Gottbrath 2008 Reverse debugging with the TotalView debugger(May 2008)

[35] David Grove and Steve Blackburn (Eds) 2015 Proceedings of the36th ACM SIGPLAN Conference on Programming Language Design andImplementation Portland OR USA June 15-17 2015 ACM httpdlacmorgcitationcfmid=2737924

[36] Zhenyu Guo Xi Wang Jian Tang Xuezheng Liu Zhilei Xu Ming WuM Frans Kaashoek and Zheng Zhang 2008 R2 An Application-LevelKernel for Record and Replay See [23] 193ndash208 httpwwwusenixorgeventsosdi08techfull_papersguoguopdf

[37] Rajiv Gupta and Todd C Mowry (Eds) 2011 Proceedings of the 16thInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2011 Newport Beach CAUSA March 5-11 2011 ACM

[38] James C Hoe and Vikram S Adve (Eds) 2010 Proceedings of the 15thInternational Conference on Architectural Support for Programming Lan-guages and Operating Systems ASPLOS 2010 Pittsburgh PennsylvaniaUSA March 13-17 2010 ACM

[39] Nima Honarmand and Josep Torrellas 2014 RelaxReplay recordand replay for relaxed-consistency multiprocessors In ArchitecturalSupport for Programming Languages and Operating Systems ASPLOSrsquo14 Salt Lake City UT USA March 1-5 2014 Rajeev BalasubramonianAl Davis and Sarita V Adve (Eds) ACM 223ndash238 httpsdoiorg10114525419402541979

[40] Nima Honarmand and Josep Torrellas 2014 Replay debugging Lever-aging record and replay for program debugging In ACMIEEE 41stInternational Symposium on Computer Architecture ISCA 2014 Min-neapolis MN USA June 14-18 2014 IEEE Computer Society 455ndash456httpsdoiorg101109ISCA20146853229

[41] Petr Hosek and Cristian Cadar 2015 VARAN the Unbelievable AnEfficient N-version Execution Framework In Proceedings of the Twenti-eth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS rsquo15 Istanbul Turkey March14-18 2015 Oumlzcan Oumlzturk Kemal Ebcioglu and Sandhya Dwarkadas(Eds) ACM 339ndash353 httpsdoiorg10114526943442694390

[42] Derek Hower and Mark D Hill 2008 Rerun Exploiting Episodes forLightweight Memory Race Recording In 35th International Symposiumon Computer Architecture (ISCA 2008) June 21-25 2008 Beijing China265ndash276 httpsdoiorg101109ISCA200826

[43] Jeff Huang Peng Liu and Charles Zhang 2010 LEAP lightweightdeterministic multi-processor replay of concurrent java programsIn Proceedings of the 18th ACM SIGSOFT International Symposium onFoundations of Software Engineering 2010 Santa Fe NM USA November7-11 2010 Gruia-Catalin Roman and Andreacute van der Hoek (Eds) ACM207ndash216 httpsdoiorg10114518822911882323

[44] Jeff Huang Charles Zhang and Julian Dolby 2013 CLAP recordinglocal executions to reproduce concurrency failures In ACM SIGPLANConference on Programming Language Design and Implementation PLDIrsquo13 Seattle WA USA June 16-19 2013 Hans-Juergen Boehm and Cor-mac Flanagan (Eds) ACM 141ndash152 httpsdoiorg10114524621562462167

[45] Shiyou Huang Bowen Cai and Jeff Huang 2017 Towards Production-Run Heisenbugs Reproduction on Commercial Hardware In 2017USENIX Annual Technical Conference USENIX ATC 2017 Santa ClaraCA USA July 12-14 2017 403ndash415 httpswwwusenixorgconferenceatc17technical-sessionspresentationhuang

[46] Koen Koning Herbert Bos and Cristiano Giuffrida 2016 Secure andEfficient Multi-Variant Execution Using Hardware-Assisted ProcessVirtualization In 46th Annual IEEEIFIP International Conference onDependable Systems and Networks DSN 2016 Toulouse France June28 - July 1 2016 IEEE Computer Society 431ndash442 httpsdoiorg101109DSN201646

[47] Oren Laadan Nicolas Viennot and Jason Nieh 2010 Transparentlightweight application execution replay on commodity multiproces-sor operating systems In SIGMETRICS 2010 Proceedings of the 2010ACM SIGMETRICS International Conference on Measurement and Mod-eling of Computer Systems New York New York USA 14-18 June 2010Vishal Misra Paul Barford and Mark S Squillante (Eds) ACM 155ndash166 httpsdoiorg10114518110391811057

[48] Leslie Lamport 1978 Time Clocks and the Ordering of Events ina Distributed System Commun ACM 21 7 (1978) 558ndash565 httpsdoiorg101145359545359563

[49] Dongyoon Lee Peter M Chen Jason Flinn and Satish Narayanasamy2012 Chimera hybrid program analysis for determinism In ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI rsquo12 Beijing China - June 11 - 16 2012 Jan Vitek Haibo Linand Frank Tip (Eds) ACM 463ndash474 httpsdoiorg10114522540642254119

[50] Dongyoon Lee Benjamin Wester Kaushik Veeraraghavan SatishNarayanasamy Peter M Chen and Jason Flinn 2010 Respec efficientonline multiprocessor replayvia speculation and external determinismSee [38] 77ndash90 httpsdoiorg10114517360201736031

[51] Kyu Hyung Lee Dohyeong Kim and Xiangyu Zhang 2014Infrastructure-Free Logging and Replay of Concurrent Executionon Multiple Cores In ECOOP 2014 - Object-Oriented Programming- 28th European Conference Uppsala Sweden July 28 - August 12014 Proceedings (Lecture Notes in Computer Science) Richard EJones (Ed) Vol 8586 Springer 232ndash256 httpsdoiorg101007978-3-662-44202-9_10

[52] Christopher Lidbury and Alastair F Donaldson 2017 Dynamic racedetection for C++11 In Proceedings of the 44th ACM SIGPLAN Sympo-sium on Principles of Programming Languages POPL 2017 Paris FranceJanuary 18-20 2017 Giuseppe Castagna and Andrew D Gordon (Eds)ACM 443ndash457 httpsdoiorg1011453009837

[53] Hongyu Liu Sam Silvestro Wei Wang Chen Tian and Tongping Liu2018 iReplayer in-situ and identical record-and-replay for multi-threaded applications In Proceedings of the 39th ACM SIGPLAN Confer-ence on Programming Language Design and Implementation PLDI 2018

591

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Philadelphia PA USA June 18-22 2018 Jeffrey S Foster and Dan Gross-man (Eds) ACM 344ndash358 httpsdoiorg10114531923663192380

[54] Peng Liu Xiangyu Zhang Omer Tripp and Yunhui Zheng 2015 Lightreplay via tightly bounded recording See [35] 55ndash64 httpsdoiorg10114527379242738001

[55] Tongping Liu Charlie Curtsinger and EmeryD Berger 2011 Dthreadsefficient deterministic multithreading See [85] 327ndash336 httpsdoiorg10114520435562043587

[56] Shan Lu Soyeon Park Eunsoo Seo and Yuanyuan Zhou 2008 Learn-ing from mistakes a comprehensive study on real world concurrencybug characteristics In Proceedings of the 13th International Confer-ence on Architectural Support for Programming Languages and Op-erating Systems ASPLOS 2008 Seattle WA USA March 1-5 2008Susan J Eggers and James R Larus (Eds) ACM 329ndash339 httpsdoiorg10114513462811346323

[57] Ali Joseacute Mashtizadeh Tal Garfinkel David Terei David Maziegraveresand Mendel Rosenblum 2017 Towards Practical Default-On Multi-Core RecordReplay In Proceedings of the Twenty-Second InternationalConference on Architectural Support for Programming Languages andOperating Systems ASPLOS 2017 Xirsquoan China April 8-12 2017 YunjiChen Olivier Temam and John Carter (Eds) ACM 693ndash708 httpsdoiorg10114530376973037751

[58] Microsoft 2018 Understanding IntelliTrace part I What the $ isIntelliTrace (2018) httpsblogsmsdnmicrosoftcomzainnab20130212understanding-intellitrace-part-i-what-the-is-intellitrace

[59] Pablo Montesinos Luis Ceze and Josep Torrellas 2008 DeLoreanRecording and Deterministically Replaying Shared-Memory Multi-processor Execution Effciently In 35th International Symposium onComputer Architecture (ISCA 2008) June 21-25 2008 Beijing China289ndash300 httpsdoiorg101109ISCA200836

[60] Mozilla 2018 SpiderMonkey httpsdevelopermozillaorgen-USdocsMozillaProjectsSpiderMonkey

[61] MadanlalMusuvathi and Shaz Qadeer 2007 Iterative context boundingfor systematic testing of multithreaded programs In Proceedings of theACM SIGPLAN 2007 Conference on Programming Language Design andImplementation San Diego California USA June 10-13 2007 JeanneFerrante and Kathryn S McKinley (Eds) ACM 446ndash455 httpsdoiorg10114512507341250785

[62] Madanlal Musuvathi Shaz Qadeer Thomas Ball Geacuterard Basler Pi-ramanayagam Arumuga Nainar and Iulian Neamtiu 2008 Find-ing and Reproducing Heisenbugs in Concurrent Programs See[23] 267ndash280 httpwwwusenixorgeventsosdi08techfull_papersmusuvathimusuvathipdf

[63] Satish Narayanasamy Gilles Pokam and Brad Calder 2005 BugNetContinuously Recording Program Execution for Deterministic ReplayDebugging In 32st International Symposium on Computer Architecture(ISCA 2005) 4-8 June 2005 Madison Wisconsin USA IEEE ComputerSociety 284ndash295 httpsdoiorg101109ISCA200516

[64] Brian Norris and Brian Demsky 2013 CDSchecker checking concur-rent data structures written with CC++ atomics In Proceedings ofthe 2013 ACM SIGPLAN International Conference on Object OrientedProgramming Systems Languages amp Applications OOPSLA 2013 part ofSPLASH 2013 Indianapolis IN USA October 26-31 2013 131ndash150

[65] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2016 Lightweight User-Space Record And ReplayCoRR abs161002144 (2016) arXiv161002144 httparxivorgabs161002144

[66] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2017 Engineering Record and Replay for Deploy-ability In 2017 USENIX Annual Technical Conference USENIX ATC 2017Santa Clara CA USA July 12-14 2017 377ndash389 httpswwwusenixorgconferenceatc17technical-sessionspresentationocallahan

[67] Harish Patil Cristiano Pereira Mack Stallcup Gregory Lueck andJames Cownie 2010 PinPlay a framework for deterministic replay

and reproducible analysis of parallel programs In Proceedings of theCGO 2010 The 8th International Symposium on Code Generation andOptimization Toronto Ontario Canada April 24-28 2010 AndreasMoshovos J Gregory Steffan Kim M Hazelwood and David R Kaeli(Eds) ACM 2ndash11 httpsdoiorg10114517729541772958

[68] pbzip2 development team 2018 pbzip2 httpslaunchpadnetpbzip2[69] Luiacutes Pina Anastasios Andronidis and Cristian Cadar 2018 FreeDA

deploying incompatible stock dynamic analyses in production viamulti-version execution In Proceedings of the 15th ACM InternationalConference on Computing Frontiers CF 2018 Ischia Italy May 08-102018 David R Kaeli and Miquel Pericagraves (Eds) ACM 1ndash10 httpsdoiorg10114532032173203237

[70] Gilles Pokam Klaus Danne Cristiano Pereira Rolf Kassa Tim KranichShiliang Hu Justin Emile Gottschlich Nima Honarmand Nathan Daut-enhahn Samuel T King and Josep Torrellas 2013 QuickRec proto-typing an intel architecture extension for record and replay of mul-tithreaded programs In The 40th Annual International Symposiumon Computer Architecture ISCArsquo13 Tel-Aviv Israel June 23-27 2013Avi Mendelson (Ed) ACM 643ndash654 httpsdoiorg10114524859222485977

[71] QuakeSpasm 2018 QuakeSpasm An engine for iD softwarersquos Quakehttpquakespasmsourceforgenet

[72] Yasushi Saito 2005 Jockey a user-space library for record-replaydebugging In Proceedings of the Sixth International Workshop on Auto-mated Debugging AADEBUG 2005 Monterey California USA Septem-ber 19-21 2005 Clinton Jeffery Jong-Deok Choi and Raimondas Lence-vicius (Eds) ACM 69ndash76 httpsdoiorg10114510851301085139

[73] Torr Samaho 2018 Zandronum httpszandronumcom[74] Konstantin Serebryany and Timur Iskhodzhanov 2009 ThreadSani-

tizer Data Race Detection in Practice In WBIA 62ndash71[75] Simple DirectMedia Layer 2018 SDL 20 library httpswwwlibsdl

orgdownload-20php[76] SQLite 2018 SQLite 3240 httpssqliteorgreleaselog3_24_0html[77] Deepa Srinivasan and Xuxian Jiang 2012 Time-Traveling Forensic

Analysis of VM-Based High-Interaction Honeypots In Security and Pri-vacy in Communication Networks Muttukrishnan Rajarajan Fred PiperHaining Wang and George Kesidis (Eds) Springer Berlin HeidelbergBerlin Heidelberg 209ndash226

[78] Paul Thomson Alastair F Donaldson and Adam Betts 2014 Concur-rency testing using schedule bounding an empirical study In ACMSIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPPoPP rsquo14 Orlando FL USA February 15-19 2014 Joseacute E Moreira andJames R Larus (Eds) ACM 15ndash28 httpsdoiorg10114525552432555260

[79] Joseph Tucek Shan Lu Chengdu Huang Spiros Xanthos andYuanyuan Zhou 2007 Triage diagnosing production run failuresat the userrsquos site In Proceedings of the 21st ACM Symposium on Oper-ating Systems Principles 2007 SOSP 2007 Stevenson Washington USAOctober 14-17 2007 Thomas C Bressoud and M Frans Kaashoek (Eds)ACM 131ndash144 httpsdoiorg10114512942611294275

[80] Undo 2018 Reversible debugging tools for CC++ on Linux amp Android(2018) httpundo-softwarecom

[81] Kaushik Veeraraghavan Dongyoon Lee Benjamin Wester JessicaOuyang Peter M Chen Jason Flinn and Satish Narayanasamy 2011DoublePlay parallelizing sequential logging and replay See [37] 15ndash26 httpsdoiorg10114519503651950370

[82] Stijn Volckaert Bart Coppens Bjorn De Sutter Koen De BosscherePer Larsen and Michael Franz 2017 Taming Parallelism in a Multi-Variant Execution Environment In Proceedings of the Twelfth EuropeanConference on Computer Systems EuroSys 2017 Belgrade Serbia April23-26 2017 Gustavo Alonso Ricardo Bianchini and Marko Vukolic(Eds) ACM 270ndash285 httpsdoiorg10114530641763064178

[83] Stijn Volckaert Bart Coppens Alexios Voulimeneas Andrei Home-scu Per Larsen Bjorn De Sutter and Michael Franz 2016 Secure

592

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

and Efficient Application Monitoring and Replication In 2016 USENIXAnnual Technical Conference USENIX ATC 2016 Denver CO USA June22-24 2016 Ajay Gulati and Hakim Weatherspoon (Eds) USENIXAssociation 167ndash179 httpswwwusenixorgconferenceatc16technical-sessionspresentationvolckaert

[84] YanWang Harish Patil Cristiano Pereira Gregory Lueck Rajiv Guptaand Iulian Neamtiu 2014 DrDebug Deterministic Replay based CyclicDebugging with Dynamic Slicing In 12th Annual IEEEACM Interna-tional Symposium on Code Generation and Optimization CGO 2014Orlando FL USA February 15-19 2014 David R Kaeli and TippMoseley(Eds) ACM 98 httpsdoiorg10114525441372544152

[85] Ted Wobber and Peter Druschel (Eds) 2011 Proceedings of the 23rdACM Symposium on Operating Systems Principles 2011 SOSP 2011Cascais Portugal October 23-26 2011 ACM httpsdoiorg1011452043556

[86] Min Xu Rastislav Bodiacutek and Mark D Hill 2003 A Flight DataRecorder for Enabling Full-System Multiprocessor Deterministic

Replay In 30th International Symposium on Computer Architecture(ISCA 2003) 9-11 June 2003 San Diego California USA Allan Got-tlieb and Kai Li (Eds) IEEE Computer Society 122ndash133 httpsdoiorg101109ISCA20031206994

[87] Jie Yu Satish Narayanasamy Cristiano Pereira and Gilles Pokam 2012Maple a coverage-driven testing tool for multithreaded programs InProceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications OOPSLA2012 part of SPLASH 2012 Tucson AZ USA October 21-25 2012 Gary TLeavens and Matthew B Dwyer (Eds) ACM 485ndash502 httpsdoiorg10114523846162384651

[88] Zandronum 2015 Zandronum bug tracker httpszandronumcomtrackerviewphpid=2380 Bug 0002380

[89] Naling Zhang Markus Kusano and ChaoWang 2015 Dynamic partialorder reduction for relaxed memory models See [35] 250ndash259 httpsdoiorg10114527379242737956

593

  • Abstract
  • 1 Introduction
  • 2 Background
  • 3 Scheduling
    • 31 Protocol Details
    • 32 Special Cases
    • 33 Liveness
      • 4 Record and Replay
        • 41 Motivating Example
        • 42 Interleaving
        • 43 Signals
        • 44 System Calls
        • 45 Asynchronous Events
          • 5 Evaluation
            • 51 CDSchecker Litmus Tests
            • 52 httpd
            • 53 PARSEC and pbzip
            • 54 SDL-based Games
            • 55 Limitations SQLite and SpiderMonkey
              • 6 Related Work
              • 7 Conclusion
              • Acknowledgments
              • References
Page 7: Sparse Record and Replay with Controlled Schedulingafd/homepages/papers/pdfs/2019/PLDI.pdf · Controlled concurrency testing has proven successful in find-ing subtle bugs in concurrent

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

For the most part tsan11rec avoids the need for user an-notation however there are some cases where they are un-avoidable this is shown in sect54

41 Motivating ExampleTo help lay out the reasoning and technical explanation givenin the rest of this section we start off with an example pro-gram The program fragment in Figure 2 (already discussedbriefly in sect2) shows a simple client that repeatedly receives achar buffer from a server applies a transformation to it thensends it back We discuss the behaviours of the applicationthat need to be recorded in order to enable faithful replayvs those features that need not be recorded

What to record The obvious case here is the interleavingof threads Recording this will ensure that the order of oper-ations to the atomic locations quit and mtx and the order ofthe syscalls used throughout will be the same during replayOther operations are invisible and thus will not affect otherthreads or introduce nondeterminismSystem calls that interact with the environment can be

seen as inputs to the program which in this case determineshow many requests to handle and the contents of each re-quest For example poll informs us on whether there is datato be read from the server and thus needs to be recorded asdo the system calls recv and sendThe signal handler in this example is used to trigger the

end of the program The arrival of the signal is asynchronousand comes from outside the program During replay the toolwill need to ensure that the same signal arrives at the samepoint in logical time

What to ignore The first and likely most contentious ele-ment is the layout of memory This will of course depend onthe program in question but in this example and most of theprograms we encounter the position of objects in memorywill have no effect on the rest of the program If the requestqueue was instead an ordered set of char pointers then itwould matter as the pointer values will determine the orderin which requests are considered during iteration

42 InterleavingAs explained in sect41 the ordering of visible operations mustbe preserved during replay We describe how the randomand queue strategies store ordering-related information

To recap the strategies described in sect3 the random sched-uler chooses which thread to allow to run the next visibleoperation randomly after each visible operation has com-pleted while the queue scheduler is first come first serve forwhichever threads attempts to perform a visible operation

For the random strategy the entire thread interleaving isencapsulated in the PRNG Therefore no information besidesthe two seeds used for the PRNG is required

For the queue strategy the ordering during record dependson the order in which threads happen to reach Wait() which

depends on physical timing To encode the order so that itcan be enforced during replay a file called QUEUE is usedThis file records (a) a map specifying for each thread id thefirst tick at which the thread should be scheduled and (b)an ordered list of ticks to be consumed by threads each timethey leave a critical sectionmdashthe tick that a thread consumeson leaving a critical section informs the thread as to the nexttick at which it is to be scheduled Run-length encoding isused to efficiently record the case where a thread is scheduledmultiple times in succession

There is clearly a trade-off between these strategiesWherethe random strategy stores no data the queue strategy mayneed to store data on every visible operation The queuestrategy will be much faster however as it is unlikely to beblocked in Wait() unless another thread is already critical

43 SignalsWe discussed signals briefly in sect32 but deferred discussionto this section as most of the difficulties are in attempting toreplay them Synchronous signals are ignored (eg SIGSEGVSIGPIPE) as these should reoccur at the same point in theexecution without the help of our tool To clarify it is en-tering the signal handler that is the visible operation Wheninside the signal handler a thread cannot interact with therest of the process except though atomic operations whichare themselves visible From this we can say that it doesnot matter at which point between a Tick() and followingWait() pair that the signal handler is entered

In tsan11rec any asynchronous signal that arrived duringrecording becomes a synchronous signal upon replay Whena thread receives a signal it records the value of the tickseen during the most recent Tick() along with the signalvalue in a file called SIGNAL For example consider the casein Figure 2 where the Responder thread T2 has just per-formed the atomic load on tick 5 but has not yet attemptedto acquire the lock It receives the signal and performs aWait() and Tick() so that it can enter the signal handlerThe SIGNAL file will therefore have the line ldquo2 5 15rdquo in-dicating that thread T2 receives signal 15 at tick 5 Duringreplay when the Responder thread calls Tick() during tick5 it will raise signal 15 itself at the end of Tick() It doesnot matter at which precise point between Tick() and thefollowing Wait() that the signal arrived at during recordingit will float to the end of Tick() as shown in Figure 6

44 System CallsAs discussed in sect32 system calls are a significant source ofnondeterminism in an application To ensure that relevantproperties of an application are preserved during replay weneed to record the results of relevant system calls This is afundamental challenge that any record-and-replay systemmust face and state-of-the-art tools such as rr [66] aim tobe as comprehensive as possible in the system calls theysupport so that they can be applied directly to a wide range

582

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Figure 6 Signals are replayed immediately after the preced-ing tick

of applications In contrast the idea behind our sparse ap-proach to identify a minimal subset of system calls such thatrecording these system calls suffices to enable faithful replayof particular applications of interest At a high level we ap-proached this by incrementally adding support for systemcalls based on a trial-and-error process of first using straceto understand the full set of system calls issued by an appli-cation and then repeatedly attempting to record and replaythe application with tsan11rec incrementally adding supportfor additional system calls through analysis of the sources ofreplay failures We emphasise that this process did requirequite some manual effort and would need to be iteratedfurther to handle applications with significantly differentsystem call requirements compared to our case studies

The term syscall is a bit of a misnomer as both tsan and theinstrumentation detailed in this paper will instead interceptthe glibc wrappers around the syscalls instead of the syscalldirectly These glibc functions are much easier to use onbehalf of the programmer as they will take care of systemspecific details pushing the arguments onto the stack andinterpreting the results returned by the kernel We still usesyscall throughout as it is in the underlying syscall wherethe nondeterministic behaviours occurEach syscall takes a variable number of user-allocated

buffers and fills them with the appropriate data before set-ting errno and returning some value As these are sourcesof nondeterminism the return value errno and any appro-priate buffers will be compressed and stored in a demo filecalled SYSCALL During replay the actual data returned willbe overwritten by the data in SYSCALL Only the interactionwith the SYSCALL file is part of the critical section whichreduces contention in the scheduler

As an example consider the Listener thread in Figure 2performing the poll and recv syscalls in succession The re-turn value error number and two elements in the server_fdsstructure must be stored for poll the return value errornumber and contents of the buffer must be stored for recvThese will be treated as character buffers and have a simplerun length encoding applied

One of the difficulties that arises from adding a syscall isthe knock-on effect it can have with respect to other syscallsConsider for example the syscalls that interact with thefilesystem create open read etc If you intercept open

on the assumption that you may not have a valid file de-scriptor during replay where you did during record you willthen have to intercept every syscall that works with thatfile descriptor What starts out as a single interception be-comes potentially hundreds of intercepted syscalls There isa delicate balance to be struck between those syscalls thatneed to be recorded to make important execution featuresdeterministic during replay vs those syscalls that are bet-ter left un-recorded because (a) determinism of replay doesnot depend on them being recorded and (b) recording themleads to a snowball effect where many other syscalls mustalso be recorded to avoid desynchronisation

We have added syscall support to tsan11rec in a demand-driven manner by using strace to identify important callsfor key applications of interest The current set of syscallssupported includes read write recvmsg recv sendmsgaccept accept4 clock_gettime ioctl select and bindThese have allowed us to get a significant number of appli-cations up and running including the applications studiedin our evaluation modulo a few workarounds (detailed insect5) these applications issue many additional system callsthat we have found it unnecessary to recordmdashunnecessaryin the sense that simply re-issuing the system call during re-play has no observable effect on the applicationrsquos behaviourSometimes whether a call must be recorded depends on thefile descriptors that the call receives For instance for all ofour case studies it never proves necessary to record readand write calls whose file descriptors correspond to filesin the file system but it is necessary to record these callsif the associated file descriptors are associated with pipesused for inter-process communication Rather than this set ofsyscalls being a starting point towards full syscall coverageour view is that efficient record and replay that preserves par-allelism can benefit from selective syscall recording basedon application-specific knowledge and we envision the toolsupporting a core set of essential syscalls and being con-figurable with support for further syscalls to suit particularrecord and replay scenarios For example to handle a pro-gram such as htop would require instrumentation of theinteraction with the proc filesystem but doing this in thegeneral case would be wasteful and maybe even harmful iffuture calls depended on this interaction

45 Asynchronous EventsAsynchronous events are specific events that do not fit inwith any of the other categories discussed An importantcharacteristic is that they are not wrapped in a Wait() andTick() either because it was infeasible to do so duringrecording or because it would create a lot of unnecessaryoverhead These events still need to be replayed to ensurethe replay remains synchronised Currently there are twotypes of events Reschedule and Signal_wakeup

The reschedule event was discussed in sect33 It is necessaryto include this to ensure that the PRNG will be called the

583

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Figure 7 Right shows how reschedules are floated abovethe Tick()

same number of times in each critical section To see whythe signal wakeup event is necessary consider again a signalbeing received in the context of the example of Figure 2This time suppose the Responder thread receives the signalwhile it is disabled trying to acquire the lock Assume thatthe thread disabled itself on tick 10 the signal arrives andthe thread re-enables itself during tick 12 and then entersthe signal handler on tick 14 It is not OK for the thread tosimply not disable itself on tick 10 during replay as the poolof enabled threads for the scheduler to choose from duringticks 10 and 11 is different between recording and replayingwhich will affect the choice the scheduler will make

As with signals all asynchronous events are replayedsynchronously with all events that occur between a Tick()and the following Wait() floating up to the previous Tick()This is shown in Figure 7 These events are stored in theASYNC file

5 EvaluationTo evaluate the controlled scheduling abilities of tsan11recwe compare the strategies on the CDSchecker benchmarksuite (sect51) We then use larger applications to comparetsan11rec with rr [66] Of the few record-and-replay toolsthat are publicly available we chose to compare with rr be-cause (a) it represents the state-of-the-art (b) it is similar totsan11rec in terms of the kinds of applications it aims to sup-port We consider programs that bring challenges related tonetworking signals IO and real-time constraints Apachersquoshttpd web server (sect52) the PARSEC benchmarks and pbzip(sect53) and two first-person shooter games built on the SDLlibrary (sect54) The SDL case studies showcase applicationsthat tsan11rec can handle that are out of scope for rr dueto communication between the game and the OpenGL in-terface In contrast we also discuss practical limitations oftsan11rec that rr does not face related to the SQLite databaseapplication and Firefoxrsquos SpiderMonkey (sect55)

Throughout we describe howwe evolved the sparse record-ing facilities of tsan11rec and many practical challenges wefaced along the way our experience is that such challengeswhich seem fundamental to record and replay are typicallydescribed only briefly if at all in the literature We hope ourexposition will be valuable to other researchers

Common experimental setup All experiments were rununder Ubuntu 1404 LTS on an Intel i7-4770 8x340GHz plat-form with 16GB RAM The tsan11 and tsan11rec tools werebuilt on top of clang revision 286384 The version of rr usedis 510 As a key goal of our work is to apply race detectionto record and replay with controlled concurrency testingmost of the testing is done with race detection enabled evenwhen using rr We still show times for rr without race detec-tion for reference We use native rr tsan11 tsan11+rr andtsan11rec to refer to a program running without instrumen-tation under rr with tsan11 instrumentation under rr withtsan11 instrumentation and under tsan11rec respectively

51 CDSchecker Litmus TestsOverview The small programs (roughly 100 LOC each)used in prior work to evaluate CDSchecker [64] are useful toassess whether tsan11recrsquos controlled scheduling improveson tsan11rsquos ability to find races (including races related toweak memory) As these programs are closed the schedulerand memory model are the sources of nondeterminism

Experimental setup The experiments are run in the fol-lowing four modes tsan11 where tsan11 (which does notuse controlled concurrency testing) finds races tsan11 + rrwhere tsan11 finds races with rr recording and tsan11recrnd and tsan11rec queue where tsan11rec finds races usingthe random and queue strategies respectively We measurethe runtime of each tool on each benchmark averaged over1000 runs reporting standard deviation and remarking onthe coefficient of variation (CV)mdashthe ratio of the standarddeviation and mean

Results and discussion Table 1 summarises the resultsThe Time columns show mean execution times with stan-dard deviation Because these are short-running tests whosebehaviour depends intimately on themanner inwhich threadsinterleave the variance across runs is fairly high with theCV usually exceeding 1 with the exception of the longer-running results for rr for which the CV is always less than 1and usually less than 05 Rate columns show the percentageof all executions that exposed a data race

Comparing the tsan11rec rnd results with the tsan11 andtsan11rec queue results we see that randomized controlledschedulingmeans tsan11rec findsmore races across all bench-marks except chase-lev-deque and dekker-fences This isbecause tsan11 runs at the mercy of the OS scheduler whichtends to explore similar schedules on repeated runs and inthese small programs typically causes the main thread torun to completion before other threads are scheduled Theprice for this is higher runtime eg mcs-lock and ms-queuesuffer slow-downs of around 2times compared with tsan11 weattribute this to the total ordering of visible operations im-posed by tsan11rec The rr results show huge increases dueto a constant overhead applied to all programs But as rr is

584

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 1 Comparison over CDSchecker benchmarks between tsan + rr tsan11 and tsan11rec with controlled random andqueue scheduling Each benchmark was executed 1000 times in each mode The ldquoTimerdquo columns shows the mean executiontime (ms) and standard deviation (in parentheses) The ldquoRaterdquo column shows the percentage of runs that exhibited data races

tsan11 + rr tsan11 tsan11rec rnd tsan11rec queueTest Time Rate Time Rate Time Rate Time Rate

barrier 590 (1445) 00 8 (914) 00 4 (523) 375 6 (499) 00chase-lev-deque 579 (16669) 05 0 (194) 59 1 (307) 02 2 (378) 00dekker-fences 1776 (120854) 499 2 (402) 503 4 (483) 387 3 (446) 528linuxrwlocks 579 (12645) 03 2 (418) 01 4 (493) 624 3 (450) 00mcs-lock 574 (1371) 00 3 (445) 00 5 (523) 770 3 (447) 01mpmc-queue 574 (1371) 00 3 (449) 00 5 (500) 605 3 (446) 00ms-queue 3093 (10019) 1000 91 (6413) 1000 93 (8068) 1000 52 (6260) 1000

designed for larger applications this overhead will usuallybecome insignificant in more realistic examples

We examined a trace from chase-lev-deque to understandwhy tsan11rec rnd detects fewer races than tsan11 We foundthat from the creation of thread 2 to the point of the racethread 1 must perform 29 operations before thread 2 per-forms just 4 operations in order for the race to manifest Theprobability of this happening under uniform random sched-uling is very low We were able to coerce a race report outof the program by moving the creation of thread 2 to later inthe program This shows that different scheduling strategieswill affect how effective we are at finding data races andthat probabilistic concurrency testing (PCT) can be effectiveat prying out concurrency bugs [12]

52 httpdOverview Apachersquos httpd [3] is a widely-used modularhttp server that makes heavy use of concurrency to han-dle many simultaneous connections For record and replayit is of further interest due to its dependence on externalnetwork input We were able to handle httpd by capturingthe system calls described in sect44 with one workaroundthe accept system call which listens for incoming connec-tions relies on epoll_wait to listen for events This returnsuser-allocated pointers file descriptors and other data ina union with no easy way of knowing the active membersomething which tsan11rec cannot currently handle Weworked around this by using httpdrsquos option to switch to asimpler but slightly less efficient syscall poll which insteadsimply listens to file descriptors the results presented hereemploy this workaround A strength of rr is that it can han-dle httpd without this workaround due to its non-sparserecord and replay mechanism

Experimental setup We tested httpd version 2428 in single-process-multiple-thread mode using ab an Apache-providedprogram for server stress testing We sent 10000 queriesacross 10 concurrent threads to an httpd server for eachof the setups shown in Table 2 averaging results over 10runs We report on standard deviation and again remark on

the CV In the table rnd and queue refer to configurationsof tsan11rec and the presence or absence + rec indicateswhether recording was enabled

Results and discussion The results are shown in Table 2The columns under Race reports show regular results withrace reporting enabled the data under No reports shows re-sults where race-checking-capable tools do perform racechecking behind the scenes but do not actually emit racereports We make this distinction because tsan11 detectsso many races that the overhead of generating reports no-ticeably affects performance results when fewer races aredetected are more representative of the performance onewould expect using a future version of httpd in which manyraces are fixed The Throughput columns indicate the meannumber of queries the server responds to per second TheRate column is the mean number of race reports generated(only relevant for tsan11-based configurations) For eachstandard deviation is shown in parentheses Variance asmeasured by CV is low below 08 in all cases and usuallyless than 05 The Overhead columns indicate how muchslower performance is compared with native executionWithout reporting tsan11 already incurs a 3times overhead

compared to native Comparing results for native with rndand rnd+rec we find that adding controlled random sched-uling increase this overhead massively to between 79ndash89timesdepending on whether recording is enabled This is in thesame ball park as the overhead associated with rr 61times with-out race checking and 160timeswith tsan11 instrumentation (butstill with the actual reporting of races disabled) In contrastwhen our queue strategy is used the overhead comparedwith native drops to 9times and 21times with recording disabled vsenabled We attribute the gap between rrrnd and queue tohttpdrsquos heavy reliance on parallelism and frequent use ofshared mutexes This parallelism is removed by rr becausethe tool sequentializes the execution of threads while ourrandom scheduler only allows the thread that it has chosento be scheduled next to execute a visible operation even ifmany other threads are ready to execute visible operations

585

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Table 2 Comparison of throughput and race rate between native tsan11 rr and tsan11rec for Apachersquos httpd Results areaveraged over 10 runs ldquoThroughputrdquo shows mean throughput in queries per second ldquoRaterdquo is the number of races detectedper run (where relevant) Standard deviations are shown (in parentheses) Overhead is computed relative to native throughput

Race reports No reportsSetup Throughput Overhead Rate Throughput Overhead

Native NA NA NA 28895 (462256) 1timesrr NA NA NA 475 (608) 61timestsan11 3687 (29428) 8times 113 (1985) 9824 (143201) 3timestsan11 + rr 86 (6320) 336times 34 (1680) 181 (4637) 160timesrnd 141 (892) 205times 162 (3878) 367 (3326) 79timesqueue 818 (31033) 35times 381 (7362) 3261 (84367) 9timesrnd + rec 142 (1185) 203times 155 (3103) 326 (3894) 89timesqueue + rec 513 (8534) 56times 360 (6428) 1387 (24961) 21times

In contrast the queue strategy allows threads to perform visi-ble operations largely on demand Turning to the results withrace reporting enabled we see that the queue strategy hasthe highest race detection rate improving on uncontrolledtsan11 All other race detecting configurations lower the racedetection rate we believe this is because rr and rnd reducethe number of queries being responded to concurrently

Comparing demo file sizes when recording is enabled thetsan11rec demo files are around 48MB for both strategiesdropping to 48MB when only 1000 queries are issued sug-gesting that demo file size increases linearly with the numberof requests at a rate of around 48KB per request This couldbe reduced further with a more aggressive compression strat-egy but would likely increase the time overhead The demofile for rr is significantly smaller 66MB for 10000 querieswhich goes down to 39MB with 1000 queries implying arate of around 03KB per request plus a constant 36MB

53 PARSEC and pbzipOverview Wenext turn to the PARSEC benchmark suite [11]and pbzip application [68] both widely used for evaluatingconcurrency analysis tools For PARSEC we consider thebenchmarks used to evaluate iReplayer [53] however threeof these would not work on our system (dedup and swaptionsdo not compile and canneal crashes)

Experimental setup Each PARSEC (version 30) bench-mark was run with the lsquosimlargersquo test size shipped with thebenchmarks using 4 threads Pbzip (version 2-1113) wasused to compress a 400MB file with 4 threads We ran eachbenchmark 10 times per tool configuration and report av-erage runtimes We report on standard deviation and againremark on the CV A small number of races were discoveredfor some benchmarks and the race detecting tools largelyagreed on the number of races we do not detail these further

Results and discussion Table 3 shows the average timetaken to run each benchmark with each tool configurationwith standard deviation Variance as measured by CV is

reasonably low (CV is always below 1) For the tsan11rec re-sults + rec indicates whether recording was enabled Table 4is computed from the data of Table 3 and reports the over-head associated with running using each tool configurationcompared with native executionWith the exception of bodytrack and fluidanimate the

overhead tsan11rec brings over that of tsan11 is small and forall benchmarks whether recording is enabled or notmakes lit-tle difference However the overhead associated with tsan11+ rr (ie running tsan11-instrumented code under rr) is signif-icant compared with the tsan11 overhead alone despite thefact that running under rr without race detection is generallyefficient Interestingly rr without race detection performsless well on blackscholes compared with the tsan11rec con-figurations Digging into this we found that the benchmarkdistributes work between threads at the start of executionand then lets threads runwith little interaction This high par-allelismlow communication execution plays to the strengthsof tsan11rec where invisible operations are left to run in par-allel but is bad for rr which forces sequentialization acrossall operations

54 SDL-based GamesOverview Simple DirectMedia Layer (SDL) is a library thatconsolidates input graphics and various other forms of IOunder a single interface [75] and is typically used for gamesOnUbuntu 1604 SDL communicateswith X11 for IO pulseau-dio for sound andOpenGL for displayWe investigated recordand replay for two SDL-based games Zandronum [73] amulti-player Doom port (asymp400kLOC) and QuakeSpasm [71](asymp88kLOC) a port of Quake While these games support cus-tom record and replay by logging high level commands byworking at the threading and system call level tsan11rec canfacilitate record and replay of bugs that rely on low-levelinteractions to manifest We discuss below successful recordand replay of a bug in Zandronum that arises due to commu-nication of game data between the game client and serverwhich is not present in the gamersquos native replay

586

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 3 Execution times (s) for tool configurations across the pbzip and PARSEC benchmarks averaged across 10 runsStandard deviation is shown (in parentheses)

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 92 (031) 117 (049) 664 (311) 772 (187) 181 (030) 123 (066) 182 (030) 129 (129)blackscholes 04 (003) 08 (007) 10 (001) 20 (000) 07 (007) 07 (006) 07 (007) 07 (007)fluidanimate 08 (004) 160 (127) 21 (001) 385 (049) 464 (339) 390 (267) 504 (209) 398 (196)streamcluster 17 (019) 388 (441) 1133 (313) 1811 (179) 1030 (059) 485 (269) 1028 (018) 416 (261)bodytrack 05 (002) 72 (036) 38 (094) 327 (094) 500 (040) 73 (033) 500 (078) 78 (023)ferret 12 (007) 140 (086) 87 (044) 815 (220) 164 (066) 146 (060) 167 (071) 145 (037)

Table 4Overhead compared with native execution for tool configurations across the pbzip and PARSEC benchmarks computedfrom the results of Table 3

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 10times 13times 72times 84times 20times 13times 20times 14timesblackscholes 10times 20times 27times 53times 19times 19times 19times 18timesfluidanimate 10times 203times 27times 489times 590times 497times 642times 506timesstreamcluster 10times 224times 654times 1045times 595times 280times 593times 240timesbodytrack 10times 135times 72times 614times 938times 138times 939times 147timesferret 10times 119times 74times 695times 140times 125times 143times 124times

Our initial attempts to replay these SDL-based gamesfailed due to communication between the application andthe closed and proprietary NVIDIA OpenGL module on ourexperimental platform via ioctl syscallsWeworked aroundthis by ignoring ioctl during recording and letting it runnatively during replay This works because communicationwith the display driver has no effect on the game logic Dis-play interaction led to further problems with initialization ofthe input module We resorted to adapting the scheduler tolet the application run uninstrumented until SDL module ini-tialization had completed adding a custom scheduler hookto allow the application and scheduler to synchronize relatedto this These problems are not specific to our approach ortoolmdashindeed rr cannot handle these SDL-based games forsimilar reasonsmdashbut are rather a fundamental limitation ofrecording and replaying applications that make heavy useof IO To handle such applications one either needs to fullymock out IO components requiring a tremendous engi-neering effort or carefully determine those components thatshould not be instrumented and specify this via annotationsWith these workarounds we were able to handle both

games such that gameplay is displayed on screen during re-play gameplay would not be visible if the IO subsystem hadbeen mocked out and visibility might be useful in debuggingproblems that manifest as visual artifacts

Experimental setup Measuring game performance in away that allows us to compare the overhead of our sched-uling strategies is non-trivial The only metric we have isthe frame-rate (fps)mdashthe number of frames drawn to thescreen per second QuakeSpasm and Zandronum are capped

at 60 fps and will try to maintain this frame-rate dipping ifthey cannot keep up If the frame-rate is reduced too muchthe games become unplayable As a best-effort evaluationmechanism we report on whether the games are playableunder various tool combinations Additionally we found thatit was possible to remove the frame cap for Quakespasmso we report ball-park figures for the overheads of varioustool configurations when playing this game un-capped (Wecould not find a way to reliably remove the frame cap forZandronum) As mentioned previously we do not comparewith rr as it cannot record or replay the games We used Zan-dronum revision 10013dd3c3b57023f updated to use SDL2QuakeSpasm version 0930 and SDL version 205

Results anddiscussion With the random tsan11rec sched-uler Zandronum was unplayable even with recording dis-abled the frame-rate dropped to below 1 fps This is due tothe random scheduler starving the main thread by frequentlyscheduling other less critical threads (eg the audio thread)In contrast the queue scheduler could maintain the full 60fps with recording enabled for 100 seconds of play the demosize grew to just under 8MB of which 65MB was for syscalls

To test tsan11recrsquos ability to replay network communica-tion we found a previously fixed Zandronum bug [88] thatrelies on an error in this communication to manifest Thisbug involves incorrect game state information being sentfrom the server to the client during a map change We repli-cated the bug with a server and two clients one of whichwas recording After about 12 minutes the bug appearedand resulted in a demo file of 43MB We then replayed thedemo and the bug appeared as expected This demonstrates

587

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

that our tool can be used to accurately capture and facilitatereplay of bugs in large networked applicationsFor QuakeSpasm we found that it was possible to play

the game without dropping below 60 fps using tsan11 and alltsan11rec configurations To further investigate the overheadof each tool configuration on this case study we removedthe fps cap then played the game 5 times per tool configu-ration for 90 seconds per play enabling a mode where thegamersquos fps is periodically appended to a file We made a besteffort to play the game in a similar manner on each run butinevitably there will still be high variation in game activitybetween plays Indicative results are shown in Table 5 Thernd and queue configurations refer to tsan11rec with the ran-dom and queue strategies respectively and with recordingdisabled while the ldquo+ recrdquo tool configurations are similarbut with recording enabled The ldquoOverheadrdquo column showsthe overhead observed compared with native execution Thetake-away from these results is that the instrumentationoverhead for both tsan11 and tsan11rec is surprisingly mod-est (generally less than 2times) and that the additional overheadassociated with enabling recording in tsan11rec is low

55 Limitations SQLite and SpiderMonkeyA downside of our sparse approach to record and replay isthat different applications may have incompatible require-ments regarding what should be recorded and what mustnot be recorded For example recording memory layout andattempting to enforce the same layout on replay would notonly slow down the SDL games (see sect54) to the point ofbeing unplayable but would also cause problems related tocommunication with the display driver Yet the behaviour ofsome programs will depend on the memory layout such asiterating over an ordered C++ container that holds pointersIn particular we experimented with applying tsan11rec

to the SQLite database management library [76] and to Spi-derMonkey Firefoxrsquos JavaScript management engine [60]While tsan11rec was applicable for controlled schedulingof these applications we found that replay would rapidlydesynchronise due to memory layout nondeterminism caus-ing conditionals that rely on the values of pointers to evaluatedifferently during replay Tools such as rr can handle theseprograms reliably by enforcing the same memory layoutThis is a trade-off the non-sparse approach of rr can leadto higher overheads as demonstrated in sect52 and sect53 Analternative to adapting the record-and-replay tool so thatit always enforces memory layout determinism would beto adapt the application of interest so that default memoryallocation is replaced with a deterministic memory allocator

6 Related WorkControlled scheduling A large amount of work has goneinto the use of scheduling strategies as a form of state spaceexploration (eg [29 30 61 62 78 87]) and on techniques

aimed at reducing the size of the state space such as dynamicpartial-order reduction [31 89]) A particularly notable con-trolled scheduling tool in terms of successful practical ap-plication is Microsoftrsquos CHESS [62] which aims to system-atically explore all interleavings of a test scenario Similarto our approach each visible instruction has an associatedcustom wrapper that intercepts the real instruction callinginto the CHESS schedulerSchedule bounding techniques notably preemption- and

delay-bounding [26 61] have been shown to be successful inprioritising the order in which thread schedules are exploredduring controlled concurrency testing They prioritise ex-ploring schedules that exhibit small numbers of preemptionsbetween threads in line with empirical evidence that bugsrarely require large numbers of preemptions in order to man-ifest [56] Combining such techniques with the tsan11recalgorithm is an appealing idea in principle but is hinderedby the assumption that the program under test takes a fixedinput and that the scheduler is the only source of nonde-terminism This assumption allows running the programagain and again trying different schedules In the contextof tsan11rec which can be used to record and replay ap-plications where the environment presents other forms ofnondeterminism the manner in which the program interactswith its environment is captured with respect to a particularthread schedule and other thread schedules might involvecompletely different environmental interactions We believea more promising approach would be to bring ideas from theprobabilistic concurrency testing (PCT) algorithm [12] to thetsan11rec setting to introduce a degree of skewing to ourrandom strategy so that it explores more diverse schedules

Record and replay Record and replay has been a signifi-cant area of research with many tools being created to facil-itate it [2 9 19 24 32 36 39 40 43ndash45 47 50 53 54 57 5765ndash67 72 79 81] The general premise behind them is simi-lar identify order nondeterminism and input nondeterminismand create techniques to capture them while recording andcontrol them during replayVarious tools extend the OS in some way or require spe-

cific hardware [2 6 9 19 20 24 47 50 79 81] This has thebenefit of giving the tool access to much more of the systemsuch as memory pages and process information For exam-ple Scribe [47] will directly modify the system schedulerinstead of coercing it and achieves slowdowns as low as105times However this severely hits the usability of the toolas it requires the user to deploy a modified OS

Other tools reside entirely in user space [10 15 32 34 43ndash45 49 51 53 54 57 66 67 72 80 84] and trade performancefor usability This is the category that tsan11rec falls intoEase of use is particularly important in persuading users toadopt the tool rr [66] in particular allows the user to recorda program by simply passing the binary to rr as a parameter

588

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 5 Indicative frames per second (fps) result captured by playing Quakespasm for 90 seconds five times per toolconfiguration and capturing the fps reports recorded by the game Frame-rate results are reported for the minimum maximummedian and mean case as well as the 25th and 75th quartiles

Min 25th Median 75th Max Mean OverheadNative 291 369 400 428 502 4004 10timestsan11 125 193 225 275 431 2338 17timesrnd 128 205 238 281 421 2471 16timesqueue 84 188 233 273 435 2333 17timesrnd + rec 113 178 212 253 366 2168 18timesqueue + rec 86 161 185 227 348 1930 21times

and as such has become the definitive tool for record and re-play We have performed an extensive comparison with rr insect5 and note that while rr outclasses tsan11rec in some appli-cations and can handle applications that are out of scope fortsan11rec (see sect55) rr shows significantly higher overheadcompared with tsan11rec for a number of applications thatrely on a high degree of parallelism for performance Furtherour sparse approach with suitable workarounds enablesrecord and replay for graphical applications (the SDL-basedgames of sect54) that rr cannot currently handleBecause it builds on tsan11 which itself uses compiler

instrumentation and a modified libcxx tsan11rec shares sim-ilarities with tools that depend on language implementationor library-level support [1 13 13 18 36 58] Notable exam-ples here include R2 [36] and IntelliTrace [58]Whole system replay aims to record all system nonde-

terminism [14 19 22 24 25 27 53 77] Among these therecent iReplayer tool [53] performs record and replay in-situavoiding many of the problems (eg memory layout issues)that otherwise come from running the record and replayexecutions under different processes

Some tools focus on the order-nondeterminism allowingthem to retain their parallelism and thus reducing the over-head of multi-threaded application [25 42 59 63 70 86]Castor [57] will provide each thread with its own bufferfor storing information and serialize them at a later timetsan11rec also fits into this category as it will both preserverparallelism of invisible operations and apply a schedulingstrategy to resolve this nondeterminism

An alternative to recording a programrsquos nondeterminismis to remove it making some or all aspects of the programdeterministic [5 8 16 17 21 55] For example Dthreads [55]ensures that memory accesses are deterministic on eachexecution Such approaches can have a significant probeproblem by removing the behaviour necessary for certainbugs to manifest in return for avoiding the performanceoverhead associated with handling order-nondeterminism

Multi-version execution Multi-version (or multi-variant)execution (MVE) is a method for concurrently running mul-tiple processes that are expected to behave in a semanticallysimilar manner [41 46 69 82 83] MVE can be used to detect

security vulnerabilities in applications if a variant divergesthis could indicate that an attacker has modified the processin some way [46 82 83] It can also be used for running dif-ferent analyses on identical processes that would not workwhen run together on the same process such as the clangsanitizers [69] Most MVE systems hinge on a special moni-tor thread that controls the generation an maintenance ofa number of variants Keeping the variants in sync with re-spect to nondeterministic behaviours presents many of thesame problems that are associated with record and replay

7 ConclusionWe have presented tsan11rec which brings together con-trolled scheduling record and replay and dynamic data racedetection for the dynamic analysis of CC++11 applicationsOur experimental evaluation demonstrates that the tool isin many cases competitive in terms of performance withrr a state-of-the-art record and replay tool in some casesout-performing rr due to tsan11recrsquos ability to preserve par-allelism in applications under test to a high degree We havealso shown that our tool is capable of recording and replayingSDL-based video games by exploiting our sparse approachto avoid recording aspects of gamedisplay communicationthat are fundamentally hard to control The flip side of oursparse approach is that by limiting what is recorded ourtool desynchronises on uncontrolled forms of nondetermin-ism such as that related to memory layout by capturingthis form of nondeterminism tools such as rr do not sufferfrom this problem Two exciting avenues for future workinclude investigating a spectrum of recording granularitiesto bridge the gap between our sparse approach and stricterapproaches in a configurable manner and to investigatebug detection using a richer range of scheduling strategiesincluding schedule bounding [26 61 62] and probabilisticconcurrency testing [12]

AcknowledgmentsThis work was supported by a PhD studentship funded byGCHQ and EPSRC projects EPN0263141 and EPR0068651

589

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

References[1] Bowen Alpern Jong-Deok Choi Ton Ngo Manu Sridharan and

John M Vlissides 2001 A Perturbation-Free Replay Platform forCross-Optimized Multithreaded Applications In Proceedings of the15th International Parallel amp Distributed Processing Symposium (IPDPS-01) San Francisco CA USA April 23-27 2001 IEEE Computer Society23 httpsdoiorg101109IPDPS2001924957

[2] GautamAltekar and Ion Stoica 2009 ODR output-deterministic replayfor multicore debugging In Proceedings of the 22nd ACM Symposiumon Operating Systems Principles 2009 SOSP 2009 Big Sky Montana USAOctober 11-14 2009 Jeanna Neefe Matthews and Thomas E Anderson(Eds) ACM 193ndash206 httpsdoiorg10114516295751629594

[3] Apache Software Foundation 2018 Apache httpd httpshttpdapacheorgdevdevnoteshtml

[4] Remzi H Arpaci-Dusseau and Brad Chen (Eds) 2010 9th USENIXSymposium on Operating Systems Design and Implementation OSDI2010 October 4-6 2010 Vancouver BC Canada Proceedings USENIXAssociation httpwwwusenixorgeventosdi10techfull_papersosdi10_proceedingspdf

[5] Amittai Aviram Shu-Chun Weng Sen Hu and Bryan Ford 2010 Ef-ficient System-Enforced Deterministic Parallelism See [4] 193ndash206httpwwwusenixorgeventsosdi10techfull_papersAvirampdf

[6] David F Bacon and Seth Copen Goldstein 1991 Hardware-assisted Re-play of Multiprocessor Programs In Proceedings of the 1991 ACMONRWorkshop on Parallel and Distributed Debugging (PADD rsquo91) ACMNew York NY USA 194ndash206 httpsdoiorg101145122759122777

[7] Mark Batty Scott Owens Susmit Sarkar Peter Sewell and TjarkWeber2011 Mathematizing C++ concurrency In POPL 55ndash66

[8] Tom Bergan Owen Anderson Joseph Devietti Luis Ceze and DanGrossman 2010 CoreDet a compiler and runtime system for deter-ministic multithreaded execution See [38] 53ndash64 httpsdoiorg10114517360201736029

[9] Tom Bergan Nicholas Hunt Luis Ceze and Steven D Gribble 2010Deterministic Process Groups in dOS See [4] 177ndash191 httpwwwusenixorgeventsosdi10techfull_papersBerganpdf

[10] Sanjay Bhansali Wen-Ke Chen Stuart de Jong Andrew EdwardsRon Murray Milenko Drinić Darek Mihočka and Joe Chau 2006Framework for Instruction-level Tracing and Analysis of Program Ex-ecutions In Proceedings of the 2Nd International Conference on VirtualExecution Environments (VEE rsquo06) ACM New York NY USA 154ndash163httpsdoiorg10114511347601220164

[11] Christian Bienia Sanjeev Kumar Jaswinder Pal Singh and Kai Li 2008The PARSEC benchmark suite characterization and architectural im-plications In 17th International Conference on Parallel Architecture andCompilation Techniques PACT 2008 Toronto Ontario Canada October25-29 2008 Andreas Moshovos David Tarditi and Kunle Olukotun(Eds) ACM 72ndash81 httpsdoiorg10114514541151454128

[12] Sebastian Burckhardt Pravesh Kothari Madanlal Musuvathi and San-tosh Nagarakatte 2010 A randomized scheduler with probabilisticguarantees of finding bugs See [38] 167ndash178 httpsdoiorg10114517360201736040

[13] Brian Burg Richard Bailey Andrew J Ko and Michael D Ernst 2013Interactive recordreplay for web application debugging In The 26thAnnual ACM Symposium on User Interface Software and TechnologyUISTrsquo13 St Andrews United Kingdom October 8-11 2013 ShahramIzadi Aaron J Quigley Ivan Poupyrev and Takeo Igarashi (Eds) ACM473ndash484 httpsdoiorg10114525019882502050

[14] Anton Burtsev David Johnson Mike Hibler Eric Eide and John Regehr2016 Abstractions for Practical Virtual Machine Replay In Proceedingsof the 12th ACM SIGPLANSIGOPS International Conference on VirtualExecution Environments Atlanta GA USA April 2-3 2016 VishakhaGupta-Cledat Donald E Porter and Vivek Sarkar (Eds) ACM 93ndash106httpsdoiorg10114528922422892257

[15] M E Chastain 1999 MEC (January 1999) httpslwnnet19990121amechtml

[16] Heming Cui Jiriacute Simsa Yi-Hong Lin Hao Li Ben Blum Xinan XuJunfeng Yang Garth A Gibson and Randal E Bryant 2013 Parrota practical runtime for deterministic stable and reliable threads InACM SIGOPS 24th Symposium on Operating Systems Principles SOSP rsquo13Farmington PA USA November 3-6 2013 Michael Kaminsky and MikeDahlin (Eds) ACM 388ndash405 httpsdoiorg10114525173492522735

[17] Heming Cui Jingyue Wu John Gallagher Huayang Guo and JunfengYang 2011 Efficient deterministic multithreading through schedulerelaxation See [85] 337ndash351 httpsdoiorg10114520435562043588

[18] P Deva 2018 Chronon (2018) httpchrononsystemscomblogdesign-and-architecture-ofthe-chronon-record-0

[19] David Devecsery Michael Chow Xianzheng Dou Jason Flinn andPeter M Chen 2014 Eidetic Systems In 11th USENIX Symposium onOperating Systems Design and Implementation OSDI rsquo14 BroomfieldCO USA October 6-8 2014 Jason Flinn and Hank Levy (Eds) USENIXAssociation 525ndash540 httpswwwusenixorgconferenceosdi14technical-sessionspresentationdevecsery

[20] Joseph Devietti Brandon Lucia Luis Ceze andMark Oskin 2009 DMPdeterministic shared memory multiprocessing In Proceedings of the14th International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2009 Washington DC USAMarch 7-11 2009 Mary Lou Soffa and Mary Jane Irwin (Eds) ACM85ndash96 httpsdoiorg10114515082441508255

[21] Joseph Devietti Jacob Nelson Tom Bergan Luis Ceze and Dan Gross-man 2011 RCDC a relaxed consistency deterministic computer See[37] 67ndash78 httpsdoiorg10114519503651950376

[22] Brendan Dolan-Gavitt Josh Hodosh Patrick Hulin Tim Leek andRyan Whelan 2015 Repeatable Reverse Engineering with PANDAIn Proceedings of the 5th Program Protection and Reverse EngineeringWorkshop (PPREW-5) ACM New York NY USA Article 4 11 pageshttpsdoiorg10114528438592843867

[23] Richard Draves and Robbert van Renesse (Eds) 2008 8th USENIX Sym-posium on Operating Systems Design and Implementation OSDI 2008December 8-10 2008 San Diego California USA Proceedings USENIXAssociation httpswwwusenixorgpublicationsproceedingsf[0]=im_group_audience3A73

[24] George W Dunlap Samuel T King Sukru Cinar Murtaza A Bas-rai and Peter M Chen 2002 ReVirt Enabling Intrusion AnalysisThrough Virtual-Machine Logging and Replay In 5th Symposium onOperating System Design and Implementation (OSDI 2002) Boston Mas-sachusetts USA December 9-11 2002 David E Culler and Peter Dr-uschel (Eds) USENIX Association httpwwwusenixorgeventsosdi02techdunlaphtml

[25] George W Dunlap Dominic G Lucchetti Michael A Fetterman andPeter M Chen 2008 Execution replay of multiprocessor virtual ma-chines In Proceedings of the 4th International Conference on VirtualExecution Environments VEE 2008 Seattle WA USA March 5-7 2008David Gregg Vikram S Adve and Brian N Bershad (Eds) ACM121ndash130 httpsdoiorg10114513462561346273

[26] Michael Emmi Shaz Qadeer and Zvonimir Rakamaric 2011 Delay-bounded scheduling In Proceedings of the 38th ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages POPL 2011 AustinTX USA January 26-28 2011 Thomas Ball and Mooly Sagiv (Eds)ACM 411ndash422 httpsdoiorg10114519263851926432

[27] Jakob Engblom Daniel Aarno and Bengt Werner 2010 Full-SystemSimulation from Embedded to High-Performance Systems Springer USBoston MA 25ndash45 httpsdoiorg101007978-1-4419-6175-4_3

[28] Cormac Flanagan and Stephen N Freund 2009 FastTrack efficientand precise dynamic race detection In Proceedings of the 2009 ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI 2009 Dublin Ireland June 15-21 2009 Michael Hind andAmer Diwan (Eds) ACM 121ndash133 httpsdoiorg1011451542476

590

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

1542490[29] Cormac Flanagan and Stephen N Freund 2010 Adversarial memory

for detecting destructive races In Proceedings of the 2010 ACM SIGPLANConference on Programming Language Design and Implementation PLDI2010 Toronto Ontario Canada June 5-10 2010 Benjamin G Zornand Alexander Aiken (Eds) ACM 244ndash254 httpsdoiorg10114518065961806625

[30] Cormac Flanagan and Stephen N Freund 2010 The RoadRunnerdynamic analysis framework for concurrent programs In Proceedingsof the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis forSoftware Tools and Engineering PASTErsquo10 Toronto Ontario CanadaJune 5-6 2010 Sorin Lerner and Atanas Rountev (Eds) ACM 1ndash8httpsdoiorg10114518066721806674

[31] Cormac Flanagan and Patrice Godefroid 2005 Dynamic partial-orderreduction for model checking software In Proceedings of the 32nd ACMSIGPLAN-SIGACT Symposium on Principles of Programming LanguagesPOPL 2005 Long Beach California USA January 12-14 2005 JensPalsberg and Martiacuten Abadi (Eds) ACM 110ndash121 httpsdoiorg10114510403051040315

[32] Dennis Geels Gautam Altekar Scott Shenker and Ion Stoica 2006Replay Debugging for Distributed Applications (Awarded Best Paper)In Proceedings of the 2006 USENIX Annual Technical Conference BostonMA USA May 30 - June 3 2006 Atul Adya and Erich M Nahum (Eds)USENIX 289ndash300 httpwwwusenixorgeventsusenix06techgeelshtml

[33] Patrice Godefroid 2005 Software Model Checking The VeriSoftApproach Formal Methods in System Design 26 2 (2005) 77ndash101httpsdoiorg101007s10703-005-1489-x

[34] C Gottbrath 2008 Reverse debugging with the TotalView debugger(May 2008)

[35] David Grove and Steve Blackburn (Eds) 2015 Proceedings of the36th ACM SIGPLAN Conference on Programming Language Design andImplementation Portland OR USA June 15-17 2015 ACM httpdlacmorgcitationcfmid=2737924

[36] Zhenyu Guo Xi Wang Jian Tang Xuezheng Liu Zhilei Xu Ming WuM Frans Kaashoek and Zheng Zhang 2008 R2 An Application-LevelKernel for Record and Replay See [23] 193ndash208 httpwwwusenixorgeventsosdi08techfull_papersguoguopdf

[37] Rajiv Gupta and Todd C Mowry (Eds) 2011 Proceedings of the 16thInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2011 Newport Beach CAUSA March 5-11 2011 ACM

[38] James C Hoe and Vikram S Adve (Eds) 2010 Proceedings of the 15thInternational Conference on Architectural Support for Programming Lan-guages and Operating Systems ASPLOS 2010 Pittsburgh PennsylvaniaUSA March 13-17 2010 ACM

[39] Nima Honarmand and Josep Torrellas 2014 RelaxReplay recordand replay for relaxed-consistency multiprocessors In ArchitecturalSupport for Programming Languages and Operating Systems ASPLOSrsquo14 Salt Lake City UT USA March 1-5 2014 Rajeev BalasubramonianAl Davis and Sarita V Adve (Eds) ACM 223ndash238 httpsdoiorg10114525419402541979

[40] Nima Honarmand and Josep Torrellas 2014 Replay debugging Lever-aging record and replay for program debugging In ACMIEEE 41stInternational Symposium on Computer Architecture ISCA 2014 Min-neapolis MN USA June 14-18 2014 IEEE Computer Society 455ndash456httpsdoiorg101109ISCA20146853229

[41] Petr Hosek and Cristian Cadar 2015 VARAN the Unbelievable AnEfficient N-version Execution Framework In Proceedings of the Twenti-eth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS rsquo15 Istanbul Turkey March14-18 2015 Oumlzcan Oumlzturk Kemal Ebcioglu and Sandhya Dwarkadas(Eds) ACM 339ndash353 httpsdoiorg10114526943442694390

[42] Derek Hower and Mark D Hill 2008 Rerun Exploiting Episodes forLightweight Memory Race Recording In 35th International Symposiumon Computer Architecture (ISCA 2008) June 21-25 2008 Beijing China265ndash276 httpsdoiorg101109ISCA200826

[43] Jeff Huang Peng Liu and Charles Zhang 2010 LEAP lightweightdeterministic multi-processor replay of concurrent java programsIn Proceedings of the 18th ACM SIGSOFT International Symposium onFoundations of Software Engineering 2010 Santa Fe NM USA November7-11 2010 Gruia-Catalin Roman and Andreacute van der Hoek (Eds) ACM207ndash216 httpsdoiorg10114518822911882323

[44] Jeff Huang Charles Zhang and Julian Dolby 2013 CLAP recordinglocal executions to reproduce concurrency failures In ACM SIGPLANConference on Programming Language Design and Implementation PLDIrsquo13 Seattle WA USA June 16-19 2013 Hans-Juergen Boehm and Cor-mac Flanagan (Eds) ACM 141ndash152 httpsdoiorg10114524621562462167

[45] Shiyou Huang Bowen Cai and Jeff Huang 2017 Towards Production-Run Heisenbugs Reproduction on Commercial Hardware In 2017USENIX Annual Technical Conference USENIX ATC 2017 Santa ClaraCA USA July 12-14 2017 403ndash415 httpswwwusenixorgconferenceatc17technical-sessionspresentationhuang

[46] Koen Koning Herbert Bos and Cristiano Giuffrida 2016 Secure andEfficient Multi-Variant Execution Using Hardware-Assisted ProcessVirtualization In 46th Annual IEEEIFIP International Conference onDependable Systems and Networks DSN 2016 Toulouse France June28 - July 1 2016 IEEE Computer Society 431ndash442 httpsdoiorg101109DSN201646

[47] Oren Laadan Nicolas Viennot and Jason Nieh 2010 Transparentlightweight application execution replay on commodity multiproces-sor operating systems In SIGMETRICS 2010 Proceedings of the 2010ACM SIGMETRICS International Conference on Measurement and Mod-eling of Computer Systems New York New York USA 14-18 June 2010Vishal Misra Paul Barford and Mark S Squillante (Eds) ACM 155ndash166 httpsdoiorg10114518110391811057

[48] Leslie Lamport 1978 Time Clocks and the Ordering of Events ina Distributed System Commun ACM 21 7 (1978) 558ndash565 httpsdoiorg101145359545359563

[49] Dongyoon Lee Peter M Chen Jason Flinn and Satish Narayanasamy2012 Chimera hybrid program analysis for determinism In ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI rsquo12 Beijing China - June 11 - 16 2012 Jan Vitek Haibo Linand Frank Tip (Eds) ACM 463ndash474 httpsdoiorg10114522540642254119

[50] Dongyoon Lee Benjamin Wester Kaushik Veeraraghavan SatishNarayanasamy Peter M Chen and Jason Flinn 2010 Respec efficientonline multiprocessor replayvia speculation and external determinismSee [38] 77ndash90 httpsdoiorg10114517360201736031

[51] Kyu Hyung Lee Dohyeong Kim and Xiangyu Zhang 2014Infrastructure-Free Logging and Replay of Concurrent Executionon Multiple Cores In ECOOP 2014 - Object-Oriented Programming- 28th European Conference Uppsala Sweden July 28 - August 12014 Proceedings (Lecture Notes in Computer Science) Richard EJones (Ed) Vol 8586 Springer 232ndash256 httpsdoiorg101007978-3-662-44202-9_10

[52] Christopher Lidbury and Alastair F Donaldson 2017 Dynamic racedetection for C++11 In Proceedings of the 44th ACM SIGPLAN Sympo-sium on Principles of Programming Languages POPL 2017 Paris FranceJanuary 18-20 2017 Giuseppe Castagna and Andrew D Gordon (Eds)ACM 443ndash457 httpsdoiorg1011453009837

[53] Hongyu Liu Sam Silvestro Wei Wang Chen Tian and Tongping Liu2018 iReplayer in-situ and identical record-and-replay for multi-threaded applications In Proceedings of the 39th ACM SIGPLAN Confer-ence on Programming Language Design and Implementation PLDI 2018

591

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Philadelphia PA USA June 18-22 2018 Jeffrey S Foster and Dan Gross-man (Eds) ACM 344ndash358 httpsdoiorg10114531923663192380

[54] Peng Liu Xiangyu Zhang Omer Tripp and Yunhui Zheng 2015 Lightreplay via tightly bounded recording See [35] 55ndash64 httpsdoiorg10114527379242738001

[55] Tongping Liu Charlie Curtsinger and EmeryD Berger 2011 Dthreadsefficient deterministic multithreading See [85] 327ndash336 httpsdoiorg10114520435562043587

[56] Shan Lu Soyeon Park Eunsoo Seo and Yuanyuan Zhou 2008 Learn-ing from mistakes a comprehensive study on real world concurrencybug characteristics In Proceedings of the 13th International Confer-ence on Architectural Support for Programming Languages and Op-erating Systems ASPLOS 2008 Seattle WA USA March 1-5 2008Susan J Eggers and James R Larus (Eds) ACM 329ndash339 httpsdoiorg10114513462811346323

[57] Ali Joseacute Mashtizadeh Tal Garfinkel David Terei David Maziegraveresand Mendel Rosenblum 2017 Towards Practical Default-On Multi-Core RecordReplay In Proceedings of the Twenty-Second InternationalConference on Architectural Support for Programming Languages andOperating Systems ASPLOS 2017 Xirsquoan China April 8-12 2017 YunjiChen Olivier Temam and John Carter (Eds) ACM 693ndash708 httpsdoiorg10114530376973037751

[58] Microsoft 2018 Understanding IntelliTrace part I What the $ isIntelliTrace (2018) httpsblogsmsdnmicrosoftcomzainnab20130212understanding-intellitrace-part-i-what-the-is-intellitrace

[59] Pablo Montesinos Luis Ceze and Josep Torrellas 2008 DeLoreanRecording and Deterministically Replaying Shared-Memory Multi-processor Execution Effciently In 35th International Symposium onComputer Architecture (ISCA 2008) June 21-25 2008 Beijing China289ndash300 httpsdoiorg101109ISCA200836

[60] Mozilla 2018 SpiderMonkey httpsdevelopermozillaorgen-USdocsMozillaProjectsSpiderMonkey

[61] MadanlalMusuvathi and Shaz Qadeer 2007 Iterative context boundingfor systematic testing of multithreaded programs In Proceedings of theACM SIGPLAN 2007 Conference on Programming Language Design andImplementation San Diego California USA June 10-13 2007 JeanneFerrante and Kathryn S McKinley (Eds) ACM 446ndash455 httpsdoiorg10114512507341250785

[62] Madanlal Musuvathi Shaz Qadeer Thomas Ball Geacuterard Basler Pi-ramanayagam Arumuga Nainar and Iulian Neamtiu 2008 Find-ing and Reproducing Heisenbugs in Concurrent Programs See[23] 267ndash280 httpwwwusenixorgeventsosdi08techfull_papersmusuvathimusuvathipdf

[63] Satish Narayanasamy Gilles Pokam and Brad Calder 2005 BugNetContinuously Recording Program Execution for Deterministic ReplayDebugging In 32st International Symposium on Computer Architecture(ISCA 2005) 4-8 June 2005 Madison Wisconsin USA IEEE ComputerSociety 284ndash295 httpsdoiorg101109ISCA200516

[64] Brian Norris and Brian Demsky 2013 CDSchecker checking concur-rent data structures written with CC++ atomics In Proceedings ofthe 2013 ACM SIGPLAN International Conference on Object OrientedProgramming Systems Languages amp Applications OOPSLA 2013 part ofSPLASH 2013 Indianapolis IN USA October 26-31 2013 131ndash150

[65] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2016 Lightweight User-Space Record And ReplayCoRR abs161002144 (2016) arXiv161002144 httparxivorgabs161002144

[66] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2017 Engineering Record and Replay for Deploy-ability In 2017 USENIX Annual Technical Conference USENIX ATC 2017Santa Clara CA USA July 12-14 2017 377ndash389 httpswwwusenixorgconferenceatc17technical-sessionspresentationocallahan

[67] Harish Patil Cristiano Pereira Mack Stallcup Gregory Lueck andJames Cownie 2010 PinPlay a framework for deterministic replay

and reproducible analysis of parallel programs In Proceedings of theCGO 2010 The 8th International Symposium on Code Generation andOptimization Toronto Ontario Canada April 24-28 2010 AndreasMoshovos J Gregory Steffan Kim M Hazelwood and David R Kaeli(Eds) ACM 2ndash11 httpsdoiorg10114517729541772958

[68] pbzip2 development team 2018 pbzip2 httpslaunchpadnetpbzip2[69] Luiacutes Pina Anastasios Andronidis and Cristian Cadar 2018 FreeDA

deploying incompatible stock dynamic analyses in production viamulti-version execution In Proceedings of the 15th ACM InternationalConference on Computing Frontiers CF 2018 Ischia Italy May 08-102018 David R Kaeli and Miquel Pericagraves (Eds) ACM 1ndash10 httpsdoiorg10114532032173203237

[70] Gilles Pokam Klaus Danne Cristiano Pereira Rolf Kassa Tim KranichShiliang Hu Justin Emile Gottschlich Nima Honarmand Nathan Daut-enhahn Samuel T King and Josep Torrellas 2013 QuickRec proto-typing an intel architecture extension for record and replay of mul-tithreaded programs In The 40th Annual International Symposiumon Computer Architecture ISCArsquo13 Tel-Aviv Israel June 23-27 2013Avi Mendelson (Ed) ACM 643ndash654 httpsdoiorg10114524859222485977

[71] QuakeSpasm 2018 QuakeSpasm An engine for iD softwarersquos Quakehttpquakespasmsourceforgenet

[72] Yasushi Saito 2005 Jockey a user-space library for record-replaydebugging In Proceedings of the Sixth International Workshop on Auto-mated Debugging AADEBUG 2005 Monterey California USA Septem-ber 19-21 2005 Clinton Jeffery Jong-Deok Choi and Raimondas Lence-vicius (Eds) ACM 69ndash76 httpsdoiorg10114510851301085139

[73] Torr Samaho 2018 Zandronum httpszandronumcom[74] Konstantin Serebryany and Timur Iskhodzhanov 2009 ThreadSani-

tizer Data Race Detection in Practice In WBIA 62ndash71[75] Simple DirectMedia Layer 2018 SDL 20 library httpswwwlibsdl

orgdownload-20php[76] SQLite 2018 SQLite 3240 httpssqliteorgreleaselog3_24_0html[77] Deepa Srinivasan and Xuxian Jiang 2012 Time-Traveling Forensic

Analysis of VM-Based High-Interaction Honeypots In Security and Pri-vacy in Communication Networks Muttukrishnan Rajarajan Fred PiperHaining Wang and George Kesidis (Eds) Springer Berlin HeidelbergBerlin Heidelberg 209ndash226

[78] Paul Thomson Alastair F Donaldson and Adam Betts 2014 Concur-rency testing using schedule bounding an empirical study In ACMSIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPPoPP rsquo14 Orlando FL USA February 15-19 2014 Joseacute E Moreira andJames R Larus (Eds) ACM 15ndash28 httpsdoiorg10114525552432555260

[79] Joseph Tucek Shan Lu Chengdu Huang Spiros Xanthos andYuanyuan Zhou 2007 Triage diagnosing production run failuresat the userrsquos site In Proceedings of the 21st ACM Symposium on Oper-ating Systems Principles 2007 SOSP 2007 Stevenson Washington USAOctober 14-17 2007 Thomas C Bressoud and M Frans Kaashoek (Eds)ACM 131ndash144 httpsdoiorg10114512942611294275

[80] Undo 2018 Reversible debugging tools for CC++ on Linux amp Android(2018) httpundo-softwarecom

[81] Kaushik Veeraraghavan Dongyoon Lee Benjamin Wester JessicaOuyang Peter M Chen Jason Flinn and Satish Narayanasamy 2011DoublePlay parallelizing sequential logging and replay See [37] 15ndash26 httpsdoiorg10114519503651950370

[82] Stijn Volckaert Bart Coppens Bjorn De Sutter Koen De BosscherePer Larsen and Michael Franz 2017 Taming Parallelism in a Multi-Variant Execution Environment In Proceedings of the Twelfth EuropeanConference on Computer Systems EuroSys 2017 Belgrade Serbia April23-26 2017 Gustavo Alonso Ricardo Bianchini and Marko Vukolic(Eds) ACM 270ndash285 httpsdoiorg10114530641763064178

[83] Stijn Volckaert Bart Coppens Alexios Voulimeneas Andrei Home-scu Per Larsen Bjorn De Sutter and Michael Franz 2016 Secure

592

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

and Efficient Application Monitoring and Replication In 2016 USENIXAnnual Technical Conference USENIX ATC 2016 Denver CO USA June22-24 2016 Ajay Gulati and Hakim Weatherspoon (Eds) USENIXAssociation 167ndash179 httpswwwusenixorgconferenceatc16technical-sessionspresentationvolckaert

[84] YanWang Harish Patil Cristiano Pereira Gregory Lueck Rajiv Guptaand Iulian Neamtiu 2014 DrDebug Deterministic Replay based CyclicDebugging with Dynamic Slicing In 12th Annual IEEEACM Interna-tional Symposium on Code Generation and Optimization CGO 2014Orlando FL USA February 15-19 2014 David R Kaeli and TippMoseley(Eds) ACM 98 httpsdoiorg10114525441372544152

[85] Ted Wobber and Peter Druschel (Eds) 2011 Proceedings of the 23rdACM Symposium on Operating Systems Principles 2011 SOSP 2011Cascais Portugal October 23-26 2011 ACM httpsdoiorg1011452043556

[86] Min Xu Rastislav Bodiacutek and Mark D Hill 2003 A Flight DataRecorder for Enabling Full-System Multiprocessor Deterministic

Replay In 30th International Symposium on Computer Architecture(ISCA 2003) 9-11 June 2003 San Diego California USA Allan Got-tlieb and Kai Li (Eds) IEEE Computer Society 122ndash133 httpsdoiorg101109ISCA20031206994

[87] Jie Yu Satish Narayanasamy Cristiano Pereira and Gilles Pokam 2012Maple a coverage-driven testing tool for multithreaded programs InProceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications OOPSLA2012 part of SPLASH 2012 Tucson AZ USA October 21-25 2012 Gary TLeavens and Matthew B Dwyer (Eds) ACM 485ndash502 httpsdoiorg10114523846162384651

[88] Zandronum 2015 Zandronum bug tracker httpszandronumcomtrackerviewphpid=2380 Bug 0002380

[89] Naling Zhang Markus Kusano and ChaoWang 2015 Dynamic partialorder reduction for relaxed memory models See [35] 250ndash259 httpsdoiorg10114527379242737956

593

  • Abstract
  • 1 Introduction
  • 2 Background
  • 3 Scheduling
    • 31 Protocol Details
    • 32 Special Cases
    • 33 Liveness
      • 4 Record and Replay
        • 41 Motivating Example
        • 42 Interleaving
        • 43 Signals
        • 44 System Calls
        • 45 Asynchronous Events
          • 5 Evaluation
            • 51 CDSchecker Litmus Tests
            • 52 httpd
            • 53 PARSEC and pbzip
            • 54 SDL-based Games
            • 55 Limitations SQLite and SpiderMonkey
              • 6 Related Work
              • 7 Conclusion
              • Acknowledgments
              • References
Page 8: Sparse Record and Replay with Controlled Schedulingafd/homepages/papers/pdfs/2019/PLDI.pdf · Controlled concurrency testing has proven successful in find-ing subtle bugs in concurrent

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Figure 6 Signals are replayed immediately after the preced-ing tick

of applications In contrast the idea behind our sparse ap-proach to identify a minimal subset of system calls such thatrecording these system calls suffices to enable faithful replayof particular applications of interest At a high level we ap-proached this by incrementally adding support for systemcalls based on a trial-and-error process of first using straceto understand the full set of system calls issued by an appli-cation and then repeatedly attempting to record and replaythe application with tsan11rec incrementally adding supportfor additional system calls through analysis of the sources ofreplay failures We emphasise that this process did requirequite some manual effort and would need to be iteratedfurther to handle applications with significantly differentsystem call requirements compared to our case studies

The term syscall is a bit of a misnomer as both tsan and theinstrumentation detailed in this paper will instead interceptthe glibc wrappers around the syscalls instead of the syscalldirectly These glibc functions are much easier to use onbehalf of the programmer as they will take care of systemspecific details pushing the arguments onto the stack andinterpreting the results returned by the kernel We still usesyscall throughout as it is in the underlying syscall wherethe nondeterministic behaviours occurEach syscall takes a variable number of user-allocated

buffers and fills them with the appropriate data before set-ting errno and returning some value As these are sourcesof nondeterminism the return value errno and any appro-priate buffers will be compressed and stored in a demo filecalled SYSCALL During replay the actual data returned willbe overwritten by the data in SYSCALL Only the interactionwith the SYSCALL file is part of the critical section whichreduces contention in the scheduler

As an example consider the Listener thread in Figure 2performing the poll and recv syscalls in succession The re-turn value error number and two elements in the server_fdsstructure must be stored for poll the return value errornumber and contents of the buffer must be stored for recvThese will be treated as character buffers and have a simplerun length encoding applied

One of the difficulties that arises from adding a syscall isthe knock-on effect it can have with respect to other syscallsConsider for example the syscalls that interact with thefilesystem create open read etc If you intercept open

on the assumption that you may not have a valid file de-scriptor during replay where you did during record you willthen have to intercept every syscall that works with thatfile descriptor What starts out as a single interception be-comes potentially hundreds of intercepted syscalls There isa delicate balance to be struck between those syscalls thatneed to be recorded to make important execution featuresdeterministic during replay vs those syscalls that are bet-ter left un-recorded because (a) determinism of replay doesnot depend on them being recorded and (b) recording themleads to a snowball effect where many other syscalls mustalso be recorded to avoid desynchronisation

We have added syscall support to tsan11rec in a demand-driven manner by using strace to identify important callsfor key applications of interest The current set of syscallssupported includes read write recvmsg recv sendmsgaccept accept4 clock_gettime ioctl select and bindThese have allowed us to get a significant number of appli-cations up and running including the applications studiedin our evaluation modulo a few workarounds (detailed insect5) these applications issue many additional system callsthat we have found it unnecessary to recordmdashunnecessaryin the sense that simply re-issuing the system call during re-play has no observable effect on the applicationrsquos behaviourSometimes whether a call must be recorded depends on thefile descriptors that the call receives For instance for all ofour case studies it never proves necessary to record readand write calls whose file descriptors correspond to filesin the file system but it is necessary to record these callsif the associated file descriptors are associated with pipesused for inter-process communication Rather than this set ofsyscalls being a starting point towards full syscall coverageour view is that efficient record and replay that preserves par-allelism can benefit from selective syscall recording basedon application-specific knowledge and we envision the toolsupporting a core set of essential syscalls and being con-figurable with support for further syscalls to suit particularrecord and replay scenarios For example to handle a pro-gram such as htop would require instrumentation of theinteraction with the proc filesystem but doing this in thegeneral case would be wasteful and maybe even harmful iffuture calls depended on this interaction

45 Asynchronous EventsAsynchronous events are specific events that do not fit inwith any of the other categories discussed An importantcharacteristic is that they are not wrapped in a Wait() andTick() either because it was infeasible to do so duringrecording or because it would create a lot of unnecessaryoverhead These events still need to be replayed to ensurethe replay remains synchronised Currently there are twotypes of events Reschedule and Signal_wakeup

The reschedule event was discussed in sect33 It is necessaryto include this to ensure that the PRNG will be called the

583

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Figure 7 Right shows how reschedules are floated abovethe Tick()

same number of times in each critical section To see whythe signal wakeup event is necessary consider again a signalbeing received in the context of the example of Figure 2This time suppose the Responder thread receives the signalwhile it is disabled trying to acquire the lock Assume thatthe thread disabled itself on tick 10 the signal arrives andthe thread re-enables itself during tick 12 and then entersthe signal handler on tick 14 It is not OK for the thread tosimply not disable itself on tick 10 during replay as the poolof enabled threads for the scheduler to choose from duringticks 10 and 11 is different between recording and replayingwhich will affect the choice the scheduler will make

As with signals all asynchronous events are replayedsynchronously with all events that occur between a Tick()and the following Wait() floating up to the previous Tick()This is shown in Figure 7 These events are stored in theASYNC file

5 EvaluationTo evaluate the controlled scheduling abilities of tsan11recwe compare the strategies on the CDSchecker benchmarksuite (sect51) We then use larger applications to comparetsan11rec with rr [66] Of the few record-and-replay toolsthat are publicly available we chose to compare with rr be-cause (a) it represents the state-of-the-art (b) it is similar totsan11rec in terms of the kinds of applications it aims to sup-port We consider programs that bring challenges related tonetworking signals IO and real-time constraints Apachersquoshttpd web server (sect52) the PARSEC benchmarks and pbzip(sect53) and two first-person shooter games built on the SDLlibrary (sect54) The SDL case studies showcase applicationsthat tsan11rec can handle that are out of scope for rr dueto communication between the game and the OpenGL in-terface In contrast we also discuss practical limitations oftsan11rec that rr does not face related to the SQLite databaseapplication and Firefoxrsquos SpiderMonkey (sect55)

Throughout we describe howwe evolved the sparse record-ing facilities of tsan11rec and many practical challenges wefaced along the way our experience is that such challengeswhich seem fundamental to record and replay are typicallydescribed only briefly if at all in the literature We hope ourexposition will be valuable to other researchers

Common experimental setup All experiments were rununder Ubuntu 1404 LTS on an Intel i7-4770 8x340GHz plat-form with 16GB RAM The tsan11 and tsan11rec tools werebuilt on top of clang revision 286384 The version of rr usedis 510 As a key goal of our work is to apply race detectionto record and replay with controlled concurrency testingmost of the testing is done with race detection enabled evenwhen using rr We still show times for rr without race detec-tion for reference We use native rr tsan11 tsan11+rr andtsan11rec to refer to a program running without instrumen-tation under rr with tsan11 instrumentation under rr withtsan11 instrumentation and under tsan11rec respectively

51 CDSchecker Litmus TestsOverview The small programs (roughly 100 LOC each)used in prior work to evaluate CDSchecker [64] are useful toassess whether tsan11recrsquos controlled scheduling improveson tsan11rsquos ability to find races (including races related toweak memory) As these programs are closed the schedulerand memory model are the sources of nondeterminism

Experimental setup The experiments are run in the fol-lowing four modes tsan11 where tsan11 (which does notuse controlled concurrency testing) finds races tsan11 + rrwhere tsan11 finds races with rr recording and tsan11recrnd and tsan11rec queue where tsan11rec finds races usingthe random and queue strategies respectively We measurethe runtime of each tool on each benchmark averaged over1000 runs reporting standard deviation and remarking onthe coefficient of variation (CV)mdashthe ratio of the standarddeviation and mean

Results and discussion Table 1 summarises the resultsThe Time columns show mean execution times with stan-dard deviation Because these are short-running tests whosebehaviour depends intimately on themanner inwhich threadsinterleave the variance across runs is fairly high with theCV usually exceeding 1 with the exception of the longer-running results for rr for which the CV is always less than 1and usually less than 05 Rate columns show the percentageof all executions that exposed a data race

Comparing the tsan11rec rnd results with the tsan11 andtsan11rec queue results we see that randomized controlledschedulingmeans tsan11rec findsmore races across all bench-marks except chase-lev-deque and dekker-fences This isbecause tsan11 runs at the mercy of the OS scheduler whichtends to explore similar schedules on repeated runs and inthese small programs typically causes the main thread torun to completion before other threads are scheduled Theprice for this is higher runtime eg mcs-lock and ms-queuesuffer slow-downs of around 2times compared with tsan11 weattribute this to the total ordering of visible operations im-posed by tsan11rec The rr results show huge increases dueto a constant overhead applied to all programs But as rr is

584

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 1 Comparison over CDSchecker benchmarks between tsan + rr tsan11 and tsan11rec with controlled random andqueue scheduling Each benchmark was executed 1000 times in each mode The ldquoTimerdquo columns shows the mean executiontime (ms) and standard deviation (in parentheses) The ldquoRaterdquo column shows the percentage of runs that exhibited data races

tsan11 + rr tsan11 tsan11rec rnd tsan11rec queueTest Time Rate Time Rate Time Rate Time Rate

barrier 590 (1445) 00 8 (914) 00 4 (523) 375 6 (499) 00chase-lev-deque 579 (16669) 05 0 (194) 59 1 (307) 02 2 (378) 00dekker-fences 1776 (120854) 499 2 (402) 503 4 (483) 387 3 (446) 528linuxrwlocks 579 (12645) 03 2 (418) 01 4 (493) 624 3 (450) 00mcs-lock 574 (1371) 00 3 (445) 00 5 (523) 770 3 (447) 01mpmc-queue 574 (1371) 00 3 (449) 00 5 (500) 605 3 (446) 00ms-queue 3093 (10019) 1000 91 (6413) 1000 93 (8068) 1000 52 (6260) 1000

designed for larger applications this overhead will usuallybecome insignificant in more realistic examples

We examined a trace from chase-lev-deque to understandwhy tsan11rec rnd detects fewer races than tsan11 We foundthat from the creation of thread 2 to the point of the racethread 1 must perform 29 operations before thread 2 per-forms just 4 operations in order for the race to manifest Theprobability of this happening under uniform random sched-uling is very low We were able to coerce a race report outof the program by moving the creation of thread 2 to later inthe program This shows that different scheduling strategieswill affect how effective we are at finding data races andthat probabilistic concurrency testing (PCT) can be effectiveat prying out concurrency bugs [12]

52 httpdOverview Apachersquos httpd [3] is a widely-used modularhttp server that makes heavy use of concurrency to han-dle many simultaneous connections For record and replayit is of further interest due to its dependence on externalnetwork input We were able to handle httpd by capturingthe system calls described in sect44 with one workaroundthe accept system call which listens for incoming connec-tions relies on epoll_wait to listen for events This returnsuser-allocated pointers file descriptors and other data ina union with no easy way of knowing the active membersomething which tsan11rec cannot currently handle Weworked around this by using httpdrsquos option to switch to asimpler but slightly less efficient syscall poll which insteadsimply listens to file descriptors the results presented hereemploy this workaround A strength of rr is that it can han-dle httpd without this workaround due to its non-sparserecord and replay mechanism

Experimental setup We tested httpd version 2428 in single-process-multiple-thread mode using ab an Apache-providedprogram for server stress testing We sent 10000 queriesacross 10 concurrent threads to an httpd server for eachof the setups shown in Table 2 averaging results over 10runs We report on standard deviation and again remark on

the CV In the table rnd and queue refer to configurationsof tsan11rec and the presence or absence + rec indicateswhether recording was enabled

Results and discussion The results are shown in Table 2The columns under Race reports show regular results withrace reporting enabled the data under No reports shows re-sults where race-checking-capable tools do perform racechecking behind the scenes but do not actually emit racereports We make this distinction because tsan11 detectsso many races that the overhead of generating reports no-ticeably affects performance results when fewer races aredetected are more representative of the performance onewould expect using a future version of httpd in which manyraces are fixed The Throughput columns indicate the meannumber of queries the server responds to per second TheRate column is the mean number of race reports generated(only relevant for tsan11-based configurations) For eachstandard deviation is shown in parentheses Variance asmeasured by CV is low below 08 in all cases and usuallyless than 05 The Overhead columns indicate how muchslower performance is compared with native executionWithout reporting tsan11 already incurs a 3times overhead

compared to native Comparing results for native with rndand rnd+rec we find that adding controlled random sched-uling increase this overhead massively to between 79ndash89timesdepending on whether recording is enabled This is in thesame ball park as the overhead associated with rr 61times with-out race checking and 160timeswith tsan11 instrumentation (butstill with the actual reporting of races disabled) In contrastwhen our queue strategy is used the overhead comparedwith native drops to 9times and 21times with recording disabled vsenabled We attribute the gap between rrrnd and queue tohttpdrsquos heavy reliance on parallelism and frequent use ofshared mutexes This parallelism is removed by rr becausethe tool sequentializes the execution of threads while ourrandom scheduler only allows the thread that it has chosento be scheduled next to execute a visible operation even ifmany other threads are ready to execute visible operations

585

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Table 2 Comparison of throughput and race rate between native tsan11 rr and tsan11rec for Apachersquos httpd Results areaveraged over 10 runs ldquoThroughputrdquo shows mean throughput in queries per second ldquoRaterdquo is the number of races detectedper run (where relevant) Standard deviations are shown (in parentheses) Overhead is computed relative to native throughput

Race reports No reportsSetup Throughput Overhead Rate Throughput Overhead

Native NA NA NA 28895 (462256) 1timesrr NA NA NA 475 (608) 61timestsan11 3687 (29428) 8times 113 (1985) 9824 (143201) 3timestsan11 + rr 86 (6320) 336times 34 (1680) 181 (4637) 160timesrnd 141 (892) 205times 162 (3878) 367 (3326) 79timesqueue 818 (31033) 35times 381 (7362) 3261 (84367) 9timesrnd + rec 142 (1185) 203times 155 (3103) 326 (3894) 89timesqueue + rec 513 (8534) 56times 360 (6428) 1387 (24961) 21times

In contrast the queue strategy allows threads to perform visi-ble operations largely on demand Turning to the results withrace reporting enabled we see that the queue strategy hasthe highest race detection rate improving on uncontrolledtsan11 All other race detecting configurations lower the racedetection rate we believe this is because rr and rnd reducethe number of queries being responded to concurrently

Comparing demo file sizes when recording is enabled thetsan11rec demo files are around 48MB for both strategiesdropping to 48MB when only 1000 queries are issued sug-gesting that demo file size increases linearly with the numberof requests at a rate of around 48KB per request This couldbe reduced further with a more aggressive compression strat-egy but would likely increase the time overhead The demofile for rr is significantly smaller 66MB for 10000 querieswhich goes down to 39MB with 1000 queries implying arate of around 03KB per request plus a constant 36MB

53 PARSEC and pbzipOverview Wenext turn to the PARSEC benchmark suite [11]and pbzip application [68] both widely used for evaluatingconcurrency analysis tools For PARSEC we consider thebenchmarks used to evaluate iReplayer [53] however threeof these would not work on our system (dedup and swaptionsdo not compile and canneal crashes)

Experimental setup Each PARSEC (version 30) bench-mark was run with the lsquosimlargersquo test size shipped with thebenchmarks using 4 threads Pbzip (version 2-1113) wasused to compress a 400MB file with 4 threads We ran eachbenchmark 10 times per tool configuration and report av-erage runtimes We report on standard deviation and againremark on the CV A small number of races were discoveredfor some benchmarks and the race detecting tools largelyagreed on the number of races we do not detail these further

Results and discussion Table 3 shows the average timetaken to run each benchmark with each tool configurationwith standard deviation Variance as measured by CV is

reasonably low (CV is always below 1) For the tsan11rec re-sults + rec indicates whether recording was enabled Table 4is computed from the data of Table 3 and reports the over-head associated with running using each tool configurationcompared with native executionWith the exception of bodytrack and fluidanimate the

overhead tsan11rec brings over that of tsan11 is small and forall benchmarks whether recording is enabled or notmakes lit-tle difference However the overhead associated with tsan11+ rr (ie running tsan11-instrumented code under rr) is signif-icant compared with the tsan11 overhead alone despite thefact that running under rr without race detection is generallyefficient Interestingly rr without race detection performsless well on blackscholes compared with the tsan11rec con-figurations Digging into this we found that the benchmarkdistributes work between threads at the start of executionand then lets threads runwith little interaction This high par-allelismlow communication execution plays to the strengthsof tsan11rec where invisible operations are left to run in par-allel but is bad for rr which forces sequentialization acrossall operations

54 SDL-based GamesOverview Simple DirectMedia Layer (SDL) is a library thatconsolidates input graphics and various other forms of IOunder a single interface [75] and is typically used for gamesOnUbuntu 1604 SDL communicateswith X11 for IO pulseau-dio for sound andOpenGL for displayWe investigated recordand replay for two SDL-based games Zandronum [73] amulti-player Doom port (asymp400kLOC) and QuakeSpasm [71](asymp88kLOC) a port of Quake While these games support cus-tom record and replay by logging high level commands byworking at the threading and system call level tsan11rec canfacilitate record and replay of bugs that rely on low-levelinteractions to manifest We discuss below successful recordand replay of a bug in Zandronum that arises due to commu-nication of game data between the game client and serverwhich is not present in the gamersquos native replay

586

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 3 Execution times (s) for tool configurations across the pbzip and PARSEC benchmarks averaged across 10 runsStandard deviation is shown (in parentheses)

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 92 (031) 117 (049) 664 (311) 772 (187) 181 (030) 123 (066) 182 (030) 129 (129)blackscholes 04 (003) 08 (007) 10 (001) 20 (000) 07 (007) 07 (006) 07 (007) 07 (007)fluidanimate 08 (004) 160 (127) 21 (001) 385 (049) 464 (339) 390 (267) 504 (209) 398 (196)streamcluster 17 (019) 388 (441) 1133 (313) 1811 (179) 1030 (059) 485 (269) 1028 (018) 416 (261)bodytrack 05 (002) 72 (036) 38 (094) 327 (094) 500 (040) 73 (033) 500 (078) 78 (023)ferret 12 (007) 140 (086) 87 (044) 815 (220) 164 (066) 146 (060) 167 (071) 145 (037)

Table 4Overhead compared with native execution for tool configurations across the pbzip and PARSEC benchmarks computedfrom the results of Table 3

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 10times 13times 72times 84times 20times 13times 20times 14timesblackscholes 10times 20times 27times 53times 19times 19times 19times 18timesfluidanimate 10times 203times 27times 489times 590times 497times 642times 506timesstreamcluster 10times 224times 654times 1045times 595times 280times 593times 240timesbodytrack 10times 135times 72times 614times 938times 138times 939times 147timesferret 10times 119times 74times 695times 140times 125times 143times 124times

Our initial attempts to replay these SDL-based gamesfailed due to communication between the application andthe closed and proprietary NVIDIA OpenGL module on ourexperimental platform via ioctl syscallsWeworked aroundthis by ignoring ioctl during recording and letting it runnatively during replay This works because communicationwith the display driver has no effect on the game logic Dis-play interaction led to further problems with initialization ofthe input module We resorted to adapting the scheduler tolet the application run uninstrumented until SDL module ini-tialization had completed adding a custom scheduler hookto allow the application and scheduler to synchronize relatedto this These problems are not specific to our approach ortoolmdashindeed rr cannot handle these SDL-based games forsimilar reasonsmdashbut are rather a fundamental limitation ofrecording and replaying applications that make heavy useof IO To handle such applications one either needs to fullymock out IO components requiring a tremendous engi-neering effort or carefully determine those components thatshould not be instrumented and specify this via annotationsWith these workarounds we were able to handle both

games such that gameplay is displayed on screen during re-play gameplay would not be visible if the IO subsystem hadbeen mocked out and visibility might be useful in debuggingproblems that manifest as visual artifacts

Experimental setup Measuring game performance in away that allows us to compare the overhead of our sched-uling strategies is non-trivial The only metric we have isthe frame-rate (fps)mdashthe number of frames drawn to thescreen per second QuakeSpasm and Zandronum are capped

at 60 fps and will try to maintain this frame-rate dipping ifthey cannot keep up If the frame-rate is reduced too muchthe games become unplayable As a best-effort evaluationmechanism we report on whether the games are playableunder various tool combinations Additionally we found thatit was possible to remove the frame cap for Quakespasmso we report ball-park figures for the overheads of varioustool configurations when playing this game un-capped (Wecould not find a way to reliably remove the frame cap forZandronum) As mentioned previously we do not comparewith rr as it cannot record or replay the games We used Zan-dronum revision 10013dd3c3b57023f updated to use SDL2QuakeSpasm version 0930 and SDL version 205

Results anddiscussion With the random tsan11rec sched-uler Zandronum was unplayable even with recording dis-abled the frame-rate dropped to below 1 fps This is due tothe random scheduler starving the main thread by frequentlyscheduling other less critical threads (eg the audio thread)In contrast the queue scheduler could maintain the full 60fps with recording enabled for 100 seconds of play the demosize grew to just under 8MB of which 65MB was for syscalls

To test tsan11recrsquos ability to replay network communica-tion we found a previously fixed Zandronum bug [88] thatrelies on an error in this communication to manifest Thisbug involves incorrect game state information being sentfrom the server to the client during a map change We repli-cated the bug with a server and two clients one of whichwas recording After about 12 minutes the bug appearedand resulted in a demo file of 43MB We then replayed thedemo and the bug appeared as expected This demonstrates

587

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

that our tool can be used to accurately capture and facilitatereplay of bugs in large networked applicationsFor QuakeSpasm we found that it was possible to play

the game without dropping below 60 fps using tsan11 and alltsan11rec configurations To further investigate the overheadof each tool configuration on this case study we removedthe fps cap then played the game 5 times per tool configu-ration for 90 seconds per play enabling a mode where thegamersquos fps is periodically appended to a file We made a besteffort to play the game in a similar manner on each run butinevitably there will still be high variation in game activitybetween plays Indicative results are shown in Table 5 Thernd and queue configurations refer to tsan11rec with the ran-dom and queue strategies respectively and with recordingdisabled while the ldquo+ recrdquo tool configurations are similarbut with recording enabled The ldquoOverheadrdquo column showsthe overhead observed compared with native execution Thetake-away from these results is that the instrumentationoverhead for both tsan11 and tsan11rec is surprisingly mod-est (generally less than 2times) and that the additional overheadassociated with enabling recording in tsan11rec is low

55 Limitations SQLite and SpiderMonkeyA downside of our sparse approach to record and replay isthat different applications may have incompatible require-ments regarding what should be recorded and what mustnot be recorded For example recording memory layout andattempting to enforce the same layout on replay would notonly slow down the SDL games (see sect54) to the point ofbeing unplayable but would also cause problems related tocommunication with the display driver Yet the behaviour ofsome programs will depend on the memory layout such asiterating over an ordered C++ container that holds pointersIn particular we experimented with applying tsan11rec

to the SQLite database management library [76] and to Spi-derMonkey Firefoxrsquos JavaScript management engine [60]While tsan11rec was applicable for controlled schedulingof these applications we found that replay would rapidlydesynchronise due to memory layout nondeterminism caus-ing conditionals that rely on the values of pointers to evaluatedifferently during replay Tools such as rr can handle theseprograms reliably by enforcing the same memory layoutThis is a trade-off the non-sparse approach of rr can leadto higher overheads as demonstrated in sect52 and sect53 Analternative to adapting the record-and-replay tool so thatit always enforces memory layout determinism would beto adapt the application of interest so that default memoryallocation is replaced with a deterministic memory allocator

6 Related WorkControlled scheduling A large amount of work has goneinto the use of scheduling strategies as a form of state spaceexploration (eg [29 30 61 62 78 87]) and on techniques

aimed at reducing the size of the state space such as dynamicpartial-order reduction [31 89]) A particularly notable con-trolled scheduling tool in terms of successful practical ap-plication is Microsoftrsquos CHESS [62] which aims to system-atically explore all interleavings of a test scenario Similarto our approach each visible instruction has an associatedcustom wrapper that intercepts the real instruction callinginto the CHESS schedulerSchedule bounding techniques notably preemption- and

delay-bounding [26 61] have been shown to be successful inprioritising the order in which thread schedules are exploredduring controlled concurrency testing They prioritise ex-ploring schedules that exhibit small numbers of preemptionsbetween threads in line with empirical evidence that bugsrarely require large numbers of preemptions in order to man-ifest [56] Combining such techniques with the tsan11recalgorithm is an appealing idea in principle but is hinderedby the assumption that the program under test takes a fixedinput and that the scheduler is the only source of nonde-terminism This assumption allows running the programagain and again trying different schedules In the contextof tsan11rec which can be used to record and replay ap-plications where the environment presents other forms ofnondeterminism the manner in which the program interactswith its environment is captured with respect to a particularthread schedule and other thread schedules might involvecompletely different environmental interactions We believea more promising approach would be to bring ideas from theprobabilistic concurrency testing (PCT) algorithm [12] to thetsan11rec setting to introduce a degree of skewing to ourrandom strategy so that it explores more diverse schedules

Record and replay Record and replay has been a signifi-cant area of research with many tools being created to facil-itate it [2 9 19 24 32 36 39 40 43ndash45 47 50 53 54 57 5765ndash67 72 79 81] The general premise behind them is simi-lar identify order nondeterminism and input nondeterminismand create techniques to capture them while recording andcontrol them during replayVarious tools extend the OS in some way or require spe-

cific hardware [2 6 9 19 20 24 47 50 79 81] This has thebenefit of giving the tool access to much more of the systemsuch as memory pages and process information For exam-ple Scribe [47] will directly modify the system schedulerinstead of coercing it and achieves slowdowns as low as105times However this severely hits the usability of the toolas it requires the user to deploy a modified OS

Other tools reside entirely in user space [10 15 32 34 43ndash45 49 51 53 54 57 66 67 72 80 84] and trade performancefor usability This is the category that tsan11rec falls intoEase of use is particularly important in persuading users toadopt the tool rr [66] in particular allows the user to recorda program by simply passing the binary to rr as a parameter

588

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 5 Indicative frames per second (fps) result captured by playing Quakespasm for 90 seconds five times per toolconfiguration and capturing the fps reports recorded by the game Frame-rate results are reported for the minimum maximummedian and mean case as well as the 25th and 75th quartiles

Min 25th Median 75th Max Mean OverheadNative 291 369 400 428 502 4004 10timestsan11 125 193 225 275 431 2338 17timesrnd 128 205 238 281 421 2471 16timesqueue 84 188 233 273 435 2333 17timesrnd + rec 113 178 212 253 366 2168 18timesqueue + rec 86 161 185 227 348 1930 21times

and as such has become the definitive tool for record and re-play We have performed an extensive comparison with rr insect5 and note that while rr outclasses tsan11rec in some appli-cations and can handle applications that are out of scope fortsan11rec (see sect55) rr shows significantly higher overheadcompared with tsan11rec for a number of applications thatrely on a high degree of parallelism for performance Furtherour sparse approach with suitable workarounds enablesrecord and replay for graphical applications (the SDL-basedgames of sect54) that rr cannot currently handleBecause it builds on tsan11 which itself uses compiler

instrumentation and a modified libcxx tsan11rec shares sim-ilarities with tools that depend on language implementationor library-level support [1 13 13 18 36 58] Notable exam-ples here include R2 [36] and IntelliTrace [58]Whole system replay aims to record all system nonde-

terminism [14 19 22 24 25 27 53 77] Among these therecent iReplayer tool [53] performs record and replay in-situavoiding many of the problems (eg memory layout issues)that otherwise come from running the record and replayexecutions under different processes

Some tools focus on the order-nondeterminism allowingthem to retain their parallelism and thus reducing the over-head of multi-threaded application [25 42 59 63 70 86]Castor [57] will provide each thread with its own bufferfor storing information and serialize them at a later timetsan11rec also fits into this category as it will both preserverparallelism of invisible operations and apply a schedulingstrategy to resolve this nondeterminism

An alternative to recording a programrsquos nondeterminismis to remove it making some or all aspects of the programdeterministic [5 8 16 17 21 55] For example Dthreads [55]ensures that memory accesses are deterministic on eachexecution Such approaches can have a significant probeproblem by removing the behaviour necessary for certainbugs to manifest in return for avoiding the performanceoverhead associated with handling order-nondeterminism

Multi-version execution Multi-version (or multi-variant)execution (MVE) is a method for concurrently running mul-tiple processes that are expected to behave in a semanticallysimilar manner [41 46 69 82 83] MVE can be used to detect

security vulnerabilities in applications if a variant divergesthis could indicate that an attacker has modified the processin some way [46 82 83] It can also be used for running dif-ferent analyses on identical processes that would not workwhen run together on the same process such as the clangsanitizers [69] Most MVE systems hinge on a special moni-tor thread that controls the generation an maintenance ofa number of variants Keeping the variants in sync with re-spect to nondeterministic behaviours presents many of thesame problems that are associated with record and replay

7 ConclusionWe have presented tsan11rec which brings together con-trolled scheduling record and replay and dynamic data racedetection for the dynamic analysis of CC++11 applicationsOur experimental evaluation demonstrates that the tool isin many cases competitive in terms of performance withrr a state-of-the-art record and replay tool in some casesout-performing rr due to tsan11recrsquos ability to preserve par-allelism in applications under test to a high degree We havealso shown that our tool is capable of recording and replayingSDL-based video games by exploiting our sparse approachto avoid recording aspects of gamedisplay communicationthat are fundamentally hard to control The flip side of oursparse approach is that by limiting what is recorded ourtool desynchronises on uncontrolled forms of nondetermin-ism such as that related to memory layout by capturingthis form of nondeterminism tools such as rr do not sufferfrom this problem Two exciting avenues for future workinclude investigating a spectrum of recording granularitiesto bridge the gap between our sparse approach and stricterapproaches in a configurable manner and to investigatebug detection using a richer range of scheduling strategiesincluding schedule bounding [26 61 62] and probabilisticconcurrency testing [12]

AcknowledgmentsThis work was supported by a PhD studentship funded byGCHQ and EPSRC projects EPN0263141 and EPR0068651

589

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

References[1] Bowen Alpern Jong-Deok Choi Ton Ngo Manu Sridharan and

John M Vlissides 2001 A Perturbation-Free Replay Platform forCross-Optimized Multithreaded Applications In Proceedings of the15th International Parallel amp Distributed Processing Symposium (IPDPS-01) San Francisco CA USA April 23-27 2001 IEEE Computer Society23 httpsdoiorg101109IPDPS2001924957

[2] GautamAltekar and Ion Stoica 2009 ODR output-deterministic replayfor multicore debugging In Proceedings of the 22nd ACM Symposiumon Operating Systems Principles 2009 SOSP 2009 Big Sky Montana USAOctober 11-14 2009 Jeanna Neefe Matthews and Thomas E Anderson(Eds) ACM 193ndash206 httpsdoiorg10114516295751629594

[3] Apache Software Foundation 2018 Apache httpd httpshttpdapacheorgdevdevnoteshtml

[4] Remzi H Arpaci-Dusseau and Brad Chen (Eds) 2010 9th USENIXSymposium on Operating Systems Design and Implementation OSDI2010 October 4-6 2010 Vancouver BC Canada Proceedings USENIXAssociation httpwwwusenixorgeventosdi10techfull_papersosdi10_proceedingspdf

[5] Amittai Aviram Shu-Chun Weng Sen Hu and Bryan Ford 2010 Ef-ficient System-Enforced Deterministic Parallelism See [4] 193ndash206httpwwwusenixorgeventsosdi10techfull_papersAvirampdf

[6] David F Bacon and Seth Copen Goldstein 1991 Hardware-assisted Re-play of Multiprocessor Programs In Proceedings of the 1991 ACMONRWorkshop on Parallel and Distributed Debugging (PADD rsquo91) ACMNew York NY USA 194ndash206 httpsdoiorg101145122759122777

[7] Mark Batty Scott Owens Susmit Sarkar Peter Sewell and TjarkWeber2011 Mathematizing C++ concurrency In POPL 55ndash66

[8] Tom Bergan Owen Anderson Joseph Devietti Luis Ceze and DanGrossman 2010 CoreDet a compiler and runtime system for deter-ministic multithreaded execution See [38] 53ndash64 httpsdoiorg10114517360201736029

[9] Tom Bergan Nicholas Hunt Luis Ceze and Steven D Gribble 2010Deterministic Process Groups in dOS See [4] 177ndash191 httpwwwusenixorgeventsosdi10techfull_papersBerganpdf

[10] Sanjay Bhansali Wen-Ke Chen Stuart de Jong Andrew EdwardsRon Murray Milenko Drinić Darek Mihočka and Joe Chau 2006Framework for Instruction-level Tracing and Analysis of Program Ex-ecutions In Proceedings of the 2Nd International Conference on VirtualExecution Environments (VEE rsquo06) ACM New York NY USA 154ndash163httpsdoiorg10114511347601220164

[11] Christian Bienia Sanjeev Kumar Jaswinder Pal Singh and Kai Li 2008The PARSEC benchmark suite characterization and architectural im-plications In 17th International Conference on Parallel Architecture andCompilation Techniques PACT 2008 Toronto Ontario Canada October25-29 2008 Andreas Moshovos David Tarditi and Kunle Olukotun(Eds) ACM 72ndash81 httpsdoiorg10114514541151454128

[12] Sebastian Burckhardt Pravesh Kothari Madanlal Musuvathi and San-tosh Nagarakatte 2010 A randomized scheduler with probabilisticguarantees of finding bugs See [38] 167ndash178 httpsdoiorg10114517360201736040

[13] Brian Burg Richard Bailey Andrew J Ko and Michael D Ernst 2013Interactive recordreplay for web application debugging In The 26thAnnual ACM Symposium on User Interface Software and TechnologyUISTrsquo13 St Andrews United Kingdom October 8-11 2013 ShahramIzadi Aaron J Quigley Ivan Poupyrev and Takeo Igarashi (Eds) ACM473ndash484 httpsdoiorg10114525019882502050

[14] Anton Burtsev David Johnson Mike Hibler Eric Eide and John Regehr2016 Abstractions for Practical Virtual Machine Replay In Proceedingsof the 12th ACM SIGPLANSIGOPS International Conference on VirtualExecution Environments Atlanta GA USA April 2-3 2016 VishakhaGupta-Cledat Donald E Porter and Vivek Sarkar (Eds) ACM 93ndash106httpsdoiorg10114528922422892257

[15] M E Chastain 1999 MEC (January 1999) httpslwnnet19990121amechtml

[16] Heming Cui Jiriacute Simsa Yi-Hong Lin Hao Li Ben Blum Xinan XuJunfeng Yang Garth A Gibson and Randal E Bryant 2013 Parrota practical runtime for deterministic stable and reliable threads InACM SIGOPS 24th Symposium on Operating Systems Principles SOSP rsquo13Farmington PA USA November 3-6 2013 Michael Kaminsky and MikeDahlin (Eds) ACM 388ndash405 httpsdoiorg10114525173492522735

[17] Heming Cui Jingyue Wu John Gallagher Huayang Guo and JunfengYang 2011 Efficient deterministic multithreading through schedulerelaxation See [85] 337ndash351 httpsdoiorg10114520435562043588

[18] P Deva 2018 Chronon (2018) httpchrononsystemscomblogdesign-and-architecture-ofthe-chronon-record-0

[19] David Devecsery Michael Chow Xianzheng Dou Jason Flinn andPeter M Chen 2014 Eidetic Systems In 11th USENIX Symposium onOperating Systems Design and Implementation OSDI rsquo14 BroomfieldCO USA October 6-8 2014 Jason Flinn and Hank Levy (Eds) USENIXAssociation 525ndash540 httpswwwusenixorgconferenceosdi14technical-sessionspresentationdevecsery

[20] Joseph Devietti Brandon Lucia Luis Ceze andMark Oskin 2009 DMPdeterministic shared memory multiprocessing In Proceedings of the14th International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2009 Washington DC USAMarch 7-11 2009 Mary Lou Soffa and Mary Jane Irwin (Eds) ACM85ndash96 httpsdoiorg10114515082441508255

[21] Joseph Devietti Jacob Nelson Tom Bergan Luis Ceze and Dan Gross-man 2011 RCDC a relaxed consistency deterministic computer See[37] 67ndash78 httpsdoiorg10114519503651950376

[22] Brendan Dolan-Gavitt Josh Hodosh Patrick Hulin Tim Leek andRyan Whelan 2015 Repeatable Reverse Engineering with PANDAIn Proceedings of the 5th Program Protection and Reverse EngineeringWorkshop (PPREW-5) ACM New York NY USA Article 4 11 pageshttpsdoiorg10114528438592843867

[23] Richard Draves and Robbert van Renesse (Eds) 2008 8th USENIX Sym-posium on Operating Systems Design and Implementation OSDI 2008December 8-10 2008 San Diego California USA Proceedings USENIXAssociation httpswwwusenixorgpublicationsproceedingsf[0]=im_group_audience3A73

[24] George W Dunlap Samuel T King Sukru Cinar Murtaza A Bas-rai and Peter M Chen 2002 ReVirt Enabling Intrusion AnalysisThrough Virtual-Machine Logging and Replay In 5th Symposium onOperating System Design and Implementation (OSDI 2002) Boston Mas-sachusetts USA December 9-11 2002 David E Culler and Peter Dr-uschel (Eds) USENIX Association httpwwwusenixorgeventsosdi02techdunlaphtml

[25] George W Dunlap Dominic G Lucchetti Michael A Fetterman andPeter M Chen 2008 Execution replay of multiprocessor virtual ma-chines In Proceedings of the 4th International Conference on VirtualExecution Environments VEE 2008 Seattle WA USA March 5-7 2008David Gregg Vikram S Adve and Brian N Bershad (Eds) ACM121ndash130 httpsdoiorg10114513462561346273

[26] Michael Emmi Shaz Qadeer and Zvonimir Rakamaric 2011 Delay-bounded scheduling In Proceedings of the 38th ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages POPL 2011 AustinTX USA January 26-28 2011 Thomas Ball and Mooly Sagiv (Eds)ACM 411ndash422 httpsdoiorg10114519263851926432

[27] Jakob Engblom Daniel Aarno and Bengt Werner 2010 Full-SystemSimulation from Embedded to High-Performance Systems Springer USBoston MA 25ndash45 httpsdoiorg101007978-1-4419-6175-4_3

[28] Cormac Flanagan and Stephen N Freund 2009 FastTrack efficientand precise dynamic race detection In Proceedings of the 2009 ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI 2009 Dublin Ireland June 15-21 2009 Michael Hind andAmer Diwan (Eds) ACM 121ndash133 httpsdoiorg1011451542476

590

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

1542490[29] Cormac Flanagan and Stephen N Freund 2010 Adversarial memory

for detecting destructive races In Proceedings of the 2010 ACM SIGPLANConference on Programming Language Design and Implementation PLDI2010 Toronto Ontario Canada June 5-10 2010 Benjamin G Zornand Alexander Aiken (Eds) ACM 244ndash254 httpsdoiorg10114518065961806625

[30] Cormac Flanagan and Stephen N Freund 2010 The RoadRunnerdynamic analysis framework for concurrent programs In Proceedingsof the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis forSoftware Tools and Engineering PASTErsquo10 Toronto Ontario CanadaJune 5-6 2010 Sorin Lerner and Atanas Rountev (Eds) ACM 1ndash8httpsdoiorg10114518066721806674

[31] Cormac Flanagan and Patrice Godefroid 2005 Dynamic partial-orderreduction for model checking software In Proceedings of the 32nd ACMSIGPLAN-SIGACT Symposium on Principles of Programming LanguagesPOPL 2005 Long Beach California USA January 12-14 2005 JensPalsberg and Martiacuten Abadi (Eds) ACM 110ndash121 httpsdoiorg10114510403051040315

[32] Dennis Geels Gautam Altekar Scott Shenker and Ion Stoica 2006Replay Debugging for Distributed Applications (Awarded Best Paper)In Proceedings of the 2006 USENIX Annual Technical Conference BostonMA USA May 30 - June 3 2006 Atul Adya and Erich M Nahum (Eds)USENIX 289ndash300 httpwwwusenixorgeventsusenix06techgeelshtml

[33] Patrice Godefroid 2005 Software Model Checking The VeriSoftApproach Formal Methods in System Design 26 2 (2005) 77ndash101httpsdoiorg101007s10703-005-1489-x

[34] C Gottbrath 2008 Reverse debugging with the TotalView debugger(May 2008)

[35] David Grove and Steve Blackburn (Eds) 2015 Proceedings of the36th ACM SIGPLAN Conference on Programming Language Design andImplementation Portland OR USA June 15-17 2015 ACM httpdlacmorgcitationcfmid=2737924

[36] Zhenyu Guo Xi Wang Jian Tang Xuezheng Liu Zhilei Xu Ming WuM Frans Kaashoek and Zheng Zhang 2008 R2 An Application-LevelKernel for Record and Replay See [23] 193ndash208 httpwwwusenixorgeventsosdi08techfull_papersguoguopdf

[37] Rajiv Gupta and Todd C Mowry (Eds) 2011 Proceedings of the 16thInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2011 Newport Beach CAUSA March 5-11 2011 ACM

[38] James C Hoe and Vikram S Adve (Eds) 2010 Proceedings of the 15thInternational Conference on Architectural Support for Programming Lan-guages and Operating Systems ASPLOS 2010 Pittsburgh PennsylvaniaUSA March 13-17 2010 ACM

[39] Nima Honarmand and Josep Torrellas 2014 RelaxReplay recordand replay for relaxed-consistency multiprocessors In ArchitecturalSupport for Programming Languages and Operating Systems ASPLOSrsquo14 Salt Lake City UT USA March 1-5 2014 Rajeev BalasubramonianAl Davis and Sarita V Adve (Eds) ACM 223ndash238 httpsdoiorg10114525419402541979

[40] Nima Honarmand and Josep Torrellas 2014 Replay debugging Lever-aging record and replay for program debugging In ACMIEEE 41stInternational Symposium on Computer Architecture ISCA 2014 Min-neapolis MN USA June 14-18 2014 IEEE Computer Society 455ndash456httpsdoiorg101109ISCA20146853229

[41] Petr Hosek and Cristian Cadar 2015 VARAN the Unbelievable AnEfficient N-version Execution Framework In Proceedings of the Twenti-eth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS rsquo15 Istanbul Turkey March14-18 2015 Oumlzcan Oumlzturk Kemal Ebcioglu and Sandhya Dwarkadas(Eds) ACM 339ndash353 httpsdoiorg10114526943442694390

[42] Derek Hower and Mark D Hill 2008 Rerun Exploiting Episodes forLightweight Memory Race Recording In 35th International Symposiumon Computer Architecture (ISCA 2008) June 21-25 2008 Beijing China265ndash276 httpsdoiorg101109ISCA200826

[43] Jeff Huang Peng Liu and Charles Zhang 2010 LEAP lightweightdeterministic multi-processor replay of concurrent java programsIn Proceedings of the 18th ACM SIGSOFT International Symposium onFoundations of Software Engineering 2010 Santa Fe NM USA November7-11 2010 Gruia-Catalin Roman and Andreacute van der Hoek (Eds) ACM207ndash216 httpsdoiorg10114518822911882323

[44] Jeff Huang Charles Zhang and Julian Dolby 2013 CLAP recordinglocal executions to reproduce concurrency failures In ACM SIGPLANConference on Programming Language Design and Implementation PLDIrsquo13 Seattle WA USA June 16-19 2013 Hans-Juergen Boehm and Cor-mac Flanagan (Eds) ACM 141ndash152 httpsdoiorg10114524621562462167

[45] Shiyou Huang Bowen Cai and Jeff Huang 2017 Towards Production-Run Heisenbugs Reproduction on Commercial Hardware In 2017USENIX Annual Technical Conference USENIX ATC 2017 Santa ClaraCA USA July 12-14 2017 403ndash415 httpswwwusenixorgconferenceatc17technical-sessionspresentationhuang

[46] Koen Koning Herbert Bos and Cristiano Giuffrida 2016 Secure andEfficient Multi-Variant Execution Using Hardware-Assisted ProcessVirtualization In 46th Annual IEEEIFIP International Conference onDependable Systems and Networks DSN 2016 Toulouse France June28 - July 1 2016 IEEE Computer Society 431ndash442 httpsdoiorg101109DSN201646

[47] Oren Laadan Nicolas Viennot and Jason Nieh 2010 Transparentlightweight application execution replay on commodity multiproces-sor operating systems In SIGMETRICS 2010 Proceedings of the 2010ACM SIGMETRICS International Conference on Measurement and Mod-eling of Computer Systems New York New York USA 14-18 June 2010Vishal Misra Paul Barford and Mark S Squillante (Eds) ACM 155ndash166 httpsdoiorg10114518110391811057

[48] Leslie Lamport 1978 Time Clocks and the Ordering of Events ina Distributed System Commun ACM 21 7 (1978) 558ndash565 httpsdoiorg101145359545359563

[49] Dongyoon Lee Peter M Chen Jason Flinn and Satish Narayanasamy2012 Chimera hybrid program analysis for determinism In ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI rsquo12 Beijing China - June 11 - 16 2012 Jan Vitek Haibo Linand Frank Tip (Eds) ACM 463ndash474 httpsdoiorg10114522540642254119

[50] Dongyoon Lee Benjamin Wester Kaushik Veeraraghavan SatishNarayanasamy Peter M Chen and Jason Flinn 2010 Respec efficientonline multiprocessor replayvia speculation and external determinismSee [38] 77ndash90 httpsdoiorg10114517360201736031

[51] Kyu Hyung Lee Dohyeong Kim and Xiangyu Zhang 2014Infrastructure-Free Logging and Replay of Concurrent Executionon Multiple Cores In ECOOP 2014 - Object-Oriented Programming- 28th European Conference Uppsala Sweden July 28 - August 12014 Proceedings (Lecture Notes in Computer Science) Richard EJones (Ed) Vol 8586 Springer 232ndash256 httpsdoiorg101007978-3-662-44202-9_10

[52] Christopher Lidbury and Alastair F Donaldson 2017 Dynamic racedetection for C++11 In Proceedings of the 44th ACM SIGPLAN Sympo-sium on Principles of Programming Languages POPL 2017 Paris FranceJanuary 18-20 2017 Giuseppe Castagna and Andrew D Gordon (Eds)ACM 443ndash457 httpsdoiorg1011453009837

[53] Hongyu Liu Sam Silvestro Wei Wang Chen Tian and Tongping Liu2018 iReplayer in-situ and identical record-and-replay for multi-threaded applications In Proceedings of the 39th ACM SIGPLAN Confer-ence on Programming Language Design and Implementation PLDI 2018

591

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Philadelphia PA USA June 18-22 2018 Jeffrey S Foster and Dan Gross-man (Eds) ACM 344ndash358 httpsdoiorg10114531923663192380

[54] Peng Liu Xiangyu Zhang Omer Tripp and Yunhui Zheng 2015 Lightreplay via tightly bounded recording See [35] 55ndash64 httpsdoiorg10114527379242738001

[55] Tongping Liu Charlie Curtsinger and EmeryD Berger 2011 Dthreadsefficient deterministic multithreading See [85] 327ndash336 httpsdoiorg10114520435562043587

[56] Shan Lu Soyeon Park Eunsoo Seo and Yuanyuan Zhou 2008 Learn-ing from mistakes a comprehensive study on real world concurrencybug characteristics In Proceedings of the 13th International Confer-ence on Architectural Support for Programming Languages and Op-erating Systems ASPLOS 2008 Seattle WA USA March 1-5 2008Susan J Eggers and James R Larus (Eds) ACM 329ndash339 httpsdoiorg10114513462811346323

[57] Ali Joseacute Mashtizadeh Tal Garfinkel David Terei David Maziegraveresand Mendel Rosenblum 2017 Towards Practical Default-On Multi-Core RecordReplay In Proceedings of the Twenty-Second InternationalConference on Architectural Support for Programming Languages andOperating Systems ASPLOS 2017 Xirsquoan China April 8-12 2017 YunjiChen Olivier Temam and John Carter (Eds) ACM 693ndash708 httpsdoiorg10114530376973037751

[58] Microsoft 2018 Understanding IntelliTrace part I What the $ isIntelliTrace (2018) httpsblogsmsdnmicrosoftcomzainnab20130212understanding-intellitrace-part-i-what-the-is-intellitrace

[59] Pablo Montesinos Luis Ceze and Josep Torrellas 2008 DeLoreanRecording and Deterministically Replaying Shared-Memory Multi-processor Execution Effciently In 35th International Symposium onComputer Architecture (ISCA 2008) June 21-25 2008 Beijing China289ndash300 httpsdoiorg101109ISCA200836

[60] Mozilla 2018 SpiderMonkey httpsdevelopermozillaorgen-USdocsMozillaProjectsSpiderMonkey

[61] MadanlalMusuvathi and Shaz Qadeer 2007 Iterative context boundingfor systematic testing of multithreaded programs In Proceedings of theACM SIGPLAN 2007 Conference on Programming Language Design andImplementation San Diego California USA June 10-13 2007 JeanneFerrante and Kathryn S McKinley (Eds) ACM 446ndash455 httpsdoiorg10114512507341250785

[62] Madanlal Musuvathi Shaz Qadeer Thomas Ball Geacuterard Basler Pi-ramanayagam Arumuga Nainar and Iulian Neamtiu 2008 Find-ing and Reproducing Heisenbugs in Concurrent Programs See[23] 267ndash280 httpwwwusenixorgeventsosdi08techfull_papersmusuvathimusuvathipdf

[63] Satish Narayanasamy Gilles Pokam and Brad Calder 2005 BugNetContinuously Recording Program Execution for Deterministic ReplayDebugging In 32st International Symposium on Computer Architecture(ISCA 2005) 4-8 June 2005 Madison Wisconsin USA IEEE ComputerSociety 284ndash295 httpsdoiorg101109ISCA200516

[64] Brian Norris and Brian Demsky 2013 CDSchecker checking concur-rent data structures written with CC++ atomics In Proceedings ofthe 2013 ACM SIGPLAN International Conference on Object OrientedProgramming Systems Languages amp Applications OOPSLA 2013 part ofSPLASH 2013 Indianapolis IN USA October 26-31 2013 131ndash150

[65] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2016 Lightweight User-Space Record And ReplayCoRR abs161002144 (2016) arXiv161002144 httparxivorgabs161002144

[66] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2017 Engineering Record and Replay for Deploy-ability In 2017 USENIX Annual Technical Conference USENIX ATC 2017Santa Clara CA USA July 12-14 2017 377ndash389 httpswwwusenixorgconferenceatc17technical-sessionspresentationocallahan

[67] Harish Patil Cristiano Pereira Mack Stallcup Gregory Lueck andJames Cownie 2010 PinPlay a framework for deterministic replay

and reproducible analysis of parallel programs In Proceedings of theCGO 2010 The 8th International Symposium on Code Generation andOptimization Toronto Ontario Canada April 24-28 2010 AndreasMoshovos J Gregory Steffan Kim M Hazelwood and David R Kaeli(Eds) ACM 2ndash11 httpsdoiorg10114517729541772958

[68] pbzip2 development team 2018 pbzip2 httpslaunchpadnetpbzip2[69] Luiacutes Pina Anastasios Andronidis and Cristian Cadar 2018 FreeDA

deploying incompatible stock dynamic analyses in production viamulti-version execution In Proceedings of the 15th ACM InternationalConference on Computing Frontiers CF 2018 Ischia Italy May 08-102018 David R Kaeli and Miquel Pericagraves (Eds) ACM 1ndash10 httpsdoiorg10114532032173203237

[70] Gilles Pokam Klaus Danne Cristiano Pereira Rolf Kassa Tim KranichShiliang Hu Justin Emile Gottschlich Nima Honarmand Nathan Daut-enhahn Samuel T King and Josep Torrellas 2013 QuickRec proto-typing an intel architecture extension for record and replay of mul-tithreaded programs In The 40th Annual International Symposiumon Computer Architecture ISCArsquo13 Tel-Aviv Israel June 23-27 2013Avi Mendelson (Ed) ACM 643ndash654 httpsdoiorg10114524859222485977

[71] QuakeSpasm 2018 QuakeSpasm An engine for iD softwarersquos Quakehttpquakespasmsourceforgenet

[72] Yasushi Saito 2005 Jockey a user-space library for record-replaydebugging In Proceedings of the Sixth International Workshop on Auto-mated Debugging AADEBUG 2005 Monterey California USA Septem-ber 19-21 2005 Clinton Jeffery Jong-Deok Choi and Raimondas Lence-vicius (Eds) ACM 69ndash76 httpsdoiorg10114510851301085139

[73] Torr Samaho 2018 Zandronum httpszandronumcom[74] Konstantin Serebryany and Timur Iskhodzhanov 2009 ThreadSani-

tizer Data Race Detection in Practice In WBIA 62ndash71[75] Simple DirectMedia Layer 2018 SDL 20 library httpswwwlibsdl

orgdownload-20php[76] SQLite 2018 SQLite 3240 httpssqliteorgreleaselog3_24_0html[77] Deepa Srinivasan and Xuxian Jiang 2012 Time-Traveling Forensic

Analysis of VM-Based High-Interaction Honeypots In Security and Pri-vacy in Communication Networks Muttukrishnan Rajarajan Fred PiperHaining Wang and George Kesidis (Eds) Springer Berlin HeidelbergBerlin Heidelberg 209ndash226

[78] Paul Thomson Alastair F Donaldson and Adam Betts 2014 Concur-rency testing using schedule bounding an empirical study In ACMSIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPPoPP rsquo14 Orlando FL USA February 15-19 2014 Joseacute E Moreira andJames R Larus (Eds) ACM 15ndash28 httpsdoiorg10114525552432555260

[79] Joseph Tucek Shan Lu Chengdu Huang Spiros Xanthos andYuanyuan Zhou 2007 Triage diagnosing production run failuresat the userrsquos site In Proceedings of the 21st ACM Symposium on Oper-ating Systems Principles 2007 SOSP 2007 Stevenson Washington USAOctober 14-17 2007 Thomas C Bressoud and M Frans Kaashoek (Eds)ACM 131ndash144 httpsdoiorg10114512942611294275

[80] Undo 2018 Reversible debugging tools for CC++ on Linux amp Android(2018) httpundo-softwarecom

[81] Kaushik Veeraraghavan Dongyoon Lee Benjamin Wester JessicaOuyang Peter M Chen Jason Flinn and Satish Narayanasamy 2011DoublePlay parallelizing sequential logging and replay See [37] 15ndash26 httpsdoiorg10114519503651950370

[82] Stijn Volckaert Bart Coppens Bjorn De Sutter Koen De BosscherePer Larsen and Michael Franz 2017 Taming Parallelism in a Multi-Variant Execution Environment In Proceedings of the Twelfth EuropeanConference on Computer Systems EuroSys 2017 Belgrade Serbia April23-26 2017 Gustavo Alonso Ricardo Bianchini and Marko Vukolic(Eds) ACM 270ndash285 httpsdoiorg10114530641763064178

[83] Stijn Volckaert Bart Coppens Alexios Voulimeneas Andrei Home-scu Per Larsen Bjorn De Sutter and Michael Franz 2016 Secure

592

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

and Efficient Application Monitoring and Replication In 2016 USENIXAnnual Technical Conference USENIX ATC 2016 Denver CO USA June22-24 2016 Ajay Gulati and Hakim Weatherspoon (Eds) USENIXAssociation 167ndash179 httpswwwusenixorgconferenceatc16technical-sessionspresentationvolckaert

[84] YanWang Harish Patil Cristiano Pereira Gregory Lueck Rajiv Guptaand Iulian Neamtiu 2014 DrDebug Deterministic Replay based CyclicDebugging with Dynamic Slicing In 12th Annual IEEEACM Interna-tional Symposium on Code Generation and Optimization CGO 2014Orlando FL USA February 15-19 2014 David R Kaeli and TippMoseley(Eds) ACM 98 httpsdoiorg10114525441372544152

[85] Ted Wobber and Peter Druschel (Eds) 2011 Proceedings of the 23rdACM Symposium on Operating Systems Principles 2011 SOSP 2011Cascais Portugal October 23-26 2011 ACM httpsdoiorg1011452043556

[86] Min Xu Rastislav Bodiacutek and Mark D Hill 2003 A Flight DataRecorder for Enabling Full-System Multiprocessor Deterministic

Replay In 30th International Symposium on Computer Architecture(ISCA 2003) 9-11 June 2003 San Diego California USA Allan Got-tlieb and Kai Li (Eds) IEEE Computer Society 122ndash133 httpsdoiorg101109ISCA20031206994

[87] Jie Yu Satish Narayanasamy Cristiano Pereira and Gilles Pokam 2012Maple a coverage-driven testing tool for multithreaded programs InProceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications OOPSLA2012 part of SPLASH 2012 Tucson AZ USA October 21-25 2012 Gary TLeavens and Matthew B Dwyer (Eds) ACM 485ndash502 httpsdoiorg10114523846162384651

[88] Zandronum 2015 Zandronum bug tracker httpszandronumcomtrackerviewphpid=2380 Bug 0002380

[89] Naling Zhang Markus Kusano and ChaoWang 2015 Dynamic partialorder reduction for relaxed memory models See [35] 250ndash259 httpsdoiorg10114527379242737956

593

  • Abstract
  • 1 Introduction
  • 2 Background
  • 3 Scheduling
    • 31 Protocol Details
    • 32 Special Cases
    • 33 Liveness
      • 4 Record and Replay
        • 41 Motivating Example
        • 42 Interleaving
        • 43 Signals
        • 44 System Calls
        • 45 Asynchronous Events
          • 5 Evaluation
            • 51 CDSchecker Litmus Tests
            • 52 httpd
            • 53 PARSEC and pbzip
            • 54 SDL-based Games
            • 55 Limitations SQLite and SpiderMonkey
              • 6 Related Work
              • 7 Conclusion
              • Acknowledgments
              • References
Page 9: Sparse Record and Replay with Controlled Schedulingafd/homepages/papers/pdfs/2019/PLDI.pdf · Controlled concurrency testing has proven successful in find-ing subtle bugs in concurrent

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Figure 7 Right shows how reschedules are floated abovethe Tick()

same number of times in each critical section To see whythe signal wakeup event is necessary consider again a signalbeing received in the context of the example of Figure 2This time suppose the Responder thread receives the signalwhile it is disabled trying to acquire the lock Assume thatthe thread disabled itself on tick 10 the signal arrives andthe thread re-enables itself during tick 12 and then entersthe signal handler on tick 14 It is not OK for the thread tosimply not disable itself on tick 10 during replay as the poolof enabled threads for the scheduler to choose from duringticks 10 and 11 is different between recording and replayingwhich will affect the choice the scheduler will make

As with signals all asynchronous events are replayedsynchronously with all events that occur between a Tick()and the following Wait() floating up to the previous Tick()This is shown in Figure 7 These events are stored in theASYNC file

5 EvaluationTo evaluate the controlled scheduling abilities of tsan11recwe compare the strategies on the CDSchecker benchmarksuite (sect51) We then use larger applications to comparetsan11rec with rr [66] Of the few record-and-replay toolsthat are publicly available we chose to compare with rr be-cause (a) it represents the state-of-the-art (b) it is similar totsan11rec in terms of the kinds of applications it aims to sup-port We consider programs that bring challenges related tonetworking signals IO and real-time constraints Apachersquoshttpd web server (sect52) the PARSEC benchmarks and pbzip(sect53) and two first-person shooter games built on the SDLlibrary (sect54) The SDL case studies showcase applicationsthat tsan11rec can handle that are out of scope for rr dueto communication between the game and the OpenGL in-terface In contrast we also discuss practical limitations oftsan11rec that rr does not face related to the SQLite databaseapplication and Firefoxrsquos SpiderMonkey (sect55)

Throughout we describe howwe evolved the sparse record-ing facilities of tsan11rec and many practical challenges wefaced along the way our experience is that such challengeswhich seem fundamental to record and replay are typicallydescribed only briefly if at all in the literature We hope ourexposition will be valuable to other researchers

Common experimental setup All experiments were rununder Ubuntu 1404 LTS on an Intel i7-4770 8x340GHz plat-form with 16GB RAM The tsan11 and tsan11rec tools werebuilt on top of clang revision 286384 The version of rr usedis 510 As a key goal of our work is to apply race detectionto record and replay with controlled concurrency testingmost of the testing is done with race detection enabled evenwhen using rr We still show times for rr without race detec-tion for reference We use native rr tsan11 tsan11+rr andtsan11rec to refer to a program running without instrumen-tation under rr with tsan11 instrumentation under rr withtsan11 instrumentation and under tsan11rec respectively

51 CDSchecker Litmus TestsOverview The small programs (roughly 100 LOC each)used in prior work to evaluate CDSchecker [64] are useful toassess whether tsan11recrsquos controlled scheduling improveson tsan11rsquos ability to find races (including races related toweak memory) As these programs are closed the schedulerand memory model are the sources of nondeterminism

Experimental setup The experiments are run in the fol-lowing four modes tsan11 where tsan11 (which does notuse controlled concurrency testing) finds races tsan11 + rrwhere tsan11 finds races with rr recording and tsan11recrnd and tsan11rec queue where tsan11rec finds races usingthe random and queue strategies respectively We measurethe runtime of each tool on each benchmark averaged over1000 runs reporting standard deviation and remarking onthe coefficient of variation (CV)mdashthe ratio of the standarddeviation and mean

Results and discussion Table 1 summarises the resultsThe Time columns show mean execution times with stan-dard deviation Because these are short-running tests whosebehaviour depends intimately on themanner inwhich threadsinterleave the variance across runs is fairly high with theCV usually exceeding 1 with the exception of the longer-running results for rr for which the CV is always less than 1and usually less than 05 Rate columns show the percentageof all executions that exposed a data race

Comparing the tsan11rec rnd results with the tsan11 andtsan11rec queue results we see that randomized controlledschedulingmeans tsan11rec findsmore races across all bench-marks except chase-lev-deque and dekker-fences This isbecause tsan11 runs at the mercy of the OS scheduler whichtends to explore similar schedules on repeated runs and inthese small programs typically causes the main thread torun to completion before other threads are scheduled Theprice for this is higher runtime eg mcs-lock and ms-queuesuffer slow-downs of around 2times compared with tsan11 weattribute this to the total ordering of visible operations im-posed by tsan11rec The rr results show huge increases dueto a constant overhead applied to all programs But as rr is

584

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 1 Comparison over CDSchecker benchmarks between tsan + rr tsan11 and tsan11rec with controlled random andqueue scheduling Each benchmark was executed 1000 times in each mode The ldquoTimerdquo columns shows the mean executiontime (ms) and standard deviation (in parentheses) The ldquoRaterdquo column shows the percentage of runs that exhibited data races

tsan11 + rr tsan11 tsan11rec rnd tsan11rec queueTest Time Rate Time Rate Time Rate Time Rate

barrier 590 (1445) 00 8 (914) 00 4 (523) 375 6 (499) 00chase-lev-deque 579 (16669) 05 0 (194) 59 1 (307) 02 2 (378) 00dekker-fences 1776 (120854) 499 2 (402) 503 4 (483) 387 3 (446) 528linuxrwlocks 579 (12645) 03 2 (418) 01 4 (493) 624 3 (450) 00mcs-lock 574 (1371) 00 3 (445) 00 5 (523) 770 3 (447) 01mpmc-queue 574 (1371) 00 3 (449) 00 5 (500) 605 3 (446) 00ms-queue 3093 (10019) 1000 91 (6413) 1000 93 (8068) 1000 52 (6260) 1000

designed for larger applications this overhead will usuallybecome insignificant in more realistic examples

We examined a trace from chase-lev-deque to understandwhy tsan11rec rnd detects fewer races than tsan11 We foundthat from the creation of thread 2 to the point of the racethread 1 must perform 29 operations before thread 2 per-forms just 4 operations in order for the race to manifest Theprobability of this happening under uniform random sched-uling is very low We were able to coerce a race report outof the program by moving the creation of thread 2 to later inthe program This shows that different scheduling strategieswill affect how effective we are at finding data races andthat probabilistic concurrency testing (PCT) can be effectiveat prying out concurrency bugs [12]

52 httpdOverview Apachersquos httpd [3] is a widely-used modularhttp server that makes heavy use of concurrency to han-dle many simultaneous connections For record and replayit is of further interest due to its dependence on externalnetwork input We were able to handle httpd by capturingthe system calls described in sect44 with one workaroundthe accept system call which listens for incoming connec-tions relies on epoll_wait to listen for events This returnsuser-allocated pointers file descriptors and other data ina union with no easy way of knowing the active membersomething which tsan11rec cannot currently handle Weworked around this by using httpdrsquos option to switch to asimpler but slightly less efficient syscall poll which insteadsimply listens to file descriptors the results presented hereemploy this workaround A strength of rr is that it can han-dle httpd without this workaround due to its non-sparserecord and replay mechanism

Experimental setup We tested httpd version 2428 in single-process-multiple-thread mode using ab an Apache-providedprogram for server stress testing We sent 10000 queriesacross 10 concurrent threads to an httpd server for eachof the setups shown in Table 2 averaging results over 10runs We report on standard deviation and again remark on

the CV In the table rnd and queue refer to configurationsof tsan11rec and the presence or absence + rec indicateswhether recording was enabled

Results and discussion The results are shown in Table 2The columns under Race reports show regular results withrace reporting enabled the data under No reports shows re-sults where race-checking-capable tools do perform racechecking behind the scenes but do not actually emit racereports We make this distinction because tsan11 detectsso many races that the overhead of generating reports no-ticeably affects performance results when fewer races aredetected are more representative of the performance onewould expect using a future version of httpd in which manyraces are fixed The Throughput columns indicate the meannumber of queries the server responds to per second TheRate column is the mean number of race reports generated(only relevant for tsan11-based configurations) For eachstandard deviation is shown in parentheses Variance asmeasured by CV is low below 08 in all cases and usuallyless than 05 The Overhead columns indicate how muchslower performance is compared with native executionWithout reporting tsan11 already incurs a 3times overhead

compared to native Comparing results for native with rndand rnd+rec we find that adding controlled random sched-uling increase this overhead massively to between 79ndash89timesdepending on whether recording is enabled This is in thesame ball park as the overhead associated with rr 61times with-out race checking and 160timeswith tsan11 instrumentation (butstill with the actual reporting of races disabled) In contrastwhen our queue strategy is used the overhead comparedwith native drops to 9times and 21times with recording disabled vsenabled We attribute the gap between rrrnd and queue tohttpdrsquos heavy reliance on parallelism and frequent use ofshared mutexes This parallelism is removed by rr becausethe tool sequentializes the execution of threads while ourrandom scheduler only allows the thread that it has chosento be scheduled next to execute a visible operation even ifmany other threads are ready to execute visible operations

585

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Table 2 Comparison of throughput and race rate between native tsan11 rr and tsan11rec for Apachersquos httpd Results areaveraged over 10 runs ldquoThroughputrdquo shows mean throughput in queries per second ldquoRaterdquo is the number of races detectedper run (where relevant) Standard deviations are shown (in parentheses) Overhead is computed relative to native throughput

Race reports No reportsSetup Throughput Overhead Rate Throughput Overhead

Native NA NA NA 28895 (462256) 1timesrr NA NA NA 475 (608) 61timestsan11 3687 (29428) 8times 113 (1985) 9824 (143201) 3timestsan11 + rr 86 (6320) 336times 34 (1680) 181 (4637) 160timesrnd 141 (892) 205times 162 (3878) 367 (3326) 79timesqueue 818 (31033) 35times 381 (7362) 3261 (84367) 9timesrnd + rec 142 (1185) 203times 155 (3103) 326 (3894) 89timesqueue + rec 513 (8534) 56times 360 (6428) 1387 (24961) 21times

In contrast the queue strategy allows threads to perform visi-ble operations largely on demand Turning to the results withrace reporting enabled we see that the queue strategy hasthe highest race detection rate improving on uncontrolledtsan11 All other race detecting configurations lower the racedetection rate we believe this is because rr and rnd reducethe number of queries being responded to concurrently

Comparing demo file sizes when recording is enabled thetsan11rec demo files are around 48MB for both strategiesdropping to 48MB when only 1000 queries are issued sug-gesting that demo file size increases linearly with the numberof requests at a rate of around 48KB per request This couldbe reduced further with a more aggressive compression strat-egy but would likely increase the time overhead The demofile for rr is significantly smaller 66MB for 10000 querieswhich goes down to 39MB with 1000 queries implying arate of around 03KB per request plus a constant 36MB

53 PARSEC and pbzipOverview Wenext turn to the PARSEC benchmark suite [11]and pbzip application [68] both widely used for evaluatingconcurrency analysis tools For PARSEC we consider thebenchmarks used to evaluate iReplayer [53] however threeof these would not work on our system (dedup and swaptionsdo not compile and canneal crashes)

Experimental setup Each PARSEC (version 30) bench-mark was run with the lsquosimlargersquo test size shipped with thebenchmarks using 4 threads Pbzip (version 2-1113) wasused to compress a 400MB file with 4 threads We ran eachbenchmark 10 times per tool configuration and report av-erage runtimes We report on standard deviation and againremark on the CV A small number of races were discoveredfor some benchmarks and the race detecting tools largelyagreed on the number of races we do not detail these further

Results and discussion Table 3 shows the average timetaken to run each benchmark with each tool configurationwith standard deviation Variance as measured by CV is

reasonably low (CV is always below 1) For the tsan11rec re-sults + rec indicates whether recording was enabled Table 4is computed from the data of Table 3 and reports the over-head associated with running using each tool configurationcompared with native executionWith the exception of bodytrack and fluidanimate the

overhead tsan11rec brings over that of tsan11 is small and forall benchmarks whether recording is enabled or notmakes lit-tle difference However the overhead associated with tsan11+ rr (ie running tsan11-instrumented code under rr) is signif-icant compared with the tsan11 overhead alone despite thefact that running under rr without race detection is generallyefficient Interestingly rr without race detection performsless well on blackscholes compared with the tsan11rec con-figurations Digging into this we found that the benchmarkdistributes work between threads at the start of executionand then lets threads runwith little interaction This high par-allelismlow communication execution plays to the strengthsof tsan11rec where invisible operations are left to run in par-allel but is bad for rr which forces sequentialization acrossall operations

54 SDL-based GamesOverview Simple DirectMedia Layer (SDL) is a library thatconsolidates input graphics and various other forms of IOunder a single interface [75] and is typically used for gamesOnUbuntu 1604 SDL communicateswith X11 for IO pulseau-dio for sound andOpenGL for displayWe investigated recordand replay for two SDL-based games Zandronum [73] amulti-player Doom port (asymp400kLOC) and QuakeSpasm [71](asymp88kLOC) a port of Quake While these games support cus-tom record and replay by logging high level commands byworking at the threading and system call level tsan11rec canfacilitate record and replay of bugs that rely on low-levelinteractions to manifest We discuss below successful recordand replay of a bug in Zandronum that arises due to commu-nication of game data between the game client and serverwhich is not present in the gamersquos native replay

586

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 3 Execution times (s) for tool configurations across the pbzip and PARSEC benchmarks averaged across 10 runsStandard deviation is shown (in parentheses)

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 92 (031) 117 (049) 664 (311) 772 (187) 181 (030) 123 (066) 182 (030) 129 (129)blackscholes 04 (003) 08 (007) 10 (001) 20 (000) 07 (007) 07 (006) 07 (007) 07 (007)fluidanimate 08 (004) 160 (127) 21 (001) 385 (049) 464 (339) 390 (267) 504 (209) 398 (196)streamcluster 17 (019) 388 (441) 1133 (313) 1811 (179) 1030 (059) 485 (269) 1028 (018) 416 (261)bodytrack 05 (002) 72 (036) 38 (094) 327 (094) 500 (040) 73 (033) 500 (078) 78 (023)ferret 12 (007) 140 (086) 87 (044) 815 (220) 164 (066) 146 (060) 167 (071) 145 (037)

Table 4Overhead compared with native execution for tool configurations across the pbzip and PARSEC benchmarks computedfrom the results of Table 3

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 10times 13times 72times 84times 20times 13times 20times 14timesblackscholes 10times 20times 27times 53times 19times 19times 19times 18timesfluidanimate 10times 203times 27times 489times 590times 497times 642times 506timesstreamcluster 10times 224times 654times 1045times 595times 280times 593times 240timesbodytrack 10times 135times 72times 614times 938times 138times 939times 147timesferret 10times 119times 74times 695times 140times 125times 143times 124times

Our initial attempts to replay these SDL-based gamesfailed due to communication between the application andthe closed and proprietary NVIDIA OpenGL module on ourexperimental platform via ioctl syscallsWeworked aroundthis by ignoring ioctl during recording and letting it runnatively during replay This works because communicationwith the display driver has no effect on the game logic Dis-play interaction led to further problems with initialization ofthe input module We resorted to adapting the scheduler tolet the application run uninstrumented until SDL module ini-tialization had completed adding a custom scheduler hookto allow the application and scheduler to synchronize relatedto this These problems are not specific to our approach ortoolmdashindeed rr cannot handle these SDL-based games forsimilar reasonsmdashbut are rather a fundamental limitation ofrecording and replaying applications that make heavy useof IO To handle such applications one either needs to fullymock out IO components requiring a tremendous engi-neering effort or carefully determine those components thatshould not be instrumented and specify this via annotationsWith these workarounds we were able to handle both

games such that gameplay is displayed on screen during re-play gameplay would not be visible if the IO subsystem hadbeen mocked out and visibility might be useful in debuggingproblems that manifest as visual artifacts

Experimental setup Measuring game performance in away that allows us to compare the overhead of our sched-uling strategies is non-trivial The only metric we have isthe frame-rate (fps)mdashthe number of frames drawn to thescreen per second QuakeSpasm and Zandronum are capped

at 60 fps and will try to maintain this frame-rate dipping ifthey cannot keep up If the frame-rate is reduced too muchthe games become unplayable As a best-effort evaluationmechanism we report on whether the games are playableunder various tool combinations Additionally we found thatit was possible to remove the frame cap for Quakespasmso we report ball-park figures for the overheads of varioustool configurations when playing this game un-capped (Wecould not find a way to reliably remove the frame cap forZandronum) As mentioned previously we do not comparewith rr as it cannot record or replay the games We used Zan-dronum revision 10013dd3c3b57023f updated to use SDL2QuakeSpasm version 0930 and SDL version 205

Results anddiscussion With the random tsan11rec sched-uler Zandronum was unplayable even with recording dis-abled the frame-rate dropped to below 1 fps This is due tothe random scheduler starving the main thread by frequentlyscheduling other less critical threads (eg the audio thread)In contrast the queue scheduler could maintain the full 60fps with recording enabled for 100 seconds of play the demosize grew to just under 8MB of which 65MB was for syscalls

To test tsan11recrsquos ability to replay network communica-tion we found a previously fixed Zandronum bug [88] thatrelies on an error in this communication to manifest Thisbug involves incorrect game state information being sentfrom the server to the client during a map change We repli-cated the bug with a server and two clients one of whichwas recording After about 12 minutes the bug appearedand resulted in a demo file of 43MB We then replayed thedemo and the bug appeared as expected This demonstrates

587

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

that our tool can be used to accurately capture and facilitatereplay of bugs in large networked applicationsFor QuakeSpasm we found that it was possible to play

the game without dropping below 60 fps using tsan11 and alltsan11rec configurations To further investigate the overheadof each tool configuration on this case study we removedthe fps cap then played the game 5 times per tool configu-ration for 90 seconds per play enabling a mode where thegamersquos fps is periodically appended to a file We made a besteffort to play the game in a similar manner on each run butinevitably there will still be high variation in game activitybetween plays Indicative results are shown in Table 5 Thernd and queue configurations refer to tsan11rec with the ran-dom and queue strategies respectively and with recordingdisabled while the ldquo+ recrdquo tool configurations are similarbut with recording enabled The ldquoOverheadrdquo column showsthe overhead observed compared with native execution Thetake-away from these results is that the instrumentationoverhead for both tsan11 and tsan11rec is surprisingly mod-est (generally less than 2times) and that the additional overheadassociated with enabling recording in tsan11rec is low

55 Limitations SQLite and SpiderMonkeyA downside of our sparse approach to record and replay isthat different applications may have incompatible require-ments regarding what should be recorded and what mustnot be recorded For example recording memory layout andattempting to enforce the same layout on replay would notonly slow down the SDL games (see sect54) to the point ofbeing unplayable but would also cause problems related tocommunication with the display driver Yet the behaviour ofsome programs will depend on the memory layout such asiterating over an ordered C++ container that holds pointersIn particular we experimented with applying tsan11rec

to the SQLite database management library [76] and to Spi-derMonkey Firefoxrsquos JavaScript management engine [60]While tsan11rec was applicable for controlled schedulingof these applications we found that replay would rapidlydesynchronise due to memory layout nondeterminism caus-ing conditionals that rely on the values of pointers to evaluatedifferently during replay Tools such as rr can handle theseprograms reliably by enforcing the same memory layoutThis is a trade-off the non-sparse approach of rr can leadto higher overheads as demonstrated in sect52 and sect53 Analternative to adapting the record-and-replay tool so thatit always enforces memory layout determinism would beto adapt the application of interest so that default memoryallocation is replaced with a deterministic memory allocator

6 Related WorkControlled scheduling A large amount of work has goneinto the use of scheduling strategies as a form of state spaceexploration (eg [29 30 61 62 78 87]) and on techniques

aimed at reducing the size of the state space such as dynamicpartial-order reduction [31 89]) A particularly notable con-trolled scheduling tool in terms of successful practical ap-plication is Microsoftrsquos CHESS [62] which aims to system-atically explore all interleavings of a test scenario Similarto our approach each visible instruction has an associatedcustom wrapper that intercepts the real instruction callinginto the CHESS schedulerSchedule bounding techniques notably preemption- and

delay-bounding [26 61] have been shown to be successful inprioritising the order in which thread schedules are exploredduring controlled concurrency testing They prioritise ex-ploring schedules that exhibit small numbers of preemptionsbetween threads in line with empirical evidence that bugsrarely require large numbers of preemptions in order to man-ifest [56] Combining such techniques with the tsan11recalgorithm is an appealing idea in principle but is hinderedby the assumption that the program under test takes a fixedinput and that the scheduler is the only source of nonde-terminism This assumption allows running the programagain and again trying different schedules In the contextof tsan11rec which can be used to record and replay ap-plications where the environment presents other forms ofnondeterminism the manner in which the program interactswith its environment is captured with respect to a particularthread schedule and other thread schedules might involvecompletely different environmental interactions We believea more promising approach would be to bring ideas from theprobabilistic concurrency testing (PCT) algorithm [12] to thetsan11rec setting to introduce a degree of skewing to ourrandom strategy so that it explores more diverse schedules

Record and replay Record and replay has been a signifi-cant area of research with many tools being created to facil-itate it [2 9 19 24 32 36 39 40 43ndash45 47 50 53 54 57 5765ndash67 72 79 81] The general premise behind them is simi-lar identify order nondeterminism and input nondeterminismand create techniques to capture them while recording andcontrol them during replayVarious tools extend the OS in some way or require spe-

cific hardware [2 6 9 19 20 24 47 50 79 81] This has thebenefit of giving the tool access to much more of the systemsuch as memory pages and process information For exam-ple Scribe [47] will directly modify the system schedulerinstead of coercing it and achieves slowdowns as low as105times However this severely hits the usability of the toolas it requires the user to deploy a modified OS

Other tools reside entirely in user space [10 15 32 34 43ndash45 49 51 53 54 57 66 67 72 80 84] and trade performancefor usability This is the category that tsan11rec falls intoEase of use is particularly important in persuading users toadopt the tool rr [66] in particular allows the user to recorda program by simply passing the binary to rr as a parameter

588

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 5 Indicative frames per second (fps) result captured by playing Quakespasm for 90 seconds five times per toolconfiguration and capturing the fps reports recorded by the game Frame-rate results are reported for the minimum maximummedian and mean case as well as the 25th and 75th quartiles

Min 25th Median 75th Max Mean OverheadNative 291 369 400 428 502 4004 10timestsan11 125 193 225 275 431 2338 17timesrnd 128 205 238 281 421 2471 16timesqueue 84 188 233 273 435 2333 17timesrnd + rec 113 178 212 253 366 2168 18timesqueue + rec 86 161 185 227 348 1930 21times

and as such has become the definitive tool for record and re-play We have performed an extensive comparison with rr insect5 and note that while rr outclasses tsan11rec in some appli-cations and can handle applications that are out of scope fortsan11rec (see sect55) rr shows significantly higher overheadcompared with tsan11rec for a number of applications thatrely on a high degree of parallelism for performance Furtherour sparse approach with suitable workarounds enablesrecord and replay for graphical applications (the SDL-basedgames of sect54) that rr cannot currently handleBecause it builds on tsan11 which itself uses compiler

instrumentation and a modified libcxx tsan11rec shares sim-ilarities with tools that depend on language implementationor library-level support [1 13 13 18 36 58] Notable exam-ples here include R2 [36] and IntelliTrace [58]Whole system replay aims to record all system nonde-

terminism [14 19 22 24 25 27 53 77] Among these therecent iReplayer tool [53] performs record and replay in-situavoiding many of the problems (eg memory layout issues)that otherwise come from running the record and replayexecutions under different processes

Some tools focus on the order-nondeterminism allowingthem to retain their parallelism and thus reducing the over-head of multi-threaded application [25 42 59 63 70 86]Castor [57] will provide each thread with its own bufferfor storing information and serialize them at a later timetsan11rec also fits into this category as it will both preserverparallelism of invisible operations and apply a schedulingstrategy to resolve this nondeterminism

An alternative to recording a programrsquos nondeterminismis to remove it making some or all aspects of the programdeterministic [5 8 16 17 21 55] For example Dthreads [55]ensures that memory accesses are deterministic on eachexecution Such approaches can have a significant probeproblem by removing the behaviour necessary for certainbugs to manifest in return for avoiding the performanceoverhead associated with handling order-nondeterminism

Multi-version execution Multi-version (or multi-variant)execution (MVE) is a method for concurrently running mul-tiple processes that are expected to behave in a semanticallysimilar manner [41 46 69 82 83] MVE can be used to detect

security vulnerabilities in applications if a variant divergesthis could indicate that an attacker has modified the processin some way [46 82 83] It can also be used for running dif-ferent analyses on identical processes that would not workwhen run together on the same process such as the clangsanitizers [69] Most MVE systems hinge on a special moni-tor thread that controls the generation an maintenance ofa number of variants Keeping the variants in sync with re-spect to nondeterministic behaviours presents many of thesame problems that are associated with record and replay

7 ConclusionWe have presented tsan11rec which brings together con-trolled scheduling record and replay and dynamic data racedetection for the dynamic analysis of CC++11 applicationsOur experimental evaluation demonstrates that the tool isin many cases competitive in terms of performance withrr a state-of-the-art record and replay tool in some casesout-performing rr due to tsan11recrsquos ability to preserve par-allelism in applications under test to a high degree We havealso shown that our tool is capable of recording and replayingSDL-based video games by exploiting our sparse approachto avoid recording aspects of gamedisplay communicationthat are fundamentally hard to control The flip side of oursparse approach is that by limiting what is recorded ourtool desynchronises on uncontrolled forms of nondetermin-ism such as that related to memory layout by capturingthis form of nondeterminism tools such as rr do not sufferfrom this problem Two exciting avenues for future workinclude investigating a spectrum of recording granularitiesto bridge the gap between our sparse approach and stricterapproaches in a configurable manner and to investigatebug detection using a richer range of scheduling strategiesincluding schedule bounding [26 61 62] and probabilisticconcurrency testing [12]

AcknowledgmentsThis work was supported by a PhD studentship funded byGCHQ and EPSRC projects EPN0263141 and EPR0068651

589

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

References[1] Bowen Alpern Jong-Deok Choi Ton Ngo Manu Sridharan and

John M Vlissides 2001 A Perturbation-Free Replay Platform forCross-Optimized Multithreaded Applications In Proceedings of the15th International Parallel amp Distributed Processing Symposium (IPDPS-01) San Francisco CA USA April 23-27 2001 IEEE Computer Society23 httpsdoiorg101109IPDPS2001924957

[2] GautamAltekar and Ion Stoica 2009 ODR output-deterministic replayfor multicore debugging In Proceedings of the 22nd ACM Symposiumon Operating Systems Principles 2009 SOSP 2009 Big Sky Montana USAOctober 11-14 2009 Jeanna Neefe Matthews and Thomas E Anderson(Eds) ACM 193ndash206 httpsdoiorg10114516295751629594

[3] Apache Software Foundation 2018 Apache httpd httpshttpdapacheorgdevdevnoteshtml

[4] Remzi H Arpaci-Dusseau and Brad Chen (Eds) 2010 9th USENIXSymposium on Operating Systems Design and Implementation OSDI2010 October 4-6 2010 Vancouver BC Canada Proceedings USENIXAssociation httpwwwusenixorgeventosdi10techfull_papersosdi10_proceedingspdf

[5] Amittai Aviram Shu-Chun Weng Sen Hu and Bryan Ford 2010 Ef-ficient System-Enforced Deterministic Parallelism See [4] 193ndash206httpwwwusenixorgeventsosdi10techfull_papersAvirampdf

[6] David F Bacon and Seth Copen Goldstein 1991 Hardware-assisted Re-play of Multiprocessor Programs In Proceedings of the 1991 ACMONRWorkshop on Parallel and Distributed Debugging (PADD rsquo91) ACMNew York NY USA 194ndash206 httpsdoiorg101145122759122777

[7] Mark Batty Scott Owens Susmit Sarkar Peter Sewell and TjarkWeber2011 Mathematizing C++ concurrency In POPL 55ndash66

[8] Tom Bergan Owen Anderson Joseph Devietti Luis Ceze and DanGrossman 2010 CoreDet a compiler and runtime system for deter-ministic multithreaded execution See [38] 53ndash64 httpsdoiorg10114517360201736029

[9] Tom Bergan Nicholas Hunt Luis Ceze and Steven D Gribble 2010Deterministic Process Groups in dOS See [4] 177ndash191 httpwwwusenixorgeventsosdi10techfull_papersBerganpdf

[10] Sanjay Bhansali Wen-Ke Chen Stuart de Jong Andrew EdwardsRon Murray Milenko Drinić Darek Mihočka and Joe Chau 2006Framework for Instruction-level Tracing and Analysis of Program Ex-ecutions In Proceedings of the 2Nd International Conference on VirtualExecution Environments (VEE rsquo06) ACM New York NY USA 154ndash163httpsdoiorg10114511347601220164

[11] Christian Bienia Sanjeev Kumar Jaswinder Pal Singh and Kai Li 2008The PARSEC benchmark suite characterization and architectural im-plications In 17th International Conference on Parallel Architecture andCompilation Techniques PACT 2008 Toronto Ontario Canada October25-29 2008 Andreas Moshovos David Tarditi and Kunle Olukotun(Eds) ACM 72ndash81 httpsdoiorg10114514541151454128

[12] Sebastian Burckhardt Pravesh Kothari Madanlal Musuvathi and San-tosh Nagarakatte 2010 A randomized scheduler with probabilisticguarantees of finding bugs See [38] 167ndash178 httpsdoiorg10114517360201736040

[13] Brian Burg Richard Bailey Andrew J Ko and Michael D Ernst 2013Interactive recordreplay for web application debugging In The 26thAnnual ACM Symposium on User Interface Software and TechnologyUISTrsquo13 St Andrews United Kingdom October 8-11 2013 ShahramIzadi Aaron J Quigley Ivan Poupyrev and Takeo Igarashi (Eds) ACM473ndash484 httpsdoiorg10114525019882502050

[14] Anton Burtsev David Johnson Mike Hibler Eric Eide and John Regehr2016 Abstractions for Practical Virtual Machine Replay In Proceedingsof the 12th ACM SIGPLANSIGOPS International Conference on VirtualExecution Environments Atlanta GA USA April 2-3 2016 VishakhaGupta-Cledat Donald E Porter and Vivek Sarkar (Eds) ACM 93ndash106httpsdoiorg10114528922422892257

[15] M E Chastain 1999 MEC (January 1999) httpslwnnet19990121amechtml

[16] Heming Cui Jiriacute Simsa Yi-Hong Lin Hao Li Ben Blum Xinan XuJunfeng Yang Garth A Gibson and Randal E Bryant 2013 Parrota practical runtime for deterministic stable and reliable threads InACM SIGOPS 24th Symposium on Operating Systems Principles SOSP rsquo13Farmington PA USA November 3-6 2013 Michael Kaminsky and MikeDahlin (Eds) ACM 388ndash405 httpsdoiorg10114525173492522735

[17] Heming Cui Jingyue Wu John Gallagher Huayang Guo and JunfengYang 2011 Efficient deterministic multithreading through schedulerelaxation See [85] 337ndash351 httpsdoiorg10114520435562043588

[18] P Deva 2018 Chronon (2018) httpchrononsystemscomblogdesign-and-architecture-ofthe-chronon-record-0

[19] David Devecsery Michael Chow Xianzheng Dou Jason Flinn andPeter M Chen 2014 Eidetic Systems In 11th USENIX Symposium onOperating Systems Design and Implementation OSDI rsquo14 BroomfieldCO USA October 6-8 2014 Jason Flinn and Hank Levy (Eds) USENIXAssociation 525ndash540 httpswwwusenixorgconferenceosdi14technical-sessionspresentationdevecsery

[20] Joseph Devietti Brandon Lucia Luis Ceze andMark Oskin 2009 DMPdeterministic shared memory multiprocessing In Proceedings of the14th International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2009 Washington DC USAMarch 7-11 2009 Mary Lou Soffa and Mary Jane Irwin (Eds) ACM85ndash96 httpsdoiorg10114515082441508255

[21] Joseph Devietti Jacob Nelson Tom Bergan Luis Ceze and Dan Gross-man 2011 RCDC a relaxed consistency deterministic computer See[37] 67ndash78 httpsdoiorg10114519503651950376

[22] Brendan Dolan-Gavitt Josh Hodosh Patrick Hulin Tim Leek andRyan Whelan 2015 Repeatable Reverse Engineering with PANDAIn Proceedings of the 5th Program Protection and Reverse EngineeringWorkshop (PPREW-5) ACM New York NY USA Article 4 11 pageshttpsdoiorg10114528438592843867

[23] Richard Draves and Robbert van Renesse (Eds) 2008 8th USENIX Sym-posium on Operating Systems Design and Implementation OSDI 2008December 8-10 2008 San Diego California USA Proceedings USENIXAssociation httpswwwusenixorgpublicationsproceedingsf[0]=im_group_audience3A73

[24] George W Dunlap Samuel T King Sukru Cinar Murtaza A Bas-rai and Peter M Chen 2002 ReVirt Enabling Intrusion AnalysisThrough Virtual-Machine Logging and Replay In 5th Symposium onOperating System Design and Implementation (OSDI 2002) Boston Mas-sachusetts USA December 9-11 2002 David E Culler and Peter Dr-uschel (Eds) USENIX Association httpwwwusenixorgeventsosdi02techdunlaphtml

[25] George W Dunlap Dominic G Lucchetti Michael A Fetterman andPeter M Chen 2008 Execution replay of multiprocessor virtual ma-chines In Proceedings of the 4th International Conference on VirtualExecution Environments VEE 2008 Seattle WA USA March 5-7 2008David Gregg Vikram S Adve and Brian N Bershad (Eds) ACM121ndash130 httpsdoiorg10114513462561346273

[26] Michael Emmi Shaz Qadeer and Zvonimir Rakamaric 2011 Delay-bounded scheduling In Proceedings of the 38th ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages POPL 2011 AustinTX USA January 26-28 2011 Thomas Ball and Mooly Sagiv (Eds)ACM 411ndash422 httpsdoiorg10114519263851926432

[27] Jakob Engblom Daniel Aarno and Bengt Werner 2010 Full-SystemSimulation from Embedded to High-Performance Systems Springer USBoston MA 25ndash45 httpsdoiorg101007978-1-4419-6175-4_3

[28] Cormac Flanagan and Stephen N Freund 2009 FastTrack efficientand precise dynamic race detection In Proceedings of the 2009 ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI 2009 Dublin Ireland June 15-21 2009 Michael Hind andAmer Diwan (Eds) ACM 121ndash133 httpsdoiorg1011451542476

590

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

1542490[29] Cormac Flanagan and Stephen N Freund 2010 Adversarial memory

for detecting destructive races In Proceedings of the 2010 ACM SIGPLANConference on Programming Language Design and Implementation PLDI2010 Toronto Ontario Canada June 5-10 2010 Benjamin G Zornand Alexander Aiken (Eds) ACM 244ndash254 httpsdoiorg10114518065961806625

[30] Cormac Flanagan and Stephen N Freund 2010 The RoadRunnerdynamic analysis framework for concurrent programs In Proceedingsof the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis forSoftware Tools and Engineering PASTErsquo10 Toronto Ontario CanadaJune 5-6 2010 Sorin Lerner and Atanas Rountev (Eds) ACM 1ndash8httpsdoiorg10114518066721806674

[31] Cormac Flanagan and Patrice Godefroid 2005 Dynamic partial-orderreduction for model checking software In Proceedings of the 32nd ACMSIGPLAN-SIGACT Symposium on Principles of Programming LanguagesPOPL 2005 Long Beach California USA January 12-14 2005 JensPalsberg and Martiacuten Abadi (Eds) ACM 110ndash121 httpsdoiorg10114510403051040315

[32] Dennis Geels Gautam Altekar Scott Shenker and Ion Stoica 2006Replay Debugging for Distributed Applications (Awarded Best Paper)In Proceedings of the 2006 USENIX Annual Technical Conference BostonMA USA May 30 - June 3 2006 Atul Adya and Erich M Nahum (Eds)USENIX 289ndash300 httpwwwusenixorgeventsusenix06techgeelshtml

[33] Patrice Godefroid 2005 Software Model Checking The VeriSoftApproach Formal Methods in System Design 26 2 (2005) 77ndash101httpsdoiorg101007s10703-005-1489-x

[34] C Gottbrath 2008 Reverse debugging with the TotalView debugger(May 2008)

[35] David Grove and Steve Blackburn (Eds) 2015 Proceedings of the36th ACM SIGPLAN Conference on Programming Language Design andImplementation Portland OR USA June 15-17 2015 ACM httpdlacmorgcitationcfmid=2737924

[36] Zhenyu Guo Xi Wang Jian Tang Xuezheng Liu Zhilei Xu Ming WuM Frans Kaashoek and Zheng Zhang 2008 R2 An Application-LevelKernel for Record and Replay See [23] 193ndash208 httpwwwusenixorgeventsosdi08techfull_papersguoguopdf

[37] Rajiv Gupta and Todd C Mowry (Eds) 2011 Proceedings of the 16thInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2011 Newport Beach CAUSA March 5-11 2011 ACM

[38] James C Hoe and Vikram S Adve (Eds) 2010 Proceedings of the 15thInternational Conference on Architectural Support for Programming Lan-guages and Operating Systems ASPLOS 2010 Pittsburgh PennsylvaniaUSA March 13-17 2010 ACM

[39] Nima Honarmand and Josep Torrellas 2014 RelaxReplay recordand replay for relaxed-consistency multiprocessors In ArchitecturalSupport for Programming Languages and Operating Systems ASPLOSrsquo14 Salt Lake City UT USA March 1-5 2014 Rajeev BalasubramonianAl Davis and Sarita V Adve (Eds) ACM 223ndash238 httpsdoiorg10114525419402541979

[40] Nima Honarmand and Josep Torrellas 2014 Replay debugging Lever-aging record and replay for program debugging In ACMIEEE 41stInternational Symposium on Computer Architecture ISCA 2014 Min-neapolis MN USA June 14-18 2014 IEEE Computer Society 455ndash456httpsdoiorg101109ISCA20146853229

[41] Petr Hosek and Cristian Cadar 2015 VARAN the Unbelievable AnEfficient N-version Execution Framework In Proceedings of the Twenti-eth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS rsquo15 Istanbul Turkey March14-18 2015 Oumlzcan Oumlzturk Kemal Ebcioglu and Sandhya Dwarkadas(Eds) ACM 339ndash353 httpsdoiorg10114526943442694390

[42] Derek Hower and Mark D Hill 2008 Rerun Exploiting Episodes forLightweight Memory Race Recording In 35th International Symposiumon Computer Architecture (ISCA 2008) June 21-25 2008 Beijing China265ndash276 httpsdoiorg101109ISCA200826

[43] Jeff Huang Peng Liu and Charles Zhang 2010 LEAP lightweightdeterministic multi-processor replay of concurrent java programsIn Proceedings of the 18th ACM SIGSOFT International Symposium onFoundations of Software Engineering 2010 Santa Fe NM USA November7-11 2010 Gruia-Catalin Roman and Andreacute van der Hoek (Eds) ACM207ndash216 httpsdoiorg10114518822911882323

[44] Jeff Huang Charles Zhang and Julian Dolby 2013 CLAP recordinglocal executions to reproduce concurrency failures In ACM SIGPLANConference on Programming Language Design and Implementation PLDIrsquo13 Seattle WA USA June 16-19 2013 Hans-Juergen Boehm and Cor-mac Flanagan (Eds) ACM 141ndash152 httpsdoiorg10114524621562462167

[45] Shiyou Huang Bowen Cai and Jeff Huang 2017 Towards Production-Run Heisenbugs Reproduction on Commercial Hardware In 2017USENIX Annual Technical Conference USENIX ATC 2017 Santa ClaraCA USA July 12-14 2017 403ndash415 httpswwwusenixorgconferenceatc17technical-sessionspresentationhuang

[46] Koen Koning Herbert Bos and Cristiano Giuffrida 2016 Secure andEfficient Multi-Variant Execution Using Hardware-Assisted ProcessVirtualization In 46th Annual IEEEIFIP International Conference onDependable Systems and Networks DSN 2016 Toulouse France June28 - July 1 2016 IEEE Computer Society 431ndash442 httpsdoiorg101109DSN201646

[47] Oren Laadan Nicolas Viennot and Jason Nieh 2010 Transparentlightweight application execution replay on commodity multiproces-sor operating systems In SIGMETRICS 2010 Proceedings of the 2010ACM SIGMETRICS International Conference on Measurement and Mod-eling of Computer Systems New York New York USA 14-18 June 2010Vishal Misra Paul Barford and Mark S Squillante (Eds) ACM 155ndash166 httpsdoiorg10114518110391811057

[48] Leslie Lamport 1978 Time Clocks and the Ordering of Events ina Distributed System Commun ACM 21 7 (1978) 558ndash565 httpsdoiorg101145359545359563

[49] Dongyoon Lee Peter M Chen Jason Flinn and Satish Narayanasamy2012 Chimera hybrid program analysis for determinism In ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI rsquo12 Beijing China - June 11 - 16 2012 Jan Vitek Haibo Linand Frank Tip (Eds) ACM 463ndash474 httpsdoiorg10114522540642254119

[50] Dongyoon Lee Benjamin Wester Kaushik Veeraraghavan SatishNarayanasamy Peter M Chen and Jason Flinn 2010 Respec efficientonline multiprocessor replayvia speculation and external determinismSee [38] 77ndash90 httpsdoiorg10114517360201736031

[51] Kyu Hyung Lee Dohyeong Kim and Xiangyu Zhang 2014Infrastructure-Free Logging and Replay of Concurrent Executionon Multiple Cores In ECOOP 2014 - Object-Oriented Programming- 28th European Conference Uppsala Sweden July 28 - August 12014 Proceedings (Lecture Notes in Computer Science) Richard EJones (Ed) Vol 8586 Springer 232ndash256 httpsdoiorg101007978-3-662-44202-9_10

[52] Christopher Lidbury and Alastair F Donaldson 2017 Dynamic racedetection for C++11 In Proceedings of the 44th ACM SIGPLAN Sympo-sium on Principles of Programming Languages POPL 2017 Paris FranceJanuary 18-20 2017 Giuseppe Castagna and Andrew D Gordon (Eds)ACM 443ndash457 httpsdoiorg1011453009837

[53] Hongyu Liu Sam Silvestro Wei Wang Chen Tian and Tongping Liu2018 iReplayer in-situ and identical record-and-replay for multi-threaded applications In Proceedings of the 39th ACM SIGPLAN Confer-ence on Programming Language Design and Implementation PLDI 2018

591

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Philadelphia PA USA June 18-22 2018 Jeffrey S Foster and Dan Gross-man (Eds) ACM 344ndash358 httpsdoiorg10114531923663192380

[54] Peng Liu Xiangyu Zhang Omer Tripp and Yunhui Zheng 2015 Lightreplay via tightly bounded recording See [35] 55ndash64 httpsdoiorg10114527379242738001

[55] Tongping Liu Charlie Curtsinger and EmeryD Berger 2011 Dthreadsefficient deterministic multithreading See [85] 327ndash336 httpsdoiorg10114520435562043587

[56] Shan Lu Soyeon Park Eunsoo Seo and Yuanyuan Zhou 2008 Learn-ing from mistakes a comprehensive study on real world concurrencybug characteristics In Proceedings of the 13th International Confer-ence on Architectural Support for Programming Languages and Op-erating Systems ASPLOS 2008 Seattle WA USA March 1-5 2008Susan J Eggers and James R Larus (Eds) ACM 329ndash339 httpsdoiorg10114513462811346323

[57] Ali Joseacute Mashtizadeh Tal Garfinkel David Terei David Maziegraveresand Mendel Rosenblum 2017 Towards Practical Default-On Multi-Core RecordReplay In Proceedings of the Twenty-Second InternationalConference on Architectural Support for Programming Languages andOperating Systems ASPLOS 2017 Xirsquoan China April 8-12 2017 YunjiChen Olivier Temam and John Carter (Eds) ACM 693ndash708 httpsdoiorg10114530376973037751

[58] Microsoft 2018 Understanding IntelliTrace part I What the $ isIntelliTrace (2018) httpsblogsmsdnmicrosoftcomzainnab20130212understanding-intellitrace-part-i-what-the-is-intellitrace

[59] Pablo Montesinos Luis Ceze and Josep Torrellas 2008 DeLoreanRecording and Deterministically Replaying Shared-Memory Multi-processor Execution Effciently In 35th International Symposium onComputer Architecture (ISCA 2008) June 21-25 2008 Beijing China289ndash300 httpsdoiorg101109ISCA200836

[60] Mozilla 2018 SpiderMonkey httpsdevelopermozillaorgen-USdocsMozillaProjectsSpiderMonkey

[61] MadanlalMusuvathi and Shaz Qadeer 2007 Iterative context boundingfor systematic testing of multithreaded programs In Proceedings of theACM SIGPLAN 2007 Conference on Programming Language Design andImplementation San Diego California USA June 10-13 2007 JeanneFerrante and Kathryn S McKinley (Eds) ACM 446ndash455 httpsdoiorg10114512507341250785

[62] Madanlal Musuvathi Shaz Qadeer Thomas Ball Geacuterard Basler Pi-ramanayagam Arumuga Nainar and Iulian Neamtiu 2008 Find-ing and Reproducing Heisenbugs in Concurrent Programs See[23] 267ndash280 httpwwwusenixorgeventsosdi08techfull_papersmusuvathimusuvathipdf

[63] Satish Narayanasamy Gilles Pokam and Brad Calder 2005 BugNetContinuously Recording Program Execution for Deterministic ReplayDebugging In 32st International Symposium on Computer Architecture(ISCA 2005) 4-8 June 2005 Madison Wisconsin USA IEEE ComputerSociety 284ndash295 httpsdoiorg101109ISCA200516

[64] Brian Norris and Brian Demsky 2013 CDSchecker checking concur-rent data structures written with CC++ atomics In Proceedings ofthe 2013 ACM SIGPLAN International Conference on Object OrientedProgramming Systems Languages amp Applications OOPSLA 2013 part ofSPLASH 2013 Indianapolis IN USA October 26-31 2013 131ndash150

[65] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2016 Lightweight User-Space Record And ReplayCoRR abs161002144 (2016) arXiv161002144 httparxivorgabs161002144

[66] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2017 Engineering Record and Replay for Deploy-ability In 2017 USENIX Annual Technical Conference USENIX ATC 2017Santa Clara CA USA July 12-14 2017 377ndash389 httpswwwusenixorgconferenceatc17technical-sessionspresentationocallahan

[67] Harish Patil Cristiano Pereira Mack Stallcup Gregory Lueck andJames Cownie 2010 PinPlay a framework for deterministic replay

and reproducible analysis of parallel programs In Proceedings of theCGO 2010 The 8th International Symposium on Code Generation andOptimization Toronto Ontario Canada April 24-28 2010 AndreasMoshovos J Gregory Steffan Kim M Hazelwood and David R Kaeli(Eds) ACM 2ndash11 httpsdoiorg10114517729541772958

[68] pbzip2 development team 2018 pbzip2 httpslaunchpadnetpbzip2[69] Luiacutes Pina Anastasios Andronidis and Cristian Cadar 2018 FreeDA

deploying incompatible stock dynamic analyses in production viamulti-version execution In Proceedings of the 15th ACM InternationalConference on Computing Frontiers CF 2018 Ischia Italy May 08-102018 David R Kaeli and Miquel Pericagraves (Eds) ACM 1ndash10 httpsdoiorg10114532032173203237

[70] Gilles Pokam Klaus Danne Cristiano Pereira Rolf Kassa Tim KranichShiliang Hu Justin Emile Gottschlich Nima Honarmand Nathan Daut-enhahn Samuel T King and Josep Torrellas 2013 QuickRec proto-typing an intel architecture extension for record and replay of mul-tithreaded programs In The 40th Annual International Symposiumon Computer Architecture ISCArsquo13 Tel-Aviv Israel June 23-27 2013Avi Mendelson (Ed) ACM 643ndash654 httpsdoiorg10114524859222485977

[71] QuakeSpasm 2018 QuakeSpasm An engine for iD softwarersquos Quakehttpquakespasmsourceforgenet

[72] Yasushi Saito 2005 Jockey a user-space library for record-replaydebugging In Proceedings of the Sixth International Workshop on Auto-mated Debugging AADEBUG 2005 Monterey California USA Septem-ber 19-21 2005 Clinton Jeffery Jong-Deok Choi and Raimondas Lence-vicius (Eds) ACM 69ndash76 httpsdoiorg10114510851301085139

[73] Torr Samaho 2018 Zandronum httpszandronumcom[74] Konstantin Serebryany and Timur Iskhodzhanov 2009 ThreadSani-

tizer Data Race Detection in Practice In WBIA 62ndash71[75] Simple DirectMedia Layer 2018 SDL 20 library httpswwwlibsdl

orgdownload-20php[76] SQLite 2018 SQLite 3240 httpssqliteorgreleaselog3_24_0html[77] Deepa Srinivasan and Xuxian Jiang 2012 Time-Traveling Forensic

Analysis of VM-Based High-Interaction Honeypots In Security and Pri-vacy in Communication Networks Muttukrishnan Rajarajan Fred PiperHaining Wang and George Kesidis (Eds) Springer Berlin HeidelbergBerlin Heidelberg 209ndash226

[78] Paul Thomson Alastair F Donaldson and Adam Betts 2014 Concur-rency testing using schedule bounding an empirical study In ACMSIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPPoPP rsquo14 Orlando FL USA February 15-19 2014 Joseacute E Moreira andJames R Larus (Eds) ACM 15ndash28 httpsdoiorg10114525552432555260

[79] Joseph Tucek Shan Lu Chengdu Huang Spiros Xanthos andYuanyuan Zhou 2007 Triage diagnosing production run failuresat the userrsquos site In Proceedings of the 21st ACM Symposium on Oper-ating Systems Principles 2007 SOSP 2007 Stevenson Washington USAOctober 14-17 2007 Thomas C Bressoud and M Frans Kaashoek (Eds)ACM 131ndash144 httpsdoiorg10114512942611294275

[80] Undo 2018 Reversible debugging tools for CC++ on Linux amp Android(2018) httpundo-softwarecom

[81] Kaushik Veeraraghavan Dongyoon Lee Benjamin Wester JessicaOuyang Peter M Chen Jason Flinn and Satish Narayanasamy 2011DoublePlay parallelizing sequential logging and replay See [37] 15ndash26 httpsdoiorg10114519503651950370

[82] Stijn Volckaert Bart Coppens Bjorn De Sutter Koen De BosscherePer Larsen and Michael Franz 2017 Taming Parallelism in a Multi-Variant Execution Environment In Proceedings of the Twelfth EuropeanConference on Computer Systems EuroSys 2017 Belgrade Serbia April23-26 2017 Gustavo Alonso Ricardo Bianchini and Marko Vukolic(Eds) ACM 270ndash285 httpsdoiorg10114530641763064178

[83] Stijn Volckaert Bart Coppens Alexios Voulimeneas Andrei Home-scu Per Larsen Bjorn De Sutter and Michael Franz 2016 Secure

592

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

and Efficient Application Monitoring and Replication In 2016 USENIXAnnual Technical Conference USENIX ATC 2016 Denver CO USA June22-24 2016 Ajay Gulati and Hakim Weatherspoon (Eds) USENIXAssociation 167ndash179 httpswwwusenixorgconferenceatc16technical-sessionspresentationvolckaert

[84] YanWang Harish Patil Cristiano Pereira Gregory Lueck Rajiv Guptaand Iulian Neamtiu 2014 DrDebug Deterministic Replay based CyclicDebugging with Dynamic Slicing In 12th Annual IEEEACM Interna-tional Symposium on Code Generation and Optimization CGO 2014Orlando FL USA February 15-19 2014 David R Kaeli and TippMoseley(Eds) ACM 98 httpsdoiorg10114525441372544152

[85] Ted Wobber and Peter Druschel (Eds) 2011 Proceedings of the 23rdACM Symposium on Operating Systems Principles 2011 SOSP 2011Cascais Portugal October 23-26 2011 ACM httpsdoiorg1011452043556

[86] Min Xu Rastislav Bodiacutek and Mark D Hill 2003 A Flight DataRecorder for Enabling Full-System Multiprocessor Deterministic

Replay In 30th International Symposium on Computer Architecture(ISCA 2003) 9-11 June 2003 San Diego California USA Allan Got-tlieb and Kai Li (Eds) IEEE Computer Society 122ndash133 httpsdoiorg101109ISCA20031206994

[87] Jie Yu Satish Narayanasamy Cristiano Pereira and Gilles Pokam 2012Maple a coverage-driven testing tool for multithreaded programs InProceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications OOPSLA2012 part of SPLASH 2012 Tucson AZ USA October 21-25 2012 Gary TLeavens and Matthew B Dwyer (Eds) ACM 485ndash502 httpsdoiorg10114523846162384651

[88] Zandronum 2015 Zandronum bug tracker httpszandronumcomtrackerviewphpid=2380 Bug 0002380

[89] Naling Zhang Markus Kusano and ChaoWang 2015 Dynamic partialorder reduction for relaxed memory models See [35] 250ndash259 httpsdoiorg10114527379242737956

593

  • Abstract
  • 1 Introduction
  • 2 Background
  • 3 Scheduling
    • 31 Protocol Details
    • 32 Special Cases
    • 33 Liveness
      • 4 Record and Replay
        • 41 Motivating Example
        • 42 Interleaving
        • 43 Signals
        • 44 System Calls
        • 45 Asynchronous Events
          • 5 Evaluation
            • 51 CDSchecker Litmus Tests
            • 52 httpd
            • 53 PARSEC and pbzip
            • 54 SDL-based Games
            • 55 Limitations SQLite and SpiderMonkey
              • 6 Related Work
              • 7 Conclusion
              • Acknowledgments
              • References
Page 10: Sparse Record and Replay with Controlled Schedulingafd/homepages/papers/pdfs/2019/PLDI.pdf · Controlled concurrency testing has proven successful in find-ing subtle bugs in concurrent

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 1 Comparison over CDSchecker benchmarks between tsan + rr tsan11 and tsan11rec with controlled random andqueue scheduling Each benchmark was executed 1000 times in each mode The ldquoTimerdquo columns shows the mean executiontime (ms) and standard deviation (in parentheses) The ldquoRaterdquo column shows the percentage of runs that exhibited data races

tsan11 + rr tsan11 tsan11rec rnd tsan11rec queueTest Time Rate Time Rate Time Rate Time Rate

barrier 590 (1445) 00 8 (914) 00 4 (523) 375 6 (499) 00chase-lev-deque 579 (16669) 05 0 (194) 59 1 (307) 02 2 (378) 00dekker-fences 1776 (120854) 499 2 (402) 503 4 (483) 387 3 (446) 528linuxrwlocks 579 (12645) 03 2 (418) 01 4 (493) 624 3 (450) 00mcs-lock 574 (1371) 00 3 (445) 00 5 (523) 770 3 (447) 01mpmc-queue 574 (1371) 00 3 (449) 00 5 (500) 605 3 (446) 00ms-queue 3093 (10019) 1000 91 (6413) 1000 93 (8068) 1000 52 (6260) 1000

designed for larger applications this overhead will usuallybecome insignificant in more realistic examples

We examined a trace from chase-lev-deque to understandwhy tsan11rec rnd detects fewer races than tsan11 We foundthat from the creation of thread 2 to the point of the racethread 1 must perform 29 operations before thread 2 per-forms just 4 operations in order for the race to manifest Theprobability of this happening under uniform random sched-uling is very low We were able to coerce a race report outof the program by moving the creation of thread 2 to later inthe program This shows that different scheduling strategieswill affect how effective we are at finding data races andthat probabilistic concurrency testing (PCT) can be effectiveat prying out concurrency bugs [12]

52 httpdOverview Apachersquos httpd [3] is a widely-used modularhttp server that makes heavy use of concurrency to han-dle many simultaneous connections For record and replayit is of further interest due to its dependence on externalnetwork input We were able to handle httpd by capturingthe system calls described in sect44 with one workaroundthe accept system call which listens for incoming connec-tions relies on epoll_wait to listen for events This returnsuser-allocated pointers file descriptors and other data ina union with no easy way of knowing the active membersomething which tsan11rec cannot currently handle Weworked around this by using httpdrsquos option to switch to asimpler but slightly less efficient syscall poll which insteadsimply listens to file descriptors the results presented hereemploy this workaround A strength of rr is that it can han-dle httpd without this workaround due to its non-sparserecord and replay mechanism

Experimental setup We tested httpd version 2428 in single-process-multiple-thread mode using ab an Apache-providedprogram for server stress testing We sent 10000 queriesacross 10 concurrent threads to an httpd server for eachof the setups shown in Table 2 averaging results over 10runs We report on standard deviation and again remark on

the CV In the table rnd and queue refer to configurationsof tsan11rec and the presence or absence + rec indicateswhether recording was enabled

Results and discussion The results are shown in Table 2The columns under Race reports show regular results withrace reporting enabled the data under No reports shows re-sults where race-checking-capable tools do perform racechecking behind the scenes but do not actually emit racereports We make this distinction because tsan11 detectsso many races that the overhead of generating reports no-ticeably affects performance results when fewer races aredetected are more representative of the performance onewould expect using a future version of httpd in which manyraces are fixed The Throughput columns indicate the meannumber of queries the server responds to per second TheRate column is the mean number of race reports generated(only relevant for tsan11-based configurations) For eachstandard deviation is shown in parentheses Variance asmeasured by CV is low below 08 in all cases and usuallyless than 05 The Overhead columns indicate how muchslower performance is compared with native executionWithout reporting tsan11 already incurs a 3times overhead

compared to native Comparing results for native with rndand rnd+rec we find that adding controlled random sched-uling increase this overhead massively to between 79ndash89timesdepending on whether recording is enabled This is in thesame ball park as the overhead associated with rr 61times with-out race checking and 160timeswith tsan11 instrumentation (butstill with the actual reporting of races disabled) In contrastwhen our queue strategy is used the overhead comparedwith native drops to 9times and 21times with recording disabled vsenabled We attribute the gap between rrrnd and queue tohttpdrsquos heavy reliance on parallelism and frequent use ofshared mutexes This parallelism is removed by rr becausethe tool sequentializes the execution of threads while ourrandom scheduler only allows the thread that it has chosento be scheduled next to execute a visible operation even ifmany other threads are ready to execute visible operations

585

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Table 2 Comparison of throughput and race rate between native tsan11 rr and tsan11rec for Apachersquos httpd Results areaveraged over 10 runs ldquoThroughputrdquo shows mean throughput in queries per second ldquoRaterdquo is the number of races detectedper run (where relevant) Standard deviations are shown (in parentheses) Overhead is computed relative to native throughput

Race reports No reportsSetup Throughput Overhead Rate Throughput Overhead

Native NA NA NA 28895 (462256) 1timesrr NA NA NA 475 (608) 61timestsan11 3687 (29428) 8times 113 (1985) 9824 (143201) 3timestsan11 + rr 86 (6320) 336times 34 (1680) 181 (4637) 160timesrnd 141 (892) 205times 162 (3878) 367 (3326) 79timesqueue 818 (31033) 35times 381 (7362) 3261 (84367) 9timesrnd + rec 142 (1185) 203times 155 (3103) 326 (3894) 89timesqueue + rec 513 (8534) 56times 360 (6428) 1387 (24961) 21times

In contrast the queue strategy allows threads to perform visi-ble operations largely on demand Turning to the results withrace reporting enabled we see that the queue strategy hasthe highest race detection rate improving on uncontrolledtsan11 All other race detecting configurations lower the racedetection rate we believe this is because rr and rnd reducethe number of queries being responded to concurrently

Comparing demo file sizes when recording is enabled thetsan11rec demo files are around 48MB for both strategiesdropping to 48MB when only 1000 queries are issued sug-gesting that demo file size increases linearly with the numberof requests at a rate of around 48KB per request This couldbe reduced further with a more aggressive compression strat-egy but would likely increase the time overhead The demofile for rr is significantly smaller 66MB for 10000 querieswhich goes down to 39MB with 1000 queries implying arate of around 03KB per request plus a constant 36MB

53 PARSEC and pbzipOverview Wenext turn to the PARSEC benchmark suite [11]and pbzip application [68] both widely used for evaluatingconcurrency analysis tools For PARSEC we consider thebenchmarks used to evaluate iReplayer [53] however threeof these would not work on our system (dedup and swaptionsdo not compile and canneal crashes)

Experimental setup Each PARSEC (version 30) bench-mark was run with the lsquosimlargersquo test size shipped with thebenchmarks using 4 threads Pbzip (version 2-1113) wasused to compress a 400MB file with 4 threads We ran eachbenchmark 10 times per tool configuration and report av-erage runtimes We report on standard deviation and againremark on the CV A small number of races were discoveredfor some benchmarks and the race detecting tools largelyagreed on the number of races we do not detail these further

Results and discussion Table 3 shows the average timetaken to run each benchmark with each tool configurationwith standard deviation Variance as measured by CV is

reasonably low (CV is always below 1) For the tsan11rec re-sults + rec indicates whether recording was enabled Table 4is computed from the data of Table 3 and reports the over-head associated with running using each tool configurationcompared with native executionWith the exception of bodytrack and fluidanimate the

overhead tsan11rec brings over that of tsan11 is small and forall benchmarks whether recording is enabled or notmakes lit-tle difference However the overhead associated with tsan11+ rr (ie running tsan11-instrumented code under rr) is signif-icant compared with the tsan11 overhead alone despite thefact that running under rr without race detection is generallyefficient Interestingly rr without race detection performsless well on blackscholes compared with the tsan11rec con-figurations Digging into this we found that the benchmarkdistributes work between threads at the start of executionand then lets threads runwith little interaction This high par-allelismlow communication execution plays to the strengthsof tsan11rec where invisible operations are left to run in par-allel but is bad for rr which forces sequentialization acrossall operations

54 SDL-based GamesOverview Simple DirectMedia Layer (SDL) is a library thatconsolidates input graphics and various other forms of IOunder a single interface [75] and is typically used for gamesOnUbuntu 1604 SDL communicateswith X11 for IO pulseau-dio for sound andOpenGL for displayWe investigated recordand replay for two SDL-based games Zandronum [73] amulti-player Doom port (asymp400kLOC) and QuakeSpasm [71](asymp88kLOC) a port of Quake While these games support cus-tom record and replay by logging high level commands byworking at the threading and system call level tsan11rec canfacilitate record and replay of bugs that rely on low-levelinteractions to manifest We discuss below successful recordand replay of a bug in Zandronum that arises due to commu-nication of game data between the game client and serverwhich is not present in the gamersquos native replay

586

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 3 Execution times (s) for tool configurations across the pbzip and PARSEC benchmarks averaged across 10 runsStandard deviation is shown (in parentheses)

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 92 (031) 117 (049) 664 (311) 772 (187) 181 (030) 123 (066) 182 (030) 129 (129)blackscholes 04 (003) 08 (007) 10 (001) 20 (000) 07 (007) 07 (006) 07 (007) 07 (007)fluidanimate 08 (004) 160 (127) 21 (001) 385 (049) 464 (339) 390 (267) 504 (209) 398 (196)streamcluster 17 (019) 388 (441) 1133 (313) 1811 (179) 1030 (059) 485 (269) 1028 (018) 416 (261)bodytrack 05 (002) 72 (036) 38 (094) 327 (094) 500 (040) 73 (033) 500 (078) 78 (023)ferret 12 (007) 140 (086) 87 (044) 815 (220) 164 (066) 146 (060) 167 (071) 145 (037)

Table 4Overhead compared with native execution for tool configurations across the pbzip and PARSEC benchmarks computedfrom the results of Table 3

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 10times 13times 72times 84times 20times 13times 20times 14timesblackscholes 10times 20times 27times 53times 19times 19times 19times 18timesfluidanimate 10times 203times 27times 489times 590times 497times 642times 506timesstreamcluster 10times 224times 654times 1045times 595times 280times 593times 240timesbodytrack 10times 135times 72times 614times 938times 138times 939times 147timesferret 10times 119times 74times 695times 140times 125times 143times 124times

Our initial attempts to replay these SDL-based gamesfailed due to communication between the application andthe closed and proprietary NVIDIA OpenGL module on ourexperimental platform via ioctl syscallsWeworked aroundthis by ignoring ioctl during recording and letting it runnatively during replay This works because communicationwith the display driver has no effect on the game logic Dis-play interaction led to further problems with initialization ofthe input module We resorted to adapting the scheduler tolet the application run uninstrumented until SDL module ini-tialization had completed adding a custom scheduler hookto allow the application and scheduler to synchronize relatedto this These problems are not specific to our approach ortoolmdashindeed rr cannot handle these SDL-based games forsimilar reasonsmdashbut are rather a fundamental limitation ofrecording and replaying applications that make heavy useof IO To handle such applications one either needs to fullymock out IO components requiring a tremendous engi-neering effort or carefully determine those components thatshould not be instrumented and specify this via annotationsWith these workarounds we were able to handle both

games such that gameplay is displayed on screen during re-play gameplay would not be visible if the IO subsystem hadbeen mocked out and visibility might be useful in debuggingproblems that manifest as visual artifacts

Experimental setup Measuring game performance in away that allows us to compare the overhead of our sched-uling strategies is non-trivial The only metric we have isthe frame-rate (fps)mdashthe number of frames drawn to thescreen per second QuakeSpasm and Zandronum are capped

at 60 fps and will try to maintain this frame-rate dipping ifthey cannot keep up If the frame-rate is reduced too muchthe games become unplayable As a best-effort evaluationmechanism we report on whether the games are playableunder various tool combinations Additionally we found thatit was possible to remove the frame cap for Quakespasmso we report ball-park figures for the overheads of varioustool configurations when playing this game un-capped (Wecould not find a way to reliably remove the frame cap forZandronum) As mentioned previously we do not comparewith rr as it cannot record or replay the games We used Zan-dronum revision 10013dd3c3b57023f updated to use SDL2QuakeSpasm version 0930 and SDL version 205

Results anddiscussion With the random tsan11rec sched-uler Zandronum was unplayable even with recording dis-abled the frame-rate dropped to below 1 fps This is due tothe random scheduler starving the main thread by frequentlyscheduling other less critical threads (eg the audio thread)In contrast the queue scheduler could maintain the full 60fps with recording enabled for 100 seconds of play the demosize grew to just under 8MB of which 65MB was for syscalls

To test tsan11recrsquos ability to replay network communica-tion we found a previously fixed Zandronum bug [88] thatrelies on an error in this communication to manifest Thisbug involves incorrect game state information being sentfrom the server to the client during a map change We repli-cated the bug with a server and two clients one of whichwas recording After about 12 minutes the bug appearedand resulted in a demo file of 43MB We then replayed thedemo and the bug appeared as expected This demonstrates

587

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

that our tool can be used to accurately capture and facilitatereplay of bugs in large networked applicationsFor QuakeSpasm we found that it was possible to play

the game without dropping below 60 fps using tsan11 and alltsan11rec configurations To further investigate the overheadof each tool configuration on this case study we removedthe fps cap then played the game 5 times per tool configu-ration for 90 seconds per play enabling a mode where thegamersquos fps is periodically appended to a file We made a besteffort to play the game in a similar manner on each run butinevitably there will still be high variation in game activitybetween plays Indicative results are shown in Table 5 Thernd and queue configurations refer to tsan11rec with the ran-dom and queue strategies respectively and with recordingdisabled while the ldquo+ recrdquo tool configurations are similarbut with recording enabled The ldquoOverheadrdquo column showsthe overhead observed compared with native execution Thetake-away from these results is that the instrumentationoverhead for both tsan11 and tsan11rec is surprisingly mod-est (generally less than 2times) and that the additional overheadassociated with enabling recording in tsan11rec is low

55 Limitations SQLite and SpiderMonkeyA downside of our sparse approach to record and replay isthat different applications may have incompatible require-ments regarding what should be recorded and what mustnot be recorded For example recording memory layout andattempting to enforce the same layout on replay would notonly slow down the SDL games (see sect54) to the point ofbeing unplayable but would also cause problems related tocommunication with the display driver Yet the behaviour ofsome programs will depend on the memory layout such asiterating over an ordered C++ container that holds pointersIn particular we experimented with applying tsan11rec

to the SQLite database management library [76] and to Spi-derMonkey Firefoxrsquos JavaScript management engine [60]While tsan11rec was applicable for controlled schedulingof these applications we found that replay would rapidlydesynchronise due to memory layout nondeterminism caus-ing conditionals that rely on the values of pointers to evaluatedifferently during replay Tools such as rr can handle theseprograms reliably by enforcing the same memory layoutThis is a trade-off the non-sparse approach of rr can leadto higher overheads as demonstrated in sect52 and sect53 Analternative to adapting the record-and-replay tool so thatit always enforces memory layout determinism would beto adapt the application of interest so that default memoryallocation is replaced with a deterministic memory allocator

6 Related WorkControlled scheduling A large amount of work has goneinto the use of scheduling strategies as a form of state spaceexploration (eg [29 30 61 62 78 87]) and on techniques

aimed at reducing the size of the state space such as dynamicpartial-order reduction [31 89]) A particularly notable con-trolled scheduling tool in terms of successful practical ap-plication is Microsoftrsquos CHESS [62] which aims to system-atically explore all interleavings of a test scenario Similarto our approach each visible instruction has an associatedcustom wrapper that intercepts the real instruction callinginto the CHESS schedulerSchedule bounding techniques notably preemption- and

delay-bounding [26 61] have been shown to be successful inprioritising the order in which thread schedules are exploredduring controlled concurrency testing They prioritise ex-ploring schedules that exhibit small numbers of preemptionsbetween threads in line with empirical evidence that bugsrarely require large numbers of preemptions in order to man-ifest [56] Combining such techniques with the tsan11recalgorithm is an appealing idea in principle but is hinderedby the assumption that the program under test takes a fixedinput and that the scheduler is the only source of nonde-terminism This assumption allows running the programagain and again trying different schedules In the contextof tsan11rec which can be used to record and replay ap-plications where the environment presents other forms ofnondeterminism the manner in which the program interactswith its environment is captured with respect to a particularthread schedule and other thread schedules might involvecompletely different environmental interactions We believea more promising approach would be to bring ideas from theprobabilistic concurrency testing (PCT) algorithm [12] to thetsan11rec setting to introduce a degree of skewing to ourrandom strategy so that it explores more diverse schedules

Record and replay Record and replay has been a signifi-cant area of research with many tools being created to facil-itate it [2 9 19 24 32 36 39 40 43ndash45 47 50 53 54 57 5765ndash67 72 79 81] The general premise behind them is simi-lar identify order nondeterminism and input nondeterminismand create techniques to capture them while recording andcontrol them during replayVarious tools extend the OS in some way or require spe-

cific hardware [2 6 9 19 20 24 47 50 79 81] This has thebenefit of giving the tool access to much more of the systemsuch as memory pages and process information For exam-ple Scribe [47] will directly modify the system schedulerinstead of coercing it and achieves slowdowns as low as105times However this severely hits the usability of the toolas it requires the user to deploy a modified OS

Other tools reside entirely in user space [10 15 32 34 43ndash45 49 51 53 54 57 66 67 72 80 84] and trade performancefor usability This is the category that tsan11rec falls intoEase of use is particularly important in persuading users toadopt the tool rr [66] in particular allows the user to recorda program by simply passing the binary to rr as a parameter

588

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 5 Indicative frames per second (fps) result captured by playing Quakespasm for 90 seconds five times per toolconfiguration and capturing the fps reports recorded by the game Frame-rate results are reported for the minimum maximummedian and mean case as well as the 25th and 75th quartiles

Min 25th Median 75th Max Mean OverheadNative 291 369 400 428 502 4004 10timestsan11 125 193 225 275 431 2338 17timesrnd 128 205 238 281 421 2471 16timesqueue 84 188 233 273 435 2333 17timesrnd + rec 113 178 212 253 366 2168 18timesqueue + rec 86 161 185 227 348 1930 21times

and as such has become the definitive tool for record and re-play We have performed an extensive comparison with rr insect5 and note that while rr outclasses tsan11rec in some appli-cations and can handle applications that are out of scope fortsan11rec (see sect55) rr shows significantly higher overheadcompared with tsan11rec for a number of applications thatrely on a high degree of parallelism for performance Furtherour sparse approach with suitable workarounds enablesrecord and replay for graphical applications (the SDL-basedgames of sect54) that rr cannot currently handleBecause it builds on tsan11 which itself uses compiler

instrumentation and a modified libcxx tsan11rec shares sim-ilarities with tools that depend on language implementationor library-level support [1 13 13 18 36 58] Notable exam-ples here include R2 [36] and IntelliTrace [58]Whole system replay aims to record all system nonde-

terminism [14 19 22 24 25 27 53 77] Among these therecent iReplayer tool [53] performs record and replay in-situavoiding many of the problems (eg memory layout issues)that otherwise come from running the record and replayexecutions under different processes

Some tools focus on the order-nondeterminism allowingthem to retain their parallelism and thus reducing the over-head of multi-threaded application [25 42 59 63 70 86]Castor [57] will provide each thread with its own bufferfor storing information and serialize them at a later timetsan11rec also fits into this category as it will both preserverparallelism of invisible operations and apply a schedulingstrategy to resolve this nondeterminism

An alternative to recording a programrsquos nondeterminismis to remove it making some or all aspects of the programdeterministic [5 8 16 17 21 55] For example Dthreads [55]ensures that memory accesses are deterministic on eachexecution Such approaches can have a significant probeproblem by removing the behaviour necessary for certainbugs to manifest in return for avoiding the performanceoverhead associated with handling order-nondeterminism

Multi-version execution Multi-version (or multi-variant)execution (MVE) is a method for concurrently running mul-tiple processes that are expected to behave in a semanticallysimilar manner [41 46 69 82 83] MVE can be used to detect

security vulnerabilities in applications if a variant divergesthis could indicate that an attacker has modified the processin some way [46 82 83] It can also be used for running dif-ferent analyses on identical processes that would not workwhen run together on the same process such as the clangsanitizers [69] Most MVE systems hinge on a special moni-tor thread that controls the generation an maintenance ofa number of variants Keeping the variants in sync with re-spect to nondeterministic behaviours presents many of thesame problems that are associated with record and replay

7 ConclusionWe have presented tsan11rec which brings together con-trolled scheduling record and replay and dynamic data racedetection for the dynamic analysis of CC++11 applicationsOur experimental evaluation demonstrates that the tool isin many cases competitive in terms of performance withrr a state-of-the-art record and replay tool in some casesout-performing rr due to tsan11recrsquos ability to preserve par-allelism in applications under test to a high degree We havealso shown that our tool is capable of recording and replayingSDL-based video games by exploiting our sparse approachto avoid recording aspects of gamedisplay communicationthat are fundamentally hard to control The flip side of oursparse approach is that by limiting what is recorded ourtool desynchronises on uncontrolled forms of nondetermin-ism such as that related to memory layout by capturingthis form of nondeterminism tools such as rr do not sufferfrom this problem Two exciting avenues for future workinclude investigating a spectrum of recording granularitiesto bridge the gap between our sparse approach and stricterapproaches in a configurable manner and to investigatebug detection using a richer range of scheduling strategiesincluding schedule bounding [26 61 62] and probabilisticconcurrency testing [12]

AcknowledgmentsThis work was supported by a PhD studentship funded byGCHQ and EPSRC projects EPN0263141 and EPR0068651

589

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

References[1] Bowen Alpern Jong-Deok Choi Ton Ngo Manu Sridharan and

John M Vlissides 2001 A Perturbation-Free Replay Platform forCross-Optimized Multithreaded Applications In Proceedings of the15th International Parallel amp Distributed Processing Symposium (IPDPS-01) San Francisco CA USA April 23-27 2001 IEEE Computer Society23 httpsdoiorg101109IPDPS2001924957

[2] GautamAltekar and Ion Stoica 2009 ODR output-deterministic replayfor multicore debugging In Proceedings of the 22nd ACM Symposiumon Operating Systems Principles 2009 SOSP 2009 Big Sky Montana USAOctober 11-14 2009 Jeanna Neefe Matthews and Thomas E Anderson(Eds) ACM 193ndash206 httpsdoiorg10114516295751629594

[3] Apache Software Foundation 2018 Apache httpd httpshttpdapacheorgdevdevnoteshtml

[4] Remzi H Arpaci-Dusseau and Brad Chen (Eds) 2010 9th USENIXSymposium on Operating Systems Design and Implementation OSDI2010 October 4-6 2010 Vancouver BC Canada Proceedings USENIXAssociation httpwwwusenixorgeventosdi10techfull_papersosdi10_proceedingspdf

[5] Amittai Aviram Shu-Chun Weng Sen Hu and Bryan Ford 2010 Ef-ficient System-Enforced Deterministic Parallelism See [4] 193ndash206httpwwwusenixorgeventsosdi10techfull_papersAvirampdf

[6] David F Bacon and Seth Copen Goldstein 1991 Hardware-assisted Re-play of Multiprocessor Programs In Proceedings of the 1991 ACMONRWorkshop on Parallel and Distributed Debugging (PADD rsquo91) ACMNew York NY USA 194ndash206 httpsdoiorg101145122759122777

[7] Mark Batty Scott Owens Susmit Sarkar Peter Sewell and TjarkWeber2011 Mathematizing C++ concurrency In POPL 55ndash66

[8] Tom Bergan Owen Anderson Joseph Devietti Luis Ceze and DanGrossman 2010 CoreDet a compiler and runtime system for deter-ministic multithreaded execution See [38] 53ndash64 httpsdoiorg10114517360201736029

[9] Tom Bergan Nicholas Hunt Luis Ceze and Steven D Gribble 2010Deterministic Process Groups in dOS See [4] 177ndash191 httpwwwusenixorgeventsosdi10techfull_papersBerganpdf

[10] Sanjay Bhansali Wen-Ke Chen Stuart de Jong Andrew EdwardsRon Murray Milenko Drinić Darek Mihočka and Joe Chau 2006Framework for Instruction-level Tracing and Analysis of Program Ex-ecutions In Proceedings of the 2Nd International Conference on VirtualExecution Environments (VEE rsquo06) ACM New York NY USA 154ndash163httpsdoiorg10114511347601220164

[11] Christian Bienia Sanjeev Kumar Jaswinder Pal Singh and Kai Li 2008The PARSEC benchmark suite characterization and architectural im-plications In 17th International Conference on Parallel Architecture andCompilation Techniques PACT 2008 Toronto Ontario Canada October25-29 2008 Andreas Moshovos David Tarditi and Kunle Olukotun(Eds) ACM 72ndash81 httpsdoiorg10114514541151454128

[12] Sebastian Burckhardt Pravesh Kothari Madanlal Musuvathi and San-tosh Nagarakatte 2010 A randomized scheduler with probabilisticguarantees of finding bugs See [38] 167ndash178 httpsdoiorg10114517360201736040

[13] Brian Burg Richard Bailey Andrew J Ko and Michael D Ernst 2013Interactive recordreplay for web application debugging In The 26thAnnual ACM Symposium on User Interface Software and TechnologyUISTrsquo13 St Andrews United Kingdom October 8-11 2013 ShahramIzadi Aaron J Quigley Ivan Poupyrev and Takeo Igarashi (Eds) ACM473ndash484 httpsdoiorg10114525019882502050

[14] Anton Burtsev David Johnson Mike Hibler Eric Eide and John Regehr2016 Abstractions for Practical Virtual Machine Replay In Proceedingsof the 12th ACM SIGPLANSIGOPS International Conference on VirtualExecution Environments Atlanta GA USA April 2-3 2016 VishakhaGupta-Cledat Donald E Porter and Vivek Sarkar (Eds) ACM 93ndash106httpsdoiorg10114528922422892257

[15] M E Chastain 1999 MEC (January 1999) httpslwnnet19990121amechtml

[16] Heming Cui Jiriacute Simsa Yi-Hong Lin Hao Li Ben Blum Xinan XuJunfeng Yang Garth A Gibson and Randal E Bryant 2013 Parrota practical runtime for deterministic stable and reliable threads InACM SIGOPS 24th Symposium on Operating Systems Principles SOSP rsquo13Farmington PA USA November 3-6 2013 Michael Kaminsky and MikeDahlin (Eds) ACM 388ndash405 httpsdoiorg10114525173492522735

[17] Heming Cui Jingyue Wu John Gallagher Huayang Guo and JunfengYang 2011 Efficient deterministic multithreading through schedulerelaxation See [85] 337ndash351 httpsdoiorg10114520435562043588

[18] P Deva 2018 Chronon (2018) httpchrononsystemscomblogdesign-and-architecture-ofthe-chronon-record-0

[19] David Devecsery Michael Chow Xianzheng Dou Jason Flinn andPeter M Chen 2014 Eidetic Systems In 11th USENIX Symposium onOperating Systems Design and Implementation OSDI rsquo14 BroomfieldCO USA October 6-8 2014 Jason Flinn and Hank Levy (Eds) USENIXAssociation 525ndash540 httpswwwusenixorgconferenceosdi14technical-sessionspresentationdevecsery

[20] Joseph Devietti Brandon Lucia Luis Ceze andMark Oskin 2009 DMPdeterministic shared memory multiprocessing In Proceedings of the14th International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2009 Washington DC USAMarch 7-11 2009 Mary Lou Soffa and Mary Jane Irwin (Eds) ACM85ndash96 httpsdoiorg10114515082441508255

[21] Joseph Devietti Jacob Nelson Tom Bergan Luis Ceze and Dan Gross-man 2011 RCDC a relaxed consistency deterministic computer See[37] 67ndash78 httpsdoiorg10114519503651950376

[22] Brendan Dolan-Gavitt Josh Hodosh Patrick Hulin Tim Leek andRyan Whelan 2015 Repeatable Reverse Engineering with PANDAIn Proceedings of the 5th Program Protection and Reverse EngineeringWorkshop (PPREW-5) ACM New York NY USA Article 4 11 pageshttpsdoiorg10114528438592843867

[23] Richard Draves and Robbert van Renesse (Eds) 2008 8th USENIX Sym-posium on Operating Systems Design and Implementation OSDI 2008December 8-10 2008 San Diego California USA Proceedings USENIXAssociation httpswwwusenixorgpublicationsproceedingsf[0]=im_group_audience3A73

[24] George W Dunlap Samuel T King Sukru Cinar Murtaza A Bas-rai and Peter M Chen 2002 ReVirt Enabling Intrusion AnalysisThrough Virtual-Machine Logging and Replay In 5th Symposium onOperating System Design and Implementation (OSDI 2002) Boston Mas-sachusetts USA December 9-11 2002 David E Culler and Peter Dr-uschel (Eds) USENIX Association httpwwwusenixorgeventsosdi02techdunlaphtml

[25] George W Dunlap Dominic G Lucchetti Michael A Fetterman andPeter M Chen 2008 Execution replay of multiprocessor virtual ma-chines In Proceedings of the 4th International Conference on VirtualExecution Environments VEE 2008 Seattle WA USA March 5-7 2008David Gregg Vikram S Adve and Brian N Bershad (Eds) ACM121ndash130 httpsdoiorg10114513462561346273

[26] Michael Emmi Shaz Qadeer and Zvonimir Rakamaric 2011 Delay-bounded scheduling In Proceedings of the 38th ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages POPL 2011 AustinTX USA January 26-28 2011 Thomas Ball and Mooly Sagiv (Eds)ACM 411ndash422 httpsdoiorg10114519263851926432

[27] Jakob Engblom Daniel Aarno and Bengt Werner 2010 Full-SystemSimulation from Embedded to High-Performance Systems Springer USBoston MA 25ndash45 httpsdoiorg101007978-1-4419-6175-4_3

[28] Cormac Flanagan and Stephen N Freund 2009 FastTrack efficientand precise dynamic race detection In Proceedings of the 2009 ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI 2009 Dublin Ireland June 15-21 2009 Michael Hind andAmer Diwan (Eds) ACM 121ndash133 httpsdoiorg1011451542476

590

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

1542490[29] Cormac Flanagan and Stephen N Freund 2010 Adversarial memory

for detecting destructive races In Proceedings of the 2010 ACM SIGPLANConference on Programming Language Design and Implementation PLDI2010 Toronto Ontario Canada June 5-10 2010 Benjamin G Zornand Alexander Aiken (Eds) ACM 244ndash254 httpsdoiorg10114518065961806625

[30] Cormac Flanagan and Stephen N Freund 2010 The RoadRunnerdynamic analysis framework for concurrent programs In Proceedingsof the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis forSoftware Tools and Engineering PASTErsquo10 Toronto Ontario CanadaJune 5-6 2010 Sorin Lerner and Atanas Rountev (Eds) ACM 1ndash8httpsdoiorg10114518066721806674

[31] Cormac Flanagan and Patrice Godefroid 2005 Dynamic partial-orderreduction for model checking software In Proceedings of the 32nd ACMSIGPLAN-SIGACT Symposium on Principles of Programming LanguagesPOPL 2005 Long Beach California USA January 12-14 2005 JensPalsberg and Martiacuten Abadi (Eds) ACM 110ndash121 httpsdoiorg10114510403051040315

[32] Dennis Geels Gautam Altekar Scott Shenker and Ion Stoica 2006Replay Debugging for Distributed Applications (Awarded Best Paper)In Proceedings of the 2006 USENIX Annual Technical Conference BostonMA USA May 30 - June 3 2006 Atul Adya and Erich M Nahum (Eds)USENIX 289ndash300 httpwwwusenixorgeventsusenix06techgeelshtml

[33] Patrice Godefroid 2005 Software Model Checking The VeriSoftApproach Formal Methods in System Design 26 2 (2005) 77ndash101httpsdoiorg101007s10703-005-1489-x

[34] C Gottbrath 2008 Reverse debugging with the TotalView debugger(May 2008)

[35] David Grove and Steve Blackburn (Eds) 2015 Proceedings of the36th ACM SIGPLAN Conference on Programming Language Design andImplementation Portland OR USA June 15-17 2015 ACM httpdlacmorgcitationcfmid=2737924

[36] Zhenyu Guo Xi Wang Jian Tang Xuezheng Liu Zhilei Xu Ming WuM Frans Kaashoek and Zheng Zhang 2008 R2 An Application-LevelKernel for Record and Replay See [23] 193ndash208 httpwwwusenixorgeventsosdi08techfull_papersguoguopdf

[37] Rajiv Gupta and Todd C Mowry (Eds) 2011 Proceedings of the 16thInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2011 Newport Beach CAUSA March 5-11 2011 ACM

[38] James C Hoe and Vikram S Adve (Eds) 2010 Proceedings of the 15thInternational Conference on Architectural Support for Programming Lan-guages and Operating Systems ASPLOS 2010 Pittsburgh PennsylvaniaUSA March 13-17 2010 ACM

[39] Nima Honarmand and Josep Torrellas 2014 RelaxReplay recordand replay for relaxed-consistency multiprocessors In ArchitecturalSupport for Programming Languages and Operating Systems ASPLOSrsquo14 Salt Lake City UT USA March 1-5 2014 Rajeev BalasubramonianAl Davis and Sarita V Adve (Eds) ACM 223ndash238 httpsdoiorg10114525419402541979

[40] Nima Honarmand and Josep Torrellas 2014 Replay debugging Lever-aging record and replay for program debugging In ACMIEEE 41stInternational Symposium on Computer Architecture ISCA 2014 Min-neapolis MN USA June 14-18 2014 IEEE Computer Society 455ndash456httpsdoiorg101109ISCA20146853229

[41] Petr Hosek and Cristian Cadar 2015 VARAN the Unbelievable AnEfficient N-version Execution Framework In Proceedings of the Twenti-eth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS rsquo15 Istanbul Turkey March14-18 2015 Oumlzcan Oumlzturk Kemal Ebcioglu and Sandhya Dwarkadas(Eds) ACM 339ndash353 httpsdoiorg10114526943442694390

[42] Derek Hower and Mark D Hill 2008 Rerun Exploiting Episodes forLightweight Memory Race Recording In 35th International Symposiumon Computer Architecture (ISCA 2008) June 21-25 2008 Beijing China265ndash276 httpsdoiorg101109ISCA200826

[43] Jeff Huang Peng Liu and Charles Zhang 2010 LEAP lightweightdeterministic multi-processor replay of concurrent java programsIn Proceedings of the 18th ACM SIGSOFT International Symposium onFoundations of Software Engineering 2010 Santa Fe NM USA November7-11 2010 Gruia-Catalin Roman and Andreacute van der Hoek (Eds) ACM207ndash216 httpsdoiorg10114518822911882323

[44] Jeff Huang Charles Zhang and Julian Dolby 2013 CLAP recordinglocal executions to reproduce concurrency failures In ACM SIGPLANConference on Programming Language Design and Implementation PLDIrsquo13 Seattle WA USA June 16-19 2013 Hans-Juergen Boehm and Cor-mac Flanagan (Eds) ACM 141ndash152 httpsdoiorg10114524621562462167

[45] Shiyou Huang Bowen Cai and Jeff Huang 2017 Towards Production-Run Heisenbugs Reproduction on Commercial Hardware In 2017USENIX Annual Technical Conference USENIX ATC 2017 Santa ClaraCA USA July 12-14 2017 403ndash415 httpswwwusenixorgconferenceatc17technical-sessionspresentationhuang

[46] Koen Koning Herbert Bos and Cristiano Giuffrida 2016 Secure andEfficient Multi-Variant Execution Using Hardware-Assisted ProcessVirtualization In 46th Annual IEEEIFIP International Conference onDependable Systems and Networks DSN 2016 Toulouse France June28 - July 1 2016 IEEE Computer Society 431ndash442 httpsdoiorg101109DSN201646

[47] Oren Laadan Nicolas Viennot and Jason Nieh 2010 Transparentlightweight application execution replay on commodity multiproces-sor operating systems In SIGMETRICS 2010 Proceedings of the 2010ACM SIGMETRICS International Conference on Measurement and Mod-eling of Computer Systems New York New York USA 14-18 June 2010Vishal Misra Paul Barford and Mark S Squillante (Eds) ACM 155ndash166 httpsdoiorg10114518110391811057

[48] Leslie Lamport 1978 Time Clocks and the Ordering of Events ina Distributed System Commun ACM 21 7 (1978) 558ndash565 httpsdoiorg101145359545359563

[49] Dongyoon Lee Peter M Chen Jason Flinn and Satish Narayanasamy2012 Chimera hybrid program analysis for determinism In ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI rsquo12 Beijing China - June 11 - 16 2012 Jan Vitek Haibo Linand Frank Tip (Eds) ACM 463ndash474 httpsdoiorg10114522540642254119

[50] Dongyoon Lee Benjamin Wester Kaushik Veeraraghavan SatishNarayanasamy Peter M Chen and Jason Flinn 2010 Respec efficientonline multiprocessor replayvia speculation and external determinismSee [38] 77ndash90 httpsdoiorg10114517360201736031

[51] Kyu Hyung Lee Dohyeong Kim and Xiangyu Zhang 2014Infrastructure-Free Logging and Replay of Concurrent Executionon Multiple Cores In ECOOP 2014 - Object-Oriented Programming- 28th European Conference Uppsala Sweden July 28 - August 12014 Proceedings (Lecture Notes in Computer Science) Richard EJones (Ed) Vol 8586 Springer 232ndash256 httpsdoiorg101007978-3-662-44202-9_10

[52] Christopher Lidbury and Alastair F Donaldson 2017 Dynamic racedetection for C++11 In Proceedings of the 44th ACM SIGPLAN Sympo-sium on Principles of Programming Languages POPL 2017 Paris FranceJanuary 18-20 2017 Giuseppe Castagna and Andrew D Gordon (Eds)ACM 443ndash457 httpsdoiorg1011453009837

[53] Hongyu Liu Sam Silvestro Wei Wang Chen Tian and Tongping Liu2018 iReplayer in-situ and identical record-and-replay for multi-threaded applications In Proceedings of the 39th ACM SIGPLAN Confer-ence on Programming Language Design and Implementation PLDI 2018

591

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Philadelphia PA USA June 18-22 2018 Jeffrey S Foster and Dan Gross-man (Eds) ACM 344ndash358 httpsdoiorg10114531923663192380

[54] Peng Liu Xiangyu Zhang Omer Tripp and Yunhui Zheng 2015 Lightreplay via tightly bounded recording See [35] 55ndash64 httpsdoiorg10114527379242738001

[55] Tongping Liu Charlie Curtsinger and EmeryD Berger 2011 Dthreadsefficient deterministic multithreading See [85] 327ndash336 httpsdoiorg10114520435562043587

[56] Shan Lu Soyeon Park Eunsoo Seo and Yuanyuan Zhou 2008 Learn-ing from mistakes a comprehensive study on real world concurrencybug characteristics In Proceedings of the 13th International Confer-ence on Architectural Support for Programming Languages and Op-erating Systems ASPLOS 2008 Seattle WA USA March 1-5 2008Susan J Eggers and James R Larus (Eds) ACM 329ndash339 httpsdoiorg10114513462811346323

[57] Ali Joseacute Mashtizadeh Tal Garfinkel David Terei David Maziegraveresand Mendel Rosenblum 2017 Towards Practical Default-On Multi-Core RecordReplay In Proceedings of the Twenty-Second InternationalConference on Architectural Support for Programming Languages andOperating Systems ASPLOS 2017 Xirsquoan China April 8-12 2017 YunjiChen Olivier Temam and John Carter (Eds) ACM 693ndash708 httpsdoiorg10114530376973037751

[58] Microsoft 2018 Understanding IntelliTrace part I What the $ isIntelliTrace (2018) httpsblogsmsdnmicrosoftcomzainnab20130212understanding-intellitrace-part-i-what-the-is-intellitrace

[59] Pablo Montesinos Luis Ceze and Josep Torrellas 2008 DeLoreanRecording and Deterministically Replaying Shared-Memory Multi-processor Execution Effciently In 35th International Symposium onComputer Architecture (ISCA 2008) June 21-25 2008 Beijing China289ndash300 httpsdoiorg101109ISCA200836

[60] Mozilla 2018 SpiderMonkey httpsdevelopermozillaorgen-USdocsMozillaProjectsSpiderMonkey

[61] MadanlalMusuvathi and Shaz Qadeer 2007 Iterative context boundingfor systematic testing of multithreaded programs In Proceedings of theACM SIGPLAN 2007 Conference on Programming Language Design andImplementation San Diego California USA June 10-13 2007 JeanneFerrante and Kathryn S McKinley (Eds) ACM 446ndash455 httpsdoiorg10114512507341250785

[62] Madanlal Musuvathi Shaz Qadeer Thomas Ball Geacuterard Basler Pi-ramanayagam Arumuga Nainar and Iulian Neamtiu 2008 Find-ing and Reproducing Heisenbugs in Concurrent Programs See[23] 267ndash280 httpwwwusenixorgeventsosdi08techfull_papersmusuvathimusuvathipdf

[63] Satish Narayanasamy Gilles Pokam and Brad Calder 2005 BugNetContinuously Recording Program Execution for Deterministic ReplayDebugging In 32st International Symposium on Computer Architecture(ISCA 2005) 4-8 June 2005 Madison Wisconsin USA IEEE ComputerSociety 284ndash295 httpsdoiorg101109ISCA200516

[64] Brian Norris and Brian Demsky 2013 CDSchecker checking concur-rent data structures written with CC++ atomics In Proceedings ofthe 2013 ACM SIGPLAN International Conference on Object OrientedProgramming Systems Languages amp Applications OOPSLA 2013 part ofSPLASH 2013 Indianapolis IN USA October 26-31 2013 131ndash150

[65] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2016 Lightweight User-Space Record And ReplayCoRR abs161002144 (2016) arXiv161002144 httparxivorgabs161002144

[66] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2017 Engineering Record and Replay for Deploy-ability In 2017 USENIX Annual Technical Conference USENIX ATC 2017Santa Clara CA USA July 12-14 2017 377ndash389 httpswwwusenixorgconferenceatc17technical-sessionspresentationocallahan

[67] Harish Patil Cristiano Pereira Mack Stallcup Gregory Lueck andJames Cownie 2010 PinPlay a framework for deterministic replay

and reproducible analysis of parallel programs In Proceedings of theCGO 2010 The 8th International Symposium on Code Generation andOptimization Toronto Ontario Canada April 24-28 2010 AndreasMoshovos J Gregory Steffan Kim M Hazelwood and David R Kaeli(Eds) ACM 2ndash11 httpsdoiorg10114517729541772958

[68] pbzip2 development team 2018 pbzip2 httpslaunchpadnetpbzip2[69] Luiacutes Pina Anastasios Andronidis and Cristian Cadar 2018 FreeDA

deploying incompatible stock dynamic analyses in production viamulti-version execution In Proceedings of the 15th ACM InternationalConference on Computing Frontiers CF 2018 Ischia Italy May 08-102018 David R Kaeli and Miquel Pericagraves (Eds) ACM 1ndash10 httpsdoiorg10114532032173203237

[70] Gilles Pokam Klaus Danne Cristiano Pereira Rolf Kassa Tim KranichShiliang Hu Justin Emile Gottschlich Nima Honarmand Nathan Daut-enhahn Samuel T King and Josep Torrellas 2013 QuickRec proto-typing an intel architecture extension for record and replay of mul-tithreaded programs In The 40th Annual International Symposiumon Computer Architecture ISCArsquo13 Tel-Aviv Israel June 23-27 2013Avi Mendelson (Ed) ACM 643ndash654 httpsdoiorg10114524859222485977

[71] QuakeSpasm 2018 QuakeSpasm An engine for iD softwarersquos Quakehttpquakespasmsourceforgenet

[72] Yasushi Saito 2005 Jockey a user-space library for record-replaydebugging In Proceedings of the Sixth International Workshop on Auto-mated Debugging AADEBUG 2005 Monterey California USA Septem-ber 19-21 2005 Clinton Jeffery Jong-Deok Choi and Raimondas Lence-vicius (Eds) ACM 69ndash76 httpsdoiorg10114510851301085139

[73] Torr Samaho 2018 Zandronum httpszandronumcom[74] Konstantin Serebryany and Timur Iskhodzhanov 2009 ThreadSani-

tizer Data Race Detection in Practice In WBIA 62ndash71[75] Simple DirectMedia Layer 2018 SDL 20 library httpswwwlibsdl

orgdownload-20php[76] SQLite 2018 SQLite 3240 httpssqliteorgreleaselog3_24_0html[77] Deepa Srinivasan and Xuxian Jiang 2012 Time-Traveling Forensic

Analysis of VM-Based High-Interaction Honeypots In Security and Pri-vacy in Communication Networks Muttukrishnan Rajarajan Fred PiperHaining Wang and George Kesidis (Eds) Springer Berlin HeidelbergBerlin Heidelberg 209ndash226

[78] Paul Thomson Alastair F Donaldson and Adam Betts 2014 Concur-rency testing using schedule bounding an empirical study In ACMSIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPPoPP rsquo14 Orlando FL USA February 15-19 2014 Joseacute E Moreira andJames R Larus (Eds) ACM 15ndash28 httpsdoiorg10114525552432555260

[79] Joseph Tucek Shan Lu Chengdu Huang Spiros Xanthos andYuanyuan Zhou 2007 Triage diagnosing production run failuresat the userrsquos site In Proceedings of the 21st ACM Symposium on Oper-ating Systems Principles 2007 SOSP 2007 Stevenson Washington USAOctober 14-17 2007 Thomas C Bressoud and M Frans Kaashoek (Eds)ACM 131ndash144 httpsdoiorg10114512942611294275

[80] Undo 2018 Reversible debugging tools for CC++ on Linux amp Android(2018) httpundo-softwarecom

[81] Kaushik Veeraraghavan Dongyoon Lee Benjamin Wester JessicaOuyang Peter M Chen Jason Flinn and Satish Narayanasamy 2011DoublePlay parallelizing sequential logging and replay See [37] 15ndash26 httpsdoiorg10114519503651950370

[82] Stijn Volckaert Bart Coppens Bjorn De Sutter Koen De BosscherePer Larsen and Michael Franz 2017 Taming Parallelism in a Multi-Variant Execution Environment In Proceedings of the Twelfth EuropeanConference on Computer Systems EuroSys 2017 Belgrade Serbia April23-26 2017 Gustavo Alonso Ricardo Bianchini and Marko Vukolic(Eds) ACM 270ndash285 httpsdoiorg10114530641763064178

[83] Stijn Volckaert Bart Coppens Alexios Voulimeneas Andrei Home-scu Per Larsen Bjorn De Sutter and Michael Franz 2016 Secure

592

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

and Efficient Application Monitoring and Replication In 2016 USENIXAnnual Technical Conference USENIX ATC 2016 Denver CO USA June22-24 2016 Ajay Gulati and Hakim Weatherspoon (Eds) USENIXAssociation 167ndash179 httpswwwusenixorgconferenceatc16technical-sessionspresentationvolckaert

[84] YanWang Harish Patil Cristiano Pereira Gregory Lueck Rajiv Guptaand Iulian Neamtiu 2014 DrDebug Deterministic Replay based CyclicDebugging with Dynamic Slicing In 12th Annual IEEEACM Interna-tional Symposium on Code Generation and Optimization CGO 2014Orlando FL USA February 15-19 2014 David R Kaeli and TippMoseley(Eds) ACM 98 httpsdoiorg10114525441372544152

[85] Ted Wobber and Peter Druschel (Eds) 2011 Proceedings of the 23rdACM Symposium on Operating Systems Principles 2011 SOSP 2011Cascais Portugal October 23-26 2011 ACM httpsdoiorg1011452043556

[86] Min Xu Rastislav Bodiacutek and Mark D Hill 2003 A Flight DataRecorder for Enabling Full-System Multiprocessor Deterministic

Replay In 30th International Symposium on Computer Architecture(ISCA 2003) 9-11 June 2003 San Diego California USA Allan Got-tlieb and Kai Li (Eds) IEEE Computer Society 122ndash133 httpsdoiorg101109ISCA20031206994

[87] Jie Yu Satish Narayanasamy Cristiano Pereira and Gilles Pokam 2012Maple a coverage-driven testing tool for multithreaded programs InProceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications OOPSLA2012 part of SPLASH 2012 Tucson AZ USA October 21-25 2012 Gary TLeavens and Matthew B Dwyer (Eds) ACM 485ndash502 httpsdoiorg10114523846162384651

[88] Zandronum 2015 Zandronum bug tracker httpszandronumcomtrackerviewphpid=2380 Bug 0002380

[89] Naling Zhang Markus Kusano and ChaoWang 2015 Dynamic partialorder reduction for relaxed memory models See [35] 250ndash259 httpsdoiorg10114527379242737956

593

  • Abstract
  • 1 Introduction
  • 2 Background
  • 3 Scheduling
    • 31 Protocol Details
    • 32 Special Cases
    • 33 Liveness
      • 4 Record and Replay
        • 41 Motivating Example
        • 42 Interleaving
        • 43 Signals
        • 44 System Calls
        • 45 Asynchronous Events
          • 5 Evaluation
            • 51 CDSchecker Litmus Tests
            • 52 httpd
            • 53 PARSEC and pbzip
            • 54 SDL-based Games
            • 55 Limitations SQLite and SpiderMonkey
              • 6 Related Work
              • 7 Conclusion
              • Acknowledgments
              • References
Page 11: Sparse Record and Replay with Controlled Schedulingafd/homepages/papers/pdfs/2019/PLDI.pdf · Controlled concurrency testing has proven successful in find-ing subtle bugs in concurrent

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Table 2 Comparison of throughput and race rate between native tsan11 rr and tsan11rec for Apachersquos httpd Results areaveraged over 10 runs ldquoThroughputrdquo shows mean throughput in queries per second ldquoRaterdquo is the number of races detectedper run (where relevant) Standard deviations are shown (in parentheses) Overhead is computed relative to native throughput

Race reports No reportsSetup Throughput Overhead Rate Throughput Overhead

Native NA NA NA 28895 (462256) 1timesrr NA NA NA 475 (608) 61timestsan11 3687 (29428) 8times 113 (1985) 9824 (143201) 3timestsan11 + rr 86 (6320) 336times 34 (1680) 181 (4637) 160timesrnd 141 (892) 205times 162 (3878) 367 (3326) 79timesqueue 818 (31033) 35times 381 (7362) 3261 (84367) 9timesrnd + rec 142 (1185) 203times 155 (3103) 326 (3894) 89timesqueue + rec 513 (8534) 56times 360 (6428) 1387 (24961) 21times

In contrast the queue strategy allows threads to perform visi-ble operations largely on demand Turning to the results withrace reporting enabled we see that the queue strategy hasthe highest race detection rate improving on uncontrolledtsan11 All other race detecting configurations lower the racedetection rate we believe this is because rr and rnd reducethe number of queries being responded to concurrently

Comparing demo file sizes when recording is enabled thetsan11rec demo files are around 48MB for both strategiesdropping to 48MB when only 1000 queries are issued sug-gesting that demo file size increases linearly with the numberof requests at a rate of around 48KB per request This couldbe reduced further with a more aggressive compression strat-egy but would likely increase the time overhead The demofile for rr is significantly smaller 66MB for 10000 querieswhich goes down to 39MB with 1000 queries implying arate of around 03KB per request plus a constant 36MB

53 PARSEC and pbzipOverview Wenext turn to the PARSEC benchmark suite [11]and pbzip application [68] both widely used for evaluatingconcurrency analysis tools For PARSEC we consider thebenchmarks used to evaluate iReplayer [53] however threeof these would not work on our system (dedup and swaptionsdo not compile and canneal crashes)

Experimental setup Each PARSEC (version 30) bench-mark was run with the lsquosimlargersquo test size shipped with thebenchmarks using 4 threads Pbzip (version 2-1113) wasused to compress a 400MB file with 4 threads We ran eachbenchmark 10 times per tool configuration and report av-erage runtimes We report on standard deviation and againremark on the CV A small number of races were discoveredfor some benchmarks and the race detecting tools largelyagreed on the number of races we do not detail these further

Results and discussion Table 3 shows the average timetaken to run each benchmark with each tool configurationwith standard deviation Variance as measured by CV is

reasonably low (CV is always below 1) For the tsan11rec re-sults + rec indicates whether recording was enabled Table 4is computed from the data of Table 3 and reports the over-head associated with running using each tool configurationcompared with native executionWith the exception of bodytrack and fluidanimate the

overhead tsan11rec brings over that of tsan11 is small and forall benchmarks whether recording is enabled or notmakes lit-tle difference However the overhead associated with tsan11+ rr (ie running tsan11-instrumented code under rr) is signif-icant compared with the tsan11 overhead alone despite thefact that running under rr without race detection is generallyefficient Interestingly rr without race detection performsless well on blackscholes compared with the tsan11rec con-figurations Digging into this we found that the benchmarkdistributes work between threads at the start of executionand then lets threads runwith little interaction This high par-allelismlow communication execution plays to the strengthsof tsan11rec where invisible operations are left to run in par-allel but is bad for rr which forces sequentialization acrossall operations

54 SDL-based GamesOverview Simple DirectMedia Layer (SDL) is a library thatconsolidates input graphics and various other forms of IOunder a single interface [75] and is typically used for gamesOnUbuntu 1604 SDL communicateswith X11 for IO pulseau-dio for sound andOpenGL for displayWe investigated recordand replay for two SDL-based games Zandronum [73] amulti-player Doom port (asymp400kLOC) and QuakeSpasm [71](asymp88kLOC) a port of Quake While these games support cus-tom record and replay by logging high level commands byworking at the threading and system call level tsan11rec canfacilitate record and replay of bugs that rely on low-levelinteractions to manifest We discuss below successful recordand replay of a bug in Zandronum that arises due to commu-nication of game data between the game client and serverwhich is not present in the gamersquos native replay

586

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 3 Execution times (s) for tool configurations across the pbzip and PARSEC benchmarks averaged across 10 runsStandard deviation is shown (in parentheses)

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 92 (031) 117 (049) 664 (311) 772 (187) 181 (030) 123 (066) 182 (030) 129 (129)blackscholes 04 (003) 08 (007) 10 (001) 20 (000) 07 (007) 07 (006) 07 (007) 07 (007)fluidanimate 08 (004) 160 (127) 21 (001) 385 (049) 464 (339) 390 (267) 504 (209) 398 (196)streamcluster 17 (019) 388 (441) 1133 (313) 1811 (179) 1030 (059) 485 (269) 1028 (018) 416 (261)bodytrack 05 (002) 72 (036) 38 (094) 327 (094) 500 (040) 73 (033) 500 (078) 78 (023)ferret 12 (007) 140 (086) 87 (044) 815 (220) 164 (066) 146 (060) 167 (071) 145 (037)

Table 4Overhead compared with native execution for tool configurations across the pbzip and PARSEC benchmarks computedfrom the results of Table 3

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 10times 13times 72times 84times 20times 13times 20times 14timesblackscholes 10times 20times 27times 53times 19times 19times 19times 18timesfluidanimate 10times 203times 27times 489times 590times 497times 642times 506timesstreamcluster 10times 224times 654times 1045times 595times 280times 593times 240timesbodytrack 10times 135times 72times 614times 938times 138times 939times 147timesferret 10times 119times 74times 695times 140times 125times 143times 124times

Our initial attempts to replay these SDL-based gamesfailed due to communication between the application andthe closed and proprietary NVIDIA OpenGL module on ourexperimental platform via ioctl syscallsWeworked aroundthis by ignoring ioctl during recording and letting it runnatively during replay This works because communicationwith the display driver has no effect on the game logic Dis-play interaction led to further problems with initialization ofthe input module We resorted to adapting the scheduler tolet the application run uninstrumented until SDL module ini-tialization had completed adding a custom scheduler hookto allow the application and scheduler to synchronize relatedto this These problems are not specific to our approach ortoolmdashindeed rr cannot handle these SDL-based games forsimilar reasonsmdashbut are rather a fundamental limitation ofrecording and replaying applications that make heavy useof IO To handle such applications one either needs to fullymock out IO components requiring a tremendous engi-neering effort or carefully determine those components thatshould not be instrumented and specify this via annotationsWith these workarounds we were able to handle both

games such that gameplay is displayed on screen during re-play gameplay would not be visible if the IO subsystem hadbeen mocked out and visibility might be useful in debuggingproblems that manifest as visual artifacts

Experimental setup Measuring game performance in away that allows us to compare the overhead of our sched-uling strategies is non-trivial The only metric we have isthe frame-rate (fps)mdashthe number of frames drawn to thescreen per second QuakeSpasm and Zandronum are capped

at 60 fps and will try to maintain this frame-rate dipping ifthey cannot keep up If the frame-rate is reduced too muchthe games become unplayable As a best-effort evaluationmechanism we report on whether the games are playableunder various tool combinations Additionally we found thatit was possible to remove the frame cap for Quakespasmso we report ball-park figures for the overheads of varioustool configurations when playing this game un-capped (Wecould not find a way to reliably remove the frame cap forZandronum) As mentioned previously we do not comparewith rr as it cannot record or replay the games We used Zan-dronum revision 10013dd3c3b57023f updated to use SDL2QuakeSpasm version 0930 and SDL version 205

Results anddiscussion With the random tsan11rec sched-uler Zandronum was unplayable even with recording dis-abled the frame-rate dropped to below 1 fps This is due tothe random scheduler starving the main thread by frequentlyscheduling other less critical threads (eg the audio thread)In contrast the queue scheduler could maintain the full 60fps with recording enabled for 100 seconds of play the demosize grew to just under 8MB of which 65MB was for syscalls

To test tsan11recrsquos ability to replay network communica-tion we found a previously fixed Zandronum bug [88] thatrelies on an error in this communication to manifest Thisbug involves incorrect game state information being sentfrom the server to the client during a map change We repli-cated the bug with a server and two clients one of whichwas recording After about 12 minutes the bug appearedand resulted in a demo file of 43MB We then replayed thedemo and the bug appeared as expected This demonstrates

587

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

that our tool can be used to accurately capture and facilitatereplay of bugs in large networked applicationsFor QuakeSpasm we found that it was possible to play

the game without dropping below 60 fps using tsan11 and alltsan11rec configurations To further investigate the overheadof each tool configuration on this case study we removedthe fps cap then played the game 5 times per tool configu-ration for 90 seconds per play enabling a mode where thegamersquos fps is periodically appended to a file We made a besteffort to play the game in a similar manner on each run butinevitably there will still be high variation in game activitybetween plays Indicative results are shown in Table 5 Thernd and queue configurations refer to tsan11rec with the ran-dom and queue strategies respectively and with recordingdisabled while the ldquo+ recrdquo tool configurations are similarbut with recording enabled The ldquoOverheadrdquo column showsthe overhead observed compared with native execution Thetake-away from these results is that the instrumentationoverhead for both tsan11 and tsan11rec is surprisingly mod-est (generally less than 2times) and that the additional overheadassociated with enabling recording in tsan11rec is low

55 Limitations SQLite and SpiderMonkeyA downside of our sparse approach to record and replay isthat different applications may have incompatible require-ments regarding what should be recorded and what mustnot be recorded For example recording memory layout andattempting to enforce the same layout on replay would notonly slow down the SDL games (see sect54) to the point ofbeing unplayable but would also cause problems related tocommunication with the display driver Yet the behaviour ofsome programs will depend on the memory layout such asiterating over an ordered C++ container that holds pointersIn particular we experimented with applying tsan11rec

to the SQLite database management library [76] and to Spi-derMonkey Firefoxrsquos JavaScript management engine [60]While tsan11rec was applicable for controlled schedulingof these applications we found that replay would rapidlydesynchronise due to memory layout nondeterminism caus-ing conditionals that rely on the values of pointers to evaluatedifferently during replay Tools such as rr can handle theseprograms reliably by enforcing the same memory layoutThis is a trade-off the non-sparse approach of rr can leadto higher overheads as demonstrated in sect52 and sect53 Analternative to adapting the record-and-replay tool so thatit always enforces memory layout determinism would beto adapt the application of interest so that default memoryallocation is replaced with a deterministic memory allocator

6 Related WorkControlled scheduling A large amount of work has goneinto the use of scheduling strategies as a form of state spaceexploration (eg [29 30 61 62 78 87]) and on techniques

aimed at reducing the size of the state space such as dynamicpartial-order reduction [31 89]) A particularly notable con-trolled scheduling tool in terms of successful practical ap-plication is Microsoftrsquos CHESS [62] which aims to system-atically explore all interleavings of a test scenario Similarto our approach each visible instruction has an associatedcustom wrapper that intercepts the real instruction callinginto the CHESS schedulerSchedule bounding techniques notably preemption- and

delay-bounding [26 61] have been shown to be successful inprioritising the order in which thread schedules are exploredduring controlled concurrency testing They prioritise ex-ploring schedules that exhibit small numbers of preemptionsbetween threads in line with empirical evidence that bugsrarely require large numbers of preemptions in order to man-ifest [56] Combining such techniques with the tsan11recalgorithm is an appealing idea in principle but is hinderedby the assumption that the program under test takes a fixedinput and that the scheduler is the only source of nonde-terminism This assumption allows running the programagain and again trying different schedules In the contextof tsan11rec which can be used to record and replay ap-plications where the environment presents other forms ofnondeterminism the manner in which the program interactswith its environment is captured with respect to a particularthread schedule and other thread schedules might involvecompletely different environmental interactions We believea more promising approach would be to bring ideas from theprobabilistic concurrency testing (PCT) algorithm [12] to thetsan11rec setting to introduce a degree of skewing to ourrandom strategy so that it explores more diverse schedules

Record and replay Record and replay has been a signifi-cant area of research with many tools being created to facil-itate it [2 9 19 24 32 36 39 40 43ndash45 47 50 53 54 57 5765ndash67 72 79 81] The general premise behind them is simi-lar identify order nondeterminism and input nondeterminismand create techniques to capture them while recording andcontrol them during replayVarious tools extend the OS in some way or require spe-

cific hardware [2 6 9 19 20 24 47 50 79 81] This has thebenefit of giving the tool access to much more of the systemsuch as memory pages and process information For exam-ple Scribe [47] will directly modify the system schedulerinstead of coercing it and achieves slowdowns as low as105times However this severely hits the usability of the toolas it requires the user to deploy a modified OS

Other tools reside entirely in user space [10 15 32 34 43ndash45 49 51 53 54 57 66 67 72 80 84] and trade performancefor usability This is the category that tsan11rec falls intoEase of use is particularly important in persuading users toadopt the tool rr [66] in particular allows the user to recorda program by simply passing the binary to rr as a parameter

588

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 5 Indicative frames per second (fps) result captured by playing Quakespasm for 90 seconds five times per toolconfiguration and capturing the fps reports recorded by the game Frame-rate results are reported for the minimum maximummedian and mean case as well as the 25th and 75th quartiles

Min 25th Median 75th Max Mean OverheadNative 291 369 400 428 502 4004 10timestsan11 125 193 225 275 431 2338 17timesrnd 128 205 238 281 421 2471 16timesqueue 84 188 233 273 435 2333 17timesrnd + rec 113 178 212 253 366 2168 18timesqueue + rec 86 161 185 227 348 1930 21times

and as such has become the definitive tool for record and re-play We have performed an extensive comparison with rr insect5 and note that while rr outclasses tsan11rec in some appli-cations and can handle applications that are out of scope fortsan11rec (see sect55) rr shows significantly higher overheadcompared with tsan11rec for a number of applications thatrely on a high degree of parallelism for performance Furtherour sparse approach with suitable workarounds enablesrecord and replay for graphical applications (the SDL-basedgames of sect54) that rr cannot currently handleBecause it builds on tsan11 which itself uses compiler

instrumentation and a modified libcxx tsan11rec shares sim-ilarities with tools that depend on language implementationor library-level support [1 13 13 18 36 58] Notable exam-ples here include R2 [36] and IntelliTrace [58]Whole system replay aims to record all system nonde-

terminism [14 19 22 24 25 27 53 77] Among these therecent iReplayer tool [53] performs record and replay in-situavoiding many of the problems (eg memory layout issues)that otherwise come from running the record and replayexecutions under different processes

Some tools focus on the order-nondeterminism allowingthem to retain their parallelism and thus reducing the over-head of multi-threaded application [25 42 59 63 70 86]Castor [57] will provide each thread with its own bufferfor storing information and serialize them at a later timetsan11rec also fits into this category as it will both preserverparallelism of invisible operations and apply a schedulingstrategy to resolve this nondeterminism

An alternative to recording a programrsquos nondeterminismis to remove it making some or all aspects of the programdeterministic [5 8 16 17 21 55] For example Dthreads [55]ensures that memory accesses are deterministic on eachexecution Such approaches can have a significant probeproblem by removing the behaviour necessary for certainbugs to manifest in return for avoiding the performanceoverhead associated with handling order-nondeterminism

Multi-version execution Multi-version (or multi-variant)execution (MVE) is a method for concurrently running mul-tiple processes that are expected to behave in a semanticallysimilar manner [41 46 69 82 83] MVE can be used to detect

security vulnerabilities in applications if a variant divergesthis could indicate that an attacker has modified the processin some way [46 82 83] It can also be used for running dif-ferent analyses on identical processes that would not workwhen run together on the same process such as the clangsanitizers [69] Most MVE systems hinge on a special moni-tor thread that controls the generation an maintenance ofa number of variants Keeping the variants in sync with re-spect to nondeterministic behaviours presents many of thesame problems that are associated with record and replay

7 ConclusionWe have presented tsan11rec which brings together con-trolled scheduling record and replay and dynamic data racedetection for the dynamic analysis of CC++11 applicationsOur experimental evaluation demonstrates that the tool isin many cases competitive in terms of performance withrr a state-of-the-art record and replay tool in some casesout-performing rr due to tsan11recrsquos ability to preserve par-allelism in applications under test to a high degree We havealso shown that our tool is capable of recording and replayingSDL-based video games by exploiting our sparse approachto avoid recording aspects of gamedisplay communicationthat are fundamentally hard to control The flip side of oursparse approach is that by limiting what is recorded ourtool desynchronises on uncontrolled forms of nondetermin-ism such as that related to memory layout by capturingthis form of nondeterminism tools such as rr do not sufferfrom this problem Two exciting avenues for future workinclude investigating a spectrum of recording granularitiesto bridge the gap between our sparse approach and stricterapproaches in a configurable manner and to investigatebug detection using a richer range of scheduling strategiesincluding schedule bounding [26 61 62] and probabilisticconcurrency testing [12]

AcknowledgmentsThis work was supported by a PhD studentship funded byGCHQ and EPSRC projects EPN0263141 and EPR0068651

589

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

References[1] Bowen Alpern Jong-Deok Choi Ton Ngo Manu Sridharan and

John M Vlissides 2001 A Perturbation-Free Replay Platform forCross-Optimized Multithreaded Applications In Proceedings of the15th International Parallel amp Distributed Processing Symposium (IPDPS-01) San Francisco CA USA April 23-27 2001 IEEE Computer Society23 httpsdoiorg101109IPDPS2001924957

[2] GautamAltekar and Ion Stoica 2009 ODR output-deterministic replayfor multicore debugging In Proceedings of the 22nd ACM Symposiumon Operating Systems Principles 2009 SOSP 2009 Big Sky Montana USAOctober 11-14 2009 Jeanna Neefe Matthews and Thomas E Anderson(Eds) ACM 193ndash206 httpsdoiorg10114516295751629594

[3] Apache Software Foundation 2018 Apache httpd httpshttpdapacheorgdevdevnoteshtml

[4] Remzi H Arpaci-Dusseau and Brad Chen (Eds) 2010 9th USENIXSymposium on Operating Systems Design and Implementation OSDI2010 October 4-6 2010 Vancouver BC Canada Proceedings USENIXAssociation httpwwwusenixorgeventosdi10techfull_papersosdi10_proceedingspdf

[5] Amittai Aviram Shu-Chun Weng Sen Hu and Bryan Ford 2010 Ef-ficient System-Enforced Deterministic Parallelism See [4] 193ndash206httpwwwusenixorgeventsosdi10techfull_papersAvirampdf

[6] David F Bacon and Seth Copen Goldstein 1991 Hardware-assisted Re-play of Multiprocessor Programs In Proceedings of the 1991 ACMONRWorkshop on Parallel and Distributed Debugging (PADD rsquo91) ACMNew York NY USA 194ndash206 httpsdoiorg101145122759122777

[7] Mark Batty Scott Owens Susmit Sarkar Peter Sewell and TjarkWeber2011 Mathematizing C++ concurrency In POPL 55ndash66

[8] Tom Bergan Owen Anderson Joseph Devietti Luis Ceze and DanGrossman 2010 CoreDet a compiler and runtime system for deter-ministic multithreaded execution See [38] 53ndash64 httpsdoiorg10114517360201736029

[9] Tom Bergan Nicholas Hunt Luis Ceze and Steven D Gribble 2010Deterministic Process Groups in dOS See [4] 177ndash191 httpwwwusenixorgeventsosdi10techfull_papersBerganpdf

[10] Sanjay Bhansali Wen-Ke Chen Stuart de Jong Andrew EdwardsRon Murray Milenko Drinić Darek Mihočka and Joe Chau 2006Framework for Instruction-level Tracing and Analysis of Program Ex-ecutions In Proceedings of the 2Nd International Conference on VirtualExecution Environments (VEE rsquo06) ACM New York NY USA 154ndash163httpsdoiorg10114511347601220164

[11] Christian Bienia Sanjeev Kumar Jaswinder Pal Singh and Kai Li 2008The PARSEC benchmark suite characterization and architectural im-plications In 17th International Conference on Parallel Architecture andCompilation Techniques PACT 2008 Toronto Ontario Canada October25-29 2008 Andreas Moshovos David Tarditi and Kunle Olukotun(Eds) ACM 72ndash81 httpsdoiorg10114514541151454128

[12] Sebastian Burckhardt Pravesh Kothari Madanlal Musuvathi and San-tosh Nagarakatte 2010 A randomized scheduler with probabilisticguarantees of finding bugs See [38] 167ndash178 httpsdoiorg10114517360201736040

[13] Brian Burg Richard Bailey Andrew J Ko and Michael D Ernst 2013Interactive recordreplay for web application debugging In The 26thAnnual ACM Symposium on User Interface Software and TechnologyUISTrsquo13 St Andrews United Kingdom October 8-11 2013 ShahramIzadi Aaron J Quigley Ivan Poupyrev and Takeo Igarashi (Eds) ACM473ndash484 httpsdoiorg10114525019882502050

[14] Anton Burtsev David Johnson Mike Hibler Eric Eide and John Regehr2016 Abstractions for Practical Virtual Machine Replay In Proceedingsof the 12th ACM SIGPLANSIGOPS International Conference on VirtualExecution Environments Atlanta GA USA April 2-3 2016 VishakhaGupta-Cledat Donald E Porter and Vivek Sarkar (Eds) ACM 93ndash106httpsdoiorg10114528922422892257

[15] M E Chastain 1999 MEC (January 1999) httpslwnnet19990121amechtml

[16] Heming Cui Jiriacute Simsa Yi-Hong Lin Hao Li Ben Blum Xinan XuJunfeng Yang Garth A Gibson and Randal E Bryant 2013 Parrota practical runtime for deterministic stable and reliable threads InACM SIGOPS 24th Symposium on Operating Systems Principles SOSP rsquo13Farmington PA USA November 3-6 2013 Michael Kaminsky and MikeDahlin (Eds) ACM 388ndash405 httpsdoiorg10114525173492522735

[17] Heming Cui Jingyue Wu John Gallagher Huayang Guo and JunfengYang 2011 Efficient deterministic multithreading through schedulerelaxation See [85] 337ndash351 httpsdoiorg10114520435562043588

[18] P Deva 2018 Chronon (2018) httpchrononsystemscomblogdesign-and-architecture-ofthe-chronon-record-0

[19] David Devecsery Michael Chow Xianzheng Dou Jason Flinn andPeter M Chen 2014 Eidetic Systems In 11th USENIX Symposium onOperating Systems Design and Implementation OSDI rsquo14 BroomfieldCO USA October 6-8 2014 Jason Flinn and Hank Levy (Eds) USENIXAssociation 525ndash540 httpswwwusenixorgconferenceosdi14technical-sessionspresentationdevecsery

[20] Joseph Devietti Brandon Lucia Luis Ceze andMark Oskin 2009 DMPdeterministic shared memory multiprocessing In Proceedings of the14th International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2009 Washington DC USAMarch 7-11 2009 Mary Lou Soffa and Mary Jane Irwin (Eds) ACM85ndash96 httpsdoiorg10114515082441508255

[21] Joseph Devietti Jacob Nelson Tom Bergan Luis Ceze and Dan Gross-man 2011 RCDC a relaxed consistency deterministic computer See[37] 67ndash78 httpsdoiorg10114519503651950376

[22] Brendan Dolan-Gavitt Josh Hodosh Patrick Hulin Tim Leek andRyan Whelan 2015 Repeatable Reverse Engineering with PANDAIn Proceedings of the 5th Program Protection and Reverse EngineeringWorkshop (PPREW-5) ACM New York NY USA Article 4 11 pageshttpsdoiorg10114528438592843867

[23] Richard Draves and Robbert van Renesse (Eds) 2008 8th USENIX Sym-posium on Operating Systems Design and Implementation OSDI 2008December 8-10 2008 San Diego California USA Proceedings USENIXAssociation httpswwwusenixorgpublicationsproceedingsf[0]=im_group_audience3A73

[24] George W Dunlap Samuel T King Sukru Cinar Murtaza A Bas-rai and Peter M Chen 2002 ReVirt Enabling Intrusion AnalysisThrough Virtual-Machine Logging and Replay In 5th Symposium onOperating System Design and Implementation (OSDI 2002) Boston Mas-sachusetts USA December 9-11 2002 David E Culler and Peter Dr-uschel (Eds) USENIX Association httpwwwusenixorgeventsosdi02techdunlaphtml

[25] George W Dunlap Dominic G Lucchetti Michael A Fetterman andPeter M Chen 2008 Execution replay of multiprocessor virtual ma-chines In Proceedings of the 4th International Conference on VirtualExecution Environments VEE 2008 Seattle WA USA March 5-7 2008David Gregg Vikram S Adve and Brian N Bershad (Eds) ACM121ndash130 httpsdoiorg10114513462561346273

[26] Michael Emmi Shaz Qadeer and Zvonimir Rakamaric 2011 Delay-bounded scheduling In Proceedings of the 38th ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages POPL 2011 AustinTX USA January 26-28 2011 Thomas Ball and Mooly Sagiv (Eds)ACM 411ndash422 httpsdoiorg10114519263851926432

[27] Jakob Engblom Daniel Aarno and Bengt Werner 2010 Full-SystemSimulation from Embedded to High-Performance Systems Springer USBoston MA 25ndash45 httpsdoiorg101007978-1-4419-6175-4_3

[28] Cormac Flanagan and Stephen N Freund 2009 FastTrack efficientand precise dynamic race detection In Proceedings of the 2009 ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI 2009 Dublin Ireland June 15-21 2009 Michael Hind andAmer Diwan (Eds) ACM 121ndash133 httpsdoiorg1011451542476

590

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

1542490[29] Cormac Flanagan and Stephen N Freund 2010 Adversarial memory

for detecting destructive races In Proceedings of the 2010 ACM SIGPLANConference on Programming Language Design and Implementation PLDI2010 Toronto Ontario Canada June 5-10 2010 Benjamin G Zornand Alexander Aiken (Eds) ACM 244ndash254 httpsdoiorg10114518065961806625

[30] Cormac Flanagan and Stephen N Freund 2010 The RoadRunnerdynamic analysis framework for concurrent programs In Proceedingsof the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis forSoftware Tools and Engineering PASTErsquo10 Toronto Ontario CanadaJune 5-6 2010 Sorin Lerner and Atanas Rountev (Eds) ACM 1ndash8httpsdoiorg10114518066721806674

[31] Cormac Flanagan and Patrice Godefroid 2005 Dynamic partial-orderreduction for model checking software In Proceedings of the 32nd ACMSIGPLAN-SIGACT Symposium on Principles of Programming LanguagesPOPL 2005 Long Beach California USA January 12-14 2005 JensPalsberg and Martiacuten Abadi (Eds) ACM 110ndash121 httpsdoiorg10114510403051040315

[32] Dennis Geels Gautam Altekar Scott Shenker and Ion Stoica 2006Replay Debugging for Distributed Applications (Awarded Best Paper)In Proceedings of the 2006 USENIX Annual Technical Conference BostonMA USA May 30 - June 3 2006 Atul Adya and Erich M Nahum (Eds)USENIX 289ndash300 httpwwwusenixorgeventsusenix06techgeelshtml

[33] Patrice Godefroid 2005 Software Model Checking The VeriSoftApproach Formal Methods in System Design 26 2 (2005) 77ndash101httpsdoiorg101007s10703-005-1489-x

[34] C Gottbrath 2008 Reverse debugging with the TotalView debugger(May 2008)

[35] David Grove and Steve Blackburn (Eds) 2015 Proceedings of the36th ACM SIGPLAN Conference on Programming Language Design andImplementation Portland OR USA June 15-17 2015 ACM httpdlacmorgcitationcfmid=2737924

[36] Zhenyu Guo Xi Wang Jian Tang Xuezheng Liu Zhilei Xu Ming WuM Frans Kaashoek and Zheng Zhang 2008 R2 An Application-LevelKernel for Record and Replay See [23] 193ndash208 httpwwwusenixorgeventsosdi08techfull_papersguoguopdf

[37] Rajiv Gupta and Todd C Mowry (Eds) 2011 Proceedings of the 16thInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2011 Newport Beach CAUSA March 5-11 2011 ACM

[38] James C Hoe and Vikram S Adve (Eds) 2010 Proceedings of the 15thInternational Conference on Architectural Support for Programming Lan-guages and Operating Systems ASPLOS 2010 Pittsburgh PennsylvaniaUSA March 13-17 2010 ACM

[39] Nima Honarmand and Josep Torrellas 2014 RelaxReplay recordand replay for relaxed-consistency multiprocessors In ArchitecturalSupport for Programming Languages and Operating Systems ASPLOSrsquo14 Salt Lake City UT USA March 1-5 2014 Rajeev BalasubramonianAl Davis and Sarita V Adve (Eds) ACM 223ndash238 httpsdoiorg10114525419402541979

[40] Nima Honarmand and Josep Torrellas 2014 Replay debugging Lever-aging record and replay for program debugging In ACMIEEE 41stInternational Symposium on Computer Architecture ISCA 2014 Min-neapolis MN USA June 14-18 2014 IEEE Computer Society 455ndash456httpsdoiorg101109ISCA20146853229

[41] Petr Hosek and Cristian Cadar 2015 VARAN the Unbelievable AnEfficient N-version Execution Framework In Proceedings of the Twenti-eth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS rsquo15 Istanbul Turkey March14-18 2015 Oumlzcan Oumlzturk Kemal Ebcioglu and Sandhya Dwarkadas(Eds) ACM 339ndash353 httpsdoiorg10114526943442694390

[42] Derek Hower and Mark D Hill 2008 Rerun Exploiting Episodes forLightweight Memory Race Recording In 35th International Symposiumon Computer Architecture (ISCA 2008) June 21-25 2008 Beijing China265ndash276 httpsdoiorg101109ISCA200826

[43] Jeff Huang Peng Liu and Charles Zhang 2010 LEAP lightweightdeterministic multi-processor replay of concurrent java programsIn Proceedings of the 18th ACM SIGSOFT International Symposium onFoundations of Software Engineering 2010 Santa Fe NM USA November7-11 2010 Gruia-Catalin Roman and Andreacute van der Hoek (Eds) ACM207ndash216 httpsdoiorg10114518822911882323

[44] Jeff Huang Charles Zhang and Julian Dolby 2013 CLAP recordinglocal executions to reproduce concurrency failures In ACM SIGPLANConference on Programming Language Design and Implementation PLDIrsquo13 Seattle WA USA June 16-19 2013 Hans-Juergen Boehm and Cor-mac Flanagan (Eds) ACM 141ndash152 httpsdoiorg10114524621562462167

[45] Shiyou Huang Bowen Cai and Jeff Huang 2017 Towards Production-Run Heisenbugs Reproduction on Commercial Hardware In 2017USENIX Annual Technical Conference USENIX ATC 2017 Santa ClaraCA USA July 12-14 2017 403ndash415 httpswwwusenixorgconferenceatc17technical-sessionspresentationhuang

[46] Koen Koning Herbert Bos and Cristiano Giuffrida 2016 Secure andEfficient Multi-Variant Execution Using Hardware-Assisted ProcessVirtualization In 46th Annual IEEEIFIP International Conference onDependable Systems and Networks DSN 2016 Toulouse France June28 - July 1 2016 IEEE Computer Society 431ndash442 httpsdoiorg101109DSN201646

[47] Oren Laadan Nicolas Viennot and Jason Nieh 2010 Transparentlightweight application execution replay on commodity multiproces-sor operating systems In SIGMETRICS 2010 Proceedings of the 2010ACM SIGMETRICS International Conference on Measurement and Mod-eling of Computer Systems New York New York USA 14-18 June 2010Vishal Misra Paul Barford and Mark S Squillante (Eds) ACM 155ndash166 httpsdoiorg10114518110391811057

[48] Leslie Lamport 1978 Time Clocks and the Ordering of Events ina Distributed System Commun ACM 21 7 (1978) 558ndash565 httpsdoiorg101145359545359563

[49] Dongyoon Lee Peter M Chen Jason Flinn and Satish Narayanasamy2012 Chimera hybrid program analysis for determinism In ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI rsquo12 Beijing China - June 11 - 16 2012 Jan Vitek Haibo Linand Frank Tip (Eds) ACM 463ndash474 httpsdoiorg10114522540642254119

[50] Dongyoon Lee Benjamin Wester Kaushik Veeraraghavan SatishNarayanasamy Peter M Chen and Jason Flinn 2010 Respec efficientonline multiprocessor replayvia speculation and external determinismSee [38] 77ndash90 httpsdoiorg10114517360201736031

[51] Kyu Hyung Lee Dohyeong Kim and Xiangyu Zhang 2014Infrastructure-Free Logging and Replay of Concurrent Executionon Multiple Cores In ECOOP 2014 - Object-Oriented Programming- 28th European Conference Uppsala Sweden July 28 - August 12014 Proceedings (Lecture Notes in Computer Science) Richard EJones (Ed) Vol 8586 Springer 232ndash256 httpsdoiorg101007978-3-662-44202-9_10

[52] Christopher Lidbury and Alastair F Donaldson 2017 Dynamic racedetection for C++11 In Proceedings of the 44th ACM SIGPLAN Sympo-sium on Principles of Programming Languages POPL 2017 Paris FranceJanuary 18-20 2017 Giuseppe Castagna and Andrew D Gordon (Eds)ACM 443ndash457 httpsdoiorg1011453009837

[53] Hongyu Liu Sam Silvestro Wei Wang Chen Tian and Tongping Liu2018 iReplayer in-situ and identical record-and-replay for multi-threaded applications In Proceedings of the 39th ACM SIGPLAN Confer-ence on Programming Language Design and Implementation PLDI 2018

591

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Philadelphia PA USA June 18-22 2018 Jeffrey S Foster and Dan Gross-man (Eds) ACM 344ndash358 httpsdoiorg10114531923663192380

[54] Peng Liu Xiangyu Zhang Omer Tripp and Yunhui Zheng 2015 Lightreplay via tightly bounded recording See [35] 55ndash64 httpsdoiorg10114527379242738001

[55] Tongping Liu Charlie Curtsinger and EmeryD Berger 2011 Dthreadsefficient deterministic multithreading See [85] 327ndash336 httpsdoiorg10114520435562043587

[56] Shan Lu Soyeon Park Eunsoo Seo and Yuanyuan Zhou 2008 Learn-ing from mistakes a comprehensive study on real world concurrencybug characteristics In Proceedings of the 13th International Confer-ence on Architectural Support for Programming Languages and Op-erating Systems ASPLOS 2008 Seattle WA USA March 1-5 2008Susan J Eggers and James R Larus (Eds) ACM 329ndash339 httpsdoiorg10114513462811346323

[57] Ali Joseacute Mashtizadeh Tal Garfinkel David Terei David Maziegraveresand Mendel Rosenblum 2017 Towards Practical Default-On Multi-Core RecordReplay In Proceedings of the Twenty-Second InternationalConference on Architectural Support for Programming Languages andOperating Systems ASPLOS 2017 Xirsquoan China April 8-12 2017 YunjiChen Olivier Temam and John Carter (Eds) ACM 693ndash708 httpsdoiorg10114530376973037751

[58] Microsoft 2018 Understanding IntelliTrace part I What the $ isIntelliTrace (2018) httpsblogsmsdnmicrosoftcomzainnab20130212understanding-intellitrace-part-i-what-the-is-intellitrace

[59] Pablo Montesinos Luis Ceze and Josep Torrellas 2008 DeLoreanRecording and Deterministically Replaying Shared-Memory Multi-processor Execution Effciently In 35th International Symposium onComputer Architecture (ISCA 2008) June 21-25 2008 Beijing China289ndash300 httpsdoiorg101109ISCA200836

[60] Mozilla 2018 SpiderMonkey httpsdevelopermozillaorgen-USdocsMozillaProjectsSpiderMonkey

[61] MadanlalMusuvathi and Shaz Qadeer 2007 Iterative context boundingfor systematic testing of multithreaded programs In Proceedings of theACM SIGPLAN 2007 Conference on Programming Language Design andImplementation San Diego California USA June 10-13 2007 JeanneFerrante and Kathryn S McKinley (Eds) ACM 446ndash455 httpsdoiorg10114512507341250785

[62] Madanlal Musuvathi Shaz Qadeer Thomas Ball Geacuterard Basler Pi-ramanayagam Arumuga Nainar and Iulian Neamtiu 2008 Find-ing and Reproducing Heisenbugs in Concurrent Programs See[23] 267ndash280 httpwwwusenixorgeventsosdi08techfull_papersmusuvathimusuvathipdf

[63] Satish Narayanasamy Gilles Pokam and Brad Calder 2005 BugNetContinuously Recording Program Execution for Deterministic ReplayDebugging In 32st International Symposium on Computer Architecture(ISCA 2005) 4-8 June 2005 Madison Wisconsin USA IEEE ComputerSociety 284ndash295 httpsdoiorg101109ISCA200516

[64] Brian Norris and Brian Demsky 2013 CDSchecker checking concur-rent data structures written with CC++ atomics In Proceedings ofthe 2013 ACM SIGPLAN International Conference on Object OrientedProgramming Systems Languages amp Applications OOPSLA 2013 part ofSPLASH 2013 Indianapolis IN USA October 26-31 2013 131ndash150

[65] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2016 Lightweight User-Space Record And ReplayCoRR abs161002144 (2016) arXiv161002144 httparxivorgabs161002144

[66] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2017 Engineering Record and Replay for Deploy-ability In 2017 USENIX Annual Technical Conference USENIX ATC 2017Santa Clara CA USA July 12-14 2017 377ndash389 httpswwwusenixorgconferenceatc17technical-sessionspresentationocallahan

[67] Harish Patil Cristiano Pereira Mack Stallcup Gregory Lueck andJames Cownie 2010 PinPlay a framework for deterministic replay

and reproducible analysis of parallel programs In Proceedings of theCGO 2010 The 8th International Symposium on Code Generation andOptimization Toronto Ontario Canada April 24-28 2010 AndreasMoshovos J Gregory Steffan Kim M Hazelwood and David R Kaeli(Eds) ACM 2ndash11 httpsdoiorg10114517729541772958

[68] pbzip2 development team 2018 pbzip2 httpslaunchpadnetpbzip2[69] Luiacutes Pina Anastasios Andronidis and Cristian Cadar 2018 FreeDA

deploying incompatible stock dynamic analyses in production viamulti-version execution In Proceedings of the 15th ACM InternationalConference on Computing Frontiers CF 2018 Ischia Italy May 08-102018 David R Kaeli and Miquel Pericagraves (Eds) ACM 1ndash10 httpsdoiorg10114532032173203237

[70] Gilles Pokam Klaus Danne Cristiano Pereira Rolf Kassa Tim KranichShiliang Hu Justin Emile Gottschlich Nima Honarmand Nathan Daut-enhahn Samuel T King and Josep Torrellas 2013 QuickRec proto-typing an intel architecture extension for record and replay of mul-tithreaded programs In The 40th Annual International Symposiumon Computer Architecture ISCArsquo13 Tel-Aviv Israel June 23-27 2013Avi Mendelson (Ed) ACM 643ndash654 httpsdoiorg10114524859222485977

[71] QuakeSpasm 2018 QuakeSpasm An engine for iD softwarersquos Quakehttpquakespasmsourceforgenet

[72] Yasushi Saito 2005 Jockey a user-space library for record-replaydebugging In Proceedings of the Sixth International Workshop on Auto-mated Debugging AADEBUG 2005 Monterey California USA Septem-ber 19-21 2005 Clinton Jeffery Jong-Deok Choi and Raimondas Lence-vicius (Eds) ACM 69ndash76 httpsdoiorg10114510851301085139

[73] Torr Samaho 2018 Zandronum httpszandronumcom[74] Konstantin Serebryany and Timur Iskhodzhanov 2009 ThreadSani-

tizer Data Race Detection in Practice In WBIA 62ndash71[75] Simple DirectMedia Layer 2018 SDL 20 library httpswwwlibsdl

orgdownload-20php[76] SQLite 2018 SQLite 3240 httpssqliteorgreleaselog3_24_0html[77] Deepa Srinivasan and Xuxian Jiang 2012 Time-Traveling Forensic

Analysis of VM-Based High-Interaction Honeypots In Security and Pri-vacy in Communication Networks Muttukrishnan Rajarajan Fred PiperHaining Wang and George Kesidis (Eds) Springer Berlin HeidelbergBerlin Heidelberg 209ndash226

[78] Paul Thomson Alastair F Donaldson and Adam Betts 2014 Concur-rency testing using schedule bounding an empirical study In ACMSIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPPoPP rsquo14 Orlando FL USA February 15-19 2014 Joseacute E Moreira andJames R Larus (Eds) ACM 15ndash28 httpsdoiorg10114525552432555260

[79] Joseph Tucek Shan Lu Chengdu Huang Spiros Xanthos andYuanyuan Zhou 2007 Triage diagnosing production run failuresat the userrsquos site In Proceedings of the 21st ACM Symposium on Oper-ating Systems Principles 2007 SOSP 2007 Stevenson Washington USAOctober 14-17 2007 Thomas C Bressoud and M Frans Kaashoek (Eds)ACM 131ndash144 httpsdoiorg10114512942611294275

[80] Undo 2018 Reversible debugging tools for CC++ on Linux amp Android(2018) httpundo-softwarecom

[81] Kaushik Veeraraghavan Dongyoon Lee Benjamin Wester JessicaOuyang Peter M Chen Jason Flinn and Satish Narayanasamy 2011DoublePlay parallelizing sequential logging and replay See [37] 15ndash26 httpsdoiorg10114519503651950370

[82] Stijn Volckaert Bart Coppens Bjorn De Sutter Koen De BosscherePer Larsen and Michael Franz 2017 Taming Parallelism in a Multi-Variant Execution Environment In Proceedings of the Twelfth EuropeanConference on Computer Systems EuroSys 2017 Belgrade Serbia April23-26 2017 Gustavo Alonso Ricardo Bianchini and Marko Vukolic(Eds) ACM 270ndash285 httpsdoiorg10114530641763064178

[83] Stijn Volckaert Bart Coppens Alexios Voulimeneas Andrei Home-scu Per Larsen Bjorn De Sutter and Michael Franz 2016 Secure

592

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

and Efficient Application Monitoring and Replication In 2016 USENIXAnnual Technical Conference USENIX ATC 2016 Denver CO USA June22-24 2016 Ajay Gulati and Hakim Weatherspoon (Eds) USENIXAssociation 167ndash179 httpswwwusenixorgconferenceatc16technical-sessionspresentationvolckaert

[84] YanWang Harish Patil Cristiano Pereira Gregory Lueck Rajiv Guptaand Iulian Neamtiu 2014 DrDebug Deterministic Replay based CyclicDebugging with Dynamic Slicing In 12th Annual IEEEACM Interna-tional Symposium on Code Generation and Optimization CGO 2014Orlando FL USA February 15-19 2014 David R Kaeli and TippMoseley(Eds) ACM 98 httpsdoiorg10114525441372544152

[85] Ted Wobber and Peter Druschel (Eds) 2011 Proceedings of the 23rdACM Symposium on Operating Systems Principles 2011 SOSP 2011Cascais Portugal October 23-26 2011 ACM httpsdoiorg1011452043556

[86] Min Xu Rastislav Bodiacutek and Mark D Hill 2003 A Flight DataRecorder for Enabling Full-System Multiprocessor Deterministic

Replay In 30th International Symposium on Computer Architecture(ISCA 2003) 9-11 June 2003 San Diego California USA Allan Got-tlieb and Kai Li (Eds) IEEE Computer Society 122ndash133 httpsdoiorg101109ISCA20031206994

[87] Jie Yu Satish Narayanasamy Cristiano Pereira and Gilles Pokam 2012Maple a coverage-driven testing tool for multithreaded programs InProceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications OOPSLA2012 part of SPLASH 2012 Tucson AZ USA October 21-25 2012 Gary TLeavens and Matthew B Dwyer (Eds) ACM 485ndash502 httpsdoiorg10114523846162384651

[88] Zandronum 2015 Zandronum bug tracker httpszandronumcomtrackerviewphpid=2380 Bug 0002380

[89] Naling Zhang Markus Kusano and ChaoWang 2015 Dynamic partialorder reduction for relaxed memory models See [35] 250ndash259 httpsdoiorg10114527379242737956

593

  • Abstract
  • 1 Introduction
  • 2 Background
  • 3 Scheduling
    • 31 Protocol Details
    • 32 Special Cases
    • 33 Liveness
      • 4 Record and Replay
        • 41 Motivating Example
        • 42 Interleaving
        • 43 Signals
        • 44 System Calls
        • 45 Asynchronous Events
          • 5 Evaluation
            • 51 CDSchecker Litmus Tests
            • 52 httpd
            • 53 PARSEC and pbzip
            • 54 SDL-based Games
            • 55 Limitations SQLite and SpiderMonkey
              • 6 Related Work
              • 7 Conclusion
              • Acknowledgments
              • References
Page 12: Sparse Record and Replay with Controlled Schedulingafd/homepages/papers/pdfs/2019/PLDI.pdf · Controlled concurrency testing has proven successful in find-ing subtle bugs in concurrent

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 3 Execution times (s) for tool configurations across the pbzip and PARSEC benchmarks averaged across 10 runsStandard deviation is shown (in parentheses)

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 92 (031) 117 (049) 664 (311) 772 (187) 181 (030) 123 (066) 182 (030) 129 (129)blackscholes 04 (003) 08 (007) 10 (001) 20 (000) 07 (007) 07 (006) 07 (007) 07 (007)fluidanimate 08 (004) 160 (127) 21 (001) 385 (049) 464 (339) 390 (267) 504 (209) 398 (196)streamcluster 17 (019) 388 (441) 1133 (313) 1811 (179) 1030 (059) 485 (269) 1028 (018) 416 (261)bodytrack 05 (002) 72 (036) 38 (094) 327 (094) 500 (040) 73 (033) 500 (078) 78 (023)ferret 12 (007) 140 (086) 87 (044) 815 (220) 164 (066) 146 (060) 167 (071) 145 (037)

Table 4Overhead compared with native execution for tool configurations across the pbzip and PARSEC benchmarks computedfrom the results of Table 3

tsan11recProgram native tsan11 rr tsan11 + rr rnd queue rnd + rec queue + recpbzip 10times 13times 72times 84times 20times 13times 20times 14timesblackscholes 10times 20times 27times 53times 19times 19times 19times 18timesfluidanimate 10times 203times 27times 489times 590times 497times 642times 506timesstreamcluster 10times 224times 654times 1045times 595times 280times 593times 240timesbodytrack 10times 135times 72times 614times 938times 138times 939times 147timesferret 10times 119times 74times 695times 140times 125times 143times 124times

Our initial attempts to replay these SDL-based gamesfailed due to communication between the application andthe closed and proprietary NVIDIA OpenGL module on ourexperimental platform via ioctl syscallsWeworked aroundthis by ignoring ioctl during recording and letting it runnatively during replay This works because communicationwith the display driver has no effect on the game logic Dis-play interaction led to further problems with initialization ofthe input module We resorted to adapting the scheduler tolet the application run uninstrumented until SDL module ini-tialization had completed adding a custom scheduler hookto allow the application and scheduler to synchronize relatedto this These problems are not specific to our approach ortoolmdashindeed rr cannot handle these SDL-based games forsimilar reasonsmdashbut are rather a fundamental limitation ofrecording and replaying applications that make heavy useof IO To handle such applications one either needs to fullymock out IO components requiring a tremendous engi-neering effort or carefully determine those components thatshould not be instrumented and specify this via annotationsWith these workarounds we were able to handle both

games such that gameplay is displayed on screen during re-play gameplay would not be visible if the IO subsystem hadbeen mocked out and visibility might be useful in debuggingproblems that manifest as visual artifacts

Experimental setup Measuring game performance in away that allows us to compare the overhead of our sched-uling strategies is non-trivial The only metric we have isthe frame-rate (fps)mdashthe number of frames drawn to thescreen per second QuakeSpasm and Zandronum are capped

at 60 fps and will try to maintain this frame-rate dipping ifthey cannot keep up If the frame-rate is reduced too muchthe games become unplayable As a best-effort evaluationmechanism we report on whether the games are playableunder various tool combinations Additionally we found thatit was possible to remove the frame cap for Quakespasmso we report ball-park figures for the overheads of varioustool configurations when playing this game un-capped (Wecould not find a way to reliably remove the frame cap forZandronum) As mentioned previously we do not comparewith rr as it cannot record or replay the games We used Zan-dronum revision 10013dd3c3b57023f updated to use SDL2QuakeSpasm version 0930 and SDL version 205

Results anddiscussion With the random tsan11rec sched-uler Zandronum was unplayable even with recording dis-abled the frame-rate dropped to below 1 fps This is due tothe random scheduler starving the main thread by frequentlyscheduling other less critical threads (eg the audio thread)In contrast the queue scheduler could maintain the full 60fps with recording enabled for 100 seconds of play the demosize grew to just under 8MB of which 65MB was for syscalls

To test tsan11recrsquos ability to replay network communica-tion we found a previously fixed Zandronum bug [88] thatrelies on an error in this communication to manifest Thisbug involves incorrect game state information being sentfrom the server to the client during a map change We repli-cated the bug with a server and two clients one of whichwas recording After about 12 minutes the bug appearedand resulted in a demo file of 43MB We then replayed thedemo and the bug appeared as expected This demonstrates

587

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

that our tool can be used to accurately capture and facilitatereplay of bugs in large networked applicationsFor QuakeSpasm we found that it was possible to play

the game without dropping below 60 fps using tsan11 and alltsan11rec configurations To further investigate the overheadof each tool configuration on this case study we removedthe fps cap then played the game 5 times per tool configu-ration for 90 seconds per play enabling a mode where thegamersquos fps is periodically appended to a file We made a besteffort to play the game in a similar manner on each run butinevitably there will still be high variation in game activitybetween plays Indicative results are shown in Table 5 Thernd and queue configurations refer to tsan11rec with the ran-dom and queue strategies respectively and with recordingdisabled while the ldquo+ recrdquo tool configurations are similarbut with recording enabled The ldquoOverheadrdquo column showsthe overhead observed compared with native execution Thetake-away from these results is that the instrumentationoverhead for both tsan11 and tsan11rec is surprisingly mod-est (generally less than 2times) and that the additional overheadassociated with enabling recording in tsan11rec is low

55 Limitations SQLite and SpiderMonkeyA downside of our sparse approach to record and replay isthat different applications may have incompatible require-ments regarding what should be recorded and what mustnot be recorded For example recording memory layout andattempting to enforce the same layout on replay would notonly slow down the SDL games (see sect54) to the point ofbeing unplayable but would also cause problems related tocommunication with the display driver Yet the behaviour ofsome programs will depend on the memory layout such asiterating over an ordered C++ container that holds pointersIn particular we experimented with applying tsan11rec

to the SQLite database management library [76] and to Spi-derMonkey Firefoxrsquos JavaScript management engine [60]While tsan11rec was applicable for controlled schedulingof these applications we found that replay would rapidlydesynchronise due to memory layout nondeterminism caus-ing conditionals that rely on the values of pointers to evaluatedifferently during replay Tools such as rr can handle theseprograms reliably by enforcing the same memory layoutThis is a trade-off the non-sparse approach of rr can leadto higher overheads as demonstrated in sect52 and sect53 Analternative to adapting the record-and-replay tool so thatit always enforces memory layout determinism would beto adapt the application of interest so that default memoryallocation is replaced with a deterministic memory allocator

6 Related WorkControlled scheduling A large amount of work has goneinto the use of scheduling strategies as a form of state spaceexploration (eg [29 30 61 62 78 87]) and on techniques

aimed at reducing the size of the state space such as dynamicpartial-order reduction [31 89]) A particularly notable con-trolled scheduling tool in terms of successful practical ap-plication is Microsoftrsquos CHESS [62] which aims to system-atically explore all interleavings of a test scenario Similarto our approach each visible instruction has an associatedcustom wrapper that intercepts the real instruction callinginto the CHESS schedulerSchedule bounding techniques notably preemption- and

delay-bounding [26 61] have been shown to be successful inprioritising the order in which thread schedules are exploredduring controlled concurrency testing They prioritise ex-ploring schedules that exhibit small numbers of preemptionsbetween threads in line with empirical evidence that bugsrarely require large numbers of preemptions in order to man-ifest [56] Combining such techniques with the tsan11recalgorithm is an appealing idea in principle but is hinderedby the assumption that the program under test takes a fixedinput and that the scheduler is the only source of nonde-terminism This assumption allows running the programagain and again trying different schedules In the contextof tsan11rec which can be used to record and replay ap-plications where the environment presents other forms ofnondeterminism the manner in which the program interactswith its environment is captured with respect to a particularthread schedule and other thread schedules might involvecompletely different environmental interactions We believea more promising approach would be to bring ideas from theprobabilistic concurrency testing (PCT) algorithm [12] to thetsan11rec setting to introduce a degree of skewing to ourrandom strategy so that it explores more diverse schedules

Record and replay Record and replay has been a signifi-cant area of research with many tools being created to facil-itate it [2 9 19 24 32 36 39 40 43ndash45 47 50 53 54 57 5765ndash67 72 79 81] The general premise behind them is simi-lar identify order nondeterminism and input nondeterminismand create techniques to capture them while recording andcontrol them during replayVarious tools extend the OS in some way or require spe-

cific hardware [2 6 9 19 20 24 47 50 79 81] This has thebenefit of giving the tool access to much more of the systemsuch as memory pages and process information For exam-ple Scribe [47] will directly modify the system schedulerinstead of coercing it and achieves slowdowns as low as105times However this severely hits the usability of the toolas it requires the user to deploy a modified OS

Other tools reside entirely in user space [10 15 32 34 43ndash45 49 51 53 54 57 66 67 72 80 84] and trade performancefor usability This is the category that tsan11rec falls intoEase of use is particularly important in persuading users toadopt the tool rr [66] in particular allows the user to recorda program by simply passing the binary to rr as a parameter

588

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 5 Indicative frames per second (fps) result captured by playing Quakespasm for 90 seconds five times per toolconfiguration and capturing the fps reports recorded by the game Frame-rate results are reported for the minimum maximummedian and mean case as well as the 25th and 75th quartiles

Min 25th Median 75th Max Mean OverheadNative 291 369 400 428 502 4004 10timestsan11 125 193 225 275 431 2338 17timesrnd 128 205 238 281 421 2471 16timesqueue 84 188 233 273 435 2333 17timesrnd + rec 113 178 212 253 366 2168 18timesqueue + rec 86 161 185 227 348 1930 21times

and as such has become the definitive tool for record and re-play We have performed an extensive comparison with rr insect5 and note that while rr outclasses tsan11rec in some appli-cations and can handle applications that are out of scope fortsan11rec (see sect55) rr shows significantly higher overheadcompared with tsan11rec for a number of applications thatrely on a high degree of parallelism for performance Furtherour sparse approach with suitable workarounds enablesrecord and replay for graphical applications (the SDL-basedgames of sect54) that rr cannot currently handleBecause it builds on tsan11 which itself uses compiler

instrumentation and a modified libcxx tsan11rec shares sim-ilarities with tools that depend on language implementationor library-level support [1 13 13 18 36 58] Notable exam-ples here include R2 [36] and IntelliTrace [58]Whole system replay aims to record all system nonde-

terminism [14 19 22 24 25 27 53 77] Among these therecent iReplayer tool [53] performs record and replay in-situavoiding many of the problems (eg memory layout issues)that otherwise come from running the record and replayexecutions under different processes

Some tools focus on the order-nondeterminism allowingthem to retain their parallelism and thus reducing the over-head of multi-threaded application [25 42 59 63 70 86]Castor [57] will provide each thread with its own bufferfor storing information and serialize them at a later timetsan11rec also fits into this category as it will both preserverparallelism of invisible operations and apply a schedulingstrategy to resolve this nondeterminism

An alternative to recording a programrsquos nondeterminismis to remove it making some or all aspects of the programdeterministic [5 8 16 17 21 55] For example Dthreads [55]ensures that memory accesses are deterministic on eachexecution Such approaches can have a significant probeproblem by removing the behaviour necessary for certainbugs to manifest in return for avoiding the performanceoverhead associated with handling order-nondeterminism

Multi-version execution Multi-version (or multi-variant)execution (MVE) is a method for concurrently running mul-tiple processes that are expected to behave in a semanticallysimilar manner [41 46 69 82 83] MVE can be used to detect

security vulnerabilities in applications if a variant divergesthis could indicate that an attacker has modified the processin some way [46 82 83] It can also be used for running dif-ferent analyses on identical processes that would not workwhen run together on the same process such as the clangsanitizers [69] Most MVE systems hinge on a special moni-tor thread that controls the generation an maintenance ofa number of variants Keeping the variants in sync with re-spect to nondeterministic behaviours presents many of thesame problems that are associated with record and replay

7 ConclusionWe have presented tsan11rec which brings together con-trolled scheduling record and replay and dynamic data racedetection for the dynamic analysis of CC++11 applicationsOur experimental evaluation demonstrates that the tool isin many cases competitive in terms of performance withrr a state-of-the-art record and replay tool in some casesout-performing rr due to tsan11recrsquos ability to preserve par-allelism in applications under test to a high degree We havealso shown that our tool is capable of recording and replayingSDL-based video games by exploiting our sparse approachto avoid recording aspects of gamedisplay communicationthat are fundamentally hard to control The flip side of oursparse approach is that by limiting what is recorded ourtool desynchronises on uncontrolled forms of nondetermin-ism such as that related to memory layout by capturingthis form of nondeterminism tools such as rr do not sufferfrom this problem Two exciting avenues for future workinclude investigating a spectrum of recording granularitiesto bridge the gap between our sparse approach and stricterapproaches in a configurable manner and to investigatebug detection using a richer range of scheduling strategiesincluding schedule bounding [26 61 62] and probabilisticconcurrency testing [12]

AcknowledgmentsThis work was supported by a PhD studentship funded byGCHQ and EPSRC projects EPN0263141 and EPR0068651

589

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

References[1] Bowen Alpern Jong-Deok Choi Ton Ngo Manu Sridharan and

John M Vlissides 2001 A Perturbation-Free Replay Platform forCross-Optimized Multithreaded Applications In Proceedings of the15th International Parallel amp Distributed Processing Symposium (IPDPS-01) San Francisco CA USA April 23-27 2001 IEEE Computer Society23 httpsdoiorg101109IPDPS2001924957

[2] GautamAltekar and Ion Stoica 2009 ODR output-deterministic replayfor multicore debugging In Proceedings of the 22nd ACM Symposiumon Operating Systems Principles 2009 SOSP 2009 Big Sky Montana USAOctober 11-14 2009 Jeanna Neefe Matthews and Thomas E Anderson(Eds) ACM 193ndash206 httpsdoiorg10114516295751629594

[3] Apache Software Foundation 2018 Apache httpd httpshttpdapacheorgdevdevnoteshtml

[4] Remzi H Arpaci-Dusseau and Brad Chen (Eds) 2010 9th USENIXSymposium on Operating Systems Design and Implementation OSDI2010 October 4-6 2010 Vancouver BC Canada Proceedings USENIXAssociation httpwwwusenixorgeventosdi10techfull_papersosdi10_proceedingspdf

[5] Amittai Aviram Shu-Chun Weng Sen Hu and Bryan Ford 2010 Ef-ficient System-Enforced Deterministic Parallelism See [4] 193ndash206httpwwwusenixorgeventsosdi10techfull_papersAvirampdf

[6] David F Bacon and Seth Copen Goldstein 1991 Hardware-assisted Re-play of Multiprocessor Programs In Proceedings of the 1991 ACMONRWorkshop on Parallel and Distributed Debugging (PADD rsquo91) ACMNew York NY USA 194ndash206 httpsdoiorg101145122759122777

[7] Mark Batty Scott Owens Susmit Sarkar Peter Sewell and TjarkWeber2011 Mathematizing C++ concurrency In POPL 55ndash66

[8] Tom Bergan Owen Anderson Joseph Devietti Luis Ceze and DanGrossman 2010 CoreDet a compiler and runtime system for deter-ministic multithreaded execution See [38] 53ndash64 httpsdoiorg10114517360201736029

[9] Tom Bergan Nicholas Hunt Luis Ceze and Steven D Gribble 2010Deterministic Process Groups in dOS See [4] 177ndash191 httpwwwusenixorgeventsosdi10techfull_papersBerganpdf

[10] Sanjay Bhansali Wen-Ke Chen Stuart de Jong Andrew EdwardsRon Murray Milenko Drinić Darek Mihočka and Joe Chau 2006Framework for Instruction-level Tracing and Analysis of Program Ex-ecutions In Proceedings of the 2Nd International Conference on VirtualExecution Environments (VEE rsquo06) ACM New York NY USA 154ndash163httpsdoiorg10114511347601220164

[11] Christian Bienia Sanjeev Kumar Jaswinder Pal Singh and Kai Li 2008The PARSEC benchmark suite characterization and architectural im-plications In 17th International Conference on Parallel Architecture andCompilation Techniques PACT 2008 Toronto Ontario Canada October25-29 2008 Andreas Moshovos David Tarditi and Kunle Olukotun(Eds) ACM 72ndash81 httpsdoiorg10114514541151454128

[12] Sebastian Burckhardt Pravesh Kothari Madanlal Musuvathi and San-tosh Nagarakatte 2010 A randomized scheduler with probabilisticguarantees of finding bugs See [38] 167ndash178 httpsdoiorg10114517360201736040

[13] Brian Burg Richard Bailey Andrew J Ko and Michael D Ernst 2013Interactive recordreplay for web application debugging In The 26thAnnual ACM Symposium on User Interface Software and TechnologyUISTrsquo13 St Andrews United Kingdom October 8-11 2013 ShahramIzadi Aaron J Quigley Ivan Poupyrev and Takeo Igarashi (Eds) ACM473ndash484 httpsdoiorg10114525019882502050

[14] Anton Burtsev David Johnson Mike Hibler Eric Eide and John Regehr2016 Abstractions for Practical Virtual Machine Replay In Proceedingsof the 12th ACM SIGPLANSIGOPS International Conference on VirtualExecution Environments Atlanta GA USA April 2-3 2016 VishakhaGupta-Cledat Donald E Porter and Vivek Sarkar (Eds) ACM 93ndash106httpsdoiorg10114528922422892257

[15] M E Chastain 1999 MEC (January 1999) httpslwnnet19990121amechtml

[16] Heming Cui Jiriacute Simsa Yi-Hong Lin Hao Li Ben Blum Xinan XuJunfeng Yang Garth A Gibson and Randal E Bryant 2013 Parrota practical runtime for deterministic stable and reliable threads InACM SIGOPS 24th Symposium on Operating Systems Principles SOSP rsquo13Farmington PA USA November 3-6 2013 Michael Kaminsky and MikeDahlin (Eds) ACM 388ndash405 httpsdoiorg10114525173492522735

[17] Heming Cui Jingyue Wu John Gallagher Huayang Guo and JunfengYang 2011 Efficient deterministic multithreading through schedulerelaxation See [85] 337ndash351 httpsdoiorg10114520435562043588

[18] P Deva 2018 Chronon (2018) httpchrononsystemscomblogdesign-and-architecture-ofthe-chronon-record-0

[19] David Devecsery Michael Chow Xianzheng Dou Jason Flinn andPeter M Chen 2014 Eidetic Systems In 11th USENIX Symposium onOperating Systems Design and Implementation OSDI rsquo14 BroomfieldCO USA October 6-8 2014 Jason Flinn and Hank Levy (Eds) USENIXAssociation 525ndash540 httpswwwusenixorgconferenceosdi14technical-sessionspresentationdevecsery

[20] Joseph Devietti Brandon Lucia Luis Ceze andMark Oskin 2009 DMPdeterministic shared memory multiprocessing In Proceedings of the14th International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2009 Washington DC USAMarch 7-11 2009 Mary Lou Soffa and Mary Jane Irwin (Eds) ACM85ndash96 httpsdoiorg10114515082441508255

[21] Joseph Devietti Jacob Nelson Tom Bergan Luis Ceze and Dan Gross-man 2011 RCDC a relaxed consistency deterministic computer See[37] 67ndash78 httpsdoiorg10114519503651950376

[22] Brendan Dolan-Gavitt Josh Hodosh Patrick Hulin Tim Leek andRyan Whelan 2015 Repeatable Reverse Engineering with PANDAIn Proceedings of the 5th Program Protection and Reverse EngineeringWorkshop (PPREW-5) ACM New York NY USA Article 4 11 pageshttpsdoiorg10114528438592843867

[23] Richard Draves and Robbert van Renesse (Eds) 2008 8th USENIX Sym-posium on Operating Systems Design and Implementation OSDI 2008December 8-10 2008 San Diego California USA Proceedings USENIXAssociation httpswwwusenixorgpublicationsproceedingsf[0]=im_group_audience3A73

[24] George W Dunlap Samuel T King Sukru Cinar Murtaza A Bas-rai and Peter M Chen 2002 ReVirt Enabling Intrusion AnalysisThrough Virtual-Machine Logging and Replay In 5th Symposium onOperating System Design and Implementation (OSDI 2002) Boston Mas-sachusetts USA December 9-11 2002 David E Culler and Peter Dr-uschel (Eds) USENIX Association httpwwwusenixorgeventsosdi02techdunlaphtml

[25] George W Dunlap Dominic G Lucchetti Michael A Fetterman andPeter M Chen 2008 Execution replay of multiprocessor virtual ma-chines In Proceedings of the 4th International Conference on VirtualExecution Environments VEE 2008 Seattle WA USA March 5-7 2008David Gregg Vikram S Adve and Brian N Bershad (Eds) ACM121ndash130 httpsdoiorg10114513462561346273

[26] Michael Emmi Shaz Qadeer and Zvonimir Rakamaric 2011 Delay-bounded scheduling In Proceedings of the 38th ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages POPL 2011 AustinTX USA January 26-28 2011 Thomas Ball and Mooly Sagiv (Eds)ACM 411ndash422 httpsdoiorg10114519263851926432

[27] Jakob Engblom Daniel Aarno and Bengt Werner 2010 Full-SystemSimulation from Embedded to High-Performance Systems Springer USBoston MA 25ndash45 httpsdoiorg101007978-1-4419-6175-4_3

[28] Cormac Flanagan and Stephen N Freund 2009 FastTrack efficientand precise dynamic race detection In Proceedings of the 2009 ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI 2009 Dublin Ireland June 15-21 2009 Michael Hind andAmer Diwan (Eds) ACM 121ndash133 httpsdoiorg1011451542476

590

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

1542490[29] Cormac Flanagan and Stephen N Freund 2010 Adversarial memory

for detecting destructive races In Proceedings of the 2010 ACM SIGPLANConference on Programming Language Design and Implementation PLDI2010 Toronto Ontario Canada June 5-10 2010 Benjamin G Zornand Alexander Aiken (Eds) ACM 244ndash254 httpsdoiorg10114518065961806625

[30] Cormac Flanagan and Stephen N Freund 2010 The RoadRunnerdynamic analysis framework for concurrent programs In Proceedingsof the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis forSoftware Tools and Engineering PASTErsquo10 Toronto Ontario CanadaJune 5-6 2010 Sorin Lerner and Atanas Rountev (Eds) ACM 1ndash8httpsdoiorg10114518066721806674

[31] Cormac Flanagan and Patrice Godefroid 2005 Dynamic partial-orderreduction for model checking software In Proceedings of the 32nd ACMSIGPLAN-SIGACT Symposium on Principles of Programming LanguagesPOPL 2005 Long Beach California USA January 12-14 2005 JensPalsberg and Martiacuten Abadi (Eds) ACM 110ndash121 httpsdoiorg10114510403051040315

[32] Dennis Geels Gautam Altekar Scott Shenker and Ion Stoica 2006Replay Debugging for Distributed Applications (Awarded Best Paper)In Proceedings of the 2006 USENIX Annual Technical Conference BostonMA USA May 30 - June 3 2006 Atul Adya and Erich M Nahum (Eds)USENIX 289ndash300 httpwwwusenixorgeventsusenix06techgeelshtml

[33] Patrice Godefroid 2005 Software Model Checking The VeriSoftApproach Formal Methods in System Design 26 2 (2005) 77ndash101httpsdoiorg101007s10703-005-1489-x

[34] C Gottbrath 2008 Reverse debugging with the TotalView debugger(May 2008)

[35] David Grove and Steve Blackburn (Eds) 2015 Proceedings of the36th ACM SIGPLAN Conference on Programming Language Design andImplementation Portland OR USA June 15-17 2015 ACM httpdlacmorgcitationcfmid=2737924

[36] Zhenyu Guo Xi Wang Jian Tang Xuezheng Liu Zhilei Xu Ming WuM Frans Kaashoek and Zheng Zhang 2008 R2 An Application-LevelKernel for Record and Replay See [23] 193ndash208 httpwwwusenixorgeventsosdi08techfull_papersguoguopdf

[37] Rajiv Gupta and Todd C Mowry (Eds) 2011 Proceedings of the 16thInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2011 Newport Beach CAUSA March 5-11 2011 ACM

[38] James C Hoe and Vikram S Adve (Eds) 2010 Proceedings of the 15thInternational Conference on Architectural Support for Programming Lan-guages and Operating Systems ASPLOS 2010 Pittsburgh PennsylvaniaUSA March 13-17 2010 ACM

[39] Nima Honarmand and Josep Torrellas 2014 RelaxReplay recordand replay for relaxed-consistency multiprocessors In ArchitecturalSupport for Programming Languages and Operating Systems ASPLOSrsquo14 Salt Lake City UT USA March 1-5 2014 Rajeev BalasubramonianAl Davis and Sarita V Adve (Eds) ACM 223ndash238 httpsdoiorg10114525419402541979

[40] Nima Honarmand and Josep Torrellas 2014 Replay debugging Lever-aging record and replay for program debugging In ACMIEEE 41stInternational Symposium on Computer Architecture ISCA 2014 Min-neapolis MN USA June 14-18 2014 IEEE Computer Society 455ndash456httpsdoiorg101109ISCA20146853229

[41] Petr Hosek and Cristian Cadar 2015 VARAN the Unbelievable AnEfficient N-version Execution Framework In Proceedings of the Twenti-eth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS rsquo15 Istanbul Turkey March14-18 2015 Oumlzcan Oumlzturk Kemal Ebcioglu and Sandhya Dwarkadas(Eds) ACM 339ndash353 httpsdoiorg10114526943442694390

[42] Derek Hower and Mark D Hill 2008 Rerun Exploiting Episodes forLightweight Memory Race Recording In 35th International Symposiumon Computer Architecture (ISCA 2008) June 21-25 2008 Beijing China265ndash276 httpsdoiorg101109ISCA200826

[43] Jeff Huang Peng Liu and Charles Zhang 2010 LEAP lightweightdeterministic multi-processor replay of concurrent java programsIn Proceedings of the 18th ACM SIGSOFT International Symposium onFoundations of Software Engineering 2010 Santa Fe NM USA November7-11 2010 Gruia-Catalin Roman and Andreacute van der Hoek (Eds) ACM207ndash216 httpsdoiorg10114518822911882323

[44] Jeff Huang Charles Zhang and Julian Dolby 2013 CLAP recordinglocal executions to reproduce concurrency failures In ACM SIGPLANConference on Programming Language Design and Implementation PLDIrsquo13 Seattle WA USA June 16-19 2013 Hans-Juergen Boehm and Cor-mac Flanagan (Eds) ACM 141ndash152 httpsdoiorg10114524621562462167

[45] Shiyou Huang Bowen Cai and Jeff Huang 2017 Towards Production-Run Heisenbugs Reproduction on Commercial Hardware In 2017USENIX Annual Technical Conference USENIX ATC 2017 Santa ClaraCA USA July 12-14 2017 403ndash415 httpswwwusenixorgconferenceatc17technical-sessionspresentationhuang

[46] Koen Koning Herbert Bos and Cristiano Giuffrida 2016 Secure andEfficient Multi-Variant Execution Using Hardware-Assisted ProcessVirtualization In 46th Annual IEEEIFIP International Conference onDependable Systems and Networks DSN 2016 Toulouse France June28 - July 1 2016 IEEE Computer Society 431ndash442 httpsdoiorg101109DSN201646

[47] Oren Laadan Nicolas Viennot and Jason Nieh 2010 Transparentlightweight application execution replay on commodity multiproces-sor operating systems In SIGMETRICS 2010 Proceedings of the 2010ACM SIGMETRICS International Conference on Measurement and Mod-eling of Computer Systems New York New York USA 14-18 June 2010Vishal Misra Paul Barford and Mark S Squillante (Eds) ACM 155ndash166 httpsdoiorg10114518110391811057

[48] Leslie Lamport 1978 Time Clocks and the Ordering of Events ina Distributed System Commun ACM 21 7 (1978) 558ndash565 httpsdoiorg101145359545359563

[49] Dongyoon Lee Peter M Chen Jason Flinn and Satish Narayanasamy2012 Chimera hybrid program analysis for determinism In ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI rsquo12 Beijing China - June 11 - 16 2012 Jan Vitek Haibo Linand Frank Tip (Eds) ACM 463ndash474 httpsdoiorg10114522540642254119

[50] Dongyoon Lee Benjamin Wester Kaushik Veeraraghavan SatishNarayanasamy Peter M Chen and Jason Flinn 2010 Respec efficientonline multiprocessor replayvia speculation and external determinismSee [38] 77ndash90 httpsdoiorg10114517360201736031

[51] Kyu Hyung Lee Dohyeong Kim and Xiangyu Zhang 2014Infrastructure-Free Logging and Replay of Concurrent Executionon Multiple Cores In ECOOP 2014 - Object-Oriented Programming- 28th European Conference Uppsala Sweden July 28 - August 12014 Proceedings (Lecture Notes in Computer Science) Richard EJones (Ed) Vol 8586 Springer 232ndash256 httpsdoiorg101007978-3-662-44202-9_10

[52] Christopher Lidbury and Alastair F Donaldson 2017 Dynamic racedetection for C++11 In Proceedings of the 44th ACM SIGPLAN Sympo-sium on Principles of Programming Languages POPL 2017 Paris FranceJanuary 18-20 2017 Giuseppe Castagna and Andrew D Gordon (Eds)ACM 443ndash457 httpsdoiorg1011453009837

[53] Hongyu Liu Sam Silvestro Wei Wang Chen Tian and Tongping Liu2018 iReplayer in-situ and identical record-and-replay for multi-threaded applications In Proceedings of the 39th ACM SIGPLAN Confer-ence on Programming Language Design and Implementation PLDI 2018

591

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Philadelphia PA USA June 18-22 2018 Jeffrey S Foster and Dan Gross-man (Eds) ACM 344ndash358 httpsdoiorg10114531923663192380

[54] Peng Liu Xiangyu Zhang Omer Tripp and Yunhui Zheng 2015 Lightreplay via tightly bounded recording See [35] 55ndash64 httpsdoiorg10114527379242738001

[55] Tongping Liu Charlie Curtsinger and EmeryD Berger 2011 Dthreadsefficient deterministic multithreading See [85] 327ndash336 httpsdoiorg10114520435562043587

[56] Shan Lu Soyeon Park Eunsoo Seo and Yuanyuan Zhou 2008 Learn-ing from mistakes a comprehensive study on real world concurrencybug characteristics In Proceedings of the 13th International Confer-ence on Architectural Support for Programming Languages and Op-erating Systems ASPLOS 2008 Seattle WA USA March 1-5 2008Susan J Eggers and James R Larus (Eds) ACM 329ndash339 httpsdoiorg10114513462811346323

[57] Ali Joseacute Mashtizadeh Tal Garfinkel David Terei David Maziegraveresand Mendel Rosenblum 2017 Towards Practical Default-On Multi-Core RecordReplay In Proceedings of the Twenty-Second InternationalConference on Architectural Support for Programming Languages andOperating Systems ASPLOS 2017 Xirsquoan China April 8-12 2017 YunjiChen Olivier Temam and John Carter (Eds) ACM 693ndash708 httpsdoiorg10114530376973037751

[58] Microsoft 2018 Understanding IntelliTrace part I What the $ isIntelliTrace (2018) httpsblogsmsdnmicrosoftcomzainnab20130212understanding-intellitrace-part-i-what-the-is-intellitrace

[59] Pablo Montesinos Luis Ceze and Josep Torrellas 2008 DeLoreanRecording and Deterministically Replaying Shared-Memory Multi-processor Execution Effciently In 35th International Symposium onComputer Architecture (ISCA 2008) June 21-25 2008 Beijing China289ndash300 httpsdoiorg101109ISCA200836

[60] Mozilla 2018 SpiderMonkey httpsdevelopermozillaorgen-USdocsMozillaProjectsSpiderMonkey

[61] MadanlalMusuvathi and Shaz Qadeer 2007 Iterative context boundingfor systematic testing of multithreaded programs In Proceedings of theACM SIGPLAN 2007 Conference on Programming Language Design andImplementation San Diego California USA June 10-13 2007 JeanneFerrante and Kathryn S McKinley (Eds) ACM 446ndash455 httpsdoiorg10114512507341250785

[62] Madanlal Musuvathi Shaz Qadeer Thomas Ball Geacuterard Basler Pi-ramanayagam Arumuga Nainar and Iulian Neamtiu 2008 Find-ing and Reproducing Heisenbugs in Concurrent Programs See[23] 267ndash280 httpwwwusenixorgeventsosdi08techfull_papersmusuvathimusuvathipdf

[63] Satish Narayanasamy Gilles Pokam and Brad Calder 2005 BugNetContinuously Recording Program Execution for Deterministic ReplayDebugging In 32st International Symposium on Computer Architecture(ISCA 2005) 4-8 June 2005 Madison Wisconsin USA IEEE ComputerSociety 284ndash295 httpsdoiorg101109ISCA200516

[64] Brian Norris and Brian Demsky 2013 CDSchecker checking concur-rent data structures written with CC++ atomics In Proceedings ofthe 2013 ACM SIGPLAN International Conference on Object OrientedProgramming Systems Languages amp Applications OOPSLA 2013 part ofSPLASH 2013 Indianapolis IN USA October 26-31 2013 131ndash150

[65] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2016 Lightweight User-Space Record And ReplayCoRR abs161002144 (2016) arXiv161002144 httparxivorgabs161002144

[66] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2017 Engineering Record and Replay for Deploy-ability In 2017 USENIX Annual Technical Conference USENIX ATC 2017Santa Clara CA USA July 12-14 2017 377ndash389 httpswwwusenixorgconferenceatc17technical-sessionspresentationocallahan

[67] Harish Patil Cristiano Pereira Mack Stallcup Gregory Lueck andJames Cownie 2010 PinPlay a framework for deterministic replay

and reproducible analysis of parallel programs In Proceedings of theCGO 2010 The 8th International Symposium on Code Generation andOptimization Toronto Ontario Canada April 24-28 2010 AndreasMoshovos J Gregory Steffan Kim M Hazelwood and David R Kaeli(Eds) ACM 2ndash11 httpsdoiorg10114517729541772958

[68] pbzip2 development team 2018 pbzip2 httpslaunchpadnetpbzip2[69] Luiacutes Pina Anastasios Andronidis and Cristian Cadar 2018 FreeDA

deploying incompatible stock dynamic analyses in production viamulti-version execution In Proceedings of the 15th ACM InternationalConference on Computing Frontiers CF 2018 Ischia Italy May 08-102018 David R Kaeli and Miquel Pericagraves (Eds) ACM 1ndash10 httpsdoiorg10114532032173203237

[70] Gilles Pokam Klaus Danne Cristiano Pereira Rolf Kassa Tim KranichShiliang Hu Justin Emile Gottschlich Nima Honarmand Nathan Daut-enhahn Samuel T King and Josep Torrellas 2013 QuickRec proto-typing an intel architecture extension for record and replay of mul-tithreaded programs In The 40th Annual International Symposiumon Computer Architecture ISCArsquo13 Tel-Aviv Israel June 23-27 2013Avi Mendelson (Ed) ACM 643ndash654 httpsdoiorg10114524859222485977

[71] QuakeSpasm 2018 QuakeSpasm An engine for iD softwarersquos Quakehttpquakespasmsourceforgenet

[72] Yasushi Saito 2005 Jockey a user-space library for record-replaydebugging In Proceedings of the Sixth International Workshop on Auto-mated Debugging AADEBUG 2005 Monterey California USA Septem-ber 19-21 2005 Clinton Jeffery Jong-Deok Choi and Raimondas Lence-vicius (Eds) ACM 69ndash76 httpsdoiorg10114510851301085139

[73] Torr Samaho 2018 Zandronum httpszandronumcom[74] Konstantin Serebryany and Timur Iskhodzhanov 2009 ThreadSani-

tizer Data Race Detection in Practice In WBIA 62ndash71[75] Simple DirectMedia Layer 2018 SDL 20 library httpswwwlibsdl

orgdownload-20php[76] SQLite 2018 SQLite 3240 httpssqliteorgreleaselog3_24_0html[77] Deepa Srinivasan and Xuxian Jiang 2012 Time-Traveling Forensic

Analysis of VM-Based High-Interaction Honeypots In Security and Pri-vacy in Communication Networks Muttukrishnan Rajarajan Fred PiperHaining Wang and George Kesidis (Eds) Springer Berlin HeidelbergBerlin Heidelberg 209ndash226

[78] Paul Thomson Alastair F Donaldson and Adam Betts 2014 Concur-rency testing using schedule bounding an empirical study In ACMSIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPPoPP rsquo14 Orlando FL USA February 15-19 2014 Joseacute E Moreira andJames R Larus (Eds) ACM 15ndash28 httpsdoiorg10114525552432555260

[79] Joseph Tucek Shan Lu Chengdu Huang Spiros Xanthos andYuanyuan Zhou 2007 Triage diagnosing production run failuresat the userrsquos site In Proceedings of the 21st ACM Symposium on Oper-ating Systems Principles 2007 SOSP 2007 Stevenson Washington USAOctober 14-17 2007 Thomas C Bressoud and M Frans Kaashoek (Eds)ACM 131ndash144 httpsdoiorg10114512942611294275

[80] Undo 2018 Reversible debugging tools for CC++ on Linux amp Android(2018) httpundo-softwarecom

[81] Kaushik Veeraraghavan Dongyoon Lee Benjamin Wester JessicaOuyang Peter M Chen Jason Flinn and Satish Narayanasamy 2011DoublePlay parallelizing sequential logging and replay See [37] 15ndash26 httpsdoiorg10114519503651950370

[82] Stijn Volckaert Bart Coppens Bjorn De Sutter Koen De BosscherePer Larsen and Michael Franz 2017 Taming Parallelism in a Multi-Variant Execution Environment In Proceedings of the Twelfth EuropeanConference on Computer Systems EuroSys 2017 Belgrade Serbia April23-26 2017 Gustavo Alonso Ricardo Bianchini and Marko Vukolic(Eds) ACM 270ndash285 httpsdoiorg10114530641763064178

[83] Stijn Volckaert Bart Coppens Alexios Voulimeneas Andrei Home-scu Per Larsen Bjorn De Sutter and Michael Franz 2016 Secure

592

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

and Efficient Application Monitoring and Replication In 2016 USENIXAnnual Technical Conference USENIX ATC 2016 Denver CO USA June22-24 2016 Ajay Gulati and Hakim Weatherspoon (Eds) USENIXAssociation 167ndash179 httpswwwusenixorgconferenceatc16technical-sessionspresentationvolckaert

[84] YanWang Harish Patil Cristiano Pereira Gregory Lueck Rajiv Guptaand Iulian Neamtiu 2014 DrDebug Deterministic Replay based CyclicDebugging with Dynamic Slicing In 12th Annual IEEEACM Interna-tional Symposium on Code Generation and Optimization CGO 2014Orlando FL USA February 15-19 2014 David R Kaeli and TippMoseley(Eds) ACM 98 httpsdoiorg10114525441372544152

[85] Ted Wobber and Peter Druschel (Eds) 2011 Proceedings of the 23rdACM Symposium on Operating Systems Principles 2011 SOSP 2011Cascais Portugal October 23-26 2011 ACM httpsdoiorg1011452043556

[86] Min Xu Rastislav Bodiacutek and Mark D Hill 2003 A Flight DataRecorder for Enabling Full-System Multiprocessor Deterministic

Replay In 30th International Symposium on Computer Architecture(ISCA 2003) 9-11 June 2003 San Diego California USA Allan Got-tlieb and Kai Li (Eds) IEEE Computer Society 122ndash133 httpsdoiorg101109ISCA20031206994

[87] Jie Yu Satish Narayanasamy Cristiano Pereira and Gilles Pokam 2012Maple a coverage-driven testing tool for multithreaded programs InProceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications OOPSLA2012 part of SPLASH 2012 Tucson AZ USA October 21-25 2012 Gary TLeavens and Matthew B Dwyer (Eds) ACM 485ndash502 httpsdoiorg10114523846162384651

[88] Zandronum 2015 Zandronum bug tracker httpszandronumcomtrackerviewphpid=2380 Bug 0002380

[89] Naling Zhang Markus Kusano and ChaoWang 2015 Dynamic partialorder reduction for relaxed memory models See [35] 250ndash259 httpsdoiorg10114527379242737956

593

  • Abstract
  • 1 Introduction
  • 2 Background
  • 3 Scheduling
    • 31 Protocol Details
    • 32 Special Cases
    • 33 Liveness
      • 4 Record and Replay
        • 41 Motivating Example
        • 42 Interleaving
        • 43 Signals
        • 44 System Calls
        • 45 Asynchronous Events
          • 5 Evaluation
            • 51 CDSchecker Litmus Tests
            • 52 httpd
            • 53 PARSEC and pbzip
            • 54 SDL-based Games
            • 55 Limitations SQLite and SpiderMonkey
              • 6 Related Work
              • 7 Conclusion
              • Acknowledgments
              • References
Page 13: Sparse Record and Replay with Controlled Schedulingafd/homepages/papers/pdfs/2019/PLDI.pdf · Controlled concurrency testing has proven successful in find-ing subtle bugs in concurrent

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

that our tool can be used to accurately capture and facilitatereplay of bugs in large networked applicationsFor QuakeSpasm we found that it was possible to play

the game without dropping below 60 fps using tsan11 and alltsan11rec configurations To further investigate the overheadof each tool configuration on this case study we removedthe fps cap then played the game 5 times per tool configu-ration for 90 seconds per play enabling a mode where thegamersquos fps is periodically appended to a file We made a besteffort to play the game in a similar manner on each run butinevitably there will still be high variation in game activitybetween plays Indicative results are shown in Table 5 Thernd and queue configurations refer to tsan11rec with the ran-dom and queue strategies respectively and with recordingdisabled while the ldquo+ recrdquo tool configurations are similarbut with recording enabled The ldquoOverheadrdquo column showsthe overhead observed compared with native execution Thetake-away from these results is that the instrumentationoverhead for both tsan11 and tsan11rec is surprisingly mod-est (generally less than 2times) and that the additional overheadassociated with enabling recording in tsan11rec is low

55 Limitations SQLite and SpiderMonkeyA downside of our sparse approach to record and replay isthat different applications may have incompatible require-ments regarding what should be recorded and what mustnot be recorded For example recording memory layout andattempting to enforce the same layout on replay would notonly slow down the SDL games (see sect54) to the point ofbeing unplayable but would also cause problems related tocommunication with the display driver Yet the behaviour ofsome programs will depend on the memory layout such asiterating over an ordered C++ container that holds pointersIn particular we experimented with applying tsan11rec

to the SQLite database management library [76] and to Spi-derMonkey Firefoxrsquos JavaScript management engine [60]While tsan11rec was applicable for controlled schedulingof these applications we found that replay would rapidlydesynchronise due to memory layout nondeterminism caus-ing conditionals that rely on the values of pointers to evaluatedifferently during replay Tools such as rr can handle theseprograms reliably by enforcing the same memory layoutThis is a trade-off the non-sparse approach of rr can leadto higher overheads as demonstrated in sect52 and sect53 Analternative to adapting the record-and-replay tool so thatit always enforces memory layout determinism would beto adapt the application of interest so that default memoryallocation is replaced with a deterministic memory allocator

6 Related WorkControlled scheduling A large amount of work has goneinto the use of scheduling strategies as a form of state spaceexploration (eg [29 30 61 62 78 87]) and on techniques

aimed at reducing the size of the state space such as dynamicpartial-order reduction [31 89]) A particularly notable con-trolled scheduling tool in terms of successful practical ap-plication is Microsoftrsquos CHESS [62] which aims to system-atically explore all interleavings of a test scenario Similarto our approach each visible instruction has an associatedcustom wrapper that intercepts the real instruction callinginto the CHESS schedulerSchedule bounding techniques notably preemption- and

delay-bounding [26 61] have been shown to be successful inprioritising the order in which thread schedules are exploredduring controlled concurrency testing They prioritise ex-ploring schedules that exhibit small numbers of preemptionsbetween threads in line with empirical evidence that bugsrarely require large numbers of preemptions in order to man-ifest [56] Combining such techniques with the tsan11recalgorithm is an appealing idea in principle but is hinderedby the assumption that the program under test takes a fixedinput and that the scheduler is the only source of nonde-terminism This assumption allows running the programagain and again trying different schedules In the contextof tsan11rec which can be used to record and replay ap-plications where the environment presents other forms ofnondeterminism the manner in which the program interactswith its environment is captured with respect to a particularthread schedule and other thread schedules might involvecompletely different environmental interactions We believea more promising approach would be to bring ideas from theprobabilistic concurrency testing (PCT) algorithm [12] to thetsan11rec setting to introduce a degree of skewing to ourrandom strategy so that it explores more diverse schedules

Record and replay Record and replay has been a signifi-cant area of research with many tools being created to facil-itate it [2 9 19 24 32 36 39 40 43ndash45 47 50 53 54 57 5765ndash67 72 79 81] The general premise behind them is simi-lar identify order nondeterminism and input nondeterminismand create techniques to capture them while recording andcontrol them during replayVarious tools extend the OS in some way or require spe-

cific hardware [2 6 9 19 20 24 47 50 79 81] This has thebenefit of giving the tool access to much more of the systemsuch as memory pages and process information For exam-ple Scribe [47] will directly modify the system schedulerinstead of coercing it and achieves slowdowns as low as105times However this severely hits the usability of the toolas it requires the user to deploy a modified OS

Other tools reside entirely in user space [10 15 32 34 43ndash45 49 51 53 54 57 66 67 72 80 84] and trade performancefor usability This is the category that tsan11rec falls intoEase of use is particularly important in persuading users toadopt the tool rr [66] in particular allows the user to recorda program by simply passing the binary to rr as a parameter

588

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 5 Indicative frames per second (fps) result captured by playing Quakespasm for 90 seconds five times per toolconfiguration and capturing the fps reports recorded by the game Frame-rate results are reported for the minimum maximummedian and mean case as well as the 25th and 75th quartiles

Min 25th Median 75th Max Mean OverheadNative 291 369 400 428 502 4004 10timestsan11 125 193 225 275 431 2338 17timesrnd 128 205 238 281 421 2471 16timesqueue 84 188 233 273 435 2333 17timesrnd + rec 113 178 212 253 366 2168 18timesqueue + rec 86 161 185 227 348 1930 21times

and as such has become the definitive tool for record and re-play We have performed an extensive comparison with rr insect5 and note that while rr outclasses tsan11rec in some appli-cations and can handle applications that are out of scope fortsan11rec (see sect55) rr shows significantly higher overheadcompared with tsan11rec for a number of applications thatrely on a high degree of parallelism for performance Furtherour sparse approach with suitable workarounds enablesrecord and replay for graphical applications (the SDL-basedgames of sect54) that rr cannot currently handleBecause it builds on tsan11 which itself uses compiler

instrumentation and a modified libcxx tsan11rec shares sim-ilarities with tools that depend on language implementationor library-level support [1 13 13 18 36 58] Notable exam-ples here include R2 [36] and IntelliTrace [58]Whole system replay aims to record all system nonde-

terminism [14 19 22 24 25 27 53 77] Among these therecent iReplayer tool [53] performs record and replay in-situavoiding many of the problems (eg memory layout issues)that otherwise come from running the record and replayexecutions under different processes

Some tools focus on the order-nondeterminism allowingthem to retain their parallelism and thus reducing the over-head of multi-threaded application [25 42 59 63 70 86]Castor [57] will provide each thread with its own bufferfor storing information and serialize them at a later timetsan11rec also fits into this category as it will both preserverparallelism of invisible operations and apply a schedulingstrategy to resolve this nondeterminism

An alternative to recording a programrsquos nondeterminismis to remove it making some or all aspects of the programdeterministic [5 8 16 17 21 55] For example Dthreads [55]ensures that memory accesses are deterministic on eachexecution Such approaches can have a significant probeproblem by removing the behaviour necessary for certainbugs to manifest in return for avoiding the performanceoverhead associated with handling order-nondeterminism

Multi-version execution Multi-version (or multi-variant)execution (MVE) is a method for concurrently running mul-tiple processes that are expected to behave in a semanticallysimilar manner [41 46 69 82 83] MVE can be used to detect

security vulnerabilities in applications if a variant divergesthis could indicate that an attacker has modified the processin some way [46 82 83] It can also be used for running dif-ferent analyses on identical processes that would not workwhen run together on the same process such as the clangsanitizers [69] Most MVE systems hinge on a special moni-tor thread that controls the generation an maintenance ofa number of variants Keeping the variants in sync with re-spect to nondeterministic behaviours presents many of thesame problems that are associated with record and replay

7 ConclusionWe have presented tsan11rec which brings together con-trolled scheduling record and replay and dynamic data racedetection for the dynamic analysis of CC++11 applicationsOur experimental evaluation demonstrates that the tool isin many cases competitive in terms of performance withrr a state-of-the-art record and replay tool in some casesout-performing rr due to tsan11recrsquos ability to preserve par-allelism in applications under test to a high degree We havealso shown that our tool is capable of recording and replayingSDL-based video games by exploiting our sparse approachto avoid recording aspects of gamedisplay communicationthat are fundamentally hard to control The flip side of oursparse approach is that by limiting what is recorded ourtool desynchronises on uncontrolled forms of nondetermin-ism such as that related to memory layout by capturingthis form of nondeterminism tools such as rr do not sufferfrom this problem Two exciting avenues for future workinclude investigating a spectrum of recording granularitiesto bridge the gap between our sparse approach and stricterapproaches in a configurable manner and to investigatebug detection using a richer range of scheduling strategiesincluding schedule bounding [26 61 62] and probabilisticconcurrency testing [12]

AcknowledgmentsThis work was supported by a PhD studentship funded byGCHQ and EPSRC projects EPN0263141 and EPR0068651

589

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

References[1] Bowen Alpern Jong-Deok Choi Ton Ngo Manu Sridharan and

John M Vlissides 2001 A Perturbation-Free Replay Platform forCross-Optimized Multithreaded Applications In Proceedings of the15th International Parallel amp Distributed Processing Symposium (IPDPS-01) San Francisco CA USA April 23-27 2001 IEEE Computer Society23 httpsdoiorg101109IPDPS2001924957

[2] GautamAltekar and Ion Stoica 2009 ODR output-deterministic replayfor multicore debugging In Proceedings of the 22nd ACM Symposiumon Operating Systems Principles 2009 SOSP 2009 Big Sky Montana USAOctober 11-14 2009 Jeanna Neefe Matthews and Thomas E Anderson(Eds) ACM 193ndash206 httpsdoiorg10114516295751629594

[3] Apache Software Foundation 2018 Apache httpd httpshttpdapacheorgdevdevnoteshtml

[4] Remzi H Arpaci-Dusseau and Brad Chen (Eds) 2010 9th USENIXSymposium on Operating Systems Design and Implementation OSDI2010 October 4-6 2010 Vancouver BC Canada Proceedings USENIXAssociation httpwwwusenixorgeventosdi10techfull_papersosdi10_proceedingspdf

[5] Amittai Aviram Shu-Chun Weng Sen Hu and Bryan Ford 2010 Ef-ficient System-Enforced Deterministic Parallelism See [4] 193ndash206httpwwwusenixorgeventsosdi10techfull_papersAvirampdf

[6] David F Bacon and Seth Copen Goldstein 1991 Hardware-assisted Re-play of Multiprocessor Programs In Proceedings of the 1991 ACMONRWorkshop on Parallel and Distributed Debugging (PADD rsquo91) ACMNew York NY USA 194ndash206 httpsdoiorg101145122759122777

[7] Mark Batty Scott Owens Susmit Sarkar Peter Sewell and TjarkWeber2011 Mathematizing C++ concurrency In POPL 55ndash66

[8] Tom Bergan Owen Anderson Joseph Devietti Luis Ceze and DanGrossman 2010 CoreDet a compiler and runtime system for deter-ministic multithreaded execution See [38] 53ndash64 httpsdoiorg10114517360201736029

[9] Tom Bergan Nicholas Hunt Luis Ceze and Steven D Gribble 2010Deterministic Process Groups in dOS See [4] 177ndash191 httpwwwusenixorgeventsosdi10techfull_papersBerganpdf

[10] Sanjay Bhansali Wen-Ke Chen Stuart de Jong Andrew EdwardsRon Murray Milenko Drinić Darek Mihočka and Joe Chau 2006Framework for Instruction-level Tracing and Analysis of Program Ex-ecutions In Proceedings of the 2Nd International Conference on VirtualExecution Environments (VEE rsquo06) ACM New York NY USA 154ndash163httpsdoiorg10114511347601220164

[11] Christian Bienia Sanjeev Kumar Jaswinder Pal Singh and Kai Li 2008The PARSEC benchmark suite characterization and architectural im-plications In 17th International Conference on Parallel Architecture andCompilation Techniques PACT 2008 Toronto Ontario Canada October25-29 2008 Andreas Moshovos David Tarditi and Kunle Olukotun(Eds) ACM 72ndash81 httpsdoiorg10114514541151454128

[12] Sebastian Burckhardt Pravesh Kothari Madanlal Musuvathi and San-tosh Nagarakatte 2010 A randomized scheduler with probabilisticguarantees of finding bugs See [38] 167ndash178 httpsdoiorg10114517360201736040

[13] Brian Burg Richard Bailey Andrew J Ko and Michael D Ernst 2013Interactive recordreplay for web application debugging In The 26thAnnual ACM Symposium on User Interface Software and TechnologyUISTrsquo13 St Andrews United Kingdom October 8-11 2013 ShahramIzadi Aaron J Quigley Ivan Poupyrev and Takeo Igarashi (Eds) ACM473ndash484 httpsdoiorg10114525019882502050

[14] Anton Burtsev David Johnson Mike Hibler Eric Eide and John Regehr2016 Abstractions for Practical Virtual Machine Replay In Proceedingsof the 12th ACM SIGPLANSIGOPS International Conference on VirtualExecution Environments Atlanta GA USA April 2-3 2016 VishakhaGupta-Cledat Donald E Porter and Vivek Sarkar (Eds) ACM 93ndash106httpsdoiorg10114528922422892257

[15] M E Chastain 1999 MEC (January 1999) httpslwnnet19990121amechtml

[16] Heming Cui Jiriacute Simsa Yi-Hong Lin Hao Li Ben Blum Xinan XuJunfeng Yang Garth A Gibson and Randal E Bryant 2013 Parrota practical runtime for deterministic stable and reliable threads InACM SIGOPS 24th Symposium on Operating Systems Principles SOSP rsquo13Farmington PA USA November 3-6 2013 Michael Kaminsky and MikeDahlin (Eds) ACM 388ndash405 httpsdoiorg10114525173492522735

[17] Heming Cui Jingyue Wu John Gallagher Huayang Guo and JunfengYang 2011 Efficient deterministic multithreading through schedulerelaxation See [85] 337ndash351 httpsdoiorg10114520435562043588

[18] P Deva 2018 Chronon (2018) httpchrononsystemscomblogdesign-and-architecture-ofthe-chronon-record-0

[19] David Devecsery Michael Chow Xianzheng Dou Jason Flinn andPeter M Chen 2014 Eidetic Systems In 11th USENIX Symposium onOperating Systems Design and Implementation OSDI rsquo14 BroomfieldCO USA October 6-8 2014 Jason Flinn and Hank Levy (Eds) USENIXAssociation 525ndash540 httpswwwusenixorgconferenceosdi14technical-sessionspresentationdevecsery

[20] Joseph Devietti Brandon Lucia Luis Ceze andMark Oskin 2009 DMPdeterministic shared memory multiprocessing In Proceedings of the14th International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2009 Washington DC USAMarch 7-11 2009 Mary Lou Soffa and Mary Jane Irwin (Eds) ACM85ndash96 httpsdoiorg10114515082441508255

[21] Joseph Devietti Jacob Nelson Tom Bergan Luis Ceze and Dan Gross-man 2011 RCDC a relaxed consistency deterministic computer See[37] 67ndash78 httpsdoiorg10114519503651950376

[22] Brendan Dolan-Gavitt Josh Hodosh Patrick Hulin Tim Leek andRyan Whelan 2015 Repeatable Reverse Engineering with PANDAIn Proceedings of the 5th Program Protection and Reverse EngineeringWorkshop (PPREW-5) ACM New York NY USA Article 4 11 pageshttpsdoiorg10114528438592843867

[23] Richard Draves and Robbert van Renesse (Eds) 2008 8th USENIX Sym-posium on Operating Systems Design and Implementation OSDI 2008December 8-10 2008 San Diego California USA Proceedings USENIXAssociation httpswwwusenixorgpublicationsproceedingsf[0]=im_group_audience3A73

[24] George W Dunlap Samuel T King Sukru Cinar Murtaza A Bas-rai and Peter M Chen 2002 ReVirt Enabling Intrusion AnalysisThrough Virtual-Machine Logging and Replay In 5th Symposium onOperating System Design and Implementation (OSDI 2002) Boston Mas-sachusetts USA December 9-11 2002 David E Culler and Peter Dr-uschel (Eds) USENIX Association httpwwwusenixorgeventsosdi02techdunlaphtml

[25] George W Dunlap Dominic G Lucchetti Michael A Fetterman andPeter M Chen 2008 Execution replay of multiprocessor virtual ma-chines In Proceedings of the 4th International Conference on VirtualExecution Environments VEE 2008 Seattle WA USA March 5-7 2008David Gregg Vikram S Adve and Brian N Bershad (Eds) ACM121ndash130 httpsdoiorg10114513462561346273

[26] Michael Emmi Shaz Qadeer and Zvonimir Rakamaric 2011 Delay-bounded scheduling In Proceedings of the 38th ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages POPL 2011 AustinTX USA January 26-28 2011 Thomas Ball and Mooly Sagiv (Eds)ACM 411ndash422 httpsdoiorg10114519263851926432

[27] Jakob Engblom Daniel Aarno and Bengt Werner 2010 Full-SystemSimulation from Embedded to High-Performance Systems Springer USBoston MA 25ndash45 httpsdoiorg101007978-1-4419-6175-4_3

[28] Cormac Flanagan and Stephen N Freund 2009 FastTrack efficientand precise dynamic race detection In Proceedings of the 2009 ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI 2009 Dublin Ireland June 15-21 2009 Michael Hind andAmer Diwan (Eds) ACM 121ndash133 httpsdoiorg1011451542476

590

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

1542490[29] Cormac Flanagan and Stephen N Freund 2010 Adversarial memory

for detecting destructive races In Proceedings of the 2010 ACM SIGPLANConference on Programming Language Design and Implementation PLDI2010 Toronto Ontario Canada June 5-10 2010 Benjamin G Zornand Alexander Aiken (Eds) ACM 244ndash254 httpsdoiorg10114518065961806625

[30] Cormac Flanagan and Stephen N Freund 2010 The RoadRunnerdynamic analysis framework for concurrent programs In Proceedingsof the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis forSoftware Tools and Engineering PASTErsquo10 Toronto Ontario CanadaJune 5-6 2010 Sorin Lerner and Atanas Rountev (Eds) ACM 1ndash8httpsdoiorg10114518066721806674

[31] Cormac Flanagan and Patrice Godefroid 2005 Dynamic partial-orderreduction for model checking software In Proceedings of the 32nd ACMSIGPLAN-SIGACT Symposium on Principles of Programming LanguagesPOPL 2005 Long Beach California USA January 12-14 2005 JensPalsberg and Martiacuten Abadi (Eds) ACM 110ndash121 httpsdoiorg10114510403051040315

[32] Dennis Geels Gautam Altekar Scott Shenker and Ion Stoica 2006Replay Debugging for Distributed Applications (Awarded Best Paper)In Proceedings of the 2006 USENIX Annual Technical Conference BostonMA USA May 30 - June 3 2006 Atul Adya and Erich M Nahum (Eds)USENIX 289ndash300 httpwwwusenixorgeventsusenix06techgeelshtml

[33] Patrice Godefroid 2005 Software Model Checking The VeriSoftApproach Formal Methods in System Design 26 2 (2005) 77ndash101httpsdoiorg101007s10703-005-1489-x

[34] C Gottbrath 2008 Reverse debugging with the TotalView debugger(May 2008)

[35] David Grove and Steve Blackburn (Eds) 2015 Proceedings of the36th ACM SIGPLAN Conference on Programming Language Design andImplementation Portland OR USA June 15-17 2015 ACM httpdlacmorgcitationcfmid=2737924

[36] Zhenyu Guo Xi Wang Jian Tang Xuezheng Liu Zhilei Xu Ming WuM Frans Kaashoek and Zheng Zhang 2008 R2 An Application-LevelKernel for Record and Replay See [23] 193ndash208 httpwwwusenixorgeventsosdi08techfull_papersguoguopdf

[37] Rajiv Gupta and Todd C Mowry (Eds) 2011 Proceedings of the 16thInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2011 Newport Beach CAUSA March 5-11 2011 ACM

[38] James C Hoe and Vikram S Adve (Eds) 2010 Proceedings of the 15thInternational Conference on Architectural Support for Programming Lan-guages and Operating Systems ASPLOS 2010 Pittsburgh PennsylvaniaUSA March 13-17 2010 ACM

[39] Nima Honarmand and Josep Torrellas 2014 RelaxReplay recordand replay for relaxed-consistency multiprocessors In ArchitecturalSupport for Programming Languages and Operating Systems ASPLOSrsquo14 Salt Lake City UT USA March 1-5 2014 Rajeev BalasubramonianAl Davis and Sarita V Adve (Eds) ACM 223ndash238 httpsdoiorg10114525419402541979

[40] Nima Honarmand and Josep Torrellas 2014 Replay debugging Lever-aging record and replay for program debugging In ACMIEEE 41stInternational Symposium on Computer Architecture ISCA 2014 Min-neapolis MN USA June 14-18 2014 IEEE Computer Society 455ndash456httpsdoiorg101109ISCA20146853229

[41] Petr Hosek and Cristian Cadar 2015 VARAN the Unbelievable AnEfficient N-version Execution Framework In Proceedings of the Twenti-eth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS rsquo15 Istanbul Turkey March14-18 2015 Oumlzcan Oumlzturk Kemal Ebcioglu and Sandhya Dwarkadas(Eds) ACM 339ndash353 httpsdoiorg10114526943442694390

[42] Derek Hower and Mark D Hill 2008 Rerun Exploiting Episodes forLightweight Memory Race Recording In 35th International Symposiumon Computer Architecture (ISCA 2008) June 21-25 2008 Beijing China265ndash276 httpsdoiorg101109ISCA200826

[43] Jeff Huang Peng Liu and Charles Zhang 2010 LEAP lightweightdeterministic multi-processor replay of concurrent java programsIn Proceedings of the 18th ACM SIGSOFT International Symposium onFoundations of Software Engineering 2010 Santa Fe NM USA November7-11 2010 Gruia-Catalin Roman and Andreacute van der Hoek (Eds) ACM207ndash216 httpsdoiorg10114518822911882323

[44] Jeff Huang Charles Zhang and Julian Dolby 2013 CLAP recordinglocal executions to reproduce concurrency failures In ACM SIGPLANConference on Programming Language Design and Implementation PLDIrsquo13 Seattle WA USA June 16-19 2013 Hans-Juergen Boehm and Cor-mac Flanagan (Eds) ACM 141ndash152 httpsdoiorg10114524621562462167

[45] Shiyou Huang Bowen Cai and Jeff Huang 2017 Towards Production-Run Heisenbugs Reproduction on Commercial Hardware In 2017USENIX Annual Technical Conference USENIX ATC 2017 Santa ClaraCA USA July 12-14 2017 403ndash415 httpswwwusenixorgconferenceatc17technical-sessionspresentationhuang

[46] Koen Koning Herbert Bos and Cristiano Giuffrida 2016 Secure andEfficient Multi-Variant Execution Using Hardware-Assisted ProcessVirtualization In 46th Annual IEEEIFIP International Conference onDependable Systems and Networks DSN 2016 Toulouse France June28 - July 1 2016 IEEE Computer Society 431ndash442 httpsdoiorg101109DSN201646

[47] Oren Laadan Nicolas Viennot and Jason Nieh 2010 Transparentlightweight application execution replay on commodity multiproces-sor operating systems In SIGMETRICS 2010 Proceedings of the 2010ACM SIGMETRICS International Conference on Measurement and Mod-eling of Computer Systems New York New York USA 14-18 June 2010Vishal Misra Paul Barford and Mark S Squillante (Eds) ACM 155ndash166 httpsdoiorg10114518110391811057

[48] Leslie Lamport 1978 Time Clocks and the Ordering of Events ina Distributed System Commun ACM 21 7 (1978) 558ndash565 httpsdoiorg101145359545359563

[49] Dongyoon Lee Peter M Chen Jason Flinn and Satish Narayanasamy2012 Chimera hybrid program analysis for determinism In ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI rsquo12 Beijing China - June 11 - 16 2012 Jan Vitek Haibo Linand Frank Tip (Eds) ACM 463ndash474 httpsdoiorg10114522540642254119

[50] Dongyoon Lee Benjamin Wester Kaushik Veeraraghavan SatishNarayanasamy Peter M Chen and Jason Flinn 2010 Respec efficientonline multiprocessor replayvia speculation and external determinismSee [38] 77ndash90 httpsdoiorg10114517360201736031

[51] Kyu Hyung Lee Dohyeong Kim and Xiangyu Zhang 2014Infrastructure-Free Logging and Replay of Concurrent Executionon Multiple Cores In ECOOP 2014 - Object-Oriented Programming- 28th European Conference Uppsala Sweden July 28 - August 12014 Proceedings (Lecture Notes in Computer Science) Richard EJones (Ed) Vol 8586 Springer 232ndash256 httpsdoiorg101007978-3-662-44202-9_10

[52] Christopher Lidbury and Alastair F Donaldson 2017 Dynamic racedetection for C++11 In Proceedings of the 44th ACM SIGPLAN Sympo-sium on Principles of Programming Languages POPL 2017 Paris FranceJanuary 18-20 2017 Giuseppe Castagna and Andrew D Gordon (Eds)ACM 443ndash457 httpsdoiorg1011453009837

[53] Hongyu Liu Sam Silvestro Wei Wang Chen Tian and Tongping Liu2018 iReplayer in-situ and identical record-and-replay for multi-threaded applications In Proceedings of the 39th ACM SIGPLAN Confer-ence on Programming Language Design and Implementation PLDI 2018

591

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Philadelphia PA USA June 18-22 2018 Jeffrey S Foster and Dan Gross-man (Eds) ACM 344ndash358 httpsdoiorg10114531923663192380

[54] Peng Liu Xiangyu Zhang Omer Tripp and Yunhui Zheng 2015 Lightreplay via tightly bounded recording See [35] 55ndash64 httpsdoiorg10114527379242738001

[55] Tongping Liu Charlie Curtsinger and EmeryD Berger 2011 Dthreadsefficient deterministic multithreading See [85] 327ndash336 httpsdoiorg10114520435562043587

[56] Shan Lu Soyeon Park Eunsoo Seo and Yuanyuan Zhou 2008 Learn-ing from mistakes a comprehensive study on real world concurrencybug characteristics In Proceedings of the 13th International Confer-ence on Architectural Support for Programming Languages and Op-erating Systems ASPLOS 2008 Seattle WA USA March 1-5 2008Susan J Eggers and James R Larus (Eds) ACM 329ndash339 httpsdoiorg10114513462811346323

[57] Ali Joseacute Mashtizadeh Tal Garfinkel David Terei David Maziegraveresand Mendel Rosenblum 2017 Towards Practical Default-On Multi-Core RecordReplay In Proceedings of the Twenty-Second InternationalConference on Architectural Support for Programming Languages andOperating Systems ASPLOS 2017 Xirsquoan China April 8-12 2017 YunjiChen Olivier Temam and John Carter (Eds) ACM 693ndash708 httpsdoiorg10114530376973037751

[58] Microsoft 2018 Understanding IntelliTrace part I What the $ isIntelliTrace (2018) httpsblogsmsdnmicrosoftcomzainnab20130212understanding-intellitrace-part-i-what-the-is-intellitrace

[59] Pablo Montesinos Luis Ceze and Josep Torrellas 2008 DeLoreanRecording and Deterministically Replaying Shared-Memory Multi-processor Execution Effciently In 35th International Symposium onComputer Architecture (ISCA 2008) June 21-25 2008 Beijing China289ndash300 httpsdoiorg101109ISCA200836

[60] Mozilla 2018 SpiderMonkey httpsdevelopermozillaorgen-USdocsMozillaProjectsSpiderMonkey

[61] MadanlalMusuvathi and Shaz Qadeer 2007 Iterative context boundingfor systematic testing of multithreaded programs In Proceedings of theACM SIGPLAN 2007 Conference on Programming Language Design andImplementation San Diego California USA June 10-13 2007 JeanneFerrante and Kathryn S McKinley (Eds) ACM 446ndash455 httpsdoiorg10114512507341250785

[62] Madanlal Musuvathi Shaz Qadeer Thomas Ball Geacuterard Basler Pi-ramanayagam Arumuga Nainar and Iulian Neamtiu 2008 Find-ing and Reproducing Heisenbugs in Concurrent Programs See[23] 267ndash280 httpwwwusenixorgeventsosdi08techfull_papersmusuvathimusuvathipdf

[63] Satish Narayanasamy Gilles Pokam and Brad Calder 2005 BugNetContinuously Recording Program Execution for Deterministic ReplayDebugging In 32st International Symposium on Computer Architecture(ISCA 2005) 4-8 June 2005 Madison Wisconsin USA IEEE ComputerSociety 284ndash295 httpsdoiorg101109ISCA200516

[64] Brian Norris and Brian Demsky 2013 CDSchecker checking concur-rent data structures written with CC++ atomics In Proceedings ofthe 2013 ACM SIGPLAN International Conference on Object OrientedProgramming Systems Languages amp Applications OOPSLA 2013 part ofSPLASH 2013 Indianapolis IN USA October 26-31 2013 131ndash150

[65] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2016 Lightweight User-Space Record And ReplayCoRR abs161002144 (2016) arXiv161002144 httparxivorgabs161002144

[66] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2017 Engineering Record and Replay for Deploy-ability In 2017 USENIX Annual Technical Conference USENIX ATC 2017Santa Clara CA USA July 12-14 2017 377ndash389 httpswwwusenixorgconferenceatc17technical-sessionspresentationocallahan

[67] Harish Patil Cristiano Pereira Mack Stallcup Gregory Lueck andJames Cownie 2010 PinPlay a framework for deterministic replay

and reproducible analysis of parallel programs In Proceedings of theCGO 2010 The 8th International Symposium on Code Generation andOptimization Toronto Ontario Canada April 24-28 2010 AndreasMoshovos J Gregory Steffan Kim M Hazelwood and David R Kaeli(Eds) ACM 2ndash11 httpsdoiorg10114517729541772958

[68] pbzip2 development team 2018 pbzip2 httpslaunchpadnetpbzip2[69] Luiacutes Pina Anastasios Andronidis and Cristian Cadar 2018 FreeDA

deploying incompatible stock dynamic analyses in production viamulti-version execution In Proceedings of the 15th ACM InternationalConference on Computing Frontiers CF 2018 Ischia Italy May 08-102018 David R Kaeli and Miquel Pericagraves (Eds) ACM 1ndash10 httpsdoiorg10114532032173203237

[70] Gilles Pokam Klaus Danne Cristiano Pereira Rolf Kassa Tim KranichShiliang Hu Justin Emile Gottschlich Nima Honarmand Nathan Daut-enhahn Samuel T King and Josep Torrellas 2013 QuickRec proto-typing an intel architecture extension for record and replay of mul-tithreaded programs In The 40th Annual International Symposiumon Computer Architecture ISCArsquo13 Tel-Aviv Israel June 23-27 2013Avi Mendelson (Ed) ACM 643ndash654 httpsdoiorg10114524859222485977

[71] QuakeSpasm 2018 QuakeSpasm An engine for iD softwarersquos Quakehttpquakespasmsourceforgenet

[72] Yasushi Saito 2005 Jockey a user-space library for record-replaydebugging In Proceedings of the Sixth International Workshop on Auto-mated Debugging AADEBUG 2005 Monterey California USA Septem-ber 19-21 2005 Clinton Jeffery Jong-Deok Choi and Raimondas Lence-vicius (Eds) ACM 69ndash76 httpsdoiorg10114510851301085139

[73] Torr Samaho 2018 Zandronum httpszandronumcom[74] Konstantin Serebryany and Timur Iskhodzhanov 2009 ThreadSani-

tizer Data Race Detection in Practice In WBIA 62ndash71[75] Simple DirectMedia Layer 2018 SDL 20 library httpswwwlibsdl

orgdownload-20php[76] SQLite 2018 SQLite 3240 httpssqliteorgreleaselog3_24_0html[77] Deepa Srinivasan and Xuxian Jiang 2012 Time-Traveling Forensic

Analysis of VM-Based High-Interaction Honeypots In Security and Pri-vacy in Communication Networks Muttukrishnan Rajarajan Fred PiperHaining Wang and George Kesidis (Eds) Springer Berlin HeidelbergBerlin Heidelberg 209ndash226

[78] Paul Thomson Alastair F Donaldson and Adam Betts 2014 Concur-rency testing using schedule bounding an empirical study In ACMSIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPPoPP rsquo14 Orlando FL USA February 15-19 2014 Joseacute E Moreira andJames R Larus (Eds) ACM 15ndash28 httpsdoiorg10114525552432555260

[79] Joseph Tucek Shan Lu Chengdu Huang Spiros Xanthos andYuanyuan Zhou 2007 Triage diagnosing production run failuresat the userrsquos site In Proceedings of the 21st ACM Symposium on Oper-ating Systems Principles 2007 SOSP 2007 Stevenson Washington USAOctober 14-17 2007 Thomas C Bressoud and M Frans Kaashoek (Eds)ACM 131ndash144 httpsdoiorg10114512942611294275

[80] Undo 2018 Reversible debugging tools for CC++ on Linux amp Android(2018) httpundo-softwarecom

[81] Kaushik Veeraraghavan Dongyoon Lee Benjamin Wester JessicaOuyang Peter M Chen Jason Flinn and Satish Narayanasamy 2011DoublePlay parallelizing sequential logging and replay See [37] 15ndash26 httpsdoiorg10114519503651950370

[82] Stijn Volckaert Bart Coppens Bjorn De Sutter Koen De BosscherePer Larsen and Michael Franz 2017 Taming Parallelism in a Multi-Variant Execution Environment In Proceedings of the Twelfth EuropeanConference on Computer Systems EuroSys 2017 Belgrade Serbia April23-26 2017 Gustavo Alonso Ricardo Bianchini and Marko Vukolic(Eds) ACM 270ndash285 httpsdoiorg10114530641763064178

[83] Stijn Volckaert Bart Coppens Alexios Voulimeneas Andrei Home-scu Per Larsen Bjorn De Sutter and Michael Franz 2016 Secure

592

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

and Efficient Application Monitoring and Replication In 2016 USENIXAnnual Technical Conference USENIX ATC 2016 Denver CO USA June22-24 2016 Ajay Gulati and Hakim Weatherspoon (Eds) USENIXAssociation 167ndash179 httpswwwusenixorgconferenceatc16technical-sessionspresentationvolckaert

[84] YanWang Harish Patil Cristiano Pereira Gregory Lueck Rajiv Guptaand Iulian Neamtiu 2014 DrDebug Deterministic Replay based CyclicDebugging with Dynamic Slicing In 12th Annual IEEEACM Interna-tional Symposium on Code Generation and Optimization CGO 2014Orlando FL USA February 15-19 2014 David R Kaeli and TippMoseley(Eds) ACM 98 httpsdoiorg10114525441372544152

[85] Ted Wobber and Peter Druschel (Eds) 2011 Proceedings of the 23rdACM Symposium on Operating Systems Principles 2011 SOSP 2011Cascais Portugal October 23-26 2011 ACM httpsdoiorg1011452043556

[86] Min Xu Rastislav Bodiacutek and Mark D Hill 2003 A Flight DataRecorder for Enabling Full-System Multiprocessor Deterministic

Replay In 30th International Symposium on Computer Architecture(ISCA 2003) 9-11 June 2003 San Diego California USA Allan Got-tlieb and Kai Li (Eds) IEEE Computer Society 122ndash133 httpsdoiorg101109ISCA20031206994

[87] Jie Yu Satish Narayanasamy Cristiano Pereira and Gilles Pokam 2012Maple a coverage-driven testing tool for multithreaded programs InProceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications OOPSLA2012 part of SPLASH 2012 Tucson AZ USA October 21-25 2012 Gary TLeavens and Matthew B Dwyer (Eds) ACM 485ndash502 httpsdoiorg10114523846162384651

[88] Zandronum 2015 Zandronum bug tracker httpszandronumcomtrackerviewphpid=2380 Bug 0002380

[89] Naling Zhang Markus Kusano and ChaoWang 2015 Dynamic partialorder reduction for relaxed memory models See [35] 250ndash259 httpsdoiorg10114527379242737956

593

  • Abstract
  • 1 Introduction
  • 2 Background
  • 3 Scheduling
    • 31 Protocol Details
    • 32 Special Cases
    • 33 Liveness
      • 4 Record and Replay
        • 41 Motivating Example
        • 42 Interleaving
        • 43 Signals
        • 44 System Calls
        • 45 Asynchronous Events
          • 5 Evaluation
            • 51 CDSchecker Litmus Tests
            • 52 httpd
            • 53 PARSEC and pbzip
            • 54 SDL-based Games
            • 55 Limitations SQLite and SpiderMonkey
              • 6 Related Work
              • 7 Conclusion
              • Acknowledgments
              • References
Page 14: Sparse Record and Replay with Controlled Schedulingafd/homepages/papers/pdfs/2019/PLDI.pdf · Controlled concurrency testing has proven successful in find-ing subtle bugs in concurrent

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

Table 5 Indicative frames per second (fps) result captured by playing Quakespasm for 90 seconds five times per toolconfiguration and capturing the fps reports recorded by the game Frame-rate results are reported for the minimum maximummedian and mean case as well as the 25th and 75th quartiles

Min 25th Median 75th Max Mean OverheadNative 291 369 400 428 502 4004 10timestsan11 125 193 225 275 431 2338 17timesrnd 128 205 238 281 421 2471 16timesqueue 84 188 233 273 435 2333 17timesrnd + rec 113 178 212 253 366 2168 18timesqueue + rec 86 161 185 227 348 1930 21times

and as such has become the definitive tool for record and re-play We have performed an extensive comparison with rr insect5 and note that while rr outclasses tsan11rec in some appli-cations and can handle applications that are out of scope fortsan11rec (see sect55) rr shows significantly higher overheadcompared with tsan11rec for a number of applications thatrely on a high degree of parallelism for performance Furtherour sparse approach with suitable workarounds enablesrecord and replay for graphical applications (the SDL-basedgames of sect54) that rr cannot currently handleBecause it builds on tsan11 which itself uses compiler

instrumentation and a modified libcxx tsan11rec shares sim-ilarities with tools that depend on language implementationor library-level support [1 13 13 18 36 58] Notable exam-ples here include R2 [36] and IntelliTrace [58]Whole system replay aims to record all system nonde-

terminism [14 19 22 24 25 27 53 77] Among these therecent iReplayer tool [53] performs record and replay in-situavoiding many of the problems (eg memory layout issues)that otherwise come from running the record and replayexecutions under different processes

Some tools focus on the order-nondeterminism allowingthem to retain their parallelism and thus reducing the over-head of multi-threaded application [25 42 59 63 70 86]Castor [57] will provide each thread with its own bufferfor storing information and serialize them at a later timetsan11rec also fits into this category as it will both preserverparallelism of invisible operations and apply a schedulingstrategy to resolve this nondeterminism

An alternative to recording a programrsquos nondeterminismis to remove it making some or all aspects of the programdeterministic [5 8 16 17 21 55] For example Dthreads [55]ensures that memory accesses are deterministic on eachexecution Such approaches can have a significant probeproblem by removing the behaviour necessary for certainbugs to manifest in return for avoiding the performanceoverhead associated with handling order-nondeterminism

Multi-version execution Multi-version (or multi-variant)execution (MVE) is a method for concurrently running mul-tiple processes that are expected to behave in a semanticallysimilar manner [41 46 69 82 83] MVE can be used to detect

security vulnerabilities in applications if a variant divergesthis could indicate that an attacker has modified the processin some way [46 82 83] It can also be used for running dif-ferent analyses on identical processes that would not workwhen run together on the same process such as the clangsanitizers [69] Most MVE systems hinge on a special moni-tor thread that controls the generation an maintenance ofa number of variants Keeping the variants in sync with re-spect to nondeterministic behaviours presents many of thesame problems that are associated with record and replay

7 ConclusionWe have presented tsan11rec which brings together con-trolled scheduling record and replay and dynamic data racedetection for the dynamic analysis of CC++11 applicationsOur experimental evaluation demonstrates that the tool isin many cases competitive in terms of performance withrr a state-of-the-art record and replay tool in some casesout-performing rr due to tsan11recrsquos ability to preserve par-allelism in applications under test to a high degree We havealso shown that our tool is capable of recording and replayingSDL-based video games by exploiting our sparse approachto avoid recording aspects of gamedisplay communicationthat are fundamentally hard to control The flip side of oursparse approach is that by limiting what is recorded ourtool desynchronises on uncontrolled forms of nondetermin-ism such as that related to memory layout by capturingthis form of nondeterminism tools such as rr do not sufferfrom this problem Two exciting avenues for future workinclude investigating a spectrum of recording granularitiesto bridge the gap between our sparse approach and stricterapproaches in a configurable manner and to investigatebug detection using a richer range of scheduling strategiesincluding schedule bounding [26 61 62] and probabilisticconcurrency testing [12]

AcknowledgmentsThis work was supported by a PhD studentship funded byGCHQ and EPSRC projects EPN0263141 and EPR0068651

589

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

References[1] Bowen Alpern Jong-Deok Choi Ton Ngo Manu Sridharan and

John M Vlissides 2001 A Perturbation-Free Replay Platform forCross-Optimized Multithreaded Applications In Proceedings of the15th International Parallel amp Distributed Processing Symposium (IPDPS-01) San Francisco CA USA April 23-27 2001 IEEE Computer Society23 httpsdoiorg101109IPDPS2001924957

[2] GautamAltekar and Ion Stoica 2009 ODR output-deterministic replayfor multicore debugging In Proceedings of the 22nd ACM Symposiumon Operating Systems Principles 2009 SOSP 2009 Big Sky Montana USAOctober 11-14 2009 Jeanna Neefe Matthews and Thomas E Anderson(Eds) ACM 193ndash206 httpsdoiorg10114516295751629594

[3] Apache Software Foundation 2018 Apache httpd httpshttpdapacheorgdevdevnoteshtml

[4] Remzi H Arpaci-Dusseau and Brad Chen (Eds) 2010 9th USENIXSymposium on Operating Systems Design and Implementation OSDI2010 October 4-6 2010 Vancouver BC Canada Proceedings USENIXAssociation httpwwwusenixorgeventosdi10techfull_papersosdi10_proceedingspdf

[5] Amittai Aviram Shu-Chun Weng Sen Hu and Bryan Ford 2010 Ef-ficient System-Enforced Deterministic Parallelism See [4] 193ndash206httpwwwusenixorgeventsosdi10techfull_papersAvirampdf

[6] David F Bacon and Seth Copen Goldstein 1991 Hardware-assisted Re-play of Multiprocessor Programs In Proceedings of the 1991 ACMONRWorkshop on Parallel and Distributed Debugging (PADD rsquo91) ACMNew York NY USA 194ndash206 httpsdoiorg101145122759122777

[7] Mark Batty Scott Owens Susmit Sarkar Peter Sewell and TjarkWeber2011 Mathematizing C++ concurrency In POPL 55ndash66

[8] Tom Bergan Owen Anderson Joseph Devietti Luis Ceze and DanGrossman 2010 CoreDet a compiler and runtime system for deter-ministic multithreaded execution See [38] 53ndash64 httpsdoiorg10114517360201736029

[9] Tom Bergan Nicholas Hunt Luis Ceze and Steven D Gribble 2010Deterministic Process Groups in dOS See [4] 177ndash191 httpwwwusenixorgeventsosdi10techfull_papersBerganpdf

[10] Sanjay Bhansali Wen-Ke Chen Stuart de Jong Andrew EdwardsRon Murray Milenko Drinić Darek Mihočka and Joe Chau 2006Framework for Instruction-level Tracing and Analysis of Program Ex-ecutions In Proceedings of the 2Nd International Conference on VirtualExecution Environments (VEE rsquo06) ACM New York NY USA 154ndash163httpsdoiorg10114511347601220164

[11] Christian Bienia Sanjeev Kumar Jaswinder Pal Singh and Kai Li 2008The PARSEC benchmark suite characterization and architectural im-plications In 17th International Conference on Parallel Architecture andCompilation Techniques PACT 2008 Toronto Ontario Canada October25-29 2008 Andreas Moshovos David Tarditi and Kunle Olukotun(Eds) ACM 72ndash81 httpsdoiorg10114514541151454128

[12] Sebastian Burckhardt Pravesh Kothari Madanlal Musuvathi and San-tosh Nagarakatte 2010 A randomized scheduler with probabilisticguarantees of finding bugs See [38] 167ndash178 httpsdoiorg10114517360201736040

[13] Brian Burg Richard Bailey Andrew J Ko and Michael D Ernst 2013Interactive recordreplay for web application debugging In The 26thAnnual ACM Symposium on User Interface Software and TechnologyUISTrsquo13 St Andrews United Kingdom October 8-11 2013 ShahramIzadi Aaron J Quigley Ivan Poupyrev and Takeo Igarashi (Eds) ACM473ndash484 httpsdoiorg10114525019882502050

[14] Anton Burtsev David Johnson Mike Hibler Eric Eide and John Regehr2016 Abstractions for Practical Virtual Machine Replay In Proceedingsof the 12th ACM SIGPLANSIGOPS International Conference on VirtualExecution Environments Atlanta GA USA April 2-3 2016 VishakhaGupta-Cledat Donald E Porter and Vivek Sarkar (Eds) ACM 93ndash106httpsdoiorg10114528922422892257

[15] M E Chastain 1999 MEC (January 1999) httpslwnnet19990121amechtml

[16] Heming Cui Jiriacute Simsa Yi-Hong Lin Hao Li Ben Blum Xinan XuJunfeng Yang Garth A Gibson and Randal E Bryant 2013 Parrota practical runtime for deterministic stable and reliable threads InACM SIGOPS 24th Symposium on Operating Systems Principles SOSP rsquo13Farmington PA USA November 3-6 2013 Michael Kaminsky and MikeDahlin (Eds) ACM 388ndash405 httpsdoiorg10114525173492522735

[17] Heming Cui Jingyue Wu John Gallagher Huayang Guo and JunfengYang 2011 Efficient deterministic multithreading through schedulerelaxation See [85] 337ndash351 httpsdoiorg10114520435562043588

[18] P Deva 2018 Chronon (2018) httpchrononsystemscomblogdesign-and-architecture-ofthe-chronon-record-0

[19] David Devecsery Michael Chow Xianzheng Dou Jason Flinn andPeter M Chen 2014 Eidetic Systems In 11th USENIX Symposium onOperating Systems Design and Implementation OSDI rsquo14 BroomfieldCO USA October 6-8 2014 Jason Flinn and Hank Levy (Eds) USENIXAssociation 525ndash540 httpswwwusenixorgconferenceosdi14technical-sessionspresentationdevecsery

[20] Joseph Devietti Brandon Lucia Luis Ceze andMark Oskin 2009 DMPdeterministic shared memory multiprocessing In Proceedings of the14th International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2009 Washington DC USAMarch 7-11 2009 Mary Lou Soffa and Mary Jane Irwin (Eds) ACM85ndash96 httpsdoiorg10114515082441508255

[21] Joseph Devietti Jacob Nelson Tom Bergan Luis Ceze and Dan Gross-man 2011 RCDC a relaxed consistency deterministic computer See[37] 67ndash78 httpsdoiorg10114519503651950376

[22] Brendan Dolan-Gavitt Josh Hodosh Patrick Hulin Tim Leek andRyan Whelan 2015 Repeatable Reverse Engineering with PANDAIn Proceedings of the 5th Program Protection and Reverse EngineeringWorkshop (PPREW-5) ACM New York NY USA Article 4 11 pageshttpsdoiorg10114528438592843867

[23] Richard Draves and Robbert van Renesse (Eds) 2008 8th USENIX Sym-posium on Operating Systems Design and Implementation OSDI 2008December 8-10 2008 San Diego California USA Proceedings USENIXAssociation httpswwwusenixorgpublicationsproceedingsf[0]=im_group_audience3A73

[24] George W Dunlap Samuel T King Sukru Cinar Murtaza A Bas-rai and Peter M Chen 2002 ReVirt Enabling Intrusion AnalysisThrough Virtual-Machine Logging and Replay In 5th Symposium onOperating System Design and Implementation (OSDI 2002) Boston Mas-sachusetts USA December 9-11 2002 David E Culler and Peter Dr-uschel (Eds) USENIX Association httpwwwusenixorgeventsosdi02techdunlaphtml

[25] George W Dunlap Dominic G Lucchetti Michael A Fetterman andPeter M Chen 2008 Execution replay of multiprocessor virtual ma-chines In Proceedings of the 4th International Conference on VirtualExecution Environments VEE 2008 Seattle WA USA March 5-7 2008David Gregg Vikram S Adve and Brian N Bershad (Eds) ACM121ndash130 httpsdoiorg10114513462561346273

[26] Michael Emmi Shaz Qadeer and Zvonimir Rakamaric 2011 Delay-bounded scheduling In Proceedings of the 38th ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages POPL 2011 AustinTX USA January 26-28 2011 Thomas Ball and Mooly Sagiv (Eds)ACM 411ndash422 httpsdoiorg10114519263851926432

[27] Jakob Engblom Daniel Aarno and Bengt Werner 2010 Full-SystemSimulation from Embedded to High-Performance Systems Springer USBoston MA 25ndash45 httpsdoiorg101007978-1-4419-6175-4_3

[28] Cormac Flanagan and Stephen N Freund 2009 FastTrack efficientand precise dynamic race detection In Proceedings of the 2009 ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI 2009 Dublin Ireland June 15-21 2009 Michael Hind andAmer Diwan (Eds) ACM 121ndash133 httpsdoiorg1011451542476

590

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

1542490[29] Cormac Flanagan and Stephen N Freund 2010 Adversarial memory

for detecting destructive races In Proceedings of the 2010 ACM SIGPLANConference on Programming Language Design and Implementation PLDI2010 Toronto Ontario Canada June 5-10 2010 Benjamin G Zornand Alexander Aiken (Eds) ACM 244ndash254 httpsdoiorg10114518065961806625

[30] Cormac Flanagan and Stephen N Freund 2010 The RoadRunnerdynamic analysis framework for concurrent programs In Proceedingsof the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis forSoftware Tools and Engineering PASTErsquo10 Toronto Ontario CanadaJune 5-6 2010 Sorin Lerner and Atanas Rountev (Eds) ACM 1ndash8httpsdoiorg10114518066721806674

[31] Cormac Flanagan and Patrice Godefroid 2005 Dynamic partial-orderreduction for model checking software In Proceedings of the 32nd ACMSIGPLAN-SIGACT Symposium on Principles of Programming LanguagesPOPL 2005 Long Beach California USA January 12-14 2005 JensPalsberg and Martiacuten Abadi (Eds) ACM 110ndash121 httpsdoiorg10114510403051040315

[32] Dennis Geels Gautam Altekar Scott Shenker and Ion Stoica 2006Replay Debugging for Distributed Applications (Awarded Best Paper)In Proceedings of the 2006 USENIX Annual Technical Conference BostonMA USA May 30 - June 3 2006 Atul Adya and Erich M Nahum (Eds)USENIX 289ndash300 httpwwwusenixorgeventsusenix06techgeelshtml

[33] Patrice Godefroid 2005 Software Model Checking The VeriSoftApproach Formal Methods in System Design 26 2 (2005) 77ndash101httpsdoiorg101007s10703-005-1489-x

[34] C Gottbrath 2008 Reverse debugging with the TotalView debugger(May 2008)

[35] David Grove and Steve Blackburn (Eds) 2015 Proceedings of the36th ACM SIGPLAN Conference on Programming Language Design andImplementation Portland OR USA June 15-17 2015 ACM httpdlacmorgcitationcfmid=2737924

[36] Zhenyu Guo Xi Wang Jian Tang Xuezheng Liu Zhilei Xu Ming WuM Frans Kaashoek and Zheng Zhang 2008 R2 An Application-LevelKernel for Record and Replay See [23] 193ndash208 httpwwwusenixorgeventsosdi08techfull_papersguoguopdf

[37] Rajiv Gupta and Todd C Mowry (Eds) 2011 Proceedings of the 16thInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2011 Newport Beach CAUSA March 5-11 2011 ACM

[38] James C Hoe and Vikram S Adve (Eds) 2010 Proceedings of the 15thInternational Conference on Architectural Support for Programming Lan-guages and Operating Systems ASPLOS 2010 Pittsburgh PennsylvaniaUSA March 13-17 2010 ACM

[39] Nima Honarmand and Josep Torrellas 2014 RelaxReplay recordand replay for relaxed-consistency multiprocessors In ArchitecturalSupport for Programming Languages and Operating Systems ASPLOSrsquo14 Salt Lake City UT USA March 1-5 2014 Rajeev BalasubramonianAl Davis and Sarita V Adve (Eds) ACM 223ndash238 httpsdoiorg10114525419402541979

[40] Nima Honarmand and Josep Torrellas 2014 Replay debugging Lever-aging record and replay for program debugging In ACMIEEE 41stInternational Symposium on Computer Architecture ISCA 2014 Min-neapolis MN USA June 14-18 2014 IEEE Computer Society 455ndash456httpsdoiorg101109ISCA20146853229

[41] Petr Hosek and Cristian Cadar 2015 VARAN the Unbelievable AnEfficient N-version Execution Framework In Proceedings of the Twenti-eth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS rsquo15 Istanbul Turkey March14-18 2015 Oumlzcan Oumlzturk Kemal Ebcioglu and Sandhya Dwarkadas(Eds) ACM 339ndash353 httpsdoiorg10114526943442694390

[42] Derek Hower and Mark D Hill 2008 Rerun Exploiting Episodes forLightweight Memory Race Recording In 35th International Symposiumon Computer Architecture (ISCA 2008) June 21-25 2008 Beijing China265ndash276 httpsdoiorg101109ISCA200826

[43] Jeff Huang Peng Liu and Charles Zhang 2010 LEAP lightweightdeterministic multi-processor replay of concurrent java programsIn Proceedings of the 18th ACM SIGSOFT International Symposium onFoundations of Software Engineering 2010 Santa Fe NM USA November7-11 2010 Gruia-Catalin Roman and Andreacute van der Hoek (Eds) ACM207ndash216 httpsdoiorg10114518822911882323

[44] Jeff Huang Charles Zhang and Julian Dolby 2013 CLAP recordinglocal executions to reproduce concurrency failures In ACM SIGPLANConference on Programming Language Design and Implementation PLDIrsquo13 Seattle WA USA June 16-19 2013 Hans-Juergen Boehm and Cor-mac Flanagan (Eds) ACM 141ndash152 httpsdoiorg10114524621562462167

[45] Shiyou Huang Bowen Cai and Jeff Huang 2017 Towards Production-Run Heisenbugs Reproduction on Commercial Hardware In 2017USENIX Annual Technical Conference USENIX ATC 2017 Santa ClaraCA USA July 12-14 2017 403ndash415 httpswwwusenixorgconferenceatc17technical-sessionspresentationhuang

[46] Koen Koning Herbert Bos and Cristiano Giuffrida 2016 Secure andEfficient Multi-Variant Execution Using Hardware-Assisted ProcessVirtualization In 46th Annual IEEEIFIP International Conference onDependable Systems and Networks DSN 2016 Toulouse France June28 - July 1 2016 IEEE Computer Society 431ndash442 httpsdoiorg101109DSN201646

[47] Oren Laadan Nicolas Viennot and Jason Nieh 2010 Transparentlightweight application execution replay on commodity multiproces-sor operating systems In SIGMETRICS 2010 Proceedings of the 2010ACM SIGMETRICS International Conference on Measurement and Mod-eling of Computer Systems New York New York USA 14-18 June 2010Vishal Misra Paul Barford and Mark S Squillante (Eds) ACM 155ndash166 httpsdoiorg10114518110391811057

[48] Leslie Lamport 1978 Time Clocks and the Ordering of Events ina Distributed System Commun ACM 21 7 (1978) 558ndash565 httpsdoiorg101145359545359563

[49] Dongyoon Lee Peter M Chen Jason Flinn and Satish Narayanasamy2012 Chimera hybrid program analysis for determinism In ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI rsquo12 Beijing China - June 11 - 16 2012 Jan Vitek Haibo Linand Frank Tip (Eds) ACM 463ndash474 httpsdoiorg10114522540642254119

[50] Dongyoon Lee Benjamin Wester Kaushik Veeraraghavan SatishNarayanasamy Peter M Chen and Jason Flinn 2010 Respec efficientonline multiprocessor replayvia speculation and external determinismSee [38] 77ndash90 httpsdoiorg10114517360201736031

[51] Kyu Hyung Lee Dohyeong Kim and Xiangyu Zhang 2014Infrastructure-Free Logging and Replay of Concurrent Executionon Multiple Cores In ECOOP 2014 - Object-Oriented Programming- 28th European Conference Uppsala Sweden July 28 - August 12014 Proceedings (Lecture Notes in Computer Science) Richard EJones (Ed) Vol 8586 Springer 232ndash256 httpsdoiorg101007978-3-662-44202-9_10

[52] Christopher Lidbury and Alastair F Donaldson 2017 Dynamic racedetection for C++11 In Proceedings of the 44th ACM SIGPLAN Sympo-sium on Principles of Programming Languages POPL 2017 Paris FranceJanuary 18-20 2017 Giuseppe Castagna and Andrew D Gordon (Eds)ACM 443ndash457 httpsdoiorg1011453009837

[53] Hongyu Liu Sam Silvestro Wei Wang Chen Tian and Tongping Liu2018 iReplayer in-situ and identical record-and-replay for multi-threaded applications In Proceedings of the 39th ACM SIGPLAN Confer-ence on Programming Language Design and Implementation PLDI 2018

591

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Philadelphia PA USA June 18-22 2018 Jeffrey S Foster and Dan Gross-man (Eds) ACM 344ndash358 httpsdoiorg10114531923663192380

[54] Peng Liu Xiangyu Zhang Omer Tripp and Yunhui Zheng 2015 Lightreplay via tightly bounded recording See [35] 55ndash64 httpsdoiorg10114527379242738001

[55] Tongping Liu Charlie Curtsinger and EmeryD Berger 2011 Dthreadsefficient deterministic multithreading See [85] 327ndash336 httpsdoiorg10114520435562043587

[56] Shan Lu Soyeon Park Eunsoo Seo and Yuanyuan Zhou 2008 Learn-ing from mistakes a comprehensive study on real world concurrencybug characteristics In Proceedings of the 13th International Confer-ence on Architectural Support for Programming Languages and Op-erating Systems ASPLOS 2008 Seattle WA USA March 1-5 2008Susan J Eggers and James R Larus (Eds) ACM 329ndash339 httpsdoiorg10114513462811346323

[57] Ali Joseacute Mashtizadeh Tal Garfinkel David Terei David Maziegraveresand Mendel Rosenblum 2017 Towards Practical Default-On Multi-Core RecordReplay In Proceedings of the Twenty-Second InternationalConference on Architectural Support for Programming Languages andOperating Systems ASPLOS 2017 Xirsquoan China April 8-12 2017 YunjiChen Olivier Temam and John Carter (Eds) ACM 693ndash708 httpsdoiorg10114530376973037751

[58] Microsoft 2018 Understanding IntelliTrace part I What the $ isIntelliTrace (2018) httpsblogsmsdnmicrosoftcomzainnab20130212understanding-intellitrace-part-i-what-the-is-intellitrace

[59] Pablo Montesinos Luis Ceze and Josep Torrellas 2008 DeLoreanRecording and Deterministically Replaying Shared-Memory Multi-processor Execution Effciently In 35th International Symposium onComputer Architecture (ISCA 2008) June 21-25 2008 Beijing China289ndash300 httpsdoiorg101109ISCA200836

[60] Mozilla 2018 SpiderMonkey httpsdevelopermozillaorgen-USdocsMozillaProjectsSpiderMonkey

[61] MadanlalMusuvathi and Shaz Qadeer 2007 Iterative context boundingfor systematic testing of multithreaded programs In Proceedings of theACM SIGPLAN 2007 Conference on Programming Language Design andImplementation San Diego California USA June 10-13 2007 JeanneFerrante and Kathryn S McKinley (Eds) ACM 446ndash455 httpsdoiorg10114512507341250785

[62] Madanlal Musuvathi Shaz Qadeer Thomas Ball Geacuterard Basler Pi-ramanayagam Arumuga Nainar and Iulian Neamtiu 2008 Find-ing and Reproducing Heisenbugs in Concurrent Programs See[23] 267ndash280 httpwwwusenixorgeventsosdi08techfull_papersmusuvathimusuvathipdf

[63] Satish Narayanasamy Gilles Pokam and Brad Calder 2005 BugNetContinuously Recording Program Execution for Deterministic ReplayDebugging In 32st International Symposium on Computer Architecture(ISCA 2005) 4-8 June 2005 Madison Wisconsin USA IEEE ComputerSociety 284ndash295 httpsdoiorg101109ISCA200516

[64] Brian Norris and Brian Demsky 2013 CDSchecker checking concur-rent data structures written with CC++ atomics In Proceedings ofthe 2013 ACM SIGPLAN International Conference on Object OrientedProgramming Systems Languages amp Applications OOPSLA 2013 part ofSPLASH 2013 Indianapolis IN USA October 26-31 2013 131ndash150

[65] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2016 Lightweight User-Space Record And ReplayCoRR abs161002144 (2016) arXiv161002144 httparxivorgabs161002144

[66] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2017 Engineering Record and Replay for Deploy-ability In 2017 USENIX Annual Technical Conference USENIX ATC 2017Santa Clara CA USA July 12-14 2017 377ndash389 httpswwwusenixorgconferenceatc17technical-sessionspresentationocallahan

[67] Harish Patil Cristiano Pereira Mack Stallcup Gregory Lueck andJames Cownie 2010 PinPlay a framework for deterministic replay

and reproducible analysis of parallel programs In Proceedings of theCGO 2010 The 8th International Symposium on Code Generation andOptimization Toronto Ontario Canada April 24-28 2010 AndreasMoshovos J Gregory Steffan Kim M Hazelwood and David R Kaeli(Eds) ACM 2ndash11 httpsdoiorg10114517729541772958

[68] pbzip2 development team 2018 pbzip2 httpslaunchpadnetpbzip2[69] Luiacutes Pina Anastasios Andronidis and Cristian Cadar 2018 FreeDA

deploying incompatible stock dynamic analyses in production viamulti-version execution In Proceedings of the 15th ACM InternationalConference on Computing Frontiers CF 2018 Ischia Italy May 08-102018 David R Kaeli and Miquel Pericagraves (Eds) ACM 1ndash10 httpsdoiorg10114532032173203237

[70] Gilles Pokam Klaus Danne Cristiano Pereira Rolf Kassa Tim KranichShiliang Hu Justin Emile Gottschlich Nima Honarmand Nathan Daut-enhahn Samuel T King and Josep Torrellas 2013 QuickRec proto-typing an intel architecture extension for record and replay of mul-tithreaded programs In The 40th Annual International Symposiumon Computer Architecture ISCArsquo13 Tel-Aviv Israel June 23-27 2013Avi Mendelson (Ed) ACM 643ndash654 httpsdoiorg10114524859222485977

[71] QuakeSpasm 2018 QuakeSpasm An engine for iD softwarersquos Quakehttpquakespasmsourceforgenet

[72] Yasushi Saito 2005 Jockey a user-space library for record-replaydebugging In Proceedings of the Sixth International Workshop on Auto-mated Debugging AADEBUG 2005 Monterey California USA Septem-ber 19-21 2005 Clinton Jeffery Jong-Deok Choi and Raimondas Lence-vicius (Eds) ACM 69ndash76 httpsdoiorg10114510851301085139

[73] Torr Samaho 2018 Zandronum httpszandronumcom[74] Konstantin Serebryany and Timur Iskhodzhanov 2009 ThreadSani-

tizer Data Race Detection in Practice In WBIA 62ndash71[75] Simple DirectMedia Layer 2018 SDL 20 library httpswwwlibsdl

orgdownload-20php[76] SQLite 2018 SQLite 3240 httpssqliteorgreleaselog3_24_0html[77] Deepa Srinivasan and Xuxian Jiang 2012 Time-Traveling Forensic

Analysis of VM-Based High-Interaction Honeypots In Security and Pri-vacy in Communication Networks Muttukrishnan Rajarajan Fred PiperHaining Wang and George Kesidis (Eds) Springer Berlin HeidelbergBerlin Heidelberg 209ndash226

[78] Paul Thomson Alastair F Donaldson and Adam Betts 2014 Concur-rency testing using schedule bounding an empirical study In ACMSIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPPoPP rsquo14 Orlando FL USA February 15-19 2014 Joseacute E Moreira andJames R Larus (Eds) ACM 15ndash28 httpsdoiorg10114525552432555260

[79] Joseph Tucek Shan Lu Chengdu Huang Spiros Xanthos andYuanyuan Zhou 2007 Triage diagnosing production run failuresat the userrsquos site In Proceedings of the 21st ACM Symposium on Oper-ating Systems Principles 2007 SOSP 2007 Stevenson Washington USAOctober 14-17 2007 Thomas C Bressoud and M Frans Kaashoek (Eds)ACM 131ndash144 httpsdoiorg10114512942611294275

[80] Undo 2018 Reversible debugging tools for CC++ on Linux amp Android(2018) httpundo-softwarecom

[81] Kaushik Veeraraghavan Dongyoon Lee Benjamin Wester JessicaOuyang Peter M Chen Jason Flinn and Satish Narayanasamy 2011DoublePlay parallelizing sequential logging and replay See [37] 15ndash26 httpsdoiorg10114519503651950370

[82] Stijn Volckaert Bart Coppens Bjorn De Sutter Koen De BosscherePer Larsen and Michael Franz 2017 Taming Parallelism in a Multi-Variant Execution Environment In Proceedings of the Twelfth EuropeanConference on Computer Systems EuroSys 2017 Belgrade Serbia April23-26 2017 Gustavo Alonso Ricardo Bianchini and Marko Vukolic(Eds) ACM 270ndash285 httpsdoiorg10114530641763064178

[83] Stijn Volckaert Bart Coppens Alexios Voulimeneas Andrei Home-scu Per Larsen Bjorn De Sutter and Michael Franz 2016 Secure

592

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

and Efficient Application Monitoring and Replication In 2016 USENIXAnnual Technical Conference USENIX ATC 2016 Denver CO USA June22-24 2016 Ajay Gulati and Hakim Weatherspoon (Eds) USENIXAssociation 167ndash179 httpswwwusenixorgconferenceatc16technical-sessionspresentationvolckaert

[84] YanWang Harish Patil Cristiano Pereira Gregory Lueck Rajiv Guptaand Iulian Neamtiu 2014 DrDebug Deterministic Replay based CyclicDebugging with Dynamic Slicing In 12th Annual IEEEACM Interna-tional Symposium on Code Generation and Optimization CGO 2014Orlando FL USA February 15-19 2014 David R Kaeli and TippMoseley(Eds) ACM 98 httpsdoiorg10114525441372544152

[85] Ted Wobber and Peter Druschel (Eds) 2011 Proceedings of the 23rdACM Symposium on Operating Systems Principles 2011 SOSP 2011Cascais Portugal October 23-26 2011 ACM httpsdoiorg1011452043556

[86] Min Xu Rastislav Bodiacutek and Mark D Hill 2003 A Flight DataRecorder for Enabling Full-System Multiprocessor Deterministic

Replay In 30th International Symposium on Computer Architecture(ISCA 2003) 9-11 June 2003 San Diego California USA Allan Got-tlieb and Kai Li (Eds) IEEE Computer Society 122ndash133 httpsdoiorg101109ISCA20031206994

[87] Jie Yu Satish Narayanasamy Cristiano Pereira and Gilles Pokam 2012Maple a coverage-driven testing tool for multithreaded programs InProceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications OOPSLA2012 part of SPLASH 2012 Tucson AZ USA October 21-25 2012 Gary TLeavens and Matthew B Dwyer (Eds) ACM 485ndash502 httpsdoiorg10114523846162384651

[88] Zandronum 2015 Zandronum bug tracker httpszandronumcomtrackerviewphpid=2380 Bug 0002380

[89] Naling Zhang Markus Kusano and ChaoWang 2015 Dynamic partialorder reduction for relaxed memory models See [35] 250ndash259 httpsdoiorg10114527379242737956

593

  • Abstract
  • 1 Introduction
  • 2 Background
  • 3 Scheduling
    • 31 Protocol Details
    • 32 Special Cases
    • 33 Liveness
      • 4 Record and Replay
        • 41 Motivating Example
        • 42 Interleaving
        • 43 Signals
        • 44 System Calls
        • 45 Asynchronous Events
          • 5 Evaluation
            • 51 CDSchecker Litmus Tests
            • 52 httpd
            • 53 PARSEC and pbzip
            • 54 SDL-based Games
            • 55 Limitations SQLite and SpiderMonkey
              • 6 Related Work
              • 7 Conclusion
              • Acknowledgments
              • References
Page 15: Sparse Record and Replay with Controlled Schedulingafd/homepages/papers/pdfs/2019/PLDI.pdf · Controlled concurrency testing has proven successful in find-ing subtle bugs in concurrent

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

References[1] Bowen Alpern Jong-Deok Choi Ton Ngo Manu Sridharan and

John M Vlissides 2001 A Perturbation-Free Replay Platform forCross-Optimized Multithreaded Applications In Proceedings of the15th International Parallel amp Distributed Processing Symposium (IPDPS-01) San Francisco CA USA April 23-27 2001 IEEE Computer Society23 httpsdoiorg101109IPDPS2001924957

[2] GautamAltekar and Ion Stoica 2009 ODR output-deterministic replayfor multicore debugging In Proceedings of the 22nd ACM Symposiumon Operating Systems Principles 2009 SOSP 2009 Big Sky Montana USAOctober 11-14 2009 Jeanna Neefe Matthews and Thomas E Anderson(Eds) ACM 193ndash206 httpsdoiorg10114516295751629594

[3] Apache Software Foundation 2018 Apache httpd httpshttpdapacheorgdevdevnoteshtml

[4] Remzi H Arpaci-Dusseau and Brad Chen (Eds) 2010 9th USENIXSymposium on Operating Systems Design and Implementation OSDI2010 October 4-6 2010 Vancouver BC Canada Proceedings USENIXAssociation httpwwwusenixorgeventosdi10techfull_papersosdi10_proceedingspdf

[5] Amittai Aviram Shu-Chun Weng Sen Hu and Bryan Ford 2010 Ef-ficient System-Enforced Deterministic Parallelism See [4] 193ndash206httpwwwusenixorgeventsosdi10techfull_papersAvirampdf

[6] David F Bacon and Seth Copen Goldstein 1991 Hardware-assisted Re-play of Multiprocessor Programs In Proceedings of the 1991 ACMONRWorkshop on Parallel and Distributed Debugging (PADD rsquo91) ACMNew York NY USA 194ndash206 httpsdoiorg101145122759122777

[7] Mark Batty Scott Owens Susmit Sarkar Peter Sewell and TjarkWeber2011 Mathematizing C++ concurrency In POPL 55ndash66

[8] Tom Bergan Owen Anderson Joseph Devietti Luis Ceze and DanGrossman 2010 CoreDet a compiler and runtime system for deter-ministic multithreaded execution See [38] 53ndash64 httpsdoiorg10114517360201736029

[9] Tom Bergan Nicholas Hunt Luis Ceze and Steven D Gribble 2010Deterministic Process Groups in dOS See [4] 177ndash191 httpwwwusenixorgeventsosdi10techfull_papersBerganpdf

[10] Sanjay Bhansali Wen-Ke Chen Stuart de Jong Andrew EdwardsRon Murray Milenko Drinić Darek Mihočka and Joe Chau 2006Framework for Instruction-level Tracing and Analysis of Program Ex-ecutions In Proceedings of the 2Nd International Conference on VirtualExecution Environments (VEE rsquo06) ACM New York NY USA 154ndash163httpsdoiorg10114511347601220164

[11] Christian Bienia Sanjeev Kumar Jaswinder Pal Singh and Kai Li 2008The PARSEC benchmark suite characterization and architectural im-plications In 17th International Conference on Parallel Architecture andCompilation Techniques PACT 2008 Toronto Ontario Canada October25-29 2008 Andreas Moshovos David Tarditi and Kunle Olukotun(Eds) ACM 72ndash81 httpsdoiorg10114514541151454128

[12] Sebastian Burckhardt Pravesh Kothari Madanlal Musuvathi and San-tosh Nagarakatte 2010 A randomized scheduler with probabilisticguarantees of finding bugs See [38] 167ndash178 httpsdoiorg10114517360201736040

[13] Brian Burg Richard Bailey Andrew J Ko and Michael D Ernst 2013Interactive recordreplay for web application debugging In The 26thAnnual ACM Symposium on User Interface Software and TechnologyUISTrsquo13 St Andrews United Kingdom October 8-11 2013 ShahramIzadi Aaron J Quigley Ivan Poupyrev and Takeo Igarashi (Eds) ACM473ndash484 httpsdoiorg10114525019882502050

[14] Anton Burtsev David Johnson Mike Hibler Eric Eide and John Regehr2016 Abstractions for Practical Virtual Machine Replay In Proceedingsof the 12th ACM SIGPLANSIGOPS International Conference on VirtualExecution Environments Atlanta GA USA April 2-3 2016 VishakhaGupta-Cledat Donald E Porter and Vivek Sarkar (Eds) ACM 93ndash106httpsdoiorg10114528922422892257

[15] M E Chastain 1999 MEC (January 1999) httpslwnnet19990121amechtml

[16] Heming Cui Jiriacute Simsa Yi-Hong Lin Hao Li Ben Blum Xinan XuJunfeng Yang Garth A Gibson and Randal E Bryant 2013 Parrota practical runtime for deterministic stable and reliable threads InACM SIGOPS 24th Symposium on Operating Systems Principles SOSP rsquo13Farmington PA USA November 3-6 2013 Michael Kaminsky and MikeDahlin (Eds) ACM 388ndash405 httpsdoiorg10114525173492522735

[17] Heming Cui Jingyue Wu John Gallagher Huayang Guo and JunfengYang 2011 Efficient deterministic multithreading through schedulerelaxation See [85] 337ndash351 httpsdoiorg10114520435562043588

[18] P Deva 2018 Chronon (2018) httpchrononsystemscomblogdesign-and-architecture-ofthe-chronon-record-0

[19] David Devecsery Michael Chow Xianzheng Dou Jason Flinn andPeter M Chen 2014 Eidetic Systems In 11th USENIX Symposium onOperating Systems Design and Implementation OSDI rsquo14 BroomfieldCO USA October 6-8 2014 Jason Flinn and Hank Levy (Eds) USENIXAssociation 525ndash540 httpswwwusenixorgconferenceosdi14technical-sessionspresentationdevecsery

[20] Joseph Devietti Brandon Lucia Luis Ceze andMark Oskin 2009 DMPdeterministic shared memory multiprocessing In Proceedings of the14th International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2009 Washington DC USAMarch 7-11 2009 Mary Lou Soffa and Mary Jane Irwin (Eds) ACM85ndash96 httpsdoiorg10114515082441508255

[21] Joseph Devietti Jacob Nelson Tom Bergan Luis Ceze and Dan Gross-man 2011 RCDC a relaxed consistency deterministic computer See[37] 67ndash78 httpsdoiorg10114519503651950376

[22] Brendan Dolan-Gavitt Josh Hodosh Patrick Hulin Tim Leek andRyan Whelan 2015 Repeatable Reverse Engineering with PANDAIn Proceedings of the 5th Program Protection and Reverse EngineeringWorkshop (PPREW-5) ACM New York NY USA Article 4 11 pageshttpsdoiorg10114528438592843867

[23] Richard Draves and Robbert van Renesse (Eds) 2008 8th USENIX Sym-posium on Operating Systems Design and Implementation OSDI 2008December 8-10 2008 San Diego California USA Proceedings USENIXAssociation httpswwwusenixorgpublicationsproceedingsf[0]=im_group_audience3A73

[24] George W Dunlap Samuel T King Sukru Cinar Murtaza A Bas-rai and Peter M Chen 2002 ReVirt Enabling Intrusion AnalysisThrough Virtual-Machine Logging and Replay In 5th Symposium onOperating System Design and Implementation (OSDI 2002) Boston Mas-sachusetts USA December 9-11 2002 David E Culler and Peter Dr-uschel (Eds) USENIX Association httpwwwusenixorgeventsosdi02techdunlaphtml

[25] George W Dunlap Dominic G Lucchetti Michael A Fetterman andPeter M Chen 2008 Execution replay of multiprocessor virtual ma-chines In Proceedings of the 4th International Conference on VirtualExecution Environments VEE 2008 Seattle WA USA March 5-7 2008David Gregg Vikram S Adve and Brian N Bershad (Eds) ACM121ndash130 httpsdoiorg10114513462561346273

[26] Michael Emmi Shaz Qadeer and Zvonimir Rakamaric 2011 Delay-bounded scheduling In Proceedings of the 38th ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages POPL 2011 AustinTX USA January 26-28 2011 Thomas Ball and Mooly Sagiv (Eds)ACM 411ndash422 httpsdoiorg10114519263851926432

[27] Jakob Engblom Daniel Aarno and Bengt Werner 2010 Full-SystemSimulation from Embedded to High-Performance Systems Springer USBoston MA 25ndash45 httpsdoiorg101007978-1-4419-6175-4_3

[28] Cormac Flanagan and Stephen N Freund 2009 FastTrack efficientand precise dynamic race detection In Proceedings of the 2009 ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI 2009 Dublin Ireland June 15-21 2009 Michael Hind andAmer Diwan (Eds) ACM 121ndash133 httpsdoiorg1011451542476

590

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

1542490[29] Cormac Flanagan and Stephen N Freund 2010 Adversarial memory

for detecting destructive races In Proceedings of the 2010 ACM SIGPLANConference on Programming Language Design and Implementation PLDI2010 Toronto Ontario Canada June 5-10 2010 Benjamin G Zornand Alexander Aiken (Eds) ACM 244ndash254 httpsdoiorg10114518065961806625

[30] Cormac Flanagan and Stephen N Freund 2010 The RoadRunnerdynamic analysis framework for concurrent programs In Proceedingsof the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis forSoftware Tools and Engineering PASTErsquo10 Toronto Ontario CanadaJune 5-6 2010 Sorin Lerner and Atanas Rountev (Eds) ACM 1ndash8httpsdoiorg10114518066721806674

[31] Cormac Flanagan and Patrice Godefroid 2005 Dynamic partial-orderreduction for model checking software In Proceedings of the 32nd ACMSIGPLAN-SIGACT Symposium on Principles of Programming LanguagesPOPL 2005 Long Beach California USA January 12-14 2005 JensPalsberg and Martiacuten Abadi (Eds) ACM 110ndash121 httpsdoiorg10114510403051040315

[32] Dennis Geels Gautam Altekar Scott Shenker and Ion Stoica 2006Replay Debugging for Distributed Applications (Awarded Best Paper)In Proceedings of the 2006 USENIX Annual Technical Conference BostonMA USA May 30 - June 3 2006 Atul Adya and Erich M Nahum (Eds)USENIX 289ndash300 httpwwwusenixorgeventsusenix06techgeelshtml

[33] Patrice Godefroid 2005 Software Model Checking The VeriSoftApproach Formal Methods in System Design 26 2 (2005) 77ndash101httpsdoiorg101007s10703-005-1489-x

[34] C Gottbrath 2008 Reverse debugging with the TotalView debugger(May 2008)

[35] David Grove and Steve Blackburn (Eds) 2015 Proceedings of the36th ACM SIGPLAN Conference on Programming Language Design andImplementation Portland OR USA June 15-17 2015 ACM httpdlacmorgcitationcfmid=2737924

[36] Zhenyu Guo Xi Wang Jian Tang Xuezheng Liu Zhilei Xu Ming WuM Frans Kaashoek and Zheng Zhang 2008 R2 An Application-LevelKernel for Record and Replay See [23] 193ndash208 httpwwwusenixorgeventsosdi08techfull_papersguoguopdf

[37] Rajiv Gupta and Todd C Mowry (Eds) 2011 Proceedings of the 16thInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2011 Newport Beach CAUSA March 5-11 2011 ACM

[38] James C Hoe and Vikram S Adve (Eds) 2010 Proceedings of the 15thInternational Conference on Architectural Support for Programming Lan-guages and Operating Systems ASPLOS 2010 Pittsburgh PennsylvaniaUSA March 13-17 2010 ACM

[39] Nima Honarmand and Josep Torrellas 2014 RelaxReplay recordand replay for relaxed-consistency multiprocessors In ArchitecturalSupport for Programming Languages and Operating Systems ASPLOSrsquo14 Salt Lake City UT USA March 1-5 2014 Rajeev BalasubramonianAl Davis and Sarita V Adve (Eds) ACM 223ndash238 httpsdoiorg10114525419402541979

[40] Nima Honarmand and Josep Torrellas 2014 Replay debugging Lever-aging record and replay for program debugging In ACMIEEE 41stInternational Symposium on Computer Architecture ISCA 2014 Min-neapolis MN USA June 14-18 2014 IEEE Computer Society 455ndash456httpsdoiorg101109ISCA20146853229

[41] Petr Hosek and Cristian Cadar 2015 VARAN the Unbelievable AnEfficient N-version Execution Framework In Proceedings of the Twenti-eth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS rsquo15 Istanbul Turkey March14-18 2015 Oumlzcan Oumlzturk Kemal Ebcioglu and Sandhya Dwarkadas(Eds) ACM 339ndash353 httpsdoiorg10114526943442694390

[42] Derek Hower and Mark D Hill 2008 Rerun Exploiting Episodes forLightweight Memory Race Recording In 35th International Symposiumon Computer Architecture (ISCA 2008) June 21-25 2008 Beijing China265ndash276 httpsdoiorg101109ISCA200826

[43] Jeff Huang Peng Liu and Charles Zhang 2010 LEAP lightweightdeterministic multi-processor replay of concurrent java programsIn Proceedings of the 18th ACM SIGSOFT International Symposium onFoundations of Software Engineering 2010 Santa Fe NM USA November7-11 2010 Gruia-Catalin Roman and Andreacute van der Hoek (Eds) ACM207ndash216 httpsdoiorg10114518822911882323

[44] Jeff Huang Charles Zhang and Julian Dolby 2013 CLAP recordinglocal executions to reproduce concurrency failures In ACM SIGPLANConference on Programming Language Design and Implementation PLDIrsquo13 Seattle WA USA June 16-19 2013 Hans-Juergen Boehm and Cor-mac Flanagan (Eds) ACM 141ndash152 httpsdoiorg10114524621562462167

[45] Shiyou Huang Bowen Cai and Jeff Huang 2017 Towards Production-Run Heisenbugs Reproduction on Commercial Hardware In 2017USENIX Annual Technical Conference USENIX ATC 2017 Santa ClaraCA USA July 12-14 2017 403ndash415 httpswwwusenixorgconferenceatc17technical-sessionspresentationhuang

[46] Koen Koning Herbert Bos and Cristiano Giuffrida 2016 Secure andEfficient Multi-Variant Execution Using Hardware-Assisted ProcessVirtualization In 46th Annual IEEEIFIP International Conference onDependable Systems and Networks DSN 2016 Toulouse France June28 - July 1 2016 IEEE Computer Society 431ndash442 httpsdoiorg101109DSN201646

[47] Oren Laadan Nicolas Viennot and Jason Nieh 2010 Transparentlightweight application execution replay on commodity multiproces-sor operating systems In SIGMETRICS 2010 Proceedings of the 2010ACM SIGMETRICS International Conference on Measurement and Mod-eling of Computer Systems New York New York USA 14-18 June 2010Vishal Misra Paul Barford and Mark S Squillante (Eds) ACM 155ndash166 httpsdoiorg10114518110391811057

[48] Leslie Lamport 1978 Time Clocks and the Ordering of Events ina Distributed System Commun ACM 21 7 (1978) 558ndash565 httpsdoiorg101145359545359563

[49] Dongyoon Lee Peter M Chen Jason Flinn and Satish Narayanasamy2012 Chimera hybrid program analysis for determinism In ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI rsquo12 Beijing China - June 11 - 16 2012 Jan Vitek Haibo Linand Frank Tip (Eds) ACM 463ndash474 httpsdoiorg10114522540642254119

[50] Dongyoon Lee Benjamin Wester Kaushik Veeraraghavan SatishNarayanasamy Peter M Chen and Jason Flinn 2010 Respec efficientonline multiprocessor replayvia speculation and external determinismSee [38] 77ndash90 httpsdoiorg10114517360201736031

[51] Kyu Hyung Lee Dohyeong Kim and Xiangyu Zhang 2014Infrastructure-Free Logging and Replay of Concurrent Executionon Multiple Cores In ECOOP 2014 - Object-Oriented Programming- 28th European Conference Uppsala Sweden July 28 - August 12014 Proceedings (Lecture Notes in Computer Science) Richard EJones (Ed) Vol 8586 Springer 232ndash256 httpsdoiorg101007978-3-662-44202-9_10

[52] Christopher Lidbury and Alastair F Donaldson 2017 Dynamic racedetection for C++11 In Proceedings of the 44th ACM SIGPLAN Sympo-sium on Principles of Programming Languages POPL 2017 Paris FranceJanuary 18-20 2017 Giuseppe Castagna and Andrew D Gordon (Eds)ACM 443ndash457 httpsdoiorg1011453009837

[53] Hongyu Liu Sam Silvestro Wei Wang Chen Tian and Tongping Liu2018 iReplayer in-situ and identical record-and-replay for multi-threaded applications In Proceedings of the 39th ACM SIGPLAN Confer-ence on Programming Language Design and Implementation PLDI 2018

591

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Philadelphia PA USA June 18-22 2018 Jeffrey S Foster and Dan Gross-man (Eds) ACM 344ndash358 httpsdoiorg10114531923663192380

[54] Peng Liu Xiangyu Zhang Omer Tripp and Yunhui Zheng 2015 Lightreplay via tightly bounded recording See [35] 55ndash64 httpsdoiorg10114527379242738001

[55] Tongping Liu Charlie Curtsinger and EmeryD Berger 2011 Dthreadsefficient deterministic multithreading See [85] 327ndash336 httpsdoiorg10114520435562043587

[56] Shan Lu Soyeon Park Eunsoo Seo and Yuanyuan Zhou 2008 Learn-ing from mistakes a comprehensive study on real world concurrencybug characteristics In Proceedings of the 13th International Confer-ence on Architectural Support for Programming Languages and Op-erating Systems ASPLOS 2008 Seattle WA USA March 1-5 2008Susan J Eggers and James R Larus (Eds) ACM 329ndash339 httpsdoiorg10114513462811346323

[57] Ali Joseacute Mashtizadeh Tal Garfinkel David Terei David Maziegraveresand Mendel Rosenblum 2017 Towards Practical Default-On Multi-Core RecordReplay In Proceedings of the Twenty-Second InternationalConference on Architectural Support for Programming Languages andOperating Systems ASPLOS 2017 Xirsquoan China April 8-12 2017 YunjiChen Olivier Temam and John Carter (Eds) ACM 693ndash708 httpsdoiorg10114530376973037751

[58] Microsoft 2018 Understanding IntelliTrace part I What the $ isIntelliTrace (2018) httpsblogsmsdnmicrosoftcomzainnab20130212understanding-intellitrace-part-i-what-the-is-intellitrace

[59] Pablo Montesinos Luis Ceze and Josep Torrellas 2008 DeLoreanRecording and Deterministically Replaying Shared-Memory Multi-processor Execution Effciently In 35th International Symposium onComputer Architecture (ISCA 2008) June 21-25 2008 Beijing China289ndash300 httpsdoiorg101109ISCA200836

[60] Mozilla 2018 SpiderMonkey httpsdevelopermozillaorgen-USdocsMozillaProjectsSpiderMonkey

[61] MadanlalMusuvathi and Shaz Qadeer 2007 Iterative context boundingfor systematic testing of multithreaded programs In Proceedings of theACM SIGPLAN 2007 Conference on Programming Language Design andImplementation San Diego California USA June 10-13 2007 JeanneFerrante and Kathryn S McKinley (Eds) ACM 446ndash455 httpsdoiorg10114512507341250785

[62] Madanlal Musuvathi Shaz Qadeer Thomas Ball Geacuterard Basler Pi-ramanayagam Arumuga Nainar and Iulian Neamtiu 2008 Find-ing and Reproducing Heisenbugs in Concurrent Programs See[23] 267ndash280 httpwwwusenixorgeventsosdi08techfull_papersmusuvathimusuvathipdf

[63] Satish Narayanasamy Gilles Pokam and Brad Calder 2005 BugNetContinuously Recording Program Execution for Deterministic ReplayDebugging In 32st International Symposium on Computer Architecture(ISCA 2005) 4-8 June 2005 Madison Wisconsin USA IEEE ComputerSociety 284ndash295 httpsdoiorg101109ISCA200516

[64] Brian Norris and Brian Demsky 2013 CDSchecker checking concur-rent data structures written with CC++ atomics In Proceedings ofthe 2013 ACM SIGPLAN International Conference on Object OrientedProgramming Systems Languages amp Applications OOPSLA 2013 part ofSPLASH 2013 Indianapolis IN USA October 26-31 2013 131ndash150

[65] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2016 Lightweight User-Space Record And ReplayCoRR abs161002144 (2016) arXiv161002144 httparxivorgabs161002144

[66] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2017 Engineering Record and Replay for Deploy-ability In 2017 USENIX Annual Technical Conference USENIX ATC 2017Santa Clara CA USA July 12-14 2017 377ndash389 httpswwwusenixorgconferenceatc17technical-sessionspresentationocallahan

[67] Harish Patil Cristiano Pereira Mack Stallcup Gregory Lueck andJames Cownie 2010 PinPlay a framework for deterministic replay

and reproducible analysis of parallel programs In Proceedings of theCGO 2010 The 8th International Symposium on Code Generation andOptimization Toronto Ontario Canada April 24-28 2010 AndreasMoshovos J Gregory Steffan Kim M Hazelwood and David R Kaeli(Eds) ACM 2ndash11 httpsdoiorg10114517729541772958

[68] pbzip2 development team 2018 pbzip2 httpslaunchpadnetpbzip2[69] Luiacutes Pina Anastasios Andronidis and Cristian Cadar 2018 FreeDA

deploying incompatible stock dynamic analyses in production viamulti-version execution In Proceedings of the 15th ACM InternationalConference on Computing Frontiers CF 2018 Ischia Italy May 08-102018 David R Kaeli and Miquel Pericagraves (Eds) ACM 1ndash10 httpsdoiorg10114532032173203237

[70] Gilles Pokam Klaus Danne Cristiano Pereira Rolf Kassa Tim KranichShiliang Hu Justin Emile Gottschlich Nima Honarmand Nathan Daut-enhahn Samuel T King and Josep Torrellas 2013 QuickRec proto-typing an intel architecture extension for record and replay of mul-tithreaded programs In The 40th Annual International Symposiumon Computer Architecture ISCArsquo13 Tel-Aviv Israel June 23-27 2013Avi Mendelson (Ed) ACM 643ndash654 httpsdoiorg10114524859222485977

[71] QuakeSpasm 2018 QuakeSpasm An engine for iD softwarersquos Quakehttpquakespasmsourceforgenet

[72] Yasushi Saito 2005 Jockey a user-space library for record-replaydebugging In Proceedings of the Sixth International Workshop on Auto-mated Debugging AADEBUG 2005 Monterey California USA Septem-ber 19-21 2005 Clinton Jeffery Jong-Deok Choi and Raimondas Lence-vicius (Eds) ACM 69ndash76 httpsdoiorg10114510851301085139

[73] Torr Samaho 2018 Zandronum httpszandronumcom[74] Konstantin Serebryany and Timur Iskhodzhanov 2009 ThreadSani-

tizer Data Race Detection in Practice In WBIA 62ndash71[75] Simple DirectMedia Layer 2018 SDL 20 library httpswwwlibsdl

orgdownload-20php[76] SQLite 2018 SQLite 3240 httpssqliteorgreleaselog3_24_0html[77] Deepa Srinivasan and Xuxian Jiang 2012 Time-Traveling Forensic

Analysis of VM-Based High-Interaction Honeypots In Security and Pri-vacy in Communication Networks Muttukrishnan Rajarajan Fred PiperHaining Wang and George Kesidis (Eds) Springer Berlin HeidelbergBerlin Heidelberg 209ndash226

[78] Paul Thomson Alastair F Donaldson and Adam Betts 2014 Concur-rency testing using schedule bounding an empirical study In ACMSIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPPoPP rsquo14 Orlando FL USA February 15-19 2014 Joseacute E Moreira andJames R Larus (Eds) ACM 15ndash28 httpsdoiorg10114525552432555260

[79] Joseph Tucek Shan Lu Chengdu Huang Spiros Xanthos andYuanyuan Zhou 2007 Triage diagnosing production run failuresat the userrsquos site In Proceedings of the 21st ACM Symposium on Oper-ating Systems Principles 2007 SOSP 2007 Stevenson Washington USAOctober 14-17 2007 Thomas C Bressoud and M Frans Kaashoek (Eds)ACM 131ndash144 httpsdoiorg10114512942611294275

[80] Undo 2018 Reversible debugging tools for CC++ on Linux amp Android(2018) httpundo-softwarecom

[81] Kaushik Veeraraghavan Dongyoon Lee Benjamin Wester JessicaOuyang Peter M Chen Jason Flinn and Satish Narayanasamy 2011DoublePlay parallelizing sequential logging and replay See [37] 15ndash26 httpsdoiorg10114519503651950370

[82] Stijn Volckaert Bart Coppens Bjorn De Sutter Koen De BosscherePer Larsen and Michael Franz 2017 Taming Parallelism in a Multi-Variant Execution Environment In Proceedings of the Twelfth EuropeanConference on Computer Systems EuroSys 2017 Belgrade Serbia April23-26 2017 Gustavo Alonso Ricardo Bianchini and Marko Vukolic(Eds) ACM 270ndash285 httpsdoiorg10114530641763064178

[83] Stijn Volckaert Bart Coppens Alexios Voulimeneas Andrei Home-scu Per Larsen Bjorn De Sutter and Michael Franz 2016 Secure

592

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

and Efficient Application Monitoring and Replication In 2016 USENIXAnnual Technical Conference USENIX ATC 2016 Denver CO USA June22-24 2016 Ajay Gulati and Hakim Weatherspoon (Eds) USENIXAssociation 167ndash179 httpswwwusenixorgconferenceatc16technical-sessionspresentationvolckaert

[84] YanWang Harish Patil Cristiano Pereira Gregory Lueck Rajiv Guptaand Iulian Neamtiu 2014 DrDebug Deterministic Replay based CyclicDebugging with Dynamic Slicing In 12th Annual IEEEACM Interna-tional Symposium on Code Generation and Optimization CGO 2014Orlando FL USA February 15-19 2014 David R Kaeli and TippMoseley(Eds) ACM 98 httpsdoiorg10114525441372544152

[85] Ted Wobber and Peter Druschel (Eds) 2011 Proceedings of the 23rdACM Symposium on Operating Systems Principles 2011 SOSP 2011Cascais Portugal October 23-26 2011 ACM httpsdoiorg1011452043556

[86] Min Xu Rastislav Bodiacutek and Mark D Hill 2003 A Flight DataRecorder for Enabling Full-System Multiprocessor Deterministic

Replay In 30th International Symposium on Computer Architecture(ISCA 2003) 9-11 June 2003 San Diego California USA Allan Got-tlieb and Kai Li (Eds) IEEE Computer Society 122ndash133 httpsdoiorg101109ISCA20031206994

[87] Jie Yu Satish Narayanasamy Cristiano Pereira and Gilles Pokam 2012Maple a coverage-driven testing tool for multithreaded programs InProceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications OOPSLA2012 part of SPLASH 2012 Tucson AZ USA October 21-25 2012 Gary TLeavens and Matthew B Dwyer (Eds) ACM 485ndash502 httpsdoiorg10114523846162384651

[88] Zandronum 2015 Zandronum bug tracker httpszandronumcomtrackerviewphpid=2380 Bug 0002380

[89] Naling Zhang Markus Kusano and ChaoWang 2015 Dynamic partialorder reduction for relaxed memory models See [35] 250ndash259 httpsdoiorg10114527379242737956

593

  • Abstract
  • 1 Introduction
  • 2 Background
  • 3 Scheduling
    • 31 Protocol Details
    • 32 Special Cases
    • 33 Liveness
      • 4 Record and Replay
        • 41 Motivating Example
        • 42 Interleaving
        • 43 Signals
        • 44 System Calls
        • 45 Asynchronous Events
          • 5 Evaluation
            • 51 CDSchecker Litmus Tests
            • 52 httpd
            • 53 PARSEC and pbzip
            • 54 SDL-based Games
            • 55 Limitations SQLite and SpiderMonkey
              • 6 Related Work
              • 7 Conclusion
              • Acknowledgments
              • References
Page 16: Sparse Record and Replay with Controlled Schedulingafd/homepages/papers/pdfs/2019/PLDI.pdf · Controlled concurrency testing has proven successful in find-ing subtle bugs in concurrent

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

1542490[29] Cormac Flanagan and Stephen N Freund 2010 Adversarial memory

for detecting destructive races In Proceedings of the 2010 ACM SIGPLANConference on Programming Language Design and Implementation PLDI2010 Toronto Ontario Canada June 5-10 2010 Benjamin G Zornand Alexander Aiken (Eds) ACM 244ndash254 httpsdoiorg10114518065961806625

[30] Cormac Flanagan and Stephen N Freund 2010 The RoadRunnerdynamic analysis framework for concurrent programs In Proceedingsof the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis forSoftware Tools and Engineering PASTErsquo10 Toronto Ontario CanadaJune 5-6 2010 Sorin Lerner and Atanas Rountev (Eds) ACM 1ndash8httpsdoiorg10114518066721806674

[31] Cormac Flanagan and Patrice Godefroid 2005 Dynamic partial-orderreduction for model checking software In Proceedings of the 32nd ACMSIGPLAN-SIGACT Symposium on Principles of Programming LanguagesPOPL 2005 Long Beach California USA January 12-14 2005 JensPalsberg and Martiacuten Abadi (Eds) ACM 110ndash121 httpsdoiorg10114510403051040315

[32] Dennis Geels Gautam Altekar Scott Shenker and Ion Stoica 2006Replay Debugging for Distributed Applications (Awarded Best Paper)In Proceedings of the 2006 USENIX Annual Technical Conference BostonMA USA May 30 - June 3 2006 Atul Adya and Erich M Nahum (Eds)USENIX 289ndash300 httpwwwusenixorgeventsusenix06techgeelshtml

[33] Patrice Godefroid 2005 Software Model Checking The VeriSoftApproach Formal Methods in System Design 26 2 (2005) 77ndash101httpsdoiorg101007s10703-005-1489-x

[34] C Gottbrath 2008 Reverse debugging with the TotalView debugger(May 2008)

[35] David Grove and Steve Blackburn (Eds) 2015 Proceedings of the36th ACM SIGPLAN Conference on Programming Language Design andImplementation Portland OR USA June 15-17 2015 ACM httpdlacmorgcitationcfmid=2737924

[36] Zhenyu Guo Xi Wang Jian Tang Xuezheng Liu Zhilei Xu Ming WuM Frans Kaashoek and Zheng Zhang 2008 R2 An Application-LevelKernel for Record and Replay See [23] 193ndash208 httpwwwusenixorgeventsosdi08techfull_papersguoguopdf

[37] Rajiv Gupta and Todd C Mowry (Eds) 2011 Proceedings of the 16thInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS 2011 Newport Beach CAUSA March 5-11 2011 ACM

[38] James C Hoe and Vikram S Adve (Eds) 2010 Proceedings of the 15thInternational Conference on Architectural Support for Programming Lan-guages and Operating Systems ASPLOS 2010 Pittsburgh PennsylvaniaUSA March 13-17 2010 ACM

[39] Nima Honarmand and Josep Torrellas 2014 RelaxReplay recordand replay for relaxed-consistency multiprocessors In ArchitecturalSupport for Programming Languages and Operating Systems ASPLOSrsquo14 Salt Lake City UT USA March 1-5 2014 Rajeev BalasubramonianAl Davis and Sarita V Adve (Eds) ACM 223ndash238 httpsdoiorg10114525419402541979

[40] Nima Honarmand and Josep Torrellas 2014 Replay debugging Lever-aging record and replay for program debugging In ACMIEEE 41stInternational Symposium on Computer Architecture ISCA 2014 Min-neapolis MN USA June 14-18 2014 IEEE Computer Society 455ndash456httpsdoiorg101109ISCA20146853229

[41] Petr Hosek and Cristian Cadar 2015 VARAN the Unbelievable AnEfficient N-version Execution Framework In Proceedings of the Twenti-eth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems ASPLOS rsquo15 Istanbul Turkey March14-18 2015 Oumlzcan Oumlzturk Kemal Ebcioglu and Sandhya Dwarkadas(Eds) ACM 339ndash353 httpsdoiorg10114526943442694390

[42] Derek Hower and Mark D Hill 2008 Rerun Exploiting Episodes forLightweight Memory Race Recording In 35th International Symposiumon Computer Architecture (ISCA 2008) June 21-25 2008 Beijing China265ndash276 httpsdoiorg101109ISCA200826

[43] Jeff Huang Peng Liu and Charles Zhang 2010 LEAP lightweightdeterministic multi-processor replay of concurrent java programsIn Proceedings of the 18th ACM SIGSOFT International Symposium onFoundations of Software Engineering 2010 Santa Fe NM USA November7-11 2010 Gruia-Catalin Roman and Andreacute van der Hoek (Eds) ACM207ndash216 httpsdoiorg10114518822911882323

[44] Jeff Huang Charles Zhang and Julian Dolby 2013 CLAP recordinglocal executions to reproduce concurrency failures In ACM SIGPLANConference on Programming Language Design and Implementation PLDIrsquo13 Seattle WA USA June 16-19 2013 Hans-Juergen Boehm and Cor-mac Flanagan (Eds) ACM 141ndash152 httpsdoiorg10114524621562462167

[45] Shiyou Huang Bowen Cai and Jeff Huang 2017 Towards Production-Run Heisenbugs Reproduction on Commercial Hardware In 2017USENIX Annual Technical Conference USENIX ATC 2017 Santa ClaraCA USA July 12-14 2017 403ndash415 httpswwwusenixorgconferenceatc17technical-sessionspresentationhuang

[46] Koen Koning Herbert Bos and Cristiano Giuffrida 2016 Secure andEfficient Multi-Variant Execution Using Hardware-Assisted ProcessVirtualization In 46th Annual IEEEIFIP International Conference onDependable Systems and Networks DSN 2016 Toulouse France June28 - July 1 2016 IEEE Computer Society 431ndash442 httpsdoiorg101109DSN201646

[47] Oren Laadan Nicolas Viennot and Jason Nieh 2010 Transparentlightweight application execution replay on commodity multiproces-sor operating systems In SIGMETRICS 2010 Proceedings of the 2010ACM SIGMETRICS International Conference on Measurement and Mod-eling of Computer Systems New York New York USA 14-18 June 2010Vishal Misra Paul Barford and Mark S Squillante (Eds) ACM 155ndash166 httpsdoiorg10114518110391811057

[48] Leslie Lamport 1978 Time Clocks and the Ordering of Events ina Distributed System Commun ACM 21 7 (1978) 558ndash565 httpsdoiorg101145359545359563

[49] Dongyoon Lee Peter M Chen Jason Flinn and Satish Narayanasamy2012 Chimera hybrid program analysis for determinism In ACMSIGPLAN Conference on Programming Language Design and Implemen-tation PLDI rsquo12 Beijing China - June 11 - 16 2012 Jan Vitek Haibo Linand Frank Tip (Eds) ACM 463ndash474 httpsdoiorg10114522540642254119

[50] Dongyoon Lee Benjamin Wester Kaushik Veeraraghavan SatishNarayanasamy Peter M Chen and Jason Flinn 2010 Respec efficientonline multiprocessor replayvia speculation and external determinismSee [38] 77ndash90 httpsdoiorg10114517360201736031

[51] Kyu Hyung Lee Dohyeong Kim and Xiangyu Zhang 2014Infrastructure-Free Logging and Replay of Concurrent Executionon Multiple Cores In ECOOP 2014 - Object-Oriented Programming- 28th European Conference Uppsala Sweden July 28 - August 12014 Proceedings (Lecture Notes in Computer Science) Richard EJones (Ed) Vol 8586 Springer 232ndash256 httpsdoiorg101007978-3-662-44202-9_10

[52] Christopher Lidbury and Alastair F Donaldson 2017 Dynamic racedetection for C++11 In Proceedings of the 44th ACM SIGPLAN Sympo-sium on Principles of Programming Languages POPL 2017 Paris FranceJanuary 18-20 2017 Giuseppe Castagna and Andrew D Gordon (Eds)ACM 443ndash457 httpsdoiorg1011453009837

[53] Hongyu Liu Sam Silvestro Wei Wang Chen Tian and Tongping Liu2018 iReplayer in-situ and identical record-and-replay for multi-threaded applications In Proceedings of the 39th ACM SIGPLAN Confer-ence on Programming Language Design and Implementation PLDI 2018

591

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Philadelphia PA USA June 18-22 2018 Jeffrey S Foster and Dan Gross-man (Eds) ACM 344ndash358 httpsdoiorg10114531923663192380

[54] Peng Liu Xiangyu Zhang Omer Tripp and Yunhui Zheng 2015 Lightreplay via tightly bounded recording See [35] 55ndash64 httpsdoiorg10114527379242738001

[55] Tongping Liu Charlie Curtsinger and EmeryD Berger 2011 Dthreadsefficient deterministic multithreading See [85] 327ndash336 httpsdoiorg10114520435562043587

[56] Shan Lu Soyeon Park Eunsoo Seo and Yuanyuan Zhou 2008 Learn-ing from mistakes a comprehensive study on real world concurrencybug characteristics In Proceedings of the 13th International Confer-ence on Architectural Support for Programming Languages and Op-erating Systems ASPLOS 2008 Seattle WA USA March 1-5 2008Susan J Eggers and James R Larus (Eds) ACM 329ndash339 httpsdoiorg10114513462811346323

[57] Ali Joseacute Mashtizadeh Tal Garfinkel David Terei David Maziegraveresand Mendel Rosenblum 2017 Towards Practical Default-On Multi-Core RecordReplay In Proceedings of the Twenty-Second InternationalConference on Architectural Support for Programming Languages andOperating Systems ASPLOS 2017 Xirsquoan China April 8-12 2017 YunjiChen Olivier Temam and John Carter (Eds) ACM 693ndash708 httpsdoiorg10114530376973037751

[58] Microsoft 2018 Understanding IntelliTrace part I What the $ isIntelliTrace (2018) httpsblogsmsdnmicrosoftcomzainnab20130212understanding-intellitrace-part-i-what-the-is-intellitrace

[59] Pablo Montesinos Luis Ceze and Josep Torrellas 2008 DeLoreanRecording and Deterministically Replaying Shared-Memory Multi-processor Execution Effciently In 35th International Symposium onComputer Architecture (ISCA 2008) June 21-25 2008 Beijing China289ndash300 httpsdoiorg101109ISCA200836

[60] Mozilla 2018 SpiderMonkey httpsdevelopermozillaorgen-USdocsMozillaProjectsSpiderMonkey

[61] MadanlalMusuvathi and Shaz Qadeer 2007 Iterative context boundingfor systematic testing of multithreaded programs In Proceedings of theACM SIGPLAN 2007 Conference on Programming Language Design andImplementation San Diego California USA June 10-13 2007 JeanneFerrante and Kathryn S McKinley (Eds) ACM 446ndash455 httpsdoiorg10114512507341250785

[62] Madanlal Musuvathi Shaz Qadeer Thomas Ball Geacuterard Basler Pi-ramanayagam Arumuga Nainar and Iulian Neamtiu 2008 Find-ing and Reproducing Heisenbugs in Concurrent Programs See[23] 267ndash280 httpwwwusenixorgeventsosdi08techfull_papersmusuvathimusuvathipdf

[63] Satish Narayanasamy Gilles Pokam and Brad Calder 2005 BugNetContinuously Recording Program Execution for Deterministic ReplayDebugging In 32st International Symposium on Computer Architecture(ISCA 2005) 4-8 June 2005 Madison Wisconsin USA IEEE ComputerSociety 284ndash295 httpsdoiorg101109ISCA200516

[64] Brian Norris and Brian Demsky 2013 CDSchecker checking concur-rent data structures written with CC++ atomics In Proceedings ofthe 2013 ACM SIGPLAN International Conference on Object OrientedProgramming Systems Languages amp Applications OOPSLA 2013 part ofSPLASH 2013 Indianapolis IN USA October 26-31 2013 131ndash150

[65] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2016 Lightweight User-Space Record And ReplayCoRR abs161002144 (2016) arXiv161002144 httparxivorgabs161002144

[66] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2017 Engineering Record and Replay for Deploy-ability In 2017 USENIX Annual Technical Conference USENIX ATC 2017Santa Clara CA USA July 12-14 2017 377ndash389 httpswwwusenixorgconferenceatc17technical-sessionspresentationocallahan

[67] Harish Patil Cristiano Pereira Mack Stallcup Gregory Lueck andJames Cownie 2010 PinPlay a framework for deterministic replay

and reproducible analysis of parallel programs In Proceedings of theCGO 2010 The 8th International Symposium on Code Generation andOptimization Toronto Ontario Canada April 24-28 2010 AndreasMoshovos J Gregory Steffan Kim M Hazelwood and David R Kaeli(Eds) ACM 2ndash11 httpsdoiorg10114517729541772958

[68] pbzip2 development team 2018 pbzip2 httpslaunchpadnetpbzip2[69] Luiacutes Pina Anastasios Andronidis and Cristian Cadar 2018 FreeDA

deploying incompatible stock dynamic analyses in production viamulti-version execution In Proceedings of the 15th ACM InternationalConference on Computing Frontiers CF 2018 Ischia Italy May 08-102018 David R Kaeli and Miquel Pericagraves (Eds) ACM 1ndash10 httpsdoiorg10114532032173203237

[70] Gilles Pokam Klaus Danne Cristiano Pereira Rolf Kassa Tim KranichShiliang Hu Justin Emile Gottschlich Nima Honarmand Nathan Daut-enhahn Samuel T King and Josep Torrellas 2013 QuickRec proto-typing an intel architecture extension for record and replay of mul-tithreaded programs In The 40th Annual International Symposiumon Computer Architecture ISCArsquo13 Tel-Aviv Israel June 23-27 2013Avi Mendelson (Ed) ACM 643ndash654 httpsdoiorg10114524859222485977

[71] QuakeSpasm 2018 QuakeSpasm An engine for iD softwarersquos Quakehttpquakespasmsourceforgenet

[72] Yasushi Saito 2005 Jockey a user-space library for record-replaydebugging In Proceedings of the Sixth International Workshop on Auto-mated Debugging AADEBUG 2005 Monterey California USA Septem-ber 19-21 2005 Clinton Jeffery Jong-Deok Choi and Raimondas Lence-vicius (Eds) ACM 69ndash76 httpsdoiorg10114510851301085139

[73] Torr Samaho 2018 Zandronum httpszandronumcom[74] Konstantin Serebryany and Timur Iskhodzhanov 2009 ThreadSani-

tizer Data Race Detection in Practice In WBIA 62ndash71[75] Simple DirectMedia Layer 2018 SDL 20 library httpswwwlibsdl

orgdownload-20php[76] SQLite 2018 SQLite 3240 httpssqliteorgreleaselog3_24_0html[77] Deepa Srinivasan and Xuxian Jiang 2012 Time-Traveling Forensic

Analysis of VM-Based High-Interaction Honeypots In Security and Pri-vacy in Communication Networks Muttukrishnan Rajarajan Fred PiperHaining Wang and George Kesidis (Eds) Springer Berlin HeidelbergBerlin Heidelberg 209ndash226

[78] Paul Thomson Alastair F Donaldson and Adam Betts 2014 Concur-rency testing using schedule bounding an empirical study In ACMSIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPPoPP rsquo14 Orlando FL USA February 15-19 2014 Joseacute E Moreira andJames R Larus (Eds) ACM 15ndash28 httpsdoiorg10114525552432555260

[79] Joseph Tucek Shan Lu Chengdu Huang Spiros Xanthos andYuanyuan Zhou 2007 Triage diagnosing production run failuresat the userrsquos site In Proceedings of the 21st ACM Symposium on Oper-ating Systems Principles 2007 SOSP 2007 Stevenson Washington USAOctober 14-17 2007 Thomas C Bressoud and M Frans Kaashoek (Eds)ACM 131ndash144 httpsdoiorg10114512942611294275

[80] Undo 2018 Reversible debugging tools for CC++ on Linux amp Android(2018) httpundo-softwarecom

[81] Kaushik Veeraraghavan Dongyoon Lee Benjamin Wester JessicaOuyang Peter M Chen Jason Flinn and Satish Narayanasamy 2011DoublePlay parallelizing sequential logging and replay See [37] 15ndash26 httpsdoiorg10114519503651950370

[82] Stijn Volckaert Bart Coppens Bjorn De Sutter Koen De BosscherePer Larsen and Michael Franz 2017 Taming Parallelism in a Multi-Variant Execution Environment In Proceedings of the Twelfth EuropeanConference on Computer Systems EuroSys 2017 Belgrade Serbia April23-26 2017 Gustavo Alonso Ricardo Bianchini and Marko Vukolic(Eds) ACM 270ndash285 httpsdoiorg10114530641763064178

[83] Stijn Volckaert Bart Coppens Alexios Voulimeneas Andrei Home-scu Per Larsen Bjorn De Sutter and Michael Franz 2016 Secure

592

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

and Efficient Application Monitoring and Replication In 2016 USENIXAnnual Technical Conference USENIX ATC 2016 Denver CO USA June22-24 2016 Ajay Gulati and Hakim Weatherspoon (Eds) USENIXAssociation 167ndash179 httpswwwusenixorgconferenceatc16technical-sessionspresentationvolckaert

[84] YanWang Harish Patil Cristiano Pereira Gregory Lueck Rajiv Guptaand Iulian Neamtiu 2014 DrDebug Deterministic Replay based CyclicDebugging with Dynamic Slicing In 12th Annual IEEEACM Interna-tional Symposium on Code Generation and Optimization CGO 2014Orlando FL USA February 15-19 2014 David R Kaeli and TippMoseley(Eds) ACM 98 httpsdoiorg10114525441372544152

[85] Ted Wobber and Peter Druschel (Eds) 2011 Proceedings of the 23rdACM Symposium on Operating Systems Principles 2011 SOSP 2011Cascais Portugal October 23-26 2011 ACM httpsdoiorg1011452043556

[86] Min Xu Rastislav Bodiacutek and Mark D Hill 2003 A Flight DataRecorder for Enabling Full-System Multiprocessor Deterministic

Replay In 30th International Symposium on Computer Architecture(ISCA 2003) 9-11 June 2003 San Diego California USA Allan Got-tlieb and Kai Li (Eds) IEEE Computer Society 122ndash133 httpsdoiorg101109ISCA20031206994

[87] Jie Yu Satish Narayanasamy Cristiano Pereira and Gilles Pokam 2012Maple a coverage-driven testing tool for multithreaded programs InProceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications OOPSLA2012 part of SPLASH 2012 Tucson AZ USA October 21-25 2012 Gary TLeavens and Matthew B Dwyer (Eds) ACM 485ndash502 httpsdoiorg10114523846162384651

[88] Zandronum 2015 Zandronum bug tracker httpszandronumcomtrackerviewphpid=2380 Bug 0002380

[89] Naling Zhang Markus Kusano and ChaoWang 2015 Dynamic partialorder reduction for relaxed memory models See [35] 250ndash259 httpsdoiorg10114527379242737956

593

  • Abstract
  • 1 Introduction
  • 2 Background
  • 3 Scheduling
    • 31 Protocol Details
    • 32 Special Cases
    • 33 Liveness
      • 4 Record and Replay
        • 41 Motivating Example
        • 42 Interleaving
        • 43 Signals
        • 44 System Calls
        • 45 Asynchronous Events
          • 5 Evaluation
            • 51 CDSchecker Litmus Tests
            • 52 httpd
            • 53 PARSEC and pbzip
            • 54 SDL-based Games
            • 55 Limitations SQLite and SpiderMonkey
              • 6 Related Work
              • 7 Conclusion
              • Acknowledgments
              • References
Page 17: Sparse Record and Replay with Controlled Schedulingafd/homepages/papers/pdfs/2019/PLDI.pdf · Controlled concurrency testing has proven successful in find-ing subtle bugs in concurrent

Sparse Record and Replay with Controlled Scheduling PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA

Philadelphia PA USA June 18-22 2018 Jeffrey S Foster and Dan Gross-man (Eds) ACM 344ndash358 httpsdoiorg10114531923663192380

[54] Peng Liu Xiangyu Zhang Omer Tripp and Yunhui Zheng 2015 Lightreplay via tightly bounded recording See [35] 55ndash64 httpsdoiorg10114527379242738001

[55] Tongping Liu Charlie Curtsinger and EmeryD Berger 2011 Dthreadsefficient deterministic multithreading See [85] 327ndash336 httpsdoiorg10114520435562043587

[56] Shan Lu Soyeon Park Eunsoo Seo and Yuanyuan Zhou 2008 Learn-ing from mistakes a comprehensive study on real world concurrencybug characteristics In Proceedings of the 13th International Confer-ence on Architectural Support for Programming Languages and Op-erating Systems ASPLOS 2008 Seattle WA USA March 1-5 2008Susan J Eggers and James R Larus (Eds) ACM 329ndash339 httpsdoiorg10114513462811346323

[57] Ali Joseacute Mashtizadeh Tal Garfinkel David Terei David Maziegraveresand Mendel Rosenblum 2017 Towards Practical Default-On Multi-Core RecordReplay In Proceedings of the Twenty-Second InternationalConference on Architectural Support for Programming Languages andOperating Systems ASPLOS 2017 Xirsquoan China April 8-12 2017 YunjiChen Olivier Temam and John Carter (Eds) ACM 693ndash708 httpsdoiorg10114530376973037751

[58] Microsoft 2018 Understanding IntelliTrace part I What the $ isIntelliTrace (2018) httpsblogsmsdnmicrosoftcomzainnab20130212understanding-intellitrace-part-i-what-the-is-intellitrace

[59] Pablo Montesinos Luis Ceze and Josep Torrellas 2008 DeLoreanRecording and Deterministically Replaying Shared-Memory Multi-processor Execution Effciently In 35th International Symposium onComputer Architecture (ISCA 2008) June 21-25 2008 Beijing China289ndash300 httpsdoiorg101109ISCA200836

[60] Mozilla 2018 SpiderMonkey httpsdevelopermozillaorgen-USdocsMozillaProjectsSpiderMonkey

[61] MadanlalMusuvathi and Shaz Qadeer 2007 Iterative context boundingfor systematic testing of multithreaded programs In Proceedings of theACM SIGPLAN 2007 Conference on Programming Language Design andImplementation San Diego California USA June 10-13 2007 JeanneFerrante and Kathryn S McKinley (Eds) ACM 446ndash455 httpsdoiorg10114512507341250785

[62] Madanlal Musuvathi Shaz Qadeer Thomas Ball Geacuterard Basler Pi-ramanayagam Arumuga Nainar and Iulian Neamtiu 2008 Find-ing and Reproducing Heisenbugs in Concurrent Programs See[23] 267ndash280 httpwwwusenixorgeventsosdi08techfull_papersmusuvathimusuvathipdf

[63] Satish Narayanasamy Gilles Pokam and Brad Calder 2005 BugNetContinuously Recording Program Execution for Deterministic ReplayDebugging In 32st International Symposium on Computer Architecture(ISCA 2005) 4-8 June 2005 Madison Wisconsin USA IEEE ComputerSociety 284ndash295 httpsdoiorg101109ISCA200516

[64] Brian Norris and Brian Demsky 2013 CDSchecker checking concur-rent data structures written with CC++ atomics In Proceedings ofthe 2013 ACM SIGPLAN International Conference on Object OrientedProgramming Systems Languages amp Applications OOPSLA 2013 part ofSPLASH 2013 Indianapolis IN USA October 26-31 2013 131ndash150

[65] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2016 Lightweight User-Space Record And ReplayCoRR abs161002144 (2016) arXiv161002144 httparxivorgabs161002144

[66] Robert OrsquoCallahan Chris Jones Nathan Froyd Kyle Huey Albert Nolland Nimrod Partush 2017 Engineering Record and Replay for Deploy-ability In 2017 USENIX Annual Technical Conference USENIX ATC 2017Santa Clara CA USA July 12-14 2017 377ndash389 httpswwwusenixorgconferenceatc17technical-sessionspresentationocallahan

[67] Harish Patil Cristiano Pereira Mack Stallcup Gregory Lueck andJames Cownie 2010 PinPlay a framework for deterministic replay

and reproducible analysis of parallel programs In Proceedings of theCGO 2010 The 8th International Symposium on Code Generation andOptimization Toronto Ontario Canada April 24-28 2010 AndreasMoshovos J Gregory Steffan Kim M Hazelwood and David R Kaeli(Eds) ACM 2ndash11 httpsdoiorg10114517729541772958

[68] pbzip2 development team 2018 pbzip2 httpslaunchpadnetpbzip2[69] Luiacutes Pina Anastasios Andronidis and Cristian Cadar 2018 FreeDA

deploying incompatible stock dynamic analyses in production viamulti-version execution In Proceedings of the 15th ACM InternationalConference on Computing Frontiers CF 2018 Ischia Italy May 08-102018 David R Kaeli and Miquel Pericagraves (Eds) ACM 1ndash10 httpsdoiorg10114532032173203237

[70] Gilles Pokam Klaus Danne Cristiano Pereira Rolf Kassa Tim KranichShiliang Hu Justin Emile Gottschlich Nima Honarmand Nathan Daut-enhahn Samuel T King and Josep Torrellas 2013 QuickRec proto-typing an intel architecture extension for record and replay of mul-tithreaded programs In The 40th Annual International Symposiumon Computer Architecture ISCArsquo13 Tel-Aviv Israel June 23-27 2013Avi Mendelson (Ed) ACM 643ndash654 httpsdoiorg10114524859222485977

[71] QuakeSpasm 2018 QuakeSpasm An engine for iD softwarersquos Quakehttpquakespasmsourceforgenet

[72] Yasushi Saito 2005 Jockey a user-space library for record-replaydebugging In Proceedings of the Sixth International Workshop on Auto-mated Debugging AADEBUG 2005 Monterey California USA Septem-ber 19-21 2005 Clinton Jeffery Jong-Deok Choi and Raimondas Lence-vicius (Eds) ACM 69ndash76 httpsdoiorg10114510851301085139

[73] Torr Samaho 2018 Zandronum httpszandronumcom[74] Konstantin Serebryany and Timur Iskhodzhanov 2009 ThreadSani-

tizer Data Race Detection in Practice In WBIA 62ndash71[75] Simple DirectMedia Layer 2018 SDL 20 library httpswwwlibsdl

orgdownload-20php[76] SQLite 2018 SQLite 3240 httpssqliteorgreleaselog3_24_0html[77] Deepa Srinivasan and Xuxian Jiang 2012 Time-Traveling Forensic

Analysis of VM-Based High-Interaction Honeypots In Security and Pri-vacy in Communication Networks Muttukrishnan Rajarajan Fred PiperHaining Wang and George Kesidis (Eds) Springer Berlin HeidelbergBerlin Heidelberg 209ndash226

[78] Paul Thomson Alastair F Donaldson and Adam Betts 2014 Concur-rency testing using schedule bounding an empirical study In ACMSIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPPoPP rsquo14 Orlando FL USA February 15-19 2014 Joseacute E Moreira andJames R Larus (Eds) ACM 15ndash28 httpsdoiorg10114525552432555260

[79] Joseph Tucek Shan Lu Chengdu Huang Spiros Xanthos andYuanyuan Zhou 2007 Triage diagnosing production run failuresat the userrsquos site In Proceedings of the 21st ACM Symposium on Oper-ating Systems Principles 2007 SOSP 2007 Stevenson Washington USAOctober 14-17 2007 Thomas C Bressoud and M Frans Kaashoek (Eds)ACM 131ndash144 httpsdoiorg10114512942611294275

[80] Undo 2018 Reversible debugging tools for CC++ on Linux amp Android(2018) httpundo-softwarecom

[81] Kaushik Veeraraghavan Dongyoon Lee Benjamin Wester JessicaOuyang Peter M Chen Jason Flinn and Satish Narayanasamy 2011DoublePlay parallelizing sequential logging and replay See [37] 15ndash26 httpsdoiorg10114519503651950370

[82] Stijn Volckaert Bart Coppens Bjorn De Sutter Koen De BosscherePer Larsen and Michael Franz 2017 Taming Parallelism in a Multi-Variant Execution Environment In Proceedings of the Twelfth EuropeanConference on Computer Systems EuroSys 2017 Belgrade Serbia April23-26 2017 Gustavo Alonso Ricardo Bianchini and Marko Vukolic(Eds) ACM 270ndash285 httpsdoiorg10114530641763064178

[83] Stijn Volckaert Bart Coppens Alexios Voulimeneas Andrei Home-scu Per Larsen Bjorn De Sutter and Michael Franz 2016 Secure

592

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

and Efficient Application Monitoring and Replication In 2016 USENIXAnnual Technical Conference USENIX ATC 2016 Denver CO USA June22-24 2016 Ajay Gulati and Hakim Weatherspoon (Eds) USENIXAssociation 167ndash179 httpswwwusenixorgconferenceatc16technical-sessionspresentationvolckaert

[84] YanWang Harish Patil Cristiano Pereira Gregory Lueck Rajiv Guptaand Iulian Neamtiu 2014 DrDebug Deterministic Replay based CyclicDebugging with Dynamic Slicing In 12th Annual IEEEACM Interna-tional Symposium on Code Generation and Optimization CGO 2014Orlando FL USA February 15-19 2014 David R Kaeli and TippMoseley(Eds) ACM 98 httpsdoiorg10114525441372544152

[85] Ted Wobber and Peter Druschel (Eds) 2011 Proceedings of the 23rdACM Symposium on Operating Systems Principles 2011 SOSP 2011Cascais Portugal October 23-26 2011 ACM httpsdoiorg1011452043556

[86] Min Xu Rastislav Bodiacutek and Mark D Hill 2003 A Flight DataRecorder for Enabling Full-System Multiprocessor Deterministic

Replay In 30th International Symposium on Computer Architecture(ISCA 2003) 9-11 June 2003 San Diego California USA Allan Got-tlieb and Kai Li (Eds) IEEE Computer Society 122ndash133 httpsdoiorg101109ISCA20031206994

[87] Jie Yu Satish Narayanasamy Cristiano Pereira and Gilles Pokam 2012Maple a coverage-driven testing tool for multithreaded programs InProceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications OOPSLA2012 part of SPLASH 2012 Tucson AZ USA October 21-25 2012 Gary TLeavens and Matthew B Dwyer (Eds) ACM 485ndash502 httpsdoiorg10114523846162384651

[88] Zandronum 2015 Zandronum bug tracker httpszandronumcomtrackerviewphpid=2380 Bug 0002380

[89] Naling Zhang Markus Kusano and ChaoWang 2015 Dynamic partialorder reduction for relaxed memory models See [35] 250ndash259 httpsdoiorg10114527379242737956

593

  • Abstract
  • 1 Introduction
  • 2 Background
  • 3 Scheduling
    • 31 Protocol Details
    • 32 Special Cases
    • 33 Liveness
      • 4 Record and Replay
        • 41 Motivating Example
        • 42 Interleaving
        • 43 Signals
        • 44 System Calls
        • 45 Asynchronous Events
          • 5 Evaluation
            • 51 CDSchecker Litmus Tests
            • 52 httpd
            • 53 PARSEC and pbzip
            • 54 SDL-based Games
            • 55 Limitations SQLite and SpiderMonkey
              • 6 Related Work
              • 7 Conclusion
              • Acknowledgments
              • References
Page 18: Sparse Record and Replay with Controlled Schedulingafd/homepages/papers/pdfs/2019/PLDI.pdf · Controlled concurrency testing has proven successful in find-ing subtle bugs in concurrent

PLDI rsquo19 June 22ndash26 2019 Phoenix AZ USA Christopher Lidbury and Alastair F Donaldson

and Efficient Application Monitoring and Replication In 2016 USENIXAnnual Technical Conference USENIX ATC 2016 Denver CO USA June22-24 2016 Ajay Gulati and Hakim Weatherspoon (Eds) USENIXAssociation 167ndash179 httpswwwusenixorgconferenceatc16technical-sessionspresentationvolckaert

[84] YanWang Harish Patil Cristiano Pereira Gregory Lueck Rajiv Guptaand Iulian Neamtiu 2014 DrDebug Deterministic Replay based CyclicDebugging with Dynamic Slicing In 12th Annual IEEEACM Interna-tional Symposium on Code Generation and Optimization CGO 2014Orlando FL USA February 15-19 2014 David R Kaeli and TippMoseley(Eds) ACM 98 httpsdoiorg10114525441372544152

[85] Ted Wobber and Peter Druschel (Eds) 2011 Proceedings of the 23rdACM Symposium on Operating Systems Principles 2011 SOSP 2011Cascais Portugal October 23-26 2011 ACM httpsdoiorg1011452043556

[86] Min Xu Rastislav Bodiacutek and Mark D Hill 2003 A Flight DataRecorder for Enabling Full-System Multiprocessor Deterministic

Replay In 30th International Symposium on Computer Architecture(ISCA 2003) 9-11 June 2003 San Diego California USA Allan Got-tlieb and Kai Li (Eds) IEEE Computer Society 122ndash133 httpsdoiorg101109ISCA20031206994

[87] Jie Yu Satish Narayanasamy Cristiano Pereira and Gilles Pokam 2012Maple a coverage-driven testing tool for multithreaded programs InProceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications OOPSLA2012 part of SPLASH 2012 Tucson AZ USA October 21-25 2012 Gary TLeavens and Matthew B Dwyer (Eds) ACM 485ndash502 httpsdoiorg10114523846162384651

[88] Zandronum 2015 Zandronum bug tracker httpszandronumcomtrackerviewphpid=2380 Bug 0002380

[89] Naling Zhang Markus Kusano and ChaoWang 2015 Dynamic partialorder reduction for relaxed memory models See [35] 250ndash259 httpsdoiorg10114527379242737956

593

  • Abstract
  • 1 Introduction
  • 2 Background
  • 3 Scheduling
    • 31 Protocol Details
    • 32 Special Cases
    • 33 Liveness
      • 4 Record and Replay
        • 41 Motivating Example
        • 42 Interleaving
        • 43 Signals
        • 44 System Calls
        • 45 Asynchronous Events
          • 5 Evaluation
            • 51 CDSchecker Litmus Tests
            • 52 httpd
            • 53 PARSEC and pbzip
            • 54 SDL-based Games
            • 55 Limitations SQLite and SpiderMonkey
              • 6 Related Work
              • 7 Conclusion
              • Acknowledgments
              • References