coredet: a compiler and runtime system for deterministic multithreaded execution tom bergan owen...

47
CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear at ASPLOS 2010

Upload: amelia-french

Post on 06-Jan-2018

233 views

Category:

Documents


0 download

DESCRIPTION

3CoreDet Goal: Execution-level determinism... of arbitrary, unmodified C/C++ pthreads programs without special hardware without sacrificing scalability Implementation: a compiler pass in LLVM, plus a custom runtime library

TRANSCRIPT

Page 1: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

CoreDet: A Compiler and Runtime System for

Deterministic Multithreaded Execution

Tom BerganOwen Anderson, Joe Devietti, Luis Ceze, Dan

GrossmanTo appear at ASPLOS 2010

Page 2: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

2

CoreDetCoreDetGoal: Execution-level determinism ...•of arbitrary, unmodified C/C++ pthreads programs•without special hardware•without sacrificing scalability

Implementation:•a compiler pass in LLVM, plus•a custom runtime library

Page 3: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

3

CoreDetCoreDetGoal: Execution-level determinism ...•of arbitrary, unmodified C/C++ pthreads programs•without special hardware•without sacrificing scalability

Implementation:•a compiler pass in LLVM, plus•a custom runtime library

Page 4: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

4

CoreDetCoreDetAn implementation of DMP in software

• DMP-OwnershipA new protocol

• DMP-Buffering

Page 5: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

5

CoreDetCoreDetAn implementation of DMP in software

• DMP-Ownership• straightforward to implement• poorer scalability, fewer overheads

• DMP-Buffering• better scalability, more overheads• no speculation• key insight: relaxed memory

consistency(specifically, Weak Ordering)

A new protocol

Page 6: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

6

CoreDetCoreDetAn implementation of DMP in software

• DMP-Ownership• straightforward to implement• poorer scalability, fewer overheads

• DMP-Buffering• better scalability, more overheads• no speculation (easier to implement

than TM)• key insight: relaxed memory

consistency(specifically, Weak Ordering)

A new protocol

Page 7: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

7

CoreDet: CoreDet: ImplementationImplementation

A compiler•instruments the code with calls to the runtime•static optimizations to remove instrumentation

A runtime library•scheduling threads•tracks interthread communication•deterministic wrappers for . . .

- pthreads- malloc

Page 8: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

8

OutlineOutlineDMP-Ownership in Software

Why not DMP-TM in Software?

DMP-Buffering

A Few Implementation Details

Performance

Page 9: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

9

DMP-SerialDMP-Serial

a := xa := x b := yb := y

x := a * 2x := a * 2 y := a + by := a + b

Thread 1 c := xc := x d := yd := y

x := a * 3x := a * 3 y := a - by := a - b

Thread 2

Page 10: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

10

DMP-SerialDMP-Serial

a := xa := x b := yb := y

x := a * 2x := a * 2 y := a + by := a + b

Thread 1

c := xc := x d := yd := y

x := a * 3x := a * 3 y := a - by := a - b

Thread 2

quan

tum

Page 11: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

11

DMP-SerialDMP-Serial

end of roundtime

T1

T3

T2

quantum

Execution is completely serialized

Page 12: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

12

DMP-OwnershipDMP-OwnershipParallel Serial

Parallel mode: no communication (can write only to private data)Serial mode: arbitrary communication

end of roundtime

T1

T3

T2

MOTx owned-by

T1

y shared

z owned-by T2:: ::

Page 13: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

13

DMP-Ownership in DMP-Ownership in SoftwareSoftware

T1

T3

T2

x owned-by T1

y shared

z owned-by T2:: ::

Quantum Formation•Count instructions: e.g. at the end of each basic block

Requirements:

Page 14: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

14

DMP-Ownership in DMP-Ownership in SoftwareSoftware

T1

T3

T2

x owned-by T1

y shared

z owned-by T2:: ::

Quantum Formation•Count instructions: e.g. at the end of each basic block Ownership Tracking•Instrument every load/store•Manipulate an MOT (in memory) via a runtime library

Requirements:

Page 15: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

15

DMP-Ownership in DMP-Ownership in SoftwareSoftware

We have implemented this in CoreDet.. . . but the scalability is not that great

splashmean

parsecmean

Page 16: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

16

OutlineOutlineDMP-Ownership in Software

Why not DMP-TM in Software?

DMP-Buffering

A Few Implementation Details

Performance

Page 17: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

17

DMP-SerialDMP-Serial

end of roundtime

T1

T3

T2

quantum

Execution is completely serialized

Page 18: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

18

DMP-TMDMP-TM

end of roundtime

T1

T3

T2

Execution is parallel and transactional

commitimplicit

transactions

Page 19: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

19

DMP-TM in SoftwareDMP-TM in Software

Quantum Formation•Count instructions

Requirements:

T1

T3

T2

Page 20: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

20

DMP-TM in SoftwareDMP-TM in Software

Quantum Formation•Count instructions Transactional Execution•Instrument every load/store•Use an off-the-shelf STM with minor modifications

Requirements:

This is really hard!

T1

T3

T2

Page 21: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

21

DMP-TM: Why not DMP-TM: Why not STM?STM?

STMs make the wrong assumptions:•Most code is not transactional •Transactions are short •Transactions are scoped

void foo() { ... begin_transaction() return}

An unscoped transaction:

Page 22: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

25

➡ Speculation makes things hard➡ Good scalability because of versioned

memory(versions are modified in parallel)

DMP-TM: What Can We DMP-TM: What Can We Learn?Learn?

DMP-Buffering: Use versioned memory without speculation

Page 23: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

26

OutlineOutlineDMP-Ownership in Software

Why not DMP-TM in Software?

DMP-Buffering

A Few Implementation Details

Performance

Page 24: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

27

DMP-BufferingDMP-Buffering

Parallel mode: no communication (versioned memory via store buffering)

time

T1

T3

T2

ParallelGlobal Global MemoryMemory

(read only)(read only)Thread

x := .. y ..x := .. y ..

Store BufferStore Buffer

Page 25: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

28

DMP-BufferingDMP-Buffering

Parallel mode: no communication (versioned memory via store buffering)Commit mode: deterministically (serially) publish local store buffers

Global Global MemoryMemory

......

Store BufferStore Buffer

Thread

time

T1

T3

T2

Parallel Commit

stores are reordered

x=..

Page 26: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

29

DMP-BufferingDMP-Buffering

Parallel mode: no communication (versioned memory via store buffering)

Global Global MemoryMemory

(read/write)(read/write)

lock(x)lock(x)

Store BufferStore Buffer

Thread

end of roundtime

T1

T3

T2

Parallel SerialCommit

Commit mode: deterministically (serially) publish local store buffersSerial mode: used for synchronization (e.g. atomic ops)

Page 27: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

30

DMP-BufferingDMP-BufferingT1

T3

T2

Parallel mode: buffer stores locally

Commit mode: publish local store buffers

Serial mode: used for synchronization (e.g. atomic ops)

•ends at synchronization (atomic ops and fences), and quantum boundaries

•happens serially for determinism•executes in parallel for performance

Page 28: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

31

DMP-BufferingDMP-Buffering A = 1A = 1 if (B == 0)if (B == 0) ......

B = 1B = 1 if (A == 0)if (A == 0) ......

Thread 1 Thread 2

Dekker’s Algorithm(there is a data race)

Page 29: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

32

DMP-BufferingDMP-Buffering

A = buffer[A]A = buffer[A] B = buffer[B]B = buffer[B]

Thread 1 Thread 2 buffer[A] = 1buffer[A] = 1 if (B == 0)if (B == 0) ......

buffer[B] = 1buffer[B] = 1 if (A == 0)if (A == 0) ......

Page 30: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

33

DMP-BufferingDMP-Buffering

A = buffer[A]A = buffer[A]

B = buffer[B]B = buffer[B]

Thread 1 Thread 2parallel

comm

it

buffer[A] = 1buffer[A] = 1 if (B == 0)if (B == 0) ......

buffer[B] = 1buffer[B] = 1 if (A == 0)if (A == 0) ......

This is deterministic . . .

reor

dere

d

Page 31: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

34

DMP-BufferingDMP-Buffering

A = buffer[A]A = buffer[A]

B = buffer[B]B = buffer[B]

Thread 1 Thread 2parallel

comm

it

buffer[A] = 1buffer[A] = 1 if (B == 0)if (B == 0) ......

buffer[B] = 1buffer[B] = 1 if (A == 0)if (A == 0) ......

. . . but not sequentially consistent(cycle in the happens-before graph)

Page 32: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

35

DMP-BufferingDMP-BufferingThread 1 Thread 2

A = 1A = 1 tmptmp11 = B = B

if (tmpif (tmp11 == 0) == 0) ......

B = 1B = 1 tmptmp22 = A = A

if (tmpif (tmp22 == 0) == 0) ......

Dekker’s Algorithm (again)Let’s remove the data race . . .

Page 33: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

36

DMP-BufferingDMP-Buffering

A = 1A = 1 tmptmp11 = B = B

Thread 1 Thread 2 lock(L)lock(L)

unlock(L)unlock(L)

lock(L)lock(L)

unlock(L)unlock(L)

if (tmpif (tmp11 == 0) == 0) ......

B = 1B = 1 tmptmp22 = A = A

if (tmpif (tmp22 == 0) == 0) ......

Dekker’s Algorithm(no data race)

Page 34: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

37

DMP-BufferingDMP-Buffering A = 1A = 1 tmptmp11 = B = B

lock(L)lock(L)

unlock(L)unlock(L)

lock(L)lock(L)

unlock(L)unlock(L)

if (tmpif (tmp11 == 0) == 0) ......

B = 1B = 1 tmptmp22 = A = A

if (tmpif (tmp22 == 0) == 0) ......

parallel +commit

serial

serial

parallel +commit

serial

parallel +commit

Synchronizationhappens sequentially

Page 35: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

DMP-BufferingDMP-Buffering A = 1A = 1 tmptmp11 = B = B

lock(L)lock(L)

unlock(L)unlock(L)

lock(L)lock(L)

unlock(L)unlock(L)

if (tmpif (tmp11 == 0) == 0) ......

B = 1B = 1 tmptmp22 = A = A

if (tmpif (tmp22 == 0) == 0) ......

parallel +commit

serial

serial

parallel +commit

serial

parallel +commit

Synchronizationhappens sequentially

Data race free programs are sequentially consistent

(required by C++ and Java memory models)

Page 36: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

43

DMP-Buffering: Parallel DMP-Buffering: Parallel CommitCommit

For determinism, the commit order must be deterministic i.e. logically serialFor performance, the commit must happen in parallelBasic idea:•Publish store buffers in parallel•Preserve the commit order on collisions

Page 37: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

44

0xA0xA 00

0xB0xB 00

0xC0xC 00

addr value

0xD0xD 00

0xE0xE 00

Global Memory

DMP-Buffering: Parallel DMP-Buffering: Parallel CommitCommit

0xA0xA 11

0xB0xB 11

0xC0xC 11

0xC0xC 22

0xD0xD 22

0xE0xE 22

addr value addr value

Thre

ad 1

Thread 2

Basic idea:•Commit store buffers in parallel•Preserve the commit order on collisions

Collision!Resolve for Thread 2

Page 38: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

46

OutlineOutlineDMP-Ownership in Software

Why not DMP-TM in Software?

DMP-Buffering

A Few Implementation Details

Performance

Page 39: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

48

• Redundant accesses y = ... x ... z = ... x ...

Remove Instrumention Remove Instrumention From ...From ...

• Accesses to thread-local (non-escaping) objects

don’t need to instrument this

Page 40: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

49

Remove Instrumention Remove Instrumention From ...From ...

• Accesses to thread-local (non-escaping) objects

don’t need to instrument this

DMP-Buffering: requires unification-based points-to analysis int local; int *p = (...) ? &local : &global; ...

must access through the store buffer

• Redundant accesses y = ... x ... z = ... x ...

Page 41: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

50

Remove Instrumention Remove Instrumention From ...From ...

• Accesses to thread-local (non-escaping) objects

don’t need to instrument this• Redundant accesses

y = ... x ... z = ... x ...

DMP-Buffering: requires unification-based points-to analysis int local; int *p = (...) ? &local : &global; ...

must access through the store buffer

Page 42: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

51

Remove Instrumention Remove Instrumention From ...From ...

• Accesses to thread-local (non-escaping) objects

don’t need to instrument this• Redundant accesses

y = ... x ... z = ... x ...

DMP-Buffering: requires unification-based points-to analysis int local; int *p = (...) ? &local : &global; ...

must access through the store buffer

DMP-Buffering: this does not apply

Page 43: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

52

External LibrariesExternal LibrariesWe do not instrument external shared libraries, suchas the system libc•External calls must be serialized

Preventing over-serialization:•We check indirect calls at runtime•We provide deterministic wrappers for common libc functions, e.g. memcpy and malloc•We do not serialize pure libc functions, e.g. sqrt

Page 44: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

53

OutlineOutlineDMP-Ownership in Software

Why not DMP-TM in Software?

DMP-Buffering

A Few Implementation Details

Performance

Page 45: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

54

ScalabilityScalability

splash mean parsec mean

Page 46: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

55

OverheadsOverheads

splash mean parsec mean

Since we preserve scalability, we can

overcome overheads by adding cores

Page 47: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear

56

Wrap UpWrap UpCoreDet•execution-level determinism in software of arbitrary multithreaded programs

DMP-Buffering•uses a relaxed memory consistency model•scales comparably to nondeterministic execution

Thank you!The CoreDet source code will eventually be released at

http://sampa.cs.washington.edu