![Page 1: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/1.jpg)
AUTOMATIC VERIFICATION AND FENCE INFERENCE
FOR RELAXED MEMORY MODELS
Martin VechevIBM Research
Michael KupersteinTechnion
Eran YahavTechnion
(FMCAD’10, PLDI’11)
![Page 2: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/2.jpg)
2
Textbook Example: Peterson’s Algorithm
Specification: mutual exclusion over critical section
Process 0:while(true) { store ent0 = true; store turn = 1; do { load e = ent1; load t = turn; } while(e == true && t == 1); //Critical Section here store ent0 = false;}
Process 1:while(true) { store ent1 = true; store turn = 0; do { load e = ent0; load t = turn; } while(e == true && t == 0); //Critical Section here store ent1 = false;}
![Page 3: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/3.jpg)
3
Beyond Textbooks: Relaxed Memory Models
Specification: mutual exclusion over critical section
Process 0:while(true) { store ent0 = true; store turn = 1; do { load e = ent1; load t = turn; } while(e == true && t == 1); //Critical Section here store ent0 = false;}
Process 1:while(true) { store ent1 = true; store turn = 0; do { load e = ent0; load t = turn; } while(e == true && t == 0); //Critical Section here store ent1 = false;}
Re-ordering of operations Non-atomic stores
![Page 4: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/4.jpg)
4
Memory Fences
Enforce order… at a cost Fences are expensive
10s-100s of cycles collateral damage (e.g., prevent compiler opts)
example: removing a single fence yields 3x speedup in a work-stealing queue
Required fences depend on memory model
Where should I put fences?
![Page 5: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/5.jpg)
5
Where should I put fences?
On the other, synchronization bugs can be very difficult to track down, so memory barriers should be used liberally, rather than relying on complex platform-specific guarantees about limits to memory instruction reordering.
On the one hand, memory barriers are expensive (100s of cycles, maybe more), and should be used only when necessary.
– Herlihy and Shavit, The Art of Multiprocessor Programming.
![Page 6: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/6.jpg)
6
May Seem Easy…
Specification: mutual exclusion over critical section
Process 0:while(true) { store ent0 = true; fence; store turn = 1; fence; do { load e = ent1; load t = turn; } while(e == true && t == 1); //Critical Section here store ent0 = false;}
Process 1:while(true) { store ent1 = true; fence; store turn = 0; fence; do { load e = ent0; load t = turn; } while(e == true && t == 0); //Critical Section here store ent1 = false;}
![Page 7: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/7.jpg)
7
Chase-Lev Work-Stealing Queue1 int take() {2 long b = bottom – 1;3 item_t * q = wsq;4 bottom = b5 long t = top6 if (b < t) {7 bottom = t;8 return EMPTY;9 }10 task = q->ap[b % q->size];11 if (b > t)12 return task13 if (!CAS(&top, t, t+1))14 return EMPTY;15 bottom = t + 1;16 return task;17 }
1 void push(int task) {2 long b = bottom;3 long t = top;4 item_t * q = wsq;5 if (b – t >= q->size – 1) {6 wsq = expand();7 q = wsq;8 }9 q->ap[b % q->size] = task;10 bottom = b + 1;11 }
1 int steal() {2 long t = top;3 long b = bottom;4 item_t * q = wsq;5 if (t >= b)6 return EMPTY;7 task = q->ap[t % q->size];8 if (!CAS(&top, t, t+1))9 return ABORT;10 return task;11 }
Specification: no lost items, no phantom items, memory safety
![Page 8: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/8.jpg)
8
Fender
Help the programmer place fences Find optimal fence placement
Ideal setting for synthesis Restrict non-determinism s.t.
program stays within set of safe executions
![Page 9: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/9.jpg)
9
Goal
P’ satisfies the specification S under M
FENDER
ProgramP
Specification S
MemoryModel
M
Program P’
with Fences
![Page 10: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/10.jpg)
10
Naïve Approach: Recipe
Compute reachable states for the program
Compute constraints on execution that guarantee that all “bad states” are avoided
Implement the constraints with fences
Bad News [Atig et. al POPL’10]
Reachability undecidable for RMO
Non-primitive recursive complexity for TSO/PSO
![Page 11: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/11.jpg)
11
Store Buffers
…P0
MainMemor
y
…P1
……
……
ent0ent1turn
ent0ent1turn
![Page 12: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/12.jpg)
12
Unbounded Store Buffers…
Even for very simple patterns e.g. spin-loops with writes in loop body
flag := true while other_flag = true { flag := false //Do something flag := true }
true
false
true
false
true
false
true
![Page 13: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/13.jpg)
13
What can we do?
Under-approximate Bound the length of execution buffers
[FMCAD’10] Implies a bound on state space
Bound context switches (Ahmed’s talk from yesterday)
Dynamic synthesis of fences (In progress)
Over-approximate Sound abstraction of buffer content [PLDI’11]
![Page 14: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/14.jpg)
14
Abstract Interpretation for RMM Main Idea: Bounded over-
approximation of unbounded buffers
Hierarchy of abstract memory models
Strictly weaker than concrete TSO/PSO Maintain partial-coherence
![Page 15: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/15.jpg)
15
First Attempt: Set Abstraction
Abstract each store buffer as a set
Process 0:while(true) { store ent0 = true; fence; store turn = 1; fence; do { load e = ent1; load t = turn; } while(e == true && t == 1); //Critical Section here store ent0 = false;}
Process 1:while(true) { store ent1 = true; fence; store turn = 0; fence; do { load e = ent0; load t = turn; } while(e == true && t == 0); //Critical Section here store ent1 = false;}
…P0
MainMemor
y
…P1
……
……
ent0ent1turn
ent0ent1turn
{ }{ }{ }
{1}
{true} { }{ false }{ false, true }
{ } { } { } { }
![Page 16: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/16.jpg)
16
Second Attempt: record most recent value
Process 0
X := 1
Process 1
while (X != 1) { nop } X := 2fence e := Xassert e == 2;
Initially X == 0
P0Main
Memory
P1
{ }
{ }
{ 1 }
X = 0X = 1
{2 }X = 2
flush (1st time)
{ }
flush (2nd time)
![Page 17: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/17.jpg)
17
Abstract Memory Models - Requirements
Intra-process coherence: a process should see the most recent value it wrote
Preserve fence semantics
Partial inter-process coherence: preserve as much order information as feasible (bounded) Enable strong flushes
Sound Simple construction!
![Page 18: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/18.jpg)
18
Partial Coherence Abstractions
…P0
MainMemor
y
…P1
……
……
ent0ent1turn
ent0ent1turn
P0
MainMemor
y
P1
ent0
turn
ent0ent1turn
Recent value
Bounded
length kUnordered elements
ent1
Allows precise fence semantics
Allows precise loads from bufferKeeps the analysis precise for “normal” programs
Fallback for soundness
![Page 19: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/19.jpg)
19
Intuition
Making unbounded number of writes to a memory location without a flush? Seems very esoteric Your program is suspicious
flag := true while other_flag = true { flag := false //Do something flag := true }
![Page 20: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/20.jpg)
Adjusted Recipe
Compute (abstract) reachable states for the program using partial-coherence abstraction
Compute constraints on execution that guarantee that all “bad states” are avoided
Implement the constraints with fences
![Page 21: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/21.jpg)
21
Precision/fences tradeoff
Different abstractions lead to different fences More precise abstraction – potentially
fewer fences
In the paper SC-Recoverability PSO as an abstraction of TSO Partially disjunctive abstraction allows
more aggressive merging of buffer, scales better
![Page 22: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/22.jpg)
22
Inference Results
Mostly mutual exclusion primitives Different variations of the abstractionProgram FD k=0 FD k=1 PD k=0 PD k=1 PD k=2Sense0 Pet0 Dek0 Lam0 T/O T/O T/O
Fast0 T/O T/O T/O T/O
Fast1a Fast1b Fast1c T/O T/O
![Page 23: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/23.jpg)
23
Limitations…
More relaxed memory models Need reasonable concrete semantics
first
What if the program is infinite-space? Or simply large Partial-coherence isn’t enough We want to also use “standard” abstract
interpretation
![Page 24: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/24.jpg)
24
Composing Abstractions
Trivial for value abstractions Abstract buffers of abstract values
Challenging for more complex abstractions Heap abstractions Predicate abstractions Buffers for “abstract locations” (?)
![Page 25: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/25.jpg)
25
Summary
Partial-coherence abstractions Verification without context bounds but possibly with false alarms
Automatic fence inference Precision affects quality of results
In progress Combination of abstractions Scalable (dynamic) fence inference
![Page 26: Automatic verification and fence inference for relaxed memory models](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816692550346895dda7248/html5/thumbnails/26.jpg)
26
Invited Questions
1. Do you ever need to use k>1 ?2. In your abstraction, do you ever fall
into the unordered set?3. How far can we expect this to scale?4. Why over-approximation for fence
inference?5. Can you explain how the inference
works?