![Page 1: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/1.jpg)
Fence Complexity in Concurrent Algorithms
Petr KuznetsovTU Berlin/DT-Labs
![Page 2: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/2.jpg)
![Page 3: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/3.jpg)
STM is about ease-of-programmingand efficiency
What is “efficient“ in a concurrent system?
![Page 4: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/4.jpg)
4
Cost metrics
Space: used memoryCheapAdvanced garbage-collection
Time: the number of reads and writes (per operation)the number of stalls
![Page 5: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/5.jpg)
5
Relaxed memory modelsMemory is much slower than CPURead: check the cache -> read the memoryWrite: invalidate the caches -> update the memoryTo overcome “stalled writes” – reorder operations
Reordering may result in inconsistency
![Page 6: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/6.jpg)
6
What is inconsistency?
Process P:
Write(X,1)
Read(Y)
Process Q:
Write(Y,1)
Read(X)
P
QW(Y,1)
R(Y)W(X,1)
R(X)
W(X,1)
![Page 7: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/7.jpg)
7
Possible outcomes
P Q
P reads before Q writes
P reads after Q writes
Q reads after P writes
Q reads before P writes
Out-of-order
![Page 8: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/8.jpg)
8
Fixing out-of-order Memory fences: read-after-write (RAW)
write(X,1)
fence() // enforce the order
read(Y)
P
QW(Y,1)
R(Y)W(X,1)
R(X)
![Page 9: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/9.jpg)
9
Fixing out-of-order Atomic operations: atomic-write-after-read atomic{
read(Y)
…
write(X,1)
}E.g., CAS, TAS, Fetch&Add,…
RAW/AWAR fences take ~60 RMRs
![Page 10: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/10.jpg)
10
Our result
10
Any concurrent program in a certain class must use RAW/AWARs
![Page 11: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/11.jpg)
11
What programs?
Concurrent data types:queues, counters, hash tables, trees,…Non-commutative operationsLinearizable solo-terminating implementations
Mutual exclusion
![Page 12: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/12.jpg)
12
Non-commutative operations
Operation A is non-commutative if there exists operation B where (applied to some state):
A influences Band
B influences A
![Page 13: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/13.jpg)
13
Example: Queue enq(v) – add v to the end of the queue deq() – dequeues the item at the head of the queue
Q=1;2
Q.deq():1;Q.deq():2 vs. Q.deq():2;Q.deq():1deq() influence each other
Q.enq(3):ok;Q.deq():1 vs. Q.deq():1;Q.enq(3):okenq() is commutative
![Page 14: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/14.jpg)
14
Proof sketch A non-commutative operation must write Suppose not
deq():1 deq():11;2
there must be a write!
w
![Page 15: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/15.jpg)
15
Proof sketch Let w be the first write Suppose there are no AWAR
deq():11;2
A(w) - the longest atomic construct containing w
w
w must be the first base-object event in A(w)!
![Page 16: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/16.jpg)
16
Proof sketch Suppose there are no RAWs
deq():11;2
No RAW - no difference for deq()!
deq():1
A(w)
![Page 17: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/17.jpg)
17
Mutual exclusionLock() – acquire the lockUnlock() – release the lock (Mutex) No two process holds the lock at the
same time (Deadlock-freedom) If at least one process
executes Lock() and no active process fails, at least one process acquires the lock
Two Lock() operations influence each other!
![Page 18: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/18.jpg)
18
Our result
18
In any implementation of mutual exclusion or a concurrent data type with a non-
commutative operation op, a complete execution of op or lock() contains a
RAW or AWAR
Every successful lock acquire incurs a RAW/AWAR fence
![Page 19: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/19.jpg)
19
Why do we care?
Hardware design: what primitives must be optimized?
API design: returned values matterSet with add returning fail vs. returning ok
Verification – early catch of obviously incorrect algorithm
![Page 20: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/20.jpg)
20
What’s next? Weaker primitives?
Idempotent Work Stealing [Michael et al,PPoPP’09 ] Tight lower bounds?
How many RAW/AWAR fences are incurred? Other patterns
Read-after-readWrite-after-writeMulti-RAW:
write(Xi,1)
collect(X1,..,Xn)
![Page 21: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/21.jpg)
21
References H. Attiya, R. Guerraoui, D. Hendler, P. Kuznetsov,
M. Michael, M. VechevLaws of Order: Expensive Synchronization in Concurrent Algorithms Cannot be EliminatedIn POPL 2011
Srivatsan’s talk on STM fence complexity, TR on the way
![Page 22: Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs](https://reader035.vdocuments.mx/reader035/viewer/2022062520/5697bf7a1a28abf838c82de7/html5/thumbnails/22.jpg)
22
QUESTIONS?