by sarita adve & kourosh gharachorloo slides by jim larson shared memory consistency models: a...
TRANSCRIPT
![Page 1: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/1.jpg)
By Sarita Adve & Kourosh Gharachorloo
Slides by Jim Larson
Shared Memory Consistency Models:A Tutorial
![Page 2: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/2.jpg)
Outline
Concurrent programming on a uniprocessor The effect of optimizations on a uniprocessor The effect of the same optimizations on a
multiprocessor Methods for restoring sequential consistency Conclusion
![Page 3: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/3.jpg)
Outline
Concurrent programming on a uniprocessor The effect of optimizations on a uniprocessor The effect of the same optimizations on a
multiprocessor Methods for restoring sequential consistency Conclusion
![Page 4: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/4.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 0
![Page 5: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/5.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 1
![Page 6: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/6.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 1
![Page 7: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/7.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 1
![Page 8: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/8.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 1
Flag1 = 1
![Page 9: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/9.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 1
Flag1 = 1
![Page 10: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/10.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 1
Flag1 = 1
Critical Section is ProtectedWorks the same if Process 2 runs first!
Process 2 enters its Critical Section
![Page 11: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/11.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 1
Arbitrary interleaving of Processes
![Page 12: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/12.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 1
Flag1 = 1
Arbitrary interleaving of Processes
![Page 13: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/13.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 1
Flag1 = 1
Both processes can block but the critical section remains protected.
Deadlock can be fixed by extending the algorithm with turn-taking
![Page 14: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/14.jpg)
Outline
Concurrent Programming on a Uniprocessor The effect of optimizations on a Uniprocessor The effect of the same optimizations on a
Multiprocessor without Sequential Consistency Methods for restoring Sequential Consistency Conclusion
![Page 15: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/15.jpg)
Optimization: Write Buffer with Bypass
SpeedUp: Write takes 100 cycles, buffering takes 1 cycle. So Buffer and keep going.
Problem: Read from a Location with a buffered Write pending??
(Single Processor Case)
![Page 16: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/16.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 0
Write Buffering
Flag1 = 1
![Page 17: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/17.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 0
Write Buffering
Flag1 = 1Flag2 = 1
![Page 18: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/18.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 0
Write Buffering
Flag1 = 1Flag2 = 1
![Page 19: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/19.jpg)
Optimization: Write Buffer with Bypass
SpeedUp: Write takes 100 cycles, buffering takes 1 cycle.
Rule: If a WRITE is issued, buffer it and keep executing
Unless: there is a READ from the same location (subsequent WRITEs don't matter),
then wait for the WRITE to complete.
![Page 20: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/20.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 0
Write BufferingRule: If a WRITE is issued, buffer it and keep executingUnless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.
STALL!
Flag1 = 1Flag2 = 1
![Page 21: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/21.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 1
Write BufferingRule: If a WRITE is issued, buffer it and keep executingUnless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.
Flag2 = 1
![Page 22: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/22.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 0
Does this work for Multiprocessors??
![Page 23: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/23.jpg)
Outline
Concurrent programming on a uniprocessor The effect of optimizations on a uniprocessor The effect of the same optimizations on a
multiprocessor Methods for restoring sequential consistency Conclusion
![Page 24: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/24.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 0
Does this work for Multiprocessors?
![Page 25: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/25.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 0
Rule: If a WRITE is issued, buffer it and keep executingUnless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.
Multiprocessor Case
![Page 26: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/26.jpg)
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 0
Rule: If a WRITE is issued, buffer it and keep executingUnless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.
Multiprocessor Case
Flag1 = 1
Process 1::Flag1 = 1If (Flag2 == 0) critical section
![Page 27: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/27.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 0
Rule: If a WRITE is issued, buffer it and keep executingUnless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.
Multiprocessor Case
Flag1 = 1 Flag2 = 1
Process 2::Flag2 = 1If (Flag1 == 0) critical section
![Page 28: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/28.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 0
Rule: If a WRITE is issued, buffer it and keep executingUnless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.
Multiprocessor Case
Flag1 = 1 Flag2 = 1
Process 2::Flag2 = 1If (Flag1 == 0) critical section
![Page 29: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/29.jpg)
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 0
Rule: If a WRITE is issued, buffer it and keep executingUnless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.
Multiprocessor Case
Flag1 = 1 Flag2 = 1
![Page 30: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/30.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 0
Rule: If a WRITE is issued, buffer it and keep executingUnless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.
Multiprocessor Case
Flag1 = 1 Flag2 = 1
![Page 31: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/31.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 0
Rule: If a WRITE is issued, buffer it and keep executingUnless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.
Multiprocessor Case
Flag1 = 1 Flag2 = 1
![Page 32: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/32.jpg)
What happens on a Processor stays on that
Processor
![Page 33: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/33.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 0
Rule: If a WRITE is issued, buffer it and keep executingUnless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.
Processor 2 knows nothing about the write to Flag1, so has no reason to stall!
Flag1 = 1 Flag2 = 1
Process 2::Flag2 = 1If (Flag1 == 0) critical section
![Page 34: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/34.jpg)
A more general way to look at the Problem: Reordering of Reads and Writes (Loads and
Stores).
![Page 35: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/35.jpg)
Process 1::Flag1 = 1If (Flag2 == 0) critical section
Process 2::Flag2 = 1If (Flag1 == 0) critical section
Consider the Instructions in these processes.
WX
RX
WY
RY
Simplify as:
![Page 36: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/36.jpg)
123456789101112131415161718192021222324
RYRYWYRXWYRXWXWXWXWXWXWXWYRXRYRYRXWYWYRXRYRYRXWY
WYRXRYRYRXWYWYRXRYRYRXWYWXWXWXWXWXWXRXWYRXWYRYRY
RXWYRXWYRYRYRXWYRXWYRYRYRXWYRXWYRYRYWXWXWXWXWXWX
There are 4! or 24 possible orderings.
If either WX<RX or WY<RYThen the Critical Section is protected (Correct Behavior).
WXWXWXWXWXWXRYRYWYRXWYRXRYRYWYRXWYRXRYRYWYRXWYRX
![Page 37: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/37.jpg)
WXWXWXWXWXWXRYRYWYRXWYRXRYRYWYRXWYRXRYRYWYRXWYRX
RYRYWYRXWYRXWXWXWXWXWXWXWYRXRYRYRXWYWYRXRYRYRXWY
WYRXRYRYRXWYWYRXRYRYRXWYWXWXWXWXWXWXRXWYRXWYRYRY
RXWYRXWYRYRYRXWYRXWYRYRYRXWYRXWYRYRYWXWXWXWXWXWX
There are 4! or 24 possible orderings.
If either WX<RX or WY<RYThen the Critical Section is protected (Correct Behavior)
18 of the 24 orderings are OK.But the other 6 are trouble!
123456789101112131415161718192021222324
![Page 38: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/38.jpg)
Consider another example...
![Page 39: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/39.jpg)
Process 1::Data = 2000;Head = 1;
Process 2::While (Head == 0) {;}LocalValue = Data
Global Data Initialized to 0
Head = 0 Data =
0Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.
Memory Interconnect
![Page 40: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/40.jpg)
Process 1::Data = 2000;Head = 1;
Process 2::While (Head == 0) {;}LocalValue = Data
Global Data Initialized to 0
Head = 0 Data =
0Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.
Memory InterconnectData = 2000
![Page 41: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/41.jpg)
Process 1::Data = 2000;Head = 1;
Process 2::While (Head == 0) {;}LocalValue = Data
Global Data Initialized to 0
Head = 0 Data =
0Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.
Memory InterconnectData = 2000Head = 1
![Page 42: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/42.jpg)
Process 1::Data = 2000;Head = 1;
Process 2::While (Head == 0) {;}LocalValue = Data
Global Data Initialized to 0
Head = 1 Data =
0Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.
Memory InterconnectData = 2000
![Page 43: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/43.jpg)
Process 1::Data = 2000;Head = 1;
Process 2::While (Head == 0) {;}LocalValue = Data
Global Data Initialized to 0
Head = 1 Data =
0Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.
Memory InterconnectData = 2000
![Page 44: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/44.jpg)
Process 1::Data = 2000;Head = 1;
Process 2::While (Head == 0) {;}LocalValue = Data
Global Data Initialized to 0
Head = 1 Data =
0Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.
Memory InterconnectData = 2000
![Page 45: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/45.jpg)
Process 1::Data = 2000;Head = 1;
Process 2::While (Head == 0) {;}LocalValue = Data
Global Data Initialized to 0
Head = 1 Data =
2000Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.
Fix: Write must be acknowledged before another write (or read) from the same processor.
Memory Interconnect
![Page 46: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/46.jpg)
Process 1::Data = 2000;Head = 1;
Process 2::While (Head == 0) {;}LocalValue = Data
Global Data Initialized to 0
Head = 0 Data = 0
Memory Interconnect
Non-Blocking Reads: Lockup-free Caches, speculative execution, dynamic scheduling allow execution to proceed past a Read. Assume Writes are acknowledged.
![Page 47: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/47.jpg)
Process 1::Data = 2000;Head = 1;
Process 2::While (Head == 0) {;}LocalValue = Data
Global Data Initialized to 0
Head = 0 Data = 0
Memory Interconnect
Non-Blocking Reads: Lockup-free Caches, speculative execution, dynamic scheduling allow execution to proceed past a Read. Assume Writes are acknowledged.
![Page 48: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/48.jpg)
Process 1::Data = 2000;Head = 1;
Process 2::While (Head == 0) {;}LocalValue = Data
Global Data Initialized to 0
Head = 0 Data = 0
Memory Interconnect
Non-Blocking Reads: Lockup-free Caches, speculative execution, dynamic scheduling allow execution to proceed past a Read. Assume Writes are acknowledged.
![Page 49: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/49.jpg)
Process 1::Data = 2000;Head = 1;
Process 2::While (Head == 0) {;}LocalValue = Data (0)
Global Data Initialized to 0
Head = 0 Data = 0
Memory Interconnect
Non-Blocking Reads: Lockup-free Caches, speculative execution, dynamic scheduling allow execution to proceed past a Read. Assume Writes are acknowledged.
![Page 50: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/50.jpg)
Process 1::Data = 2000;Head = 1;
Process 2::While (Head == 0) {;}LocalValue = Data (0)
Global Data Initialized to 0
Head = 0 Data =
2000
Memory Interconnect
Non-Blocking Reads: Lockup-free Caches, speculative execution, dynamic scheduling allow execution to proceed past a Read. Assume Writes are acknowledged.
![Page 51: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/51.jpg)
Process 1::Data = 2000;Head = 1;
Process 2::While (Head == 0) {;}LocalValue = Data (0)
Global Data Initialized to 0
Head = 1 Data =
2000
Memory Interconnect
Non-Blocking Reads: Lockup-free Caches, speculative execution, dynamic scheduling allow execution to proceed past a Read. Assume Writes are acknowledged.
![Page 52: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/52.jpg)
Process 1::Data = 2000;Head = 1;
Process 2::While (Head == 0) {;}LocalValue = Data (0)
Global Data Initialized to 0
Head = 1 Data =
2000
Memory Interconnect
Non-Blocking Reads: Lockup-free Caches, speculative execution, dynamic scheduling allow execution to proceed past a Read. Assume Writes are acknowledged.
![Page 53: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/53.jpg)
Process 1::Data = 2000;Head = 1;
Process 2::While (Head == 0) {;}LocalValue = Data (0)
Global Data Initialized to 0
Head = 1 Data =
2000
Memory Interconnect
Non-Blocking Reads: Lockup-free Caches, speculative execution, dynamic scheduling allow execution to proceed past a Read. Assume Writes are acknowledged.
![Page 54: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/54.jpg)
Let's reason about reordering of reads and writes again.
![Page 55: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/55.jpg)
Process 1::Data = 2000;Head = 1;
Process 2::While (Head == 0) {;}LocalValue = Data
Consider the Instructions in these processes.
WX
RXWY
RY
Simplify as:
![Page 56: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/56.jpg)
WXWXWXWXWXWXRYRYWYRXWYRXRYRYWYRXWYRXRYRYWYRXWYRX
RYRYWYRXWYRXWXWXWXWXWXWXWYRXRYRYRXWYWYRXRYRYRXWY
WYRXRYRYRXWYWYRXRYRYRXWYWXWXWXWXWXWXRXWYRXWYRYRY
RXWYRXWYRYRYRXWYRXWYRYRYRXWYRXWYRYRYWXWXWXWXWXWX
Correct behavior requires WX<RX, WY<RY. Program requires WY<RX.=> 6 correct orders out of 24.
123456789101112131415161718192021222324
![Page 57: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/57.jpg)
WXWXWXWXWXWXRYRYWYRXWYRXRYRYWYRXWYRXRYRYWYRXWYRX
RYRYWYRXWYRXWXWXWXWXWXWXWYRXRYRYRXWYWYRXRYRYRXWY
WYRXRYRYRXWYWYRXRYRYRXWYWXWXWXWXWXWXRXWYRXWYRYRY
RXWYRXWYRYRYRXWYRXWYRYRYRXWYRXWYRYRYWXWXWXWXWXWX
Correct behavior requires WX<RX, WY<RY. Program requires WY<RX.=> 6 correct orders out of 24.
123456789101112131415161718192021222324
Write Acknowledgment means WX < WY. Does that Help?
Disallows only 12 out of 24.9 still incorrect!
![Page 58: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/58.jpg)
Outline
Concurrent programming on a uniprocessor The effect of optimizations on a uniprocessor The effect of the same optimizations on a
multiprocessor Methods for restoring sequential consistency Conclusion
![Page 59: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/59.jpg)
Sequential Consistency for Multiprocessors
Why is it surprising that these code examples break on a multi-processor?
What ordering property are we assuming (incorrectly!) that multiprocessors support?
Are we assuming that they are sequentially consistent?
![Page 60: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/60.jpg)
Sequential Consistency
Sequential Consistency requires that the result of any execution be the same as if the memory
accesses executed by each processor were kept in order and the accesses among different
processors were interleaved arbitrarily....appears as if a memory operation executes
atomically or instantaneously with respect to other memory operations
(Hennessy and Patterson, 4th ed.)
![Page 61: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/61.jpg)
Understanding Ordering
Program Order Compiled Order Interleaving Order Execution Order
![Page 62: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/62.jpg)
Reordering
Writes reach memory, and Reads see memory in an order different than that in the Program.
Caused by Processor Caused by Multiprocessors (and Cache) Caused by Compilers
![Page 63: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/63.jpg)
What Are the Choices?
If we want our results to be the same as those of a Sequentially Consistent Model. Do we:
Enforce Sequential Consistency at the memory level?
Use Coherent (Consistent) Cache ? Or what ?
![Page 64: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/64.jpg)
Enforce Sequential Consistency?
Removes virtually all optimizations => Too Slow!
![Page 65: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/65.jpg)
What Are the Choices?
If we want our results to be the same as those of a Sequentially Consistent Model. Do we:
Enforce Sequential Consistency at the memory level?
Use Coherent (Consistent) Caches ? Or what ?
![Page 66: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/66.jpg)
Cache Coherence
Multiple processors have a consistent view of memory (i.e. by using the MESI protocol)
But this does not say when a processor must see a value updated by another processor.
Cache coherency does not guarantee Sequential Consistency!
Example: a write-through cache acts just like a write buffer with bypass.
![Page 67: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/67.jpg)
What Are the Choices?
Enforce Sequential Consistency ? Too Slow! Use Coherent (Consistent) Caches ? Won't
help! What's left ?????
![Page 68: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/68.jpg)
Involve the Programmer?
![Page 69: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/69.jpg)
If you don't talk to your CPU about concurrency, who's going to??
![Page 70: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/70.jpg)
What Are the Choices?
Enforce Sequential Consistency? (Too Slow!) Use Coherent (Consistent) Cache? (Won't help) Provide Memory Barrier (Fence) Instructions?
![Page 71: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/71.jpg)
Barrier Instructions
Methods for overriding relaxations of the Sequential Consistency Model.
Also known as a Safety Net. Example: A Fence would require buffered
Writes to complete before allowing further execution.
Not Cheap, but not often needed. Must be placed by the Programmer. Memory Consistency Model for Processor tells
you how.
![Page 72: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/72.jpg)
Process 1::Flag1 = 1>>Mem_Bar<<If (Flag2 == 0) critical section
Process 2::Flag2 = 1>>Mem_Bar<<If (Flag1 == 0) critical section
Consider the Instructions in these processes.
WX
RX
WY
RY
Simplify as:
>>Fence<< >>Fence<<
Fence: WX < RY Fence: WY < RX
![Page 73: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/73.jpg)
WXWXWXWXWXWXRYRYWYRXWYRXRYRYWYRXWYRXRYRYWYRXWYRX
RYRYWYRXWYRXWXWXWXWXWXWXWYRXRYRYRXWYWYRXRYRYRXWY
WYRXRYRYRXWYWYRXRYRYRXWYWXWXWXWXWXWXRXWYRXWYRYRY
RXWYRXWYRYRYRXWYRXWYRYRYRXWYRXWYRYRYWXWXWXWXWXWX
There are 4! or 24 possible orderings.
If either WX<RX or WY<RYThen the Critical Section is protected (Correct Behavior)
18 of the 24 orderings are OK.But the other 6 are trouble!
123456789101112131415161718192021222324
Enforce WX<RY and WY<RX.
Only 6 of the 18 good orderings are allowed OK.But the 6 bad ones are forbidden!
![Page 74: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/74.jpg)
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 0Flag1 = 1
Process 1::Flag1 = 1>>Mem_Bar<<If (Flag2 == 0) critical section
Process 2::Flag2 = 1>>Mem_Bar<<If (Flag1 == 0) critical section
Fence: Wait for pending I/O to complete before more I/O (includes cache updates).
![Page 75: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/75.jpg)
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 0Flag1 = 1
Process 1::Flag1 = 1>>Mem_Bar<<If (Flag2 == 0) critical section
Process 2::Flag2 = 1>>Mem_Bar<<If (Flag1 == 0) critical section
Fence: Wait for pending I/O to complete before more I/O (includes cache updates).
Flag2 = 1
![Page 76: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/76.jpg)
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 1
Process 1::Flag1 = 1>>Mem_Bar<<If (Flag2 == 0) critical section
Process 2::Flag2 = 1>>Mem_Bar<<If (Flag1 == 0) critical section
Fence: Wait for pending I/O to complete before more I/O (includes cache updates).
Flag2 = 1Flag1 = 1
![Page 77: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/77.jpg)
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 1
Process 1::Flag1 = 1>>Mem_Bar<<If (Flag2 == 0) critical section
Process 2::Flag2 = 1>>Mem_Bar<<If (Flag1 == 0) critical section
Fence: Wait for pending I/O to complete before more I/O (includes cache updates).
Flag2 = 1
![Page 78: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/78.jpg)
Dekker's Algorithm: Global Flags Init to 0
Flag2 = 0
Flag1 = 1
Process 1::Flag1 = 1>>Mem_Bar<<If (Flag2 == 0) critical section
Process 2::Flag2 = 1>>Mem_Bar<<If (Flag1 == 0) critical section
Flag2 = 1
Flag1 protects critical section when Process 2 continues at Mem_Bar.
![Page 79: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/79.jpg)
Process 1::Data = 2000;>>Mem_Bar<<Head = 1;
Process 2::While (Head == 0) {;}>>Mem_Bar<<LocalValue = Data
Consider the Instructions in these processes.
WX
RXWY
RY
Simplify as:
>>Fence<< >>Fence<<
Fence: WX < WY Fence: RY < RX
![Page 80: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/80.jpg)
WXWXWXWXWXWXRYRYWYRXWYRXRYRYWYRXWYRXRYRYWYRXWYRX
RYRYWYRXWYRXWXWXWXWXWXWXWYRXRYRYRXWYWYRXRYRYRXWY
WYRXRYRYRXWYWYRXRYRYRXWYWXWXWXWXWXWXRXWYRXWYRYRY
RXWYRXWYRYRYRXWYRXWYRYRYRXWYRXWYRYRYWXWXWXWXWXWX
Correct behavior requires WX<RX, WY<RY. Program requires WY<RX.=> 6 correct orders out of 24.
123456789101112131415161718192021222324
We can require WX<WY and RY<RX. Is that enough?Program requires WY<RX.Thus, WX<WY<RY<RX; hence WX<RX and WY<RY.
Only 2 of the 6 good orderings are allowed -But all 18 incorrect orderings are forbidden.
![Page 81: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/81.jpg)
Global Data Initialized to 0
Head = 0 Data =
0
Memory InterconnectData = 2000
Process 1::Data = 2000;>>Mem_Bar<<Head = 1;
Process 2::While (Head == 0) {;}>>Mem_Bar<<LocalValue = Data
Fence: Wait for pending I/O to complete before more I/O (includes cache updates).
![Page 82: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/82.jpg)
Global Data Initialized to 0
Head = 0 Data =
0
Memory InterconnectData = 2000
Process 1::Data = 2000;>>Mem_Bar<<Head = 1;
Process 2::While (Head == 0) {;}>>Mem_Bar<<LocalValue = Data
Fence: Wait for pending I/O to complete before more I/O (includes cache updates).
![Page 83: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/83.jpg)
Global Data Initialized to 0
Head = 0 Data =
2000
Memory InterconnectData = 2000
Process 1::Data = 2000;>>Mem_Bar<<Head = 1;
Process 2::While (Head == 0) {;}>>Mem_Bar<<LocalValue = Data
Fence: Wait for pending I/O to complete before more I/O (includes cache updates).
![Page 84: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/84.jpg)
Global Data Initialized to 0
Head = 1 Data =
2000
Memory Interconnect
Process 1::Data = 2000;>>Mem_Bar<<Head = 1;
Process 2::While (Head == 0) {;}>>Mem_Bar<<LocalValue = Data
Fence: Wait for pending I/O to complete before more I/O (includes cache updates).
![Page 85: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/85.jpg)
Global Data Initialized to 0
Head = 1 Data =
2000
Memory Interconnect
Process 1::Data = 2000;>>Mem_Bar<<Head = 1;
Process 2::While (Head == 0) {;}>>Mem_Bar<<LocalValue = Data
Fence: Wait for pending I/O to complete before more I/O (includes cache updates).
When Head reads as 1, Data will have the correct value when Process 2 continues at Mem_Bar.
![Page 86: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/86.jpg)
Results appear in a Sequentially Consistent manner.
![Page 87: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/87.jpg)
I've never heard of this. Is this for Real??
![Page 88: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/88.jpg)
Memory Ordering in Modern Microprocessors, Part I
Linux provides a carefully chosen set of memory-barrier primitives, as follows:
smp_mb(): “memory barrier” that orders both loads and stores. This means loads and stores preceding the memory barrier are committed to memory before any loads and stores following the memory barrier.
smp_rmb(): “read memory barrier” that orders only loads.
smp_wmb(): “write memory barrier” that orders only stores.
![Page 89: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/89.jpg)
OK, I get it. So what's a Programmer supposed to do??
![Page 90: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/90.jpg)
Words of Advice?
“The difficult problem is identifying the ordering constraints that are necessary for correctness.”
“...the programmer must still resort to reasoning with low level reordering optimizations to determine whether sufficient orders are enforced.”
“...deep knowledge of each CPU's memory-consistency model can be helpful when debugging, to say nothing of writing architecture-specific code or synchronization primitives.”
![Page 91: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/91.jpg)
Memory Consistency Models
Explain what relaxations of Sequential Consistency are implemented.
Explain what Barrier statements are available to avoid them.
Provided for every processor (YMMV).
![Page 92: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/92.jpg)
Memory Consistency Models
![Page 93: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/93.jpg)
Programmer's View
What does a programmer need to do? How do they know when to do it? Compilers & Libraries can help, but still need to
use primitives in parallel programming (like in a kernel).
Assuming the worst and synchronizing everything results in sequential consistency. Too slow, but a good way to debug.
![Page 94: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/94.jpg)
How to Reason about Sequential Consistency
Applies to parallel programs (Kernel!) Parallel Programming Language may provide
the protection (DoAll loops). Language may have types to use. Distinguish Data and Sync regions. Library may provide primitives (Linux). How to know if you need synchronization?
![Page 95: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/95.jpg)
Outline
Concurrent programming on a uniprocessor The effect of optimizations on a uniprocessor The effect of the same optimizations on a
multiprocessor Methods for restoring sequential consistency Conclusion
![Page 96: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/96.jpg)
Conclusion
Parallel programming on a Multiprocessor that relaxes the Sequentially Consistent Model presents new challenges.
Know the memory consistency models for the processors you use.
Use barrier (fence) instructions to allow optimizations while protecting your code.
Simple examples were used, there are others much more subtle. The fix is basically the same.
![Page 97: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/97.jpg)
Conclusion
Parallel programming on a Multiprocessor that relaxes the Sequentially Consistent Model presents new challenges.
Know the memory consistency models for the processors you use.
Use barrier (fence) instructions to allow optimizations while protecting your code.
Simple examples were used, there are others much more subtle. The fix is basically the same.
![Page 98: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/98.jpg)
Conclusion
Parallel programming on a Multiprocessor that relaxes the Sequentially Consistent Model presents new challenges.
Know the memory consistency models for the processors you use.
Use barrier (fence) instructions to allow optimizations while protecting your code.
Simple examples were used, there are others much more subtle. The fix is basically the same.
![Page 99: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/99.jpg)
Conclusion
Parallel programming on a Multiprocessor that relaxes the Sequentially Consistent Model presents new challenges.
Know the memory consistency models for the processors you use.
Use barrier (fence) instructions to allow optimizations while protecting your code.
Simple examples were used, there are others much more subtle. The fix is basically the same.
![Page 100: By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649f295503460f94c41b56/html5/thumbnails/100.jpg)
References
Shared Memory Consistency Models: A Tutorial By Sarita Adve & Kourosh Gharachorloo
Vince Shuster Presentation for CS533, Winter, 2010
Memory Ordering in Modern Microprocessors, Part I, Paul E. McKenney, Linux Journel, June, 2005
Computer Architecture, Hennessy and Patterson, 4th Ed., 2007