implementation and performance of munin (distributed shared memory system) dongying li department of...
TRANSCRIPT
![Page 1: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/1.jpg)
Implementation and Performance of Munin Implementation and Performance of Munin (Distributed Shared Memory System)(Distributed Shared Memory System)
Dongying Li
Department of Electrical and Computer Engineering
University of Toronto
(Original Authors: J. B. Carter, et al.)
ECE 1147, Parallel ComputationOct. 30, 2006
![Page 2: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/2.jpg)
2
Distributed Shared Memory
• Shared address space spanning the processors of a distributed memory multiprocessor
proc1 proc3
X=0
X=0 X=0
proc2
X=0
![Page 3: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/3.jpg)
3
Distributed Shared Memory
mem0
proc0
mem1
proc1
mem2
proc2
memN
procN
network
...
shared memory
![Page 4: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/4.jpg)
4
Distributed Shared Memory
• Challenges– Good performance comparable to shared memory
programs
– No significant deviation from shared memory coding model
– Low communication and message passing overheads
![Page 5: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/5.jpg)
5
Munin System
• Characterized features– Software released consistency– Multiple consistency protocols
• Deviations from shared memory model– Annotated shared memory variable pattern– All Synchronization visible to system
![Page 6: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/6.jpg)
6
Contents
• Basic concepts– Shared object– Software release consistency– Multiple consistency protocols
• Software implementation– Prototype overview– Execution process– Advanced programming features– Data object directory and delayed update queue– Synchronization
• Performance• Overview of other DSM systems• Conclusion
![Page 7: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/7.jpg)
7
Basic Concepts
• Basic concepts– Shared object– Software release consistency– Multiple consistency protocols
• Software implementation– Prototype overview– Execution process– Advanced programming features– Data object directory and delayed update queue– Synchronization
• Performance• Overview of other DSM systems• Conclusion
![Page 8: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/8.jpg)
8
Shared Object
x
y
x
x
8-kilo 8-kilo 8-kilo
![Page 9: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/9.jpg)
9
Software Release Consistency
• Sequential Consistency– All processors observe the same order– Must correspond to some serial order– Only ordering constraint is that reads/writes of P1
appear in the same order, but no restrictions on relative ordering between processors.
• Synchronous read/write– Writes must be propagated before moving on to the
next operation
![Page 10: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/10.jpg)
10
Software consistency
• Problems– Message passing overhead– False sharing
w(x)
r(y) r(y) r(x)
w(x) w(x)
![Page 11: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/11.jpg)
11
Weak Consistency
• Data modifications only propagated at synchronization.• Works fine if program properly synchronized through
system primitives.
w(x)
r(y) r(y) r(x)
synch
w(x) w(x)
![Page 12: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/12.jpg)
12
Weak Consistency
w(x) w(x)
r(y) r(y) r(x)
synch
![Page 13: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/13.jpg)
13
Software Release Consistency
• Special weak consistency protocol
• Reduction of message passing overhead
• Two categories of shared variable operations– Ordinary access
• Read• Write
– Synchronization access (lock, semaphore, barrier)• Acquire• Release
![Page 14: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/14.jpg)
14
Software Release Consistency
• Before ordinary access (read, write) allowed, all previous acquire performed
• Before release allowed, all previous ordinary access performed
• Before acquire allowed, all previous release performed
• Before release allowed, all previous acquire performed
• In a word, results of writes prior to a release propagated before next processor acquiring this released lock
![Page 15: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/15.jpg)
15
Eager Release Consistency
• Write propagating at release
![Page 16: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/16.jpg)
16
Lazy Release Consistency
• Write propagating at acquire
![Page 17: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/17.jpg)
17
Multiple Consistency Protocols
• No single consistency protocol suitable for all parallelization purpose
• Shared variables accessed in different ways within single program
• Variable access pattern changes during execution
• Multiple protocols allow access pattern-oriented tuning for different shared variables
![Page 18: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/18.jpg)
18
Multiple Consistency Protocols
• High-level sharing pattern annotation– Specified in shared variable declaration– Combinations of low-level protocol parameters
• Low-level protocol parameter– Specified in shared variable directory– Specific aspect of protocol
![Page 19: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/19.jpg)
19
Protocol Parameters
• I: invalidate or update?
• R: Replicas allowed?
• D: Delayed operation allowed?
• FO: Having fixed owner?
• M: Multiple writers allowed?
• S: Stable access pattern?
• FL: Flushing changes to owner?
• W: Writable? (write protected?)
![Page 20: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/20.jpg)
20
Sharing annotations
• Read only– Simplest pattern: once initialized, no further access– Suitable for constant etc.
• Migratory– Only one thread can access at one period of time– Suitable for variables accessed only in critical session
• Write-shared– Can be written concurrently by multiple threads– Different threads update different words of variable
• Producer-consumer– Written only by one threads and read by others– Replicate and update the object, not invalidate
![Page 21: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/21.jpg)
21
Sharing annotations
• Example: producer-consumer
for some number of timesteps/iterations {for (i=0; i<n; i++ )
for( j=1, j<n, j++ )temp[i][j] = 0.25 *
( grid[i-1][j] + grid[i+1][j]grid[i][j-1] + grid[i][j+1] );
for( i=0; i<n; i++ )for( j=1; j<n; j++ )
grid[i][j] = temp[i][j];}
back
![Page 22: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/22.jpg)
22
Sharing annotations
• Reduction– Accessed by fetching and operation (read, write then
release)– Example: min(), a++
• Result– Phase 1: multiple write allowed– Phase 2: one thread (the result) access exclusively
• Conventional– Conventional update protocol for shared variables
![Page 23: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/23.jpg)
23
Sharing annotations
Sharing Annotations
Protocol Parameters
I R D FO M S FL W
Read-only N Y - - - - - N
Migratory Y N - N N - N Y
Write-shared N Y Y N Y N N Y
Producer-Consumer
N Y Y N Y Y N Y
Reduction N Y N Y N - N Y
Result N Y Y Y Y - Y Y
Conventional Y Y N N N - N Y
![Page 24: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/24.jpg)
24
Software Implementation
• Basic concepts– Shared object– Software release consistency– Multiple consistency protocols
• Software implementation– Prototype overview– Execution process– Advanced programming features– Data object directory and delayed update queue– Synchronization
• Performance• Overview of other DSM systems• Conclusion
![Page 25: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/25.jpg)
25
Prototype Overview
• A simple processor converting annotations to suitable format
• A linker creating the shared memory segment
• Library routines linked into program
• Operating system support for fault handling and page table manipulation
![Page 26: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/26.jpg)
26
Execution Process
• Compiling
Sharing annotations
Munin processor
Auxiliary file
Linker
Shared data segment
Shared data description table
![Page 27: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/27.jpg)
27
Execution Process
• Initialization
P1
P2
Pn
.
.
Munin root thread
Munin worker thread
Munin worker thread
User_init()
Code copy
Data segment
Code copy
Data segment
user root thread
![Page 28: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/28.jpg)
28
Execution Process
• Synchronization
P1
P2
Pn
.
.
Munin root thread
Munin worker thread
Synchronization operation User thread
![Page 29: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/29.jpg)
29
Advanced Programming Features
• Associate data & Synch back
msg
acq(m) r(x) r(x)
rel(m)
msg
acq(m) r(x)
rel(m)
w(x)
![Page 30: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/30.jpg)
30
Advanced Programming Features
• PhaseChange()– Change the producer consumer relationship– Example: adaptive mesh sor
• ChangeAnnotation()– Change the access pattern in execution
• Invalidate()
• Flush()
• SingleObject()
• PreAcquire()
![Page 31: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/31.jpg)
31
Data Object Directory
• Start Address and Size• Protocol parameters• Object state (valid, writable, invalid)• Copyset (which remote has copies)• Synchq (corresponding synchronization object)• Probable owner• Home node• Access control semaphore• Links
![Page 32: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/32.jpg)
32
Delayed Update Queue
acq(m)w(x) w(y)
rel(m)
x xy
![Page 33: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/33.jpg)
33
Multiple Writer Handling
![Page 34: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/34.jpg)
34
Multiple Writer Handling
![Page 35: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/35.jpg)
35
Synchronization
• Queue based synchronization
• Request – reply – lock forward mechanism
• AcquireLock(), Unlock(), WaitAtBarrier()
![Page 36: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/36.jpg)
36
Performance
• Basic concepts– Shared object– Software release consistency– Multiple consistency protocols
• Software implementation– Prototype overview– Execution process– Advanced programming features– Data object directory and delayed update queue– Synchronization
• Performance• Overview of other DSM systems• Conclusion
![Page 37: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/37.jpg)
37
Matrix Multiply
0
50
100
150
200
250
300
350
400
2 Procs 4 Procs 8 Procs 16Procs
DM
Munin
0
2
4
6
8
10
2Procs
4Procs
8Procs
16Procs
Diff %
![Page 38: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/38.jpg)
38
Matrix Multiply Optimized
0
50
100
150
200
250
300
350
400
2 Procs 4 Procs 8 Procs 16Procs
DM
Munin
0
0.5
1
1.5
2Procs
4Procs
8Procs
16Procs
Diff %
![Page 39: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/39.jpg)
39
SOR
0
10
20
30
40
50
60
70
2 Procs 4 Procs 8 Procs 16Procs
DM
Munin
0
2
4
6
8
10
2Procs
4Procs
8Procs
16Procs
Diff %
![Page 40: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/40.jpg)
40
Effect of Multiple Protocols
Protocol Matrix Multiply SOR
Multiple 72.41 27.64
Write-shared 75.59 64.48
Conventional 75.85 67.64
![Page 41: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/41.jpg)
41
Overview of Other DSM System
• Basic concepts– Shared object– Software release consistency– Multiple consistency protocols
• Software implementation– Prototype overview– Execution process– Advanced programming features– Data object directory and delayed update queue– Synchronization
• Performance• Overview of other DSM systems• Conclusion
![Page 42: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/42.jpg)
42
Overview of Other DSM System
• Clouds: per-segment (object) based consistency protocol
• Mirage: per-page based• Orca: reliable ordered broadcast protocol• Amber: user responsible for the data distribution
among processors• Linda: shared variable in tuple space, atomic
operation: insertion, removal, reading• Midway: using entry consistency (weaker
consistency than release consistency)• DASH: hardware DSM
![Page 43: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/43.jpg)
43
Conclusion
• Objective: efficient DSM system with similar protocol to shared memory programming and small message passing overhead
• Special feature: multiple protocols, software release consistency
• Implementation: synchronization realized by Munin root thread and Munin worker threads
![Page 44: Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bf8a1a28abf838c8ac8e/html5/thumbnails/44.jpg)
44
Thank you