multi processor
TRANSCRIPT
-
7/23/2019 Multi Processor
1/73
Multiprocessor ArchitectureBasics
Companion slides forThe Art of Multiprocessor
Programmingby Maurice Herlihy & Nir Shavit
-
7/23/2019 Multi Processor
2/73
Art of MultiprocessorProgramming
2
Multiprocessor Architecture
Abstract models are (mostly) OK tounderstand algorithm correctness
To understand how concurrentalgorithms perform
You need to understand something
about multiprocessor architectures
-
7/23/2019 Multi Processor
3/73
Art of MultiprocessorProgramming
3
Pieces
Processors
Threads
Interconnect
Memory
Caches
-
7/23/2019 Multi Processor
4/73
Art of MultiprocessorProgramming
4
Design an Urban Messenger
Service in 1980 Downtown Manhattan
Should you use Cars (1980 Buicks, 15 MPG)?
Bicycles (hire recent graduates)?
Better use bicycles
-
7/23/2019 Multi Processor
5/73
Art of MultiprocessorProgramming
5
Technology Changes
Since 1980, car technology haschanged enormously
Better mileage (hybrid cars, 35 MPG) More reliable
Should you rethink your Manhattan
messenger service?
-
7/23/2019 Multi Processor
6/73
Art of MultiprocessorProgramming
6
Processors
Cycle: Fetch and execute one instruction
Cycle times change 1980: 10 million cycles/sec
2005: 3,000 million cycles/sec
-
7/23/2019 Multi Processor
7/73
Art of MultiprocessorProgramming
7
Computer Architecture
Measure time in cycles Absolute cycle times change
Memory access: ~100s of cycles Changes slowly
Mostly gets worse
-
7/23/2019 Multi Processor
8/73
Art of MultiprocessorProgramming
8
Threads
Execution of a sequential program
Software, not hardware
A processor can run a thread
Put it aside Thread does I/O
Thread runs out of time
Run another thread
-
7/23/2019 Multi Processor
9/73
Art of MultiprocessorProgramming
9
Interconnect
Bus Like a tiny Ethernet
Broadcast medium Connects
Processors to memory
Processors to processors
Network Tiny LAN
Mostly used on large machines
-
7/23/2019 Multi Processor
10/73
Art of MultiprocessorProgramming
10
Interconnect
Interconnect is a finite resource
Processors can be delayed if othersare consuming too much
Avoid algorithms that use too muchbandwidth
-
7/23/2019 Multi Processor
11/73
Art of MultiprocessorProgramming11
Analogy
You work in an office When you leave for lunch, someone
else takes over your office. If you dont take a break, a security
guard shows up and escorts you to
the cafeteria. When you return, you may get adifferent office
-
7/23/2019 Multi Processor
12/73
Art of MultiprocessorProgramming12
Processor and Memory are Far
Apart
processor
memory
interconnect
-
7/23/2019 Multi Processor
13/73
Art of MultiprocessorProgramming13
Reading from Memory
address
-
7/23/2019 Multi Processor
14/73
Art of MultiprocessorProgramming14
Reading from Memory
zzz
-
7/23/2019 Multi Processor
15/73
Art of MultiprocessorProgramming15
Reading from Memory
value
-
7/23/2019 Multi Processor
16/73
Art of MultiprocessorProgramming16
Writing to Memory
address, value
-
7/23/2019 Multi Processor
17/73
Art of MultiprocessorProgramming 17
Writing to Memory
zzz
-
7/23/2019 Multi Processor
18/73
Art of MultiprocessorProgramming 18
Writing to Memory
ack
-
7/23/2019 Multi Processor
19/73
Art of MultiprocessorProgramming 19
Remote Spinning
Thread waits for a bit in memory tochange
Maybe it tried to dequeue from anempty buffer
Spins Repeatedly rereads flag bit
Huge waste of interconnectbandwidth
-
7/23/2019 Multi Processor
20/73
Art of MultiprocessorProgramming 20
Analogy
In the days before the Internet Alice is writing a paper on aardvarks
Sources are in university library Request book by campus mail Book arrives by return mail
Send it back when not in use She spends a lot of time in the mail
room
-
7/23/2019 Multi Processor
21/73
Art of MultiprocessorProgramming 21
Analogy II
Alice buys A desk
In her office To keep the books she is using now
A bookcase
in the hall To keep the books she will need soon
-
7/23/2019 Multi Processor
22/73
Art of MultiprocessorProgramming 22
Cache: Reading from Memory
address
cache
-
7/23/2019 Multi Processor
23/73
Art of MultiprocessorProgramming 23
Cache: Reading from Memory
cache
-
7/23/2019 Multi Processor
24/73
Art of MultiprocessorProgramming 24
Cache: Reading from Memory
cache
-
7/23/2019 Multi Processor
25/73
Art of MultiprocessorProgramming25
Cache Hit
cache
?
-
7/23/2019 Multi Processor
26/73
Art of MultiprocessorProgramming26
Cache Hit
cache
Yes!
-
7/23/2019 Multi Processor
27/73
Art of MultiprocessorProgramming27
Cache Miss
address
cache
?No
-
7/23/2019 Multi Processor
28/73
-
7/23/2019 Multi Processor
29/73
-
7/23/2019 Multi Processor
30/73
Art of MultiprocessorProgramming30
Local Spinning
With caches, spinning becomes practical
First time
Load flag bit into cache As long as it doesnt change
Hit in cache (no interconnect used)
When it changes One-time cost See cache coherence below
-
7/23/2019 Multi Processor
31/73
Art of MultiprocessorProgramming31
Granularity
Caches operate at a largergranularity than a word
Cache line: fixed-size block containingthe address
-
7/23/2019 Multi Processor
32/73
-
7/23/2019 Multi Processor
33/73
Art of MultiprocessorProgramming33
Hit Ratio
Proportion of requests that hit in thecache
Measure of effectiveness of cachingmechanism
Depends on locality of application
-
7/23/2019 Multi Processor
34/73
-
7/23/2019 Multi Processor
35/73
Art of MultiprocessorProgramming35
L1 and L2 Caches
L1
L2
Small & fast1 or 2 cycles~16 byte line
-
7/23/2019 Multi Processor
36/73
Art of MultiprocessorProgramming36
L1 and L2 Caches
L1
L2
Larger and slower10s of cycles~1K line size
-
7/23/2019 Multi Processor
37/73
Art of MultiprocessorProgramming37
When a Cache Becomes Full
Need to make room for new entry
By evicting an existing entry
Need a replacement policy Usually some kind of least recently used
heuristic
-
7/23/2019 Multi Processor
38/73
Art of MultiprocessorProgramming38
Fully Associative Cache
Any line can be anywhere in the cache Advantage: can replace any line
Disadvantage: hard to find lines
-
7/23/2019 Multi Processor
39/73
-
7/23/2019 Multi Processor
40/73
Art of MultiprocessorProgramming40
K-way Set Associative Cache
Each slot holds k lines Advantage: pretty easy to find a line
Advantage: some choice in replacing line
-
7/23/2019 Multi Processor
41/73
-
7/23/2019 Multi Processor
42/73
Art of MultiprocessorProgramming42
Contention
Good to avoid memory contention.
Idle processors
Consumes interconnect bandwidth
-
7/23/2019 Multi Processor
43/73
-
7/23/2019 Multi Processor
44/73
-
7/23/2019 Multi Processor
45/73
-
7/23/2019 Multi Processor
46/73
Art of MultiprocessorProgramming46
Cache Coherence
Processor A and B both cacheaddress x
A writes to x Updates cache
How does B find out?
Many cache coherence protocols inliterature
-
7/23/2019 Multi Processor
47/73
-
7/23/2019 Multi Processor
48/73
-
7/23/2019 Multi Processor
49/73
Art of MultiprocessorProgramming
49
MESI
Modified Have modified cached data, must write
back to memory Exclusive
Not modified, I have only copy
Shared Not modified, may be cached elsewhere
-
7/23/2019 Multi Processor
50/73
-
7/23/2019 Multi Processor
51/73
-
7/23/2019 Multi Processor
52/73
-
7/23/2019 Multi Processor
53/73
-
7/23/2019 Multi Processor
54/73
-
7/23/2019 Multi Processor
55/73
-
7/23/2019 Multi Processor
56/73
-
7/23/2019 Multi Processor
57/73
Art of MultiprocessorProgramming
57
Write-Through Caches
Immediately broadcast changes Good
Memory, caches always agree More read hits, maybe
Bad
Bus traffic on all writes Most writes to unshared data For example, loop indexes
(1)
-
7/23/2019 Multi Processor
58/73
-
7/23/2019 Multi Processor
59/73
-
7/23/2019 Multi Processor
60/73
-
7/23/2019 Multi Processor
61/73
Art of MultiprocessorProgramming
61
Multicore Architectures
The university president Alarmed by fall in productivity
Puts Alice, Bob, and Carol in samecorridor Private desks
Shared bookcase Contention costs go way down
-
7/23/2019 Multi Processor
62/73
-
7/23/2019 Multi Processor
63/73
Art of MultiprocessorProgramming
63
cache
BusBus
memory
cachecachecache
Multicore Architecture
-
7/23/2019 Multi Processor
64/73
Art of MultiprocessorProgramming
64
Multicore
Private L1 caches
Shared L2 caches
Communication between same-chipprocessors now very fast
Different-chip processors still not so
fast
-
7/23/2019 Multi Processor
65/73
Art of MultiprocessorProgramming
65
NUMA Architectures
Alice and Bob transfer to NUMAState University
No centralized library Each office basement holds part of
the library
-
7/23/2019 Multi Processor
66/73
-
7/23/2019 Multi Processor
67/73
Art of MultiprocessorProgramming
67
SMP vs NUMA
SMP
memory
NUMA
(1)
SMP: symmetric multiprocessor NUMA: non-uniform memory access CC-NUMA: cache-coherent
-
7/23/2019 Multi Processor
68/73
Art of MultiprocessorProgramming
68
Spinning Again
NUMA without cache OK if local variable
Bad if remote Cc-NUMA
Like SMP
-
7/23/2019 Multi Processor
69/73
Art of MultiprocessorProgramming
69
Relaxed Memory
Remember the flag principle? Alice and Bobs flag variables false
Alice writes true to her flag andreads Bobs
Bob writes true to his flag and reads
Alices One must see the others flag true
-
7/23/2019 Multi Processor
70/73
Art of MultiprocessorProgramming
70
Not Necessarily So
Sometimes the compiler reordersmemory operations
Can improve cache performance
interconnect use
But unexpected concurrentinteractions
-
7/23/2019 Multi Processor
71/73
Art of MultiprocessorProgramming
71
Write Buffers
address
Absorbing
Batching
-
7/23/2019 Multi Processor
72/73
Art of MultiprocessorProgramming
72
Volatile
In Java, if a variable is declaredvolatile, operations wont be
reordered Expensive, so use it only when needed
This work is licensed under a Creative Commons Attribution-
http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/ -
7/23/2019 Multi Processor
73/73
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.
You are free:
to Share to copy, distribute and transmit the work to Remix to adapt the work
Under the following conditions: Attribution. You must attribute the work to The Art of
Multiprocessor Programming (but not in any way thatsuggests that the authors endorse you or your use of the
work). Share Alike. If you alter, transform, or build upon this work,you may distribute the resulting work only under the same,similar or a compatible license.
For any reuse or distribution, you must make clear to others thelicense terms of this work. The best way to do this is with a linkto
http://creativecommons.org/licenses/by-sa/3.0/. Any of the above conditions can be waived if you get permission
from the copyright holder. Nothing in this license impairs or restricts the author's moral
rights.
http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/http://creativecommons.org/licenses/by-sa/2.5/