chapter 7 · 2005-12-12 · computer architecture cs 35101-002 4 memory hierarchy design philosophy...
TRANSCRIPT
Chapter 7: Large and Fast: Exploiting Memory Hierarchy
Computer Architecture CS 35101-002 2
Basic Memory Requirements
Users/Programmers Demand: Large computer memory Very Fast access memory
Technology Limitations Large Computer memory relatively slower
access Small Computer memory Relatively Faster
access
So how do you build a large computer memory with faster access?
Computer Architecture CS 35101-002 3
Computer MemoryUse-Case Scenarios
If a memory item is referenced It will most likely be referenced again soon
(Temporal Locality) Its neighbors will tend to be referenced soon
(Spatial Locality)
Basic Philosophy Employ basic requirements, technology limitations
and stated use-case scenarios to architect the memory
Computer Architecture CS 35101-002 4
Memory HierarchyDesign Philosophy
Build a hierarchy of memories with fast access close to the CPU Employ Temporal Locality and Spatial Locality in the design
CPU
Level 1
Level 2
Level n
Increasing distance
from the CPU in
access timeLevels in the
memory hierarchy
Size of the memory at each level
Increasing speed
Computer Architecture CS 35101-002 5
Memory SpeedTechnology trend
$0.50 - $25,000,000 – 20,000,000 ns
Magnetic Disk
$100 - $20050 – 70 nsDRAM
$4,000 - $10,0000.5 – 5 nsSRAM
$ per GB in 2004Access TimeTechnology
Computer Architecture CS 35101-002 6
CPU
SRAM ---- Small Memory and Fastest
DRAM
Magnetic Disk ---- Biggest and Slowest
Fastest Memory is closest to the CPU
Three-Level Computer Memory Hierarchy
Cache
Main
Virtual(stores data)
Computer Architecture CS 35101-002 7
Memory HierarchyUpper & Lower Levels
CPU
q
q
blockData transfer
if requested data appears in cache block hit not in cache miss
Data requestMemory Performance
hit rate = hits/memory accessmiss rate = 1-hit ratehit time = time to access cache miss penalty = time to access
lower level + time to transfer block in upper level
+ access time of upper memory
Computer Architecture CS 35101-002 8
Basics of Caches
Simple Cache CPU request Xn
But Xn is not in Cache
Xn is copied in Cache from
memory Xn is returned to CPU
X3
X2
Xn-1
Xn-2
X1
X4
CPU
X3
Xn
X2
Xn-1
Xn-2
X1
X4
Requests Xn
How do we know if a data item is in cache, and how do we find it?
Trip to Memory
MemoryBefore Request After Request
Computer Architecture CS 35101-002 9
Cache StructureDirect Mapped
Each memory location is mapped directly to a unique location in cache
Example Mapping Scheme: Memory address (Block address) mudulo (Number of cache blocks in cache)
00001 00101 01001 01101 10001 10101 11001 11101
000
Cache
Memory
001
01
001
11
001
011
101
11
Words: 0 - 31
8 cache entries
(8 = 23)
Total entries in Direct mapped Cache must be a Power of 2
1-word block of cache
Computer Architecture CS 35101-002 10
Basics of Cache Cache mapping: many words -to-one cache location
How do we know whether the data in the cache corresponds to a requested word? Add a set of Tags to the cache
Contain the upper bits of the word address not used in the indexing
How do we know if a cache block contains valid information? Cache contents are invalid (empty) during CPU initialization Solution add Valid Bits to indicate valid address
Computer Architecture CS 35101-002 11
Accessing Direct-Mapped CacheAssume a direct mapped Cache with 1-word blocks
1101026
1001018
1000016
000113
1000016
1011022
1101026
1011022
Assigned cache block
Hits/MissBinary Ref Address
Decimal Ref Address
1.
2.
3.
4.
5.
6.
7.
8.
Requests
Computer Architecture CS 35101-002 12
N111
N110
N101
N100
N011
N010
N001
N000
DataTagValidIndex
Accessing a CacheState of Cache at Initialization
Cache is empty
Computer Architecture CS 35101-002 13
N111
Memory contents1011010Y110
N101
N100
N011
N010
N001
N000
DataTagValidIndex
Accessing a CacheRequest #1: Post Memory Reference 10110
CPU encounters a miss, Cache copies contents from memory 10110
Computer Architecture CS 35101-002 14
N111
Memory contents1011010Y110
N101
N100
N011
Memory contents1101011Y010
N001
N000
DataTagValidIndex
Accessing a CacheRequest #2: Post Memory Reference 11010
CPU encounters a miss, Cache copies contents from memory 11010
Computer Architecture CS 35101-002 15
N111
Memory contents1011010Y110
N101
N100
N011
Memory contents1101011Y010
N001
N000
DataTagValidIndex
Accessing a CacheRequest #3: Post Memory Reference 10110
CPU encounters a hit,
Computer Architecture CS 35101-002 16
N111
Memory contents1011010Y110
N101
N100
N011
Memory contents1101011Y010
N001
N000
DataTagValidIndex
Accessing a CacheRequest #4: Post Memory Reference 11010
CPU encounters a hit
Computer Architecture CS 35101-002 17
N111
Memory contents1011010Y110
N101
N100
N011
Memory contents1101011Y010
N001
Memory contents1000010Y000
DataTagValidIndex
Accessing a CacheRequest #5: Post Memory Reference 10000
CPU encounters a miss, Cache copies contents from memory 10000
Computer Architecture CS 35101-002 18
N111
Memory contents1011010Y110
N101
N100
Memory contents0001100Y011
Memory contents1101011Y010
N001
Memory contents1000010Y000
DataTagValidIndex
Accessing a CacheRequest #6: Post Memory Reference 00011
CPU encounters a miss, Cache copies contents from memory 00011
Computer Architecture CS 35101-002 19
N111
Memory contents1011010Y110
N101
N100
Memory contents0001100Y011
Memory contents1101011Y010
N001
Memory contents1000010Y000
DataTagValidIndex
Accessing a CacheRequest #7: Post Memory Reference 10000
CPU encounters a hit
Computer Architecture CS 35101-002 20
N111
Memory contents1011010Y110
N101
N100
Memory contents0001100Y011
Memory contents1101011Y010
N001
Memory contents1000010Y000
DataTagValidIndex
Accessing a CacheRequest #7: Post Memory Reference 00011
CPU encounters a miss, Cache copies contents from memory 00011
Computer Architecture CS 35101-002 21
N111
Memory contents1011010Y110
N101
N100
Memory contents0001100Y011
Memory contents1001010Y010
N001
Memory contents1000010Y000
DataTagValidIndex
Accessing a CacheRequest #8: Post Memory Reference 10010
CPU encounters a miss, Cache copies contents from memory 10010
Computer Architecture CS 35101-002 22
Accessing Direct-Mapped CacheAssume a direct mapped Cache with 1-word blocks
010Hit1101026
010Miss (fig 7.6f)1001018
000Hit1000016
011Miss
(fig,. 7.6e)
000113
000Miss
(fig. 7.6d)
1000016
110Hit1011022
010Miss
(fig. 7.6c)
1101026
110Miss
(fig. 7.6b)
1011022
Assigned cache block
Hits/MissBinary Ref Address
Decimal Ref Address
1.
2.
3.
4.
5.
6.
7.
8.
Requests
Computer Architecture CS 35101-002 23
Cache StructureDirect Mapped
Each memory location is mapped directly to a unique location in cache
Example Mapping Scheme: Memory address (Block address) mudulo (Number of cache blocks in cache)
00001 00101 01001 01101 10001 10101 11001 11101
000
Cache
Memory
001
01
001
11
001
011
101
11
Words: 0 - 31
8 cache entries
(8 = 23)
Total entries in Direct mapped Cache must be a Power of 2
1-word block of cache
Computer Architecture CS 35101-002 24
N111
N110
N101
N100
N011
N010
N001
N000
DataTagValidIndex
Accessing a CacheState of Cache at Initialization
Cache is empty
Computer Architecture CS 35101-002 25
Address (showing bit positions)
Data
Hit
Data
Tag
Valid Tag
3220
Index
012
102310221021
=
Index
20 10
Byteoffset
31 30 13 12 11 2 1 0
MIPS Direct Mapped Cache32-bit Byte Reference Address
•Cache Index: Bits 2 – 11•Bits 0-1 ignored (not significant)
• Tag Field: Bits 12 - 31
CPU
Cache:
Cache size:• 210 blocks• 1 block/word
Computer Architecture CS 35101-002 26
Direct Mapped Cache Four-word blocks & Total size 16 words
3210Index/Block
11
10
01
00
Cache DATA
How do we map the memory address to a Direct Mapped cache with four-word blocks?
Computer Architecture CS 35101-002 27
Four-word blocks & Total size 16 words Consider following reference Memory addresses
1010020
1110028
1001018
1000016
000113
1000016
1011123
1101026
1011022
Binary Ref Address
Decimal Ref Address
1.
2.
3.
4.
5.
6.
7.
8.
9.
Ref.Requests
Show the Direct-Mapped Cache contents at each stage
Computer Architecture CS 35101-002 28
Direct Mapped Cache: Four-word blocks & Total size 16 words
Initial Content of Cache
3210Index/Block
11
10
01
00
Cache Data
N
N
N
N
VTag
Cache is empty
Computer Architecture CS 35101-002 29
Direct Mapped Cache: Four-word blocks & Total size 16 words
Request #1: Post Memory Reference address 22 ( 1 01 10 )
3
11two
2
10two
1
01two
0
00two
Index/Block
11
10
01
00
Cache
Con(23)Con(22)Con(21)Con (20)
Data
Block: 22 mod 4 = 2 =10two
Index: Four word 2bits max 01
Address Tag = 1: diff from cache Tag content Miss
N
N
Y1
N
VTag
Trailing 2 bits of 22
Next 2 upper bits of 22
Transfer data to cache then
set Tag to 1 & V bit to Y
Remaining bits of 22
Computer Architecture CS 35101-002 30
Direct Mapped Cache: Four-word blocks & Total size 16 words
Request #2: Post Memory Reference address 20 (10100)
3
11two
2
10two
1
01two
0
00two
Index/Block
11
10
01
00
Cache
Con(23)Con(22)Con(21)Con (20)
Data
Block: 20 mod 4 = 0 =00two
Index: Four word 2bits max 01
Tag = 1(Remaining bits)
N
N
Y1
N
VTag
HitTag & V-bits already set to 1& Y resp.
Computer Architecture CS 35101-002 31
Direct Mapped Cache: Four-word blocks & Total size 16 words
Request #3: Post Memory Reference address 26 (11010)
3
11two
2
10two
1
01two
0
00two
Index/Block
11
10
01
00
Cache
Con(27)Con(26)Con(25)Con (24)
Con(23)Con(22)Con(21)Con (20)
Data
Block: 26 mod 4 = 2 =10two
Index: Four word 2bits max 10Tag = 1
(Remaining bits)
N
Y1
Y1
N
VTag
Tag not set, V-bit set to N MissTransfer data to cache then
set Tag to 1 & V bit to Y
Computer Architecture CS 35101-002 32
Direct Mapped Cache: Four-word blocks & Total size 16 words
Request #4: Post Memory Reference address 23 (10111)
3210Index/Block
11
10
01
00
Cache
Con(27)Con(26)Con(25)Con (24)
Con(23)Con(22)Con(21)Con (20)
Data
Block: 23 mod 4 = 3 =11two
Index: Four word 2bits max 01
Tag = 1(Remaining bits)
N
Y1
Y1
N
VTag
Tag already set to 1, V-bit set to Y Hit
Computer Architecture CS 35101-002 33
Direct Mapped Cache: Four-word blocks & Total size 16 words
Request #5: Post Memory Reference address 28 (11100)
3210Index/Block
Con(31)Con(30)Con(29)Con (28)11
10
01
00
Cache
Con(27)Con(26)Con(25)Con (24)
Con(23)Con(22)Con(21)Con (20)
Data
Block: 28 mod 4 = 0 =00two
Index: Four word 2bits max 11
Tag = 1(Remaining bits)
Y1
Y1
Y1
N
VTag
MissTag not set, V-bit set to N Transfer data to cache then
set Tag to 1 & V bit to Y
Computer Architecture CS 35101-002 34
Direct Mapped Cache: Four-word blocks & Total size 16 words
Request #6: Post Memory Reference address 16 (10000)
3210Index/Block
Con(31)Con(30)Con(29)Con (28)11
10
01
00
Cache
Con(27)Con(26)Con(25)Con (24)
Con(23)Con(22)Con(21)Con (20)
Con(19)Con(18)Con(17)Con (16)
Data
Block: 16 mod 4 = 0 =00two
Index: Four word 2bits max 00
Tag = 1(Remaining bits)
Y1
Y1
Y1
Y1
VTag
MissTag not set, V-bit set to N Transfer data to cache then
set Tag to 1 & V bit to Y
Computer Architecture CS 35101-002 35
Direct Mapped Cache: Four-word blocks & Total size 16 words
Request #7: Post Memory Reference address 3 (00011)
3210Index/Block
Con(31)Con(30)Con(29)Con (28)11
10
01
00
Cache
Con(27)Con(26)Con(25)Con (24)
Con(23)Con(22)Con(21)Con (20)
Con(3)Con(2)Con(1)Con (0)
Data
Block: 3 mod 4 = 3 =11two
Index: Four word 2bits max 00
Tag = 0(Remaining bits)
Y1
Y1
Y1
Y0
VTag
MissTransfer data to cache then
set Tag to 0 & V bit to Y
But Tag is set to 1
Computer Architecture CS 35101-002 36
Direct Mapped Cache: Four-word blocks & Total size 16 words
Request #8: Post Memory Reference address 16 (10000)
3210Index/Block
Con(31)Con(30)Con(29)Con (28)11
10
01
00
Cache
Con(27)Con(26)Con(25)Con (24)
Con(23)Con(22)Con(21)Con (20)
Con(19)Con(18)Con(17)Con (16)
Data
Block: 16 mod 4 = 0 =00two
Index: Four word 2bits max 00
Tag = 1(Remaining bits)
Y1
Y1
Y1
Y1
VTag
MissBut Tag is set to 0 Transfer data to cache then
set Tag to 1 & V bit to Y
Computer Architecture CS 35101-002 37
Direct Mapped Cache: Four-word blocks & Total size 16 words
Request #9: Post Memory Reference address 18 (10010)
3210Index/Block
Con(31)Con(30)Con(29)Con (28)11
10
01
00
Cache
Con(27)Con(26)Con(25)Con (24)
Con(23)Con(22)Con(21)Con (20)
Con(19)Con(18)Con(17)Con (16)
Data
Block: 18 mod 4 = 2 =10two
Index: Four word 2bits max 00
Tag = 1(Remaining bits)
Y1
Y1
Y1
Y1
VTag
Tag already set to 1, V-bit set to Y Hit
Computer Architecture CS 35101-002 38
Handling Hits and MissesControl Unit
Read hits BAU! this is what we want!
Read misses stall the CPU, control unit fetches block from memory,
deliver to cache, restart
Write hits: can replace data in cache and memory (write-through) write the data only into the cache (write-back the cache
later)
Write misses: read the entire block into the cache, then write the word
Computer Architecture CS 35101-002 39
Final Schedule
Wed. Dec. 14 at 5:45 pm
Extra Class: Dec 8th/Thursday from 5:15pm at room 108!
No class on next coming Monday But Dr. Samba will teach on next Wednesday!
Computer Architecture CS 35101-002 40
Other Cache StructuresReducing Cache Misses
Fully Associative Cache Each Block in Memory can be placed anywhere in
the cache
Set-Associative Cache Each Block in Memory can be placed in a fixed
number of locations within a “set” A 2-Way Set-Associative Cache
Each set has 2 elements: Cache Locations
Computer Architecture CS 35101-002 41
Set Associative Cache Mapping of Memory Lines
Each set can hold E lines Typically between 2 and 8
Given memory line can map to any entry within its given set
Eviction Policy Which line gets kicked out when bring new line in Commonly either “Least Recently Used” (LRU) or pseudo-random
LRU: least-recently accessed (read or written) line gets evicted
Set i:0 1 • • • B–1Tag Valid
•••
0 1 • • • B–1Tag Valid
0 1 • • • B–1Tag Valid
LRU State
Line 0:
Line 1:
Line E–1:
Computer Architecture CS 35101-002 42
Set 0:
Set 1:
Set S–1:
•••
t s b
tag set index offset
Physical Address
Indexing into 2-Way Associative Cache Use middle s bits to
select from among S = 2s sets
0 1 • • • B–1Tag Valid0 1 • • • B–1Tag Valid
0 1 • • • B–1Tag Valid0 1 • • • B–1Tag Valid
0 1 • • • B–1Tag Valid0 1 • • • B–1Tag Valid
Computer Architecture CS 35101-002 43
2-Way Associative Cache Tag Matching Identifying Line
Must have one of the tags match high order bits of address
Must have Valid = 1 for this line
Selected Set:
t s b
tag set index offset
Physical Address
= ?
= 1?
Lower bits of address select byte or word within cache line
0 1 • • • B–1Tag Valid0 1 • • • B–1Tag Valid
Computer Architecture CS 35101-002 44
2-Way Set Associative SimulationM=16 addresses, B=2 bytes/line, S=2 sets, E=2 entries/set
Address trace (reads):0 [0000] 1 [0001] 13 [1101] 8 [1000] 0 [0000]
0 (miss)
13 (miss)
8 (miss)(LRU replacement)
0 (miss)(LRU replacement)
xxt=2 s=1 b=1
x x
1 00 m[1] m[0]
v tag data v tag data
v tag data v tag data1 00 m[1] m[0] 1 11 m[13] m[12]
1 10 m[9] m[8]
v tag data v tag data1 11 m[13] m[12]
1 10 m[9] m[8]
v tag data v tag data1 00 m[1] m[0]
Computer Architecture CS 35101-002 45
Two-Way Set Associative CacheImplementation
Set index selects a set from the cache The two tags in the set are compared in parallel Data is selected based on the tag result
Cache Data
Cache Line 0
Cache Tag
Valid
:: :
Cache Data
Cache Line 0
Cache Tag Valid
: ::
Set Index
Mux 01Sel1 Sel0
Cache Line
CompareAdr Tag
Compare
OR
Hit
Adr Tag
Computer Architecture CS 35101-002 46
5.
4.
3.
2.
1.
Order of requests
001106
010008
000000
010008
000000
Binary Reference Address
Decimal Reference Address
Two-way Set - Associative Cache Show Cache Contents after each requests
Computer Architecture CS 35101-002 47
Two-way Set - Associative Cache State of Cache at initialization
Element 2Element 1Set
1
0
NN
NN
DataTagVDataTagV
We will assume:2. The Set -Associative Cache has 2 Sets (21) with two elements per set3. Cache Block Size 1 Byte = 20
= Max bits to rep Address – Max bits to rep Set – Cache Block size
= Max bits to rep Address - 1 - 0
Number of bits in Tag
Computer Architecture CS 35101-002 48
Two-way Set - Associative Cache Contents Request #1: Post Memory Reference address 0 (00000)
Locate Set: 0 mod 2 = 0 =0two
Tag bit: 5 – 1 Leading 4 bits
Miss Transfer data to cache then
set Tag to 000 & V bit to Y
Element 2Element 1Set
1
0 CON[0]0000Y
DataTagVDataTagV
Does not exist in Set 0
Search:
Computer Architecture CS 35101-002 49
Two-way Set - Associative Cache Contents Request #2: Post Memory Reference address 8 (01000)
Locate Set: 8 mod 2 = 0 =0two
Tag bit: 5– 1 Leading 4 bits
Miss Transfer data to cache then
set Tag to 010 & V bit to Y
Element 2Element 1Set
1
0 CON[8]0100YCON[0]0000Y
DataTagVDataTagV
Does not exist in Set 0
Search:
Computer Architecture CS 35101-002 50
Two-way Set - Associative Cache Contents Request #3: Post Memory Reference address 0 (00000)
Locate Set: 0 mod 2 = 0 =0two
Tag bit: 5 – 1 Leading 4 bits
HitSend Memory address
contents to CPU
Element 2Element 1Set
1
0 CON[8]0100YCON[0]0000Y
DataTagVDataTagV
TAG bits exist in Set 0
Search the TAG bits in set 0
(parallel algorithm) for 000:
Computer Architecture CS 35101-002 51
Two-way Set - Associative Cache Contents Request #4: Post Memory Reference address 6 (00110)
Locate Set: 6 mod 2 = 0 =0two
Tag bit: 5 – 1 Leading 4 bits
Miss
Transfer data to cache then
set Tag to 001 & V bit to Y
Element 2Element 1Set
1
0 CON[6]0011YCON[0]0000Y
DataTagVDataTagV
Does not exist in Set 0
Search:
Cache Replacement Algorithm: Least Recently Used block
Computer Architecture CS 35101-002 52
Two-way Set - Associative Cache Contents Request #5: Post Memory Reference address 8 (01000)
Locate Set: 8 mod 2 = 0 =0two
Tag bit: 5 – 1 Leading 4 bits
MissTransfer data to cache then
set Tag to 010 & V bit to Y
Element 2Element 1Set
1
0 CON[6]0010YCON[8]0100Y
DataTagVDataTagV
Does not exist in Set 0
Search:
Cache Replacement Algorithm: Least Recently Used block
Computer Architecture CS 35101-002 53
Fully Associative Cache Mapping of Memory Lines
Cache consists of single set holding E lines Given memory line can map to any line in set Only practical for small caches
Entire Cache
0 1 • • • B–1Tag Valid
•••
0 1 • • • B–1Tag Valid
0 1 • • • B–1Tag Valid
LRU State
Line 0:
Line 1:
Line E–1:
Computer Architecture CS 35101-002 54
Fully Associative Cache Tag Matching Identifying Line
Must check all of the tags for match
Must have Valid = 1 for this line
t b
tag offset
Physical Address
= ?
= 1?
Lower bits of address select byte or word within cache line
0 1 • • • B–1Tag Valid
•••
0 1 • • • B–1Tag Valid
0 1 • • • B–1Tag Valid
•••
Computer Architecture CS 35101-002 55
Fully Associative Cache SimulationM=16 addresses, B=2 bytes/line, S=1 sets, E=4 entries/setAddress trace (reads):
0 [0000] 1 [0001] 13 [1101] 8 [1000] 0 [0000]xxxt=3 s=0 b=1
x
1 000 m[1] m[0]v tag data
1 110 m[13] m[12]
13 (miss)
(2)
v tag data8 (miss)
(3)
1 000 m[1] m[0]1 110 m[13] m[12]1 100 m[9] m[8]
1 00 m[1] m[0]v tag data
0 (miss)
(1) set ø
Computer Architecture CS 35101-002 56
Set-Associative CacheBlock Replacement Algorithm
Least Recently Used (LRU) The block replaced is the one that has been unused for the
longest time. Track usage of each Element in a set Expensive with increased associativity (m-way set associative; where
m very large)
Most Recently Used (MRU) Least-Frequently Used (LFU ) Most-Frequently Used (MFU) First In First Out (FIFO)
Computer Architecture CS 35101-002 57
Let’s SummarizeM-Way Set-Associative Cache
1-Way Set-Associative Cache (M=1) (√) Direct Mapped
2-Way Set-Associative Cache (M=2) (√) Cache Block Replacement: LRU
4-Way Set-Associative Cache (M=4) (√) Cache Block Replacement: LRU
Fully Associative Cache Block can be played in any location in Cache All entries in cache must be searched in response to cache
request (Expensive)
Computer Architecture CS 35101-002 58
Let’s assume Index field occupies n bits Size = 2n blocks
Block size 2m word 2m+5 bits
For a 32-bit byte address: Number of bits for Tag field = 32 – (n + m +2) bits Size of cache = 2n x (block size + Tag size + Valid size) Since Block size = 2m+5 bits Number of bits: 2n x (block size + Tag size + Valid size) = 2n x (2m+5 + [32 – (n + m +2)] + 1) = 2n x (m x 32 + [32 – (n + m +2)] + 1)
Cache size = 2n x (m x 32 + 31 – n –m)
Direct Mapped CacheCache size
Computer Architecture CS 35101-002 59
Exercise
How many total bits are required for a direct-mapped cache with 16KB of data and 4-word blocks, assuming a 32-bit address? How many words in each block in terms of words? How many blocks? How many bits for the Tag field? The total cache size = ?
Computer Architecture CS 35101-002 60
Fully Associative Cache SimulationM=16 addresses, B=2 bytes/line, S=1 sets, E=4 entries/setAddress trace (reads):
0 [0000] 1 [0001] 13 [1101] 8 [1000] 0 [0000]xxxt=3 s=0 b=1
x
1 000 m[1] m[0]v tag data
1 110 m[13] m[12](2)
v tag data
(3)
1 000 m[1] m[0]1 110 m[13] m[12]1 100 m[9] m[8]
1 00 m[1] m[0]v tag data
(1) set ø