caches hakim weatherspoon cs 3410, spring 2013 computer science cornell university see p&h 5.1,...
TRANSCRIPT
![Page 1: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/1.jpg)
Caches
Hakim WeatherspoonCS 3410, Spring 2013
Computer ScienceCornell University
See P&H 5.1, 5.2 (except writes)
![Page 2: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/2.jpg)
Big Picture: Memory
Write-BackMemory
InstructionFetch Execute
InstructionDecode
extend
registerfile
control
alu
memory
din dout
addrPC
memory
newpc
inst
IF/ID ID/EX EX/MEM MEM/WB
imm
BA
ctrl
ctrl
ctrl
BD D
M
computejump/branch
targets
+4
forwardunit
detecthazard Stack, Data, Code
Stored in Memory
$0 (zero)$1 ($at)
$29 ($sp)$31 ($ra)
Code Stored in Memory(also, data and stack)
![Page 3: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/3.jpg)
Write-BackMemory
InstructionFetch Execute
InstructionDecode
extend
registerfile
control
Big Picture: Memory
alu
memory
din dout
addrPC
memory
newpc
inst
IF/ID ID/EX EX/MEM MEM/WB
imm
BA
ctrl
ctrl
ctrl
BD D
M
computejump/branch
targets
+4
forwardunit
detecthazard
Memory: big & slow vs Caches: small & fast
$0 (zero)$1 ($at)
$29 ($sp)$31 ($ra)
Code Stored in Memory(also, data and stack)
Stack, Data, Code Stored in Memory
![Page 4: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/4.jpg)
Write-BackMemory
InstructionFetch Execute
InstructionDecode
extend
registerfile
control
Big Picture: Memory
alu
memory
din dout
addrPC
memory
newpc
inst
IF/ID ID/EX EX/MEM MEM/WB
imm
BA
ctrl
ctrl
ctrl
BD D
M
computejump/branch
targets
+4
forwardunit
detecthazard
Memory: big & slow vs Caches: small & fast
$0 (zero)$1 ($at)
$29 ($sp)$31 ($ra)
Code Stored in Memory(also, data and stack)
Stack, Data, Code Stored in Memory
$$$$
![Page 5: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/5.jpg)
Goals for Today: caches
Caches vs memory vs tertiary storage• Tradeoffs: big & slow vs small & fast• working set: 90/10 rule• How to predict future: temporal & spacial locality
Examples of caches:• Direct Mapped• Fully Associative• N-way set associative
Caching Questions• How does a cache work?• How effective is the cache (hit rate/miss rate)?• How large is the cache?• How fast is the cache (AMAT=average memory access time)
![Page 6: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/6.jpg)
PerformanceCPU clock rates ~0.2ns – 2ns (5GHz-500MHz)Technology Capacity $/GB LatencyTape 1 TB $.17 100s of secondsDisk 2 TB $.03 Millions of cycles (ms)SSD (Flash) 128 GB $2 Thousands of cycles (us)DRAM 8 GB $10 (10s of ns)SRAM off-chip 8 MB $4000 5-15 cycles (few ns)SRAM on-chip 256 KB ??? 1-3 cycles (ns)
Others: eDRAM aka 1T SRAM , FeRAM, CD, DVD, …Q: Can we create illusion of cheap + large + fast?
50-300 cycles
![Page 7: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/7.jpg)
Memory Pyramid
Disk (Many GB – few TB)
Memory (128MB – few GB)
L2 Cache (½-32MB)
RegFile100s bytes
Memory Pyramid< 1 cycle access
1-3 cycle access
5-15 cycle access
50-300 cycle access
L3 becoming more common(eDRAM ?)
These are rough numbers: mileage may vary for latest/greatestCaches usually made of SRAM (or eDRAM)
L1 Cache(several KB)
1000000+ cycle access
![Page 8: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/8.jpg)
Memory HierarchyMemory closer to processor • small & fast• stores active data
Memory farther from processor • big & slow• stores inactive data
![Page 9: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/9.jpg)
Memory Hierarchy
Insight for Caches
If Mem[x] is was accessed recently...… then Mem[x] is likely to be accessed soon• Exploit temporal locality:
– Put recently accessed Mem[x] higher in memory hierarchysince it will likely be accessed again soon
… then Mem[x ± ε] is likely to be accessed soon• Exploit spatial locality:
– Put entire block containing Mem[x] and surrounding addresses higher in memory hierarchy since nearby address will likely
be accessed
![Page 10: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/10.jpg)
Memory HierarchyMemory closer to processor is fast but small• usually stores subset of memory farther away
– “strictly inclusive”
• Transfer whole blocks(cache lines):4kb: disk ↔ ram256b: ram ↔ L264b: L2 ↔ L1
![Page 11: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/11.jpg)
Memory HierarchyMemory trace0x7c9a2b180x7c9a2b190x7c9a2b1a0x7c9a2b1b0x7c9a2b1c0x7c9a2b1d0x7c9a2b1e0x7c9a2b1f0x7c9a2b200x7c9a2b210x7c9a2b220x7c9a2b230x7c9a2b280x7c9a2b2c0x0040030c0x004003100x7c9a2b040x004003140x7c9a2b000x004003180x0040031c...
int n = 4;int k[] = { 3, 14, 0, 10 };
int fib(int i) {if (i <= 2) return i;else return fib(i-1)+fib(i-2);
}
int main(int ac, char **av) {for (int i = 0; i < n; i++)
{printi(fib(k[i]));prints("\n");
}}
![Page 12: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/12.jpg)
Cache Lookups (Read)Processor tries to access Mem[x]Check: is block containing Mem[x] in the cache?• Yes: cache hit
– return requested data from cache line
• No: cache miss– read block from memory (or lower level cache)– (evict an existing cache line to make room)– place new block in cache– return requested data and stall the pipeline while all of this happens
![Page 13: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/13.jpg)
Three common designsA given data block can be placed…• … in exactly one cache line Direct Mapped• … in any cache line Fully Associative• … in a small set of cache lines Set Associative
![Page 14: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/14.jpg)
Direct Mapped CacheDirect Mapped Cache• Each block number
mapped to a singlecache line index
• Simplest hardware
line 0line 1
0x0000000x0000040x0000080x00000c0x0000100x0000140x0000180x00001c0x0000200x0000240x0000280x00002c0x0000300x0000340x0000380x00003c0x0000400x0000440x000048
Memory
Cache
2 cachelines4-words per cacheline
![Page 15: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/15.jpg)
Direct Mapped CacheDirect Mapped Cache• Each block number
mapped to a singlecache line index
• Simplest hardware
0x0000000x0000040x0000080x00000c0x0000100x0000140x0000180x00001c0x0000200x0000240x0000280x00002c0x0000300x0000340x0000380x00003c0x0000400x0000440x000048
addr
Memory
line 0line 1
0x0000040x000000 0x000008 0x00000cCache
2 cachelines4-words per cacheline
tag index offset32-addr4-bits1-bits27-bits
![Page 16: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/16.jpg)
Direct Mapped CacheDirect Mapped Cache• Each block number
mapped to a singlecache line index
• Simplest hardware
line 0line 1
0x0000000x0000040x0000080x00000c0x0000100x0000140x0000180x00001c0x0000200x0000240x0000280x00002c0x0000300x0000340x0000380x00003c0x0000400x0000440x000048
0x0000040x000000 0x000008 0x00000c
Memory
Cache
2 cachelines4-words per cacheline
line 0
line 1
tag index offset32-addr4-bits1-bits27-bits
![Page 17: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/17.jpg)
Direct Mapped CacheDirect Mapped Cache• Each block number
mapped to a singlecache line index
• Simplest hardware
0x0000000x0000040x0000080x00000c0x0000100x0000140x0000180x00001c0x0000200x0000240x0000280x00002c0x0000300x0000340x0000380x00003c0x0000400x0000440x000048
tag
Memory
index offset32-addr
Cache
4 cachelines2-words per cacheline
line 0
line 1
line 0line 1line 2line 3
0x0000040x000000
3-bits2-bits27-bits
line 2
line 3
line 0
line 1
line 2
line 3
![Page 18: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/18.jpg)
Direct Mapped Cache
![Page 19: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/19.jpg)
Direct Mapped Cache (Reading)
V Tag
Tag Index Offset
=
hit? dataword select
32bits
0…001000offset
indextag
wordselector
Byte offsetin word
Block
![Page 20: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/20.jpg)
Direct Mapped Cache (Reading)
V Tag
Tag Index Offset
n bit index, m bit offsetQ: How big is cache (data only)?
Block
![Page 21: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/21.jpg)
Direct Mapped Cache (Reading)
V Tag Block
Tag Index Offset
n bit index, m bit offsetQ: How much SRAM is needed (data + overhead)?
![Page 22: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/22.jpg)
Example:A Simple Direct Mapped Cache
110
130
150160
180
200
220
240
0123456789
101112131415
LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]
CacheProcessor
tag data
$0$1$2$3
Memory
100
120
140
170
190
210
230
250
4 cache lines2 word block
0
0
0
0
V
Using byte addresses in this example! Addr Bus = 5 bits
![Page 23: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/23.jpg)
Example:A Simple Direct Mapped Cache
110
130
150160
180
200
220
240
0123456789
101112131415
LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]
CacheProcessor
tag data
$0$1$2$3
Memory
100
120
140
170
190
210
230
250
4 cache lines2 word block2 bit tag field2 bit index field1 bit block offset
0
0
0
0
V
Using byte addresses in this example! Addr Bus = 5 bits
![Page 24: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/24.jpg)
1st Access
110
130
150160
180
200
220
240
0123456789
101112131415
CacheProcessor
00
tag data
$0$1$2$3
Memory
100
120
140
170
190
210
230
250
100110
110Misses: 1
Hits: 0
Addr: 00001
block offset
1
0
LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]
0
0
VM
![Page 25: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/25.jpg)
Misses
Three types of misses• Cold (aka Compulsory)
– The line is being referenced for the first time
• Capacity– The line was evicted because the cache was not large
enough
• Conflict– The line was evicted because of another access whose
index conflicted
![Page 26: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/26.jpg)
MissesQ: How to avoid…Cold Misses• Unavoidable? The data was never in the cache…• Prefetching!
Capacity Misses• Buy more SRAM
Conflict Misses• Use a more flexible cache design
![Page 27: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/27.jpg)
LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 8 ]LB $2 M[ 4 ]LB $2 M[ 0 ]
8th and 9th Access
110
130
150160
180
200
220
240
0123456789
101112131415
Processor
$0$1$2$3
Memory
100
120
140
170
190
210
230
250
Misses:
Hits:
Cache
tag data
2
100110
1501401
0
0
VM
![Page 28: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/28.jpg)
LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 8 ]LB $2 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 8 ]
10th and 11th Access
110
130
150160
180
200
220
240
0123456789
101112131415
Processor Memory
100
120
140
170
190
210
230
250
Cache
tag data
2
100110
1501401
0
0
VM
Misses:
Hits:
![Page 29: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/29.jpg)
Cache Organization
How to avoid Conflict Misses
Three common designs• Fully associative: Block can be anywhere in the
cache• Direct mapped: Block can only be in one line in the
cache• Set-associative: Block can be in a few (2 to 8)
places in the cache
![Page 30: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/30.jpg)
Fully Associative Cache (Reading)
V Tag Block
word select
hit? data
line select
= = = =
32bits
64bytes
Tag Offset
![Page 31: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/31.jpg)
Fully Associative Cache (Reading)
V Tag Block
Tag Offset
m bit offsetQ: How big is cache (data only)?
, 2n blocks (cache lines)
![Page 32: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/32.jpg)
Fully Associative Cache (Reading)
V Tag Block
Tag Offset
m bit offsetQ: How much SRAM needed (data + overhead)?
, 2n blocks (cache lines)
![Page 33: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/33.jpg)
Example:Simple Fully Associative Cache
110
130
150160
180
200
220
240
0123456789
101112131415
LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]
CacheProcessor
tag data
$0$1$2$3
Memory
100
120
140
170
190
210
230
250
4 cache lines2 word block
4 bit tag field1 bit block offset
V
V
V
V
V
Using byte addresses in this example! Addr Bus = 5 bits
![Page 34: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/34.jpg)
0
1st Access
110
130
150160
180
200
220
240
0123456789
101112131415
LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]
CacheProcessor
0000
tag data
$0$1$2$3
Memory
100
120
140
170
190
210
230
250
100110
110Misses: 1
Hits: 0
LRU
Addr: 00001 block offset
1
0
0
M
![Page 35: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/35.jpg)
LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 8 ]LB $2 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 8 ]
10th and 11th Access
110
130
150160
180
200
220
240
0123456789
101112131415
Processor Memory
100
120
140
170
190
210
230
250
Misses: 4
Hits: 3+2+2
Cache
0000
tag data
0010
100110
150140
1
1
1
01
0110 220230180190
0100
MMHHH
MMHHHH
![Page 36: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/36.jpg)
Eviction
Which cache line should be evicted from the cache to make room for a new line?• Direct-mapped
– no choice, must evict line selected by index• Associative caches
– random: select one of the lines at random– round-robin: similar to random– FIFO: replace oldest line– LRU: replace line that has not been used in the longest
time
![Page 37: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/37.jpg)
Cache TradeoffsDirect Mapped+ Smaller+ Less+ Less+ Faster+ Less+ Very– Lots– Low– Common
Fully AssociativeLarger –More –More –
Slower –More –
Not Very –Zero +High +
?
Tag SizeSRAM OverheadController Logic
SpeedPrice
Scalability# of conflict misses
Hit ratePathological Cases?
![Page 38: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/38.jpg)
Compromise
Set-associative cache
Like a direct-mapped cache• Index into a location• Fast
Like a fully-associative cache• Can store multiple entries
– decreases thrashing in cache• Search in each element
![Page 39: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/39.jpg)
3-Way Set Associative Cache (Reading)
word select
hit? data
line select
= = =
32bits
64bytes
Tag Index Offset
![Page 40: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/40.jpg)
LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 5 ]LB $2 M[ 12 ]LB $2 M[ 5 ]LB $2 M[ 12 ]LB $2 M[ 5 ]
Comparison: Direct Mapped
110
130
150160
180
200
220
240
0123456789
101112131415
Processor Memory
100
120
140
170
190
210
230
250
Misses:
Hits:
Cache
tag data
2
100110
1501401
0
0
4 cache lines2 word block
2 bit tag field2 bit index field1 bit block offset field
Using byte addresses in this example! Addr Bus = 5 bits
![Page 41: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/41.jpg)
LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 5 ]LB $2 M[ 12 ]LB $2 M[ 5 ]LB $2 M[ 12 ]LB $2 M[ 5 ]
Comparison: Fully Associative
110
130
150160
180
200
220
240
0123456789
101112131415
Processor Memory
100
120
140
170
190
210
230
250
Misses:
Hits:
Cache
tag data
0
4 cache lines2 word block
4 bit tag field1 bit block offset field
Using byte addresses in this example! Addr Bus = 5 bits
![Page 42: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/42.jpg)
Comparison: 2 Way Set Assoc
110
130
150160
180
200
220
240
0123456789
101112131415
Processor Memory
100
120
140
170
190
210
230
250
Misses:
Hits:
Cache
tag data
0
0
0
0
2 sets2 word block3 bit tag field1 bit set index field1 bit block offset fieldLB $1 M[ 1 ]
LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 5 ]LB $2 M[ 12 ]LB $2 M[ 5 ]LB $2 M[ 12 ]LB $2 M[ 5 ]
Using byte addresses in this example! Addr Bus = 5 bits
![Page 43: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/43.jpg)
Remaining Issues
To Do:• Evicting cache lines• Picking cache parameters• Writing using the cache
![Page 44: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/44.jpg)
Summary
Caching assumptions• small working set: 90/10 rule• can predict future: spatial & temporal locality
Benefits• big & fast memory built from (big & slow) + (small & fast)
Tradeoffs: associativity, line size, hit cost, miss penalty, hit rate• Fully Associative higher hit cost, higher hit rate• Larger block size lower hit cost, higher miss penalty
Next up: other designs; writing to caches
![Page 45: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/45.jpg)
Administrivia
Upcoming agenda• HW3 was due yesterday Wednesday, March 13th
• PA2 Work-in-Progress circuit due before spring break
• Spring break: Saturday, March 16th to Sunday, March 24th
• Prelim2 Thursday, March 28th, right after spring break
• PA2 due Thursday, April 4th
![Page 46: Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)](https://reader034.vdocuments.mx/reader034/viewer/2022051401/56649ef65503460f94c092f0/html5/thumbnails/46.jpg)
Have a great Spring Break!!!