fall 2015, nov 2... elec 5200-001/6200-001 lecture 9 1 elec 5200-001/6200-001 computer architecture...
TRANSCRIPT
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 11
ELEC 5200-001/6200-001ELEC 5200-001/6200-001Computer Architecture and DesignComputer Architecture and Design
Fall 2015Fall 2015 Memory Organization (Chapter 5) Memory Organization (Chapter 5)
Vishwani D. AgrawalVishwani D. AgrawalJames J. Danaher ProfessorJames J. Danaher Professor
Department of Electrical and Computer EngineeringDepartment of Electrical and Computer EngineeringAuburn University, Auburn, AL 36849Auburn University, Auburn, AL 36849
http://www.eng.auburn.edu/[email protected]
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 22
Types of Computer MemoriesTypes of Computer Memories
From the cover of:A. S. Tanenbaum, Structured Computer Organization, Fifth Edition, Upper SaddleRiver, New Jersey: Pearson Prentice Hall, 2006.
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 33
Random Access Memory (RAM)Random Access Memory (RAM)
Memory cell
array
Address decoder
Read/write circuits
Address bits
Data bits
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 44
Six-Transistor SRAM CellSix-Transistor SRAM Cell
Bit line
Word line
Bit line
bit bit
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 55
Dynamic RAM (DRAM) CellDynamic RAM (DRAM) Cell
Word line
Bit line
“Single-transistor DRAM cell”Robert Dennard’s 1967 invevention
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 66
Electronic Memory DevicesElectronic Memory DevicesMemory Memory
technologytechnology
Typical Typical access access
timetime
Clock rate Clock rate GHzGHz
Cost per GB Cost per GB in 2004in 2004
SRAMSRAM 0.5-5 ns0.5-5 ns 0.2-2.0 GHz0.2-2.0 GHz $4k-$10k$4k-$10k
DRAMDRAM 50-70 ns50-70 ns 15-20 MHz15-20 MHz $100-$200$100-$200
Magnetic Magnetic diskdisk 5-20 ms5-20 ms 50-200 Hz50-200 Hz $0.5-$2$0.5-$2
For more on memories:Semiconductor Memories: A Handbook of Design, Manufacture and Application, by Betty Prince, Wiley 1996.Emerging Memories: Technologies and Trends, by Betty Prince,Springer 2002.
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 77
Building a Computer with 1GHz Building a Computer with 1GHz Clock and 40GB MemoryClock and 40GB Memory
Type of Type of memorymemory CostCost Clock rateClock rate
SRAMSRAM $160,000$160,000 0.2-2.0 GHz0.2-2.0 GHz
DRAMDRAM $4,000$4,000 15-20 MHz15-20 MHz
DiskDisk $20$20 50-200 Hz50-200 Hz
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 88
Trying to Buy a Laptop Computer?Trying to Buy a Laptop Computer?(Three Years Ago)(Three Years Ago)
IBM ThinkPad X Series by Lenovo23717GU1.20 GHz Low Voltage Intel® Pentium® M, 1MB L2 SRAM Cache ~$5Microsoft® Windows® XP Professional512 MB DRAM ~$10040 GB Hard Drive ~$402.7 lbs, 12.1" XGA (1024x768)IBM Embedded Security Subsystem 2.0Intel PRO/Wireless Network Connection 802.11b, Gigabit EthernetIntegrated graphics Intel Extreme Graphics 2No CD/DVD drive PROP, Fixed BayAvailability**: Within 2 weeks$2,149.00 IBM web price*$1,741.65 sale price* Last year $1,023.75 X61 with Intel Duo 2GHz Processor
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 99
2006/072006/07Choose a Lenovo 3000 V Series to customize & buy
From: $999.00
Sale price: $949.00
ProcessorIntel Core 2 Duo T5500 (1.66GHz, 2MBL2, 667MHzFSB)
Total memory512MB PC2-5300DDR2 SDRAM
Hard drive80GB, 5400rpm Serial ATA
Weight4.0lbs
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 1010
2007 Executive-Class 2007 Executive-Class
The Reserve Edition features a leather exterior handmade by expert Japanese saddle makers.
The classic, award-winning ThinkPad design remains unchanged - why mess with success?
ThinkPad Reserve Verizon Edition orThinkPad Reserve Cingular EditionFrom $5,000
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 1111
Nicholas Negroponte’s OLPCNicholas Negroponte’s OLPC(One Laptop per Child)(One Laptop per Child)
http://www.flickr.com/photos/olpc/3145038187/
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 1212
X0-1X0-1Manufacturer:Manufacturer: Quanta ComputerQuanta ComputerType: Type: SubnotebookSubnotebookConnectivity: Connectivity: 802.11b/g /s, wireless LAN, 3 USB 2.0 ports, 802.11b/g /s, wireless LAN, 3 USB 2.0 ports,
MMC/SD card slotMMC/SD card slotMedia: Media: 1 GB flash memory1 GB flash memoryOperating system: Operating system: Fedora-based (Linux)Fedora-based (Linux)Input: Input: Keyboard, Touchpad, Microphone, CameraKeyboard, Touchpad, Microphone, CameraCamera: Camera: Built-in video camera (640×480; 30 FPS)Built-in video camera (640×480; 30 FPS)Power: Power: NiMH or LiFePO4 battery removable packNiMH or LiFePO4 battery removable packCPU: CPU: AMD Geode [email protected] WAMD Geode [email protected] WMemory: Memory: 256 MB DRAM256 MB DRAMDisplay: Display: Dual-mode 19.1 cm/7.5" diagonal TFT LCD Dual-mode 19.1 cm/7.5" diagonal TFT LCD
1200×9001200×900Dimensions: Dimensions: 242 mm × 228 mm × 32 mm242 mm × 228 mm × 32 mmWeight: Weight: LiFeP battery: 1.45 kg [3.2 lbs]; NiMH battery: LiFeP battery: 1.45 kg [3.2 lbs]; NiMH battery: 1.58 1.58 kg [3.5 lbs]kg [3.5 lbs]Price: Price: $100+$100+
Today’s LaptopToday’s Laptop
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 1313
Inspiron 15
Dell Price $379.99 Introducing the new Inspiron™ 15, a 15.6" laptop that gives you the everyday features you need, all at a great value. •Up to Intel® Core™2 Duo processors•Entertainment on the go with the HD display •Personalize with a choice of six vibrant colors or choose from over 200+ artist designs with Design Studio
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 1414
CacheCacheProcessor does all memory Processor does all memory operations with cache.operations with cache.MissMiss – If requested word is not – If requested word is not in cache, a in cache, a blockblock of words of words containing the requested word is containing the requested word is brought to cache, and then the brought to cache, and then the processor request is completed.processor request is completed.HitHit – If the requested word is in – If the requested word is in cache, read or write operation is cache, read or write operation is performed directly in cache, performed directly in cache, without accessing main memory.without accessing main memory.BlockBlock – minimum amount of – minimum amount of data transferred between cache data transferred between cache and main memory.and main memory.
Processor
Cache small, fast
memory
Main memory large, inexpensive
(slow)
words
blocks
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 1515
Inventor of CacheInventor of Cache
M. V. Wilkes, “Slave Memories and Dynamic Storage Allocation,”IEEE Transactions on Electronic Computers, vol. EC-14, no. 2,pp. 270-271, April 1965.
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 1616
Cache PerformanceCache Performance
Average access timeAverage access time
= T1 = T1 × h + (Tm + T1) × (1 – × h + (Tm + T1) × (1 – h)h)
= T1 + Tm × (1 – h)= T1 + Tm × (1 – h)
wherewhere– T1 = cache access time (small)T1 = cache access time (small)– Tm = memory access time (large)Tm = memory access time (large)– h = hit rate (0 ≤ h ≤ 1)h = hit rate (0 ≤ h ≤ 1)
Hit rate is also known as hit ratio,Hit rate is also known as hit ratio,
miss rate = 1 – hit ratemiss rate = 1 – hit rate
Processor
CacheSmall, fast memory
Main memory large, inexpensive
(slow)
Access time = T1
Access time = Tm
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 1717
Average Access TimeAverage Access Time
Tm + T1
T1
0 h = 1
1 h = 0
miss rate, 1 – h
Acc
ess
tim
e
Desirable miss rate < 5%
Acceptable miss rate < 10%
T1 + Tm × (1– h)T1 + Tm × (1– h)Slo
pe = Tm
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 1818
Comparing PerformanceComparing PerformanceIdeal processor with 1 cycle memory access, CPI = 1Ideal processor with 1 cycle memory access, CPI = 1Processor without cache:Processor without cache:
Assume main memory access time of 10 cyclesAssume main memory access time of 10 cyclesAssume 30% instructions require memory data accessAssume 30% instructions require memory data access
Processor with cache:Processor with cache:Assume cache access time of 1 cycleAssume cache access time of 1 cycleAssume hit rate 0.95 for instructions, 0.90 for dataAssume hit rate 0.95 for instructions, 0.90 for dataAssume miss penalty (time to read memory into cache and from Assume miss penalty (time to read memory into cache and from cache to processor) is 11 cyclescache to processor) is 11 cycles
– Comparing times of Comparing times of 100100 instructions: instructions:
Time without cache Time without cache 100 100×10 + 30×10×10 + 30×10──────────── ──────────── == ──────────────────────────── ────────────────────────────
Time with cacheTime with cache 100(0.95×1+0.05×11) + 30(0.9×1+0.1×11) 100(0.95×1+0.05×11) + 30(0.9×1+0.1×11)
= 6.19= 6.19
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 1919
Controlling Miss RateControlling Miss RateIncrease cache sizeIncrease cache size
More blocks can be More blocks can be kept in cache; lower kept in cache; lower miss rate.miss rate.Larger cache is Larger cache is slower; expensive.slower; expensive.
Increase block sizeIncrease block sizeMore data available; More data available; may lower miss rate.may lower miss rate.Fewer blocks in Fewer blocks in cache increase miss cache increase miss rate.rate.Larger blocks need Larger blocks need more time to swap.more time to swap.
Large memory
Large memory
Cache
Cache
Blocks
Blocks
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 2020
Cache SizeCache Size
Increasing cache size
hit rate
1/(cycle time)
optimum
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 2121
Block SizeBlock Size
Increasing block size
hit
rate
fragmenteddata
optimum
1.0
0.8
localizeddata
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 2222
Increasing Hit RateIncreasing Hit RateHit rate increases with cache size.Hit rate increases with cache size.Hit rate mildly depends on block size.Hit rate mildly depends on block size.
10%
5%
0%
Cache size = 4KB
16KB
64KB
16B 32B 64B 128B 256BBlock size
mis
s ra
te =
1 –
hit
rat
e
100%
95%
90%
hit
rat
e, h
Decreasing chances of covering large data locality
Decreasing chances of getting fragmented data
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 2323
The Locality PrincipleThe Locality PrincipleA program tends to access data that form a A program tends to access data that form a physical cluster in the memory – multiple physical cluster in the memory – multiple accesses may be made within the same block.accesses may be made within the same block.Physical localities are temporal and may shift Physical localities are temporal and may shift over longer periods of time – data not used for over longer periods of time – data not used for some time is less likely to be used in the future. some time is less likely to be used in the future. Upon miss, the Upon miss, the least recently usedleast recently used (LRU) block (LRU) block can be overwritten by a new block.can be overwritten by a new block.P. J. Denning, “The Locality Principle,” P. J. Denning, “The Locality Principle,” Communications of the ACMCommunications of the ACM, vol. 48, no. 7, pp. , vol. 48, no. 7, pp. 19-24, July 2005.19-24, July 2005.
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 2424
Data Locality, Cache, BlocksData Locality, Cache, Blocks
Increase block size to match locality size
Increase cache size to include most blocks
Dataneeded bya program
Block 1
Block 2
Memory
Cache
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 2525
Types of CachesTypes of Caches
Direct-mapped cacheDirect-mapped cachePartitions of size of cache in the memoryPartitions of size of cache in the memory
Each partition subdivided into blocksEach partition subdivided into blocks
Set-associative cacheSet-associative cache
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 2626
Direct-Mapped CacheDirect-Mapped Cache
Dataneeded bya program
Memory
Cache
Swap-out
Swap-in
LRU
Block 1
Block 2 Data needed
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 2727
Set-Associative CacheSet-Associative Cache
Dataneeded bya program
Memory
Cache Swap
-out
LRU
Block 1
Block 2
Swap-inData
needed
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 2828
Direct-Mapped CacheDirect-Mapped Cache
000001010011100101110111
0000000001000100001100100001010011000111
0100001001010100101101100011010111001111
1000010001100101001110100101011011010111
1100011001110101101111100111011111011111
Main memory
Cache of 8 blocks
11 101 → memory address
cache address: tag
index
32-
wo
rd w
ord
-ad
dre
ssab
le m
emo
ry Block size = 1 word
ind
ex (
loca
l ad
dre
ss)
tag
0010110101001011
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 2929
Direct-Mapped CacheDirect-Mapped Cache
00011011
0000000001000100001100100001010011000111
0100001001010100101101100011010111001111
1000010001100101001110100101011011010111
1100011001110101101111100111011111011111
Main memory
Cache of 4 blocks
11 10 1 → memory address
cache address:tag
index block offset
32-
wo
rd w
ord
-ad
dre
ssab
le m
emo
ry Block size = 2 word
ind
ex (
loca
l ad
dre
ss)
tag
00110010
block offset0 1
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 3030
Main memory
Size=W words
Number of Tag and Index BitsNumber of Tag and Index Bits
Cache Size= w words
Each word in cache has unique index (local addr.)Number of index bits = log2w
Index bits are shared with block offset when a block contains more words than 1
Assume partitions of w words each in the main memory.
W/w such partitions, each identified by a tagNumber of tag bits = log2(W/w)
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 3131
Direct-Mapped Cache (Byte Address)Direct-Mapped Cache (Byte Address)
000001010011100101110111
00000 0000001 0000010 0000011 0000100 0000101 0000110 0000111 00
01000 0001001 0001010 0001011 0001100 0001101 0001110 0001111 00
10000 0010001 0010010 0010011 0010100 0010101 0010110 0010111 00
11000 0011001 0011010 0011011 0011100 0011101 0011110 0011111 00
Main memory
Cache of 8 blocks
11 101 00 → memory address
cache address: tag
index
32-
wo
rd b
yte-
add
ress
able
mem
ory Block size = 1 word
ind
ex
tag
0010110101001011
byte offset
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 3232
Finding a Word in CacheFinding a Word in Cache
Valid 2-bit Index bit Tag Data
000 001 010 011 100 101 110 111
byte offset b6 b5 b4 b3 b2 b1 b0
= Data1 = hit0 = miss
TagIndex
Memory address
Cache size8 words
Block size= 1 word
32 words byte-address
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 3333
How Many Bits Cache Has?How Many Bits Cache Has?Consider a main memory:Consider a main memory:
32 words; byte address is 7 bits wide: 32 words; byte address is 7 bits wide: b6 b5 b4 b3 b2 b1 b0b6 b5 b4 b3 b2 b1 b0Each word is 32 bits wideEach word is 32 bits wide
Assume that cache block size is 1 word (32 bits Assume that cache block size is 1 word (32 bits data) and it contains 8 blocks.data) and it contains 8 blocks.Cache requires, for each word:Cache requires, for each word:
2 bit 2 bit tagtag, and one , and one valid bitvalid bitTotal storage needed in cacheTotal storage needed in cache
= #blocks in cache = #blocks in cache × × (data bits/block + tag bits + (data bits/block + tag bits + valid bit)valid bit)
= 8 (32+2+1) = 280 bits= 8 (32+2+1) = 280 bits
Physical storage/Data storage = 280/256 = 1.094Physical storage/Data storage = 280/256 = 1.094
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 3434
A More Realistic CacheA More Realistic CacheConsider 4 GB, byte-addressable main memory:Consider 4 GB, byte-addressable main memory:
1Gwords; byte address is 32 bits wide: 1Gwords; byte address is 32 bits wide: b31…b16 b15…b2 b1 b0b31…b16 b15…b2 b1 b0Each word is 32 bits wideEach word is 32 bits wide
Assume that cache block size is 1 word (32 bits data) and it Assume that cache block size is 1 word (32 bits data) and it contains contains 64 KB data64 KB data, or 16K words, i.e., 16K blocks., or 16K words, i.e., 16K blocks.Number of cache index bits = 14, because 16K = 2Number of cache index bits = 14, because 16K = 21414
Tag size = 32 – byte offset – #index bits = 32 – 2 – 14 = 16 bitsTag size = 32 – byte offset – #index bits = 32 – 2 – 14 = 16 bits
Cache requires, for each word:Cache requires, for each word:16 bit 16 bit tagtag, and one , and one valid bitvalid bitTotal storage needed in cacheTotal storage needed in cache
= #blocks in cache = #blocks in cache × × (data bits/block + tag size + valid bits)(data bits/block + tag size + valid bits)
= 2= 21414(32+16+1) = 16(32+16+1) = 16×2×21010×49 = 784×2×49 = 784×21010 bits = 784 Kb = bits = 784 Kb = 98 KB98 KB
Physical storage/Data storage = 98/64 = 1.53Physical storage/Data storage = 98/64 = 1.53But, need to increase the block size to match the size of locality.But, need to increase the block size to match the size of locality.
Data Organization in CacheData Organization in Cache
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 3535
Block offset00 01 10 11
000
001
010
011
100
101
110
111
Blo
ck in
dex
0100
0011
0101
1011
1001
1010
1100
1110
0
0
0
0
0
0
0
0
Va
lid b
itTag
Memory address of word in cache 1011 011 10 00
4 bit tag means memory is 16 times larger than cacheAddressmappingoverhead
4 word block
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 3636
Cache Bits for 4-Word BlockCache Bits for 4-Word BlockConsider 4 GB, byte-addressable main memory:Consider 4 GB, byte-addressable main memory:
1Gwords; byte address is 32 bits wide: 1Gwords; byte address is 32 bits wide: b31…b16 b15…b2 b1 b0b31…b16 b15…b2 b1 b0Each word is 32 bits wideEach word is 32 bits wide
Assume that cache block size is 4 words (128 bits data) and Assume that cache block size is 4 words (128 bits data) and it contains it contains 64 KB data64 KB data, or 16K words, i.e., 4K blocks., or 16K words, i.e., 4K blocks.Number of cache index bits = 12, because 4K = 2Number of cache index bits = 12, because 4K = 21212
– Tag size Tag size = 32 – byte offset – #block offset bits – #index bits= 32 – byte offset – #block offset bits – #index bits= 32 – 2 – 2 – 12 = 16 bits= 32 – 2 – 2 – 12 = 16 bits
Cache requires, for each word:Cache requires, for each word:– 16 bit 16 bit tagtag, and one , and one valid bitvalid bit– Total storage needed in cacheTotal storage needed in cache
= #blocks in cache = #blocks in cache × × (data bits/block + tag size + valid bit)(data bits/block + tag size + valid bit)= 2= 21212(4(4××32+16+1) = 432+16+1) = 4×2×21010×145 = 580×2×145 = 580×21010 bits =580 Kb = bits =580 Kb = 72.5 KB72.5 KB
Physical storage/Data storage = 72.5/64 = 1.13Physical storage/Data storage = 72.5/64 = 1.13
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 3737
Using Larger Cache Block (4 Words)Using Larger Cache Block (4 Words)
Val. 16-bit DataIndex bit Tag (4 words=128 bits)
byte offset
b31… b15 b14… b4 b3 b2 b1 b0
=
Data
1 = hit0 = miss
16 bit Tag
12 bit Index
Memory address
Cache size16K words
Block size= 4 word
4GB = 1G words byte-address
4K In
dex
es
0000 0000 0000
1111 1111 1111
M U X
2 bit block offset
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 3838
Handling a MissHandling a MissMiss occurs when data at the required memory Miss occurs when data at the required memory address is not found in cache.address is not found in cache.Controller actions:Controller actions:– Stall pipelineStall pipeline– Freeze contents of all registersFreeze contents of all registers– Activate a separate cache controllerActivate a separate cache controller
If cache is fullIf cache is full– select the least recently used (LRU) block in cache for over-select the least recently used (LRU) block in cache for over-
writingwriting– If selected block has inconsistent data, take proper actionIf selected block has inconsistent data, take proper action
Copy the block containing the requested address from memoryCopy the block containing the requested address from memory
– Restart InstructionRestart Instruction
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 3939
Miss During Instruction FetchMiss During Instruction Fetch
Send original PC value (PC – 4) to the Send original PC value (PC – 4) to the memory.memory.
Instruct main memory to perform a read Instruct main memory to perform a read and wait for the memory to complete the and wait for the memory to complete the access.access.
Write cache entry.Write cache entry.
Restart the instruction whose fetch failed.Restart the instruction whose fetch failed.
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 4040
Writing to MemoryWriting to MemoryCache and memory become Cache and memory become inconsistentinconsistent when data is written into cache, but not to when data is written into cache, but not to memory – memory – the cache coherence problem.the cache coherence problem.
Strategies to handle inconsistent data:Strategies to handle inconsistent data:– Write-throughWrite-through
Write to memory and cache simultaneously always.Write to memory and cache simultaneously always.
Write to memory is ~100 times slower than to cache.Write to memory is ~100 times slower than to cache.
– Write bufferWrite bufferWrite to cache and to buffer for writing to memory.Write to cache and to buffer for writing to memory.
If buffer is full, the processor must wait.If buffer is full, the processor must wait.
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 4141
Writing to Memory: Write-BackWriting to Memory: Write-BackWrite-back (or copy back) writes only to Write-back (or copy back) writes only to cache but sets a cache but sets a “dirty bit”“dirty bit” in the block in the block where write is performed.where write is performed.
When a block with When a block with dirty bitdirty bit “on” is to be “on” is to be overwritten in the cache, it is first written to overwritten in the cache, it is first written to the memory.the memory.
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 4242
AMD Opteron MicroprocessorAMD Opteron Microprocessor
L1(split64KB each)Block 64BWrite-back
L21MBBlock 64BWrite-back
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 4343
Interleaved MemoryInterleaved MemoryReduces miss penalty.Reduces miss penalty.Memory designed to read words Memory designed to read words of a block simultaneously in one of a block simultaneously in one read operation.read operation.Example:Example:– Cache block size = 4 wordsCache block size = 4 words– Interleaved memory with 4 banksInterleaved memory with 4 banks– Suppose memory access ~15 cyclesSuppose memory access ~15 cycles– Miss penalty = 1 cycle to send Miss penalty = 1 cycle to send
address + 15 cycles to read a block + address + 15 cycles to read a block + 4 cycles to send data to cache4 cycles to send data to cache = = 20 cycles20 cycles
– Without interleaving,Without interleaving, Miss Miss penalty penalty = 65 cycles= 65 cycles
Processor
CacheSmall, fast
memory
words
blocks
Memory bank 0
Memory bank 1
Memory bank 2
Memory bank 3
Main memory
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 4444
Cache HierarchyCache Hierarchy
Average access timeAverage access time
= = T1 + T1 + (1 – h(1 – h1) [ T2 + (1 – h2)Tm ]1) [ T2 + (1 – h2)Tm ]
WhereWhereT1 = L1 cache access time T1 = L1 cache access time (smallest)(smallest)
T2 = L2 cache access time (small)T2 = L2 cache access time (small)
Tm = memory access time (large)Tm = memory access time (large)
h1, h2 = hit rates (0 ≤ h1, h2 ≤ 1)h1, h2 = hit rates (0 ≤ h1, h2 ≤ 1)
Average access time reduces Average access time reduces by adding a cache.by adding a cache.
Processor
L1 Cache(SRAM)
Main memory large, inexpensive
(slow)
Access time = T1
Access time = Tm
L2 Cache (DRAM)
Access time = T2
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 4545
Average Access TimeAverage Access Time
T1+T2+Tm
T1
0 h1=1
1 h1=0
miss rate, 1- h1
Acc
ess
tim
e
T1+T2+Tm / 2
T1+T2
T1 < T2 < Tm
h2 = 0
h2 = 1
h2 = 0.5
T1 + (1 – h1) [ T2 + (1 – h2)Tm ]T1 + (1 – h1) [ T2 + (1 – h2)Tm ]
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 4646
Processor Pipeline Performance Processor Pipeline Performance Without CacheWithout Cache
5GHz processor, cycle time = 0.2ns5GHz processor, cycle time = 0.2ns
Memory access time = 100ns = 500 cyclesMemory access time = 100ns = 500 cycles
Ignoring memory access, CPI = 1Ignoring memory access, CPI = 1
Assuming no memory data access:Assuming no memory data access:
CPICPI = 1 + # stall cycles= 1 + # stall cycles= 1 + 500 = 501= 1 + 500 = 501
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 4747
Performance with 1 Level CachePerformance with 1 Level Cache
Assume hit rate, h1 = 0.95Assume hit rate, h1 = 0.95
L1 access time = 0.2ns = 1 cycleL1 access time = 0.2ns = 1 cycle
CPICPI = 1 + # stall cycles= 1 + # stall cycles
= 1 += 1 + 0.05×500 0.05×500
= 26= 26
Processor speed increase due to cacheProcessor speed increase due to cache
= 501/26= 501/26 = = 19.319.3
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 4848
Performance with 2 Level CachesPerformance with 2 Level CachesAssume:Assume:– L1 hit rate, h1 = 0.95L1 hit rate, h1 = 0.95– L2 hit rate, h2 = 0.90L2 hit rate, h2 = 0.90– L2 access time = 5ns = 25 cyclesL2 access time = 5ns = 25 cycles
CPI = 1 + # stall cyclesCPI = 1 + # stall cycles = 1= 1 + 0.05 (25 + 0.10×500) + 0.05 (25 + 0.10×500) = 1 + 3.75 = 4.75= 1 + 3.75 = 4.75
Processor speed increase due to 2 level cachesProcessor speed increase due to 2 level caches = 501/4.75= 501/4.75 = = 105.5105.5
Speed increase due to L2 cacheSpeed increase due to L2 cache = 26/4.75= 26/4.75 = 5.47= 5.47
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 4949
Miss Rate of Direct-Mapped CacheMiss Rate of Direct-Mapped Cache
000001010011100101110111
00000 0000001 0000010 0000011 0000100 0000101 0000110 0000111 00
01000 0001001 0001010 0001011 0001100 0001101 0001110 0001111 00
10000 0010001 0010010 0010011 0010100 0010101 0010110 0010111 00
11000 0011001 0011010 0011011 0011100 0011101 0011110 0011111 00
Main memory
Cache of 8 blocks
11 101 00 → memory address
cache address: tag
index
32-
wo
rd w
ord
-ad
dre
ssab
le m
emo
ry Block size = 1 word
ind
ex
tag
001011010100 111011
byte offset
Least recently used(LRU) block
This block is needed
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 5050
Fully-Associative Cache (8-Way Set Associative)Fully-Associative Cache (8-Way Set Associative)
000001010011100101010 11101111
00000 0000001 0000010 0000011 0000100 0000101 0000110 0000111 00
01000 0001001 0001010 0001011 0001100 0001101 0001110 0001111 00
10000 0010001 0010010 0010011 0010100 0010101 0010110 0010111 00
11000 0011001 0011010 0011011 0011100 0011101 0011110 0011111 00
Main memory
Cache of 8 blocks
11101 00 → memory address
cache address: tag
32-
wo
rd w
ord
-ad
dre
ssab
le m
emo
ry Block size = 1 word
tag
0010110101000111
byte offset
LRU block
This block is needed
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 5151
Miss Rate of Direct-Mapped CacheMiss Rate of Direct-Mapped Cache
000001010011100101110111
00000 0000001 0000010 0000011 0000100 0000101 0000110 0000111 00
01000 0001001 0001010 0001011 0001100 0001101 0001110 0001111 00
10000 0010001 0010010 0010011 0010100 0010101 0010110 0010111 00
11000 0011001 0011010 0011011 0011100 0011101 0011110 0011111 00
Main memory
Cache of 8 blocks
11 101 00 → memory address
cache address: tag
index
32-
wo
rd w
ord
-ad
dre
ssab
le m
emo
ry Block size = 1 word
ind
ex
tag
00 / 01 / 00 / 10 xx xx xx xx xx 00 xx
byte offset
Memory references to addresses: 0, 8, 0, 6, 8, 16
1. miss
2.miss
4. miss
3. miss
5. miss
6. miss
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 5252
Miss Rate: Fully-Associative CacheMiss Rate: Fully-Associative Cache
00000 01000 00110 10000 xxxxx xxxxx xxxxx xxxxx
00000 0000001 0000010 0000011 0000100 0000101 0000110 0000111 00
01000 0001001 0001010 0001011 0001100 0001101 0001110 0001111 00
10000 0010001 0010010 0010011 0010100 0010101 0010110 0010111 00
11000 0011001 0011010 0011011 0011100 0011101 0011110 0011111 00
Main memory
Cache of 8 blocks
11101 00 → memory address
cache address: tag
32-
wo
rd w
ord
-ad
dre
ssab
le m
emo
ry Block size = 1 word
tag
byte offset
Memory references to addresses: 0, 8, 0, 6, 8, 16
1. miss
2. miss
3. hit
4. miss
5. hit
6. miss
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 5353
Finding a Word in Associative CacheFinding a Word in Associative Cache
Valid 5-bit Databit Tag
byte offset b6 b5 b4 b3 b2 b1 b0
= Data1 = hit0 = miss
5 bit Tag no index
Memory address
Cache size8 words
Block size= 1 word
32 words byte-address
No index,must comparewith all tagsin the cache
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 5454
Eight-Way Set-Associative CacheEight-Way Set-Associative Cache
byte offset
b6 b5 b4 b3 b2 b1 b0
Data1 = hit0 = miss
5 bit Tag
Memory address Cache size8 wordsBlock size= 1 word
32 words byte-address
=
V | tag | data
=
V | tag | data
=
V | tag | data
=
V | tag | data
=
V | tag | data
=
V | tag | data
=
V | tag | data
=
V | tag | data
8 to 1 multiplexer
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 5555
Two-Way Set-Associative CacheTwo-Way Set-Associative Cache
00011011
00000 0000001 0000010 0000011 0000100 0000101 0000110 0000111 00
01000 0001001 0001010 0001011 0001100 0001101 0001110 0001111 00
10000 0010001 0010010 0010011 0010100 0010101 0010110 0010111 00
11000 0011001 0011010 0011011 0011100 0011101 0011110 0011111 00
Main memory
Cache of 8 blocks
111 01 00 → memory address
cache address: tag
index
32-
wo
rd w
ord
-ad
dre
ssab
le m
emo
ry Block size = 1 word
ind
ex
tag
s
000 | 011100 | 001110 | 101010 | 111
byte offset
LRU block
This block is needed
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 5656
Miss Rate: Two-Way Set-Associative CacheMiss Rate: Two-Way Set-Associative Cache
00011011
00000 0000001 0000010 0000011 0000100 0000101 0000110 0000111 00
01000 0001001 0001010 0001011 0001100 0001101 0001110 0001111 00
10000 0010001 0010010 0010011 0010100 0010101 0010110 0010111 00
11000 0011001 0011010 0011011 0011100 0011101 0011110 0011111 00
Main memory
Cache of 8 blocks
111 01 00 → memory address
cache address: tag
index
32-
wo
rd w
ord
-ad
dre
ssab
le m
emo
ry Block size = 1 word
ind
ex
tag
s
000 | 010xxx | xxx001 | xxxxxx | xxx
byte offset
Memory references to addresses: 0, 8, 0, 6, 8, 16
1. miss
2. miss
4. m
iss
3. hit
5. hit6. miss
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 5757
Two-Way Set-Associative CacheTwo-Way Set-Associative Cache
byte offset
b6 b5 b4 b3 b2 b1 b0
Data1 = hit0 = miss
3 bit tag
Memory addressCache size8 wordsBlock size= 1 word
32 words byte-address
V | tag | data
V | tag | data
V | tag | data
V | tag | data
==
V | tag | data
V | tag | data
V | tag | data
V | tag | data
00011011
2 to
1 M
UX
2 bit index
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 5858
Cache Miss and Page FaultCache Miss and Page FaultD
isk
All data, organized inPages (~4KB), accessed byPhysical addresses
ProcessorCacheMMU
MainMemory
Pages
(Write-back, same as in
cache)Cached pages,Page table
Page fault: a required page is not found in main memory
Cache miss: a required block is not found in cache
“Page fault” in virtual memory is similar to “miss” in cache.
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 5959
Virtual vs. Physical AddressVirtual vs. Physical AddressProcessor assumes a virtual memory addressing Processor assumes a virtual memory addressing scheme:scheme:
Disk is a virtual memory (large, slow)Disk is a virtual memory (large, slow)A block of data is called a virtual pageA block of data is called a virtual pageAn address is called virtual (or logical) address (VA)An address is called virtual (or logical) address (VA)
Main memory may have a different addressing Main memory may have a different addressing scheme:scheme:
Physical memory consists of cachesPhysical memory consists of cachesMemory address is called physical addressMemory address is called physical addressMMU translates virtual address to physical addressMMU translates virtual address to physical addressComplete address translation table is large and is kept in Complete address translation table is large and is kept in main memorymain memoryMMU contains TLB (translation-lookaside buffer), which MMU contains TLB (translation-lookaside buffer), which keeps record of recent address translations.keeps record of recent address translations.
Memory HierarchyMemory Hierarchy
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 6060
Wordstransferred
via load/store
Main memoryPhysical Virtual
CacheRegisters (1 or more levels)
Blockstransferred
automaticallyupon cache miss
Pagestransferred
automaticallyupon page fault
Memory Hierarchy ExampleMemory Hierarchy Example32-bit address (byte addressing)32-bit address (byte addressing)
4 GB virtual main memory (disk space)4 GB virtual main memory (disk space)Page size = 4 KBPage size = 4 KB
Number of virtual pages = 4Number of virtual pages = 4✕2✕23030/(4✕2/(4✕21010) = 1M) = 1M
Bits for virtual page number = logBits for virtual page number = log22(1M) = 20(1M) = 20
128 MB physical main memory128 MB physical main memoryPage size 4 KBPage size 4 KB
Number of physical pages = 128Number of physical pages = 128✕2✕22020/(4✕2/(4✕21010) = 32K) = 32K
Bits for physical page number = logBits for physical page number = log22(32K) = 15(32K) = 15
Page table contains 1M records specifying where Page table contains 1M records specifying where each virtual page is located.each virtual page is located.
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 6161
Page TablePage Table
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 6262
Page tableregister
Address ofPage tablein mainmemory
1M v
irtua
l pag
e nu
mbe
rs
Virtual main memory(pages on disk)
Phy
sica
l mai
n m
emor
y (p
ages
)
Valid bitOther flags, e.g.,
dirty bit, LRU ref. bit
Page locations0123..K...
Page 0
Page 1
Page 2
Page 3
-2-13----0--
32-bit Virtual Address (4 KB Page)32-bit Virtual Address (4 KB Page)
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 6363
1K words(4KB data)
A virtual page(contains 4KB,or 1K words)
20-bit virtual page number |10-b word number within page|2-b byte offset
32 bits (4 bytes)
Virtual to Physical Address TranslationVirtual to Physical Address Translation
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 6464
20-bit virtual page number | 12-bit byte offset within page
15-bit physical page number | 12-bit byte offset within page
Address translation
Virtual address
Physical address
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 6565
Virtual Memory SystemVirtual Memory System
Processor
MMU:Memory
management unit with
TLB
Virtual or logical address (VA)
Physical address (PTE)
Main Memorywith page table
Physical address
Data
Data
DMA:Direct memory
access
Cache
Disk
SRAM
DRAM
TLB: Translation-Lookaside BufferTLB: Translation-Lookaside Buffer
A processor request requires two A processor request requires two accesses to main memory:accesses to main memory:
Access page table to get physical addressAccess page table to get physical address
Access physical addressAccess physical address
TLB acts as a cache of page tableTLB acts as a cache of page tableHolds recent virtual to physical page translationsHolds recent virtual to physical page translations
Eliminates one main memory access if requested Eliminates one main memory access if requested virtual page address is found in TLB (hit)virtual page address is found in TLB (hit)
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 6666
TLB OrganizationTLB Organization
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 6767
Page tableregister
Address ofPage tablein mainmemory
1M v
irtua
l pag
e nu
mbe
rs
Virtual main memory(pages on disk)
Phy
sica
l mai
n m
emor
y (p
ages
)
Valid bitOther flags, e.g.,
dirty bit, LRU ref. bit
Page locations
01234.K...
Page 0
Page 1
Page 2
Page 3
-2-13----0--
V D R Tag Phy. Pg. Addr.
1 1 1 4 3
1 0 1 1 2
TLB DataTLB Data
V: Valid bitV: Valid bit
D: Dirty bitD: Dirty bit
R: Reference bit (LRU)R: Reference bit (LRU)
Tag: Index in page tableTag: Index in page table
Physical page addressPhysical page address
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 6868
Typical TLB CharacteristicsTypical TLB Characteristics
TLB size: 16 – 512 entriesTLB size: 16 – 512 entries
Block size: 1 – 2 page table entries of 4 – Block size: 1 – 2 page table entries of 4 – 8 bytes each8 bytes each
Hit time: 0.5 – 1 clock cycleHit time: 0.5 – 1 clock cycle
Miss penalty: 10 – 100 clock cyclesMiss penalty: 10 – 100 clock cycles
Miss rate: 0.01% – 1% Miss rate: 0.01% – 1%
Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 6969