chapter 2: memory hierarchy design - cs.sjtu.edu.cnshen-yy/cs359/slides/5_memory_a.pdf · memory...
TRANSCRIPT
1
CS359: Computer Architecture
Memory Hierarchy Design(Appendix B and Chapter 2)
Yanyan ShenDepartment of Computer Science and Engineering
2
Memory Hierarchy Design
Roadmap
2.1 Cache OrganizationVirtual MemorySix Basic Cache OptimizationsTen Advanced Optimizations of Cache
PerformanceMemory Technology and OptimizationsVirtual Memory and ProtectionProtection: Virtual Memory and Virtual
Machines
3
Memory Hierarchy Design
Computer Memory Memory is a large linear array of bytes.
Each byte has a unique address(location).
Byte of data at address 0x100, and 0x101 Most computers support byte (8-bit)
addressing. Data may have to be aligned on word (4
bytes) or double word (8 bytes) boundary. int is 4 bytes double precision floating point is 8 bytes
32-bit vs. 64-bit addresses! we will assume 32-bit for rest of course,
unless otherwise stated!
4
Memory Hierarchy Design
Our Naïve View of Memory
What issues do we need to worry about in implementing thememory system?
6
Memory Hierarchy Design
Performance Gap between Memory and Processor
7
Memory Hierarchy Design
SecondLevelCache
(SRAM)
A Typical Memory HierarchyTake advantage of the principle of locality to present the user with (1) as much memory as is available in the cheapest technology, (2) but at the speed offered by the fastest technology
Control
Datapath
SecondaryMemory(Disk)
On-Chip Components
RegFile
MainMemory(DRAM)D
ataC
acheInstr
Cache
ITLBD
TLB
Access time (ns): 0.1 1 10 100 10,000Size (bytes): 100 10K M G T
Cost: highest lowest
8
Memory Hierarchy Design
SRAM and DRAM Caches use SRAM for
speed and technology compatibility Fast (typical access times
of 0.5 to 2.5 nsec) Low density (6 transistor
cells), higher power, expensive ($2000 to $5000 per GB in 2008)
Static: content will last “forever” (as long as power is left on)
Main memory uses DRAM for size (density) Slower (typical access times
of 50 to 70 nsec) High density (1 transistor
cells), lower power, cheaper ($20 to $75 per GB in 2008)
Dynamic: needs to be “refreshed” regularly (~ every 8 ms)
- consumes1% to 2% of the active cycles of the DRAM
Addresses divided into 2 halves (row and column)
- RAS or Row Access Strobe triggering the row decoder
- CAS or Column Access Strobe triggering the column selector
SRAM: 6 transistors DRAM: 1 transistor
9
Memory Hierarchy Design
Locality
• Recently referenced items are likely to be referenced in the near future.
Temporal locality
• Items with nearby addresses tend to be referenced close together in time.
Spatial locality
Principle of Locality:Programs tend to reuse data and instructions near
those they have used recently, or those were recently referenced themselves.
10
Memory Hierarchy Design
The Memory Hierarchy: How Does it Work? Temporal Locality (locality in time)
If a memory location is referenced then it will tend to be referenced again soon
⇒ Keep most recently accessed data items closer to the processor
Spatial Locality (locality in space) If a memory location is referenced, the locations
with nearby addresses will tend to be referenced soon
⇒ Move blocks consisting of contiguous words closer to the processor
11
Memory Hierarchy Design
Processor
Control
Datapath
Memory
Devices
Input
Output
Cache
Main
Mem
ory
Secondary M
emory
(Disk)
Overview
12
Memory Hierarchy Design
Real Organization of Hierarchy
13
Memory Hierarchy Design
Characteristics of the Memory Hierarchy
Increasing distance from the processor in accesstime
L1$
L2$
Main Memory
Secondary Memory
Processor
(Relative) size of the memory at each level
Inclusive–what is in L1$ is a subset of what is in L2$ is a subset of what is in MM that is a subset of is in SM
4-8 bytes (word)
1 to 4 blocks
1,024+ bytes (disk sector = page)
8-32 bytes (block)
14
Memory Hierarchy Design
Example
15
Memory Hierarchy Design
Terminology Block (or line): the minimum unit of information that is present
(or not) in a cache Hit Rate: the fraction of memory accesses found in a level of the
memory hierarchy Hit Time: Time to access that level which consists of
(1) Time to access the block + (2) Time to determine hit/miss
Miss Rate: the fraction of memory accesses not found in a level of the memory hierarchy ⇒ 1 - (Hit Rate) Miss Penalty: Time to replace a block in that level with the
corresponding block from a lower level which consists of(1) Time to access the block in the lower level + (2) Time to transmit that block to the level that experienced the miss + (3) Time to insert the block in that level + (4) Time to pass the block to the requestor
Hit Time << Miss Penalty
16
Memory Hierarchy Design
17
Memory Hierarchy Design
How is the Hierarchy Managed?
registers ↔ memoryby compiler
cache ↔ main memoryby the cache controller hardware
main memory ↔ disksby the operating system (virtual memory)physical address mapping assisted by the
hardware (TLB)
18
Memory Hierarchy Design
Four Memory Hierarchy Questions
Q1 (block placement): where can a block be placed in cache?
Q2 (block identification): how is a block found if it is in cache?
Q3 (block replacement): which block should be replaced on a miss?
Q4 (write strategy): what happens on a write?
19
Memory Hierarchy Design
Q1: Block Placement
Cache:8 blocks
Memory:32 blocks
20
Memory Hierarchy Design
Q2: Block Identification Caches have an address tag on each block
frame that gives the block address It is checked to see if it matches the block address
from the processor Each cache block has a valid bit
To indicate if the entry has a valid address
031
21
Memory Hierarchy Design
Two questions to answer (in hardware):
Question 2: How do we know if a data item is in the cache?
Question 1: If it is, how do we find it?
22
Memory Hierarchy Design
Direct mapped Each memory block is mapped to exactly one
block in the cache lots of blocks must share blocks in the cache
Address mapping (to answer Q1):(block address) modulo (# of blocks in the cache)
Have a tag associated with each cache block that contains the address information (the upper portion of the address) required to identify the block (to answer Q2)
23
Memory Hierarchy Design
Caching: A Simple First Example
00011011
Cache
Main Memory
Q1: How do we find it?
Use next 2 low order memory address bits – the index – to determine which cache block (i.e., modulo the number of blocks in the cache)
Tag Data
Q2: Is it there?
Compare the cache tag to the high order 2 memory address bits to tell if the memory block is in the cache
Valid
0000xx0001xx0010xx0011xx0100xx0101xx0110xx0111xx1000xx1001xx1010xx1011xx1100xx1101xx1110xx1111xx
• One-word blocks
• Two low order bits define the byte in the word (32b words)
(block address) modulo (# of blocks in the cache)
Index
Byte addressable!
24
Memory Hierarchy Design
Direct Mapped Cache Example
One word blocks, cache size = 1K words (or 4KB)
20Tag 10Index
DataIndex TagValid012...
102110221023
31 30 . . . 13 12 11 . . . 2 1 0Blockoffset
20
Data
32
Hit
25
Memory Hierarchy Design
Multiword Block Direct Mapped Cache
Four words/block, cache size = 1K words
8Index
DataIndex TagValid012...
253254255
31 30 . . . 13 12 11 . . . 4 3 2 1 0 Byte offset
20
20Tag
Hit Data
32
Block offset
26
Memory Hierarchy Design
Set Associative MappedAllow more flexible block placement
In a direct mapped cache a memory block maps to exactly one cache block
At the other extreme, could allow a memory block to be mapped to any cache block – fully associative cache
A compromise is to divide the cache into sets each of which consists of n “ways” (n-way set associative). A memory block maps to a unique set (specified by the index field) and can be placed in any way of that set (so there are n choices)
(block address) modulo (# sets in the cache)
27
Memory Hierarchy Design
Set Associative Cache Example
0
Cache
Main Memory
Q1: How do we find it?
Use next 1 low order memory address bit to determine which cache set (i.e., modulo the number of sets in the cache)
Tag Data
Q2: Is it there?
Compare all the cache tags in the set to the high order 3 memory address bits to tell if the memory block is in the cache
V
0000xx0001xx0010xx0011xx0100xx0101xx0110xx0111xx1000xx1001xx1010xx1011xx1100xx1101xx1110xx1111xx
Set
101
Way
0
1
• One-word blocks• Two low order bits
define the byte in the word (32b words)
28
Memory Hierarchy Design
Four-Way Set Associative Cache28 = 256 sets each with four ways (each with one block)
31 30 . . . 13 12 11 . . . 2 1 0 Byte offset
DataTagV012...
253254255
DataTagV012...
253254255
DataTagV012...
253254255
Index DataTagV012...
253254255
8Index
22Tag
Hit Data
32
4x1 select
Way 0 Way 1 Way 2 Way 3