s10 memory system 5
TRANSCRIPT
-
7/31/2019 s10 Memory System 5
1/31
1
CMPE 421 Parallel Computer Architecture
PART5
More Elaborations with cache
&Virtual Memory
-
7/31/2019 s10 Memory System 5
2/31
Cache Optimization into categories
Reducing Miss Penalty
Multilevel caches
Critical word first: Dont wait for the full block to be loaded before
sending the requested word and restarting the CPU
Read Miss Before write miss: This optimization serves reads beforewrites have been completed.
SW R2, 512(R0) ; M[512] R2 (cache index 0)
LW R1,1024(R0) ; R1 M[1024] (cache index 0)
LW R2,512(R0) ; R2 M[512] (cache index 0)
- If the write buffer hasnt completed writing to location 512 in memory, theread of location 512 will put the old, wrong value into the cache block, andthen into R2.
Victim Caches
2
-
7/31/2019 s10 Memory System 5
3/31
Victim Caches
One approach to lower misspenalty is to remember what
was discarded in case it isneeded again.
This victim cache containsonly blocks that are
discarded from a cachebecause of a miss
victimsand are checkedon a miss to see if they havethe desired data before
going to the next lower-levelmemory.
The AMD Athlon has avictim cache with eight
entries. 3
Jouppi [1990] found that victim caches of one tofive entries are effective at reducing misses,
especially for small, direct-mapped data caches.Depending on the program, a four-entry victimcache might remove one quarter of the misses
in a 4-KB direct-mapped data cache.
-
7/31/2019 s10 Memory System 5
4/31
Cache Optimization into categories
Reducing the miss rate
Larger block size,
Larger cache size,
Higher associativity, Way prediction Pseudo-associativity,
- In way-prediction, extra bits are kept in the cache to predict the setof the next cache access.
Compiler optimizations Reducing the time to hit in the cache
small and simple caches,
avoiding address translation,
and pipelined cache access. 4
-
7/31/2019 s10 Memory System 5
5/31
Cache Optimization
Complier-based cache optimization reduces the miss rate
without any hardware change For Instructions
Reorder procedures in memory to reduce conflict
Profiling to determine likely conflicts among groups of instructions
For Data Merging Arrays: improve spatial locality by single array of compound
elements vs. two arrays
Loop Interchange: change nesting of loops to access data in orderstored in memory
Loop Fusion: Combine two independent loops that have samelooping and some variables overlap
Blocking: Improve temporal locality by accessing blocks of data
repeatedly vs. going down whole columns or rows
5
-
7/31/2019 s10 Memory System 5
6/31
Examples
6
Reduces misses by improving spatial locality through combined arraysthat are accessed simultaneously
Sequential accesses instead of striding through memory every 100 words;improved spatial locality
-
7/31/2019 s10 Memory System 5
7/31
Examples Some programs have separate sections of code that access the
same arrays (performing different computation on common data)
Fusing multiple loops into a single loop allows the data in cache tobe used repeatedly before being swapped out
Loop fusion reduces missed through improved temporal locality(rather than spatial locality in array merging and loop interchange)
7
Accessing array a and c would have caused twice the number of
misses without loop fusion
-
7/31/2019 s10 Memory System 5
8/31
Blocking Example
8
-
7/31/2019 s10 Memory System 5
9/31
Example
B called Blocking
Factor Conflict misses
can go down too
Blocking is also
useful for registerallocation
9
-
7/31/2019 s10 Memory System 5
10/31
Summary of performance equations
10
-
7/31/2019 s10 Memory System 5
11/31
VIRTUAL MEMORY
Youre running a huge program that requires 32MB
Your PC has only 16MB available... Rewrite your program so that it implements overlays
Execute the first portion of code (fit it in the availablememory)
When you need more memory... Find some memory that isnt needed right now
Save it to disk
Use the memory for the latter portion of code
So on... The memory is to disk as registers are to memory
Disk as an extension of memory
Main memory can act as a cache for the secondary
stage (magnetic disk) 11
-
7/31/2019 s10 Memory System 5
12/31
A Memory Hierarchy
Disk
Extend the hierarchy
Main memory acts like acache for the disk
Cache: About $20/Mbyte
-
7/31/2019 s10 Memory System 5
13/31
Virtual Memory Idea: Keep only the portions of a program (code, data)
that are currently needed in Main Memory
Currently unused data is saved on disk, ready to be brought inwhen needed
Appears as a very large virtual memory(limited only by the disksize)
Advantages: Programs that require large amounts of memory can be run (As
long as they dont need it all at once)
Multiple programs can be in virtual memory at once, only activeprograms will be loaded into memory
A program can be written (linked) to use whatever addresses it
wants to! It doesnt matter where it is physically loaded! When a program is loaded, it doesnt need to be placed in
continuous memory locations
Disadvantages: The memory a program needs may all be on disk
The operating system has to manage virtual memory 13
-
7/31/2019 s10 Memory System 5
14/31
Virtual Memory
We will focus
on using the disk as a storage area for chunks ofmain memory that are not being used.
The basic concepts are similar to providing a cache formain memory, although we now view part of the hard
disk as being the memory. Only few programs are active
An active might not need all the memory that has been reservedby the program (store rest in the Hard disk)
14
-
7/31/2019 s10 Memory System 5
15/31
The Virtual Memory Concept
Virtual Memory Space:All possible memory addresses(4GB in 32-bit systems)All that can be held as an option
(conceived) .
Disk Swap Space:Area on hard disk that can
be used as an extension ofmemory.(Typically equal to ram size)All that can be used.
Main Memory:Physical memory.(Typically 1GB)All that physically exists.
Virtual Memory Space
Disk SwapSpace
MainMemory
15
-
7/31/2019 s10 Memory System 5
16/31
The Virtual Memory Concept
Virtual Memory Space
Disk SwapSpace
MainMemory
This address can be conceived of,but doesnt correspond to anymemory. Accessing it will producean error.
This address can be accessed.However, it currently is only ondisk and must be read intomain memory before being used.
A table maps from its virtualaddress to the disk location.
This address can be accessedimmediately since it is already in
memory. A table maps from itsvirtual addressto its physicaladdress. There will also be aback-uplocation on disk.
Error
Disk Address: 58984Not in main memory
Physical Address: 883232Disk Address: 322321
16
-
7/31/2019 s10 Memory System 5
17/31
The Process
The CPU deals with Virtual Addresses
Steps to accessing memory with a virtual address1. Convert the virtual address to a physical address
Need a special table (Virtual Addr-> Physical Addr.)
The table may indicate that the desired address is
on disk, but not in physical memory Read the location from the disk into memory (this may require
moving something else out of memory to make room)
2. Do the memory access using the physical address Check the cache first (note: cache uses onlyphysical addresses)
Update the cache if needed
17
-
7/31/2019 s10 Memory System 5
18/31
Structure of Virtual Memory
Virtual Address
Address Translator
Physical Address
From Processor
To Memory
Page fault
Using elaborate
Softwarepage fault
Handling
algorithm
Return our Library AnalogyVirtual addresses as the title of a bookPhysical address as the location of that in the library
18
-
7/31/2019 s10 Memory System 5
19/31
Translation (hardware that translates these virtual addresses to physical addresses) Since the hardware access memory, we need to convert from a
logical address to a physical address in hardware
The Memory Management Unit (MMU) provides this functionality
0
2n-1
CPU MMUVirtual
Address(Logical)
PhysicalAddress
(Real)
PhysicalMemory
19
-
7/31/2019 s10 Memory System 5
20/31
Address Translation
In Virtual Memory, blocks of memory (called pages) are mapped fromone set of address (called virtual addresses) to another set (calledphysical addresses)
20
-
7/31/2019 s10 Memory System 5
21/31
If the valid bit for a virtual page is off, a page fault occurs. The operating systemmust be given control. Once the operating system gets control, it must find thepage in the next level of the hierarchy (usually magnetic disk) and decidewhere to place the requested page in main memory.
Page Faults
21
-
7/31/2019 s10 Memory System 5
22/31
Terminology
page: The unit of memory transferred between disk and
the main memory. page fault: when a program accesses a virtual memory
location that is not currently in the main memory.
address translation: the process of finding the physical
address that corresponds to a virtual address.
22
Cache Virtual memoryBlock PageCache miss page faultBlock addressing Address translation
-
7/31/2019 s10 Memory System 5
23/31
Difference between virtual and cache memory
The miss penalty is huge (millions of seconds)
Solution: Increase block size (page size) around 8KB- Because transfers have a large startup time, but data transfer is
relatively fast after started
Even on faults (misses) VM must provide info on thedisk location
VM system must have an entry for all possible locations
When there is a hit, the VM system provides the physicaladdress in memory (not the actual data, in the cache we havedata itself )
- Saves room one address rather than 8 KB data
Since miss penalty is very huge, VM systems typicallyhave a miss (page fault) rate of 0.00001- 0.0001%
23
-
7/31/2019 s10 Memory System 5
24/31
In Virtual Memory Systems
Pages should be large enough to amortize the high
access time. (from 4 kB to 16 kB are typical, and some
designers are considering size as large as 64 kB.)
Organizations that reduce the page fault rate are
attractive. The primary technique used here is to allow
flexible placement of pages. (e.g. fully associative)
Sophisticated LRU replacement policy is preferable
Page faults can be handled in software.
Write-back (Write-through scheme does not work.)
we need a scheme that reduce the number of disk writes.
24
-
7/31/2019 s10 Memory System 5
25/31
Keeping track of pages: The page table
All programs use the same virtual addressing space
Each program must have its own memory mapping Each program has its own page table to map virtual
addresses to physical addresses
virtual Address Physical Address
The page table resides in memory, and is pointed to by
the page table register The page table has an entry for every possible page (in
principle, not in practice...), no tags are necessary.
A valid bit indicates whether the page is in memory or on
disk. 25
Page Table
-
7/31/2019 s10 Memory System 5
26/31
Virtual to Physical Mapping
Virtual Page Number Page Offset0121331
Physical Page Number Page Offset
0121323
Example
4GB (32-bit) VirtualAddress Space16MB (24-bit) PhysicalAddress Space8 KB (13-bit) page size(block size)
Translation
A 32-bit virtual address is given to the V.M. hardware
The virtual page number(index) is derived from this by removingthe page (block) offset
Note: may involve reading from diskPage tables are stored in main MEM
(index)
No tag - All entries are unique
The Virtual Page Number is looked up in a page table When found, entry is either:
The physical page number, if in memory V->1
The disk address, if not in memory (a page fault) V->0
If not found, the address is invalid
Both virtual and physical address are broken down a page number and page offset
-
7/31/2019 s10 Memory System 5
27/31
Virtual Memory (32-bit system): 8KB page size,16MB Mem
Phys. Page # Disk AddressVirt.Pg.# V
012
512K
...
...
0121331
Index1319
Virtual Address
Page offset
0121323Physical Address
4GB / 8KB =512K entries
219=512K
11
27
-
7/31/2019 s10 Memory System 5
28/31
Virtual Memory Consists
Bits for page address
Bits for virtual page number Number of virtual pages
Entries in the page table
Bits for physical page number
Number of physical pages
Bits per page table line
Total page table size
28
-
7/31/2019 s10 Memory System 5
29/31
Write issues
Write Through - Update both disk and memory + Easy to implement
- Requires a write buffer
- Requires a separate disk write for every write to memory
-A write miss requires reading in the page first, then writing back thesingle word
Write Back - Write only to main memory. Write to the diskonly when block is replaced.
+ Writes are fast
+ Multiple writes to a page are combined into one disk write
- Must keep track of when page has been written (dirty bit)
29
-
7/31/2019 s10 Memory System 5
30/31
Page replacement policy
Exact Least Recently Used (LRU) but it is expensive.
So, use Approximate LRU: a use bit (or reference bit) is added to every page table
line
If there is a hit, PPN is used to form the address and reference
bit is turned on so the bit is set at every access the OS periodically clears all use bits
the page to replace is chosen among the ones with theiruse bit at zero
Choose one entry as a victim randomly
If the OS chooses to replace the page, the dirty bitindicates whether the page to be written out before itslocation in memory can be given to another (give a
Figure) 30
-
7/31/2019 s10 Memory System 5
31/31
Virtual memory example
Virtual Page # Valid Physical Page #/(index) Bit Disk address000000 1 1001
000001 0 sector 5000...000010 1 0010000011 0 sector 4323000100 1 1011000101 1 1010
000110 0 sector 1239...000111 1 0001
Page Table:
System with 20-bit V.A., 16KB pages, 256KB of physical memory
Page offset takes 14 bits, 6 bits for V.P.N. and 4 bits for P.P.N.Access to:000010001100 1010 1010
PPN = 0010
Physical Address:0010001100 1010 1010
Access to:00011001 0011 1100 0000
PPN = Page Fault tosector 1239...
Pick a page to kick out of memory (use LRU).
Assume LRU is VPN 000101for this example.
0
1 1010
sector xxxx...
Read data from sector 1239into PPN 1010
31