s10 memory system 5

7/31/2019 s10 Memory System 5

1/31

1

CMPE 421 Parallel Computer Architecture

PART5

More Elaborations with cache

&Virtual Memory


2/31

Cache Optimization into categories

Reducing Miss Penalty

Multilevel caches

Critical word first: Dont wait for the full block to be loaded before

sending the requested word and restarting the CPU

Read Miss Before write miss: This optimization serves reads beforewrites have been completed.

SW R2, 512(R0) ; M[512] R2 (cache index 0)

LW R1,1024(R0) ; R1 M[1024] (cache index 0)

LW R2,512(R0) ; R2 M[512] (cache index 0)

- If the write buffer hasnt completed writing to location 512 in memory, theread of location 512 will put the old, wrong value into the cache block, andthen into R2.

Victim Caches

2


3/31

Victim Caches

One approach to lower misspenalty is to remember what

was discarded in case it isneeded again.

This victim cache containsonly blocks that are

discarded from a cachebecause of a miss

victimsand are checkedon a miss to see if they havethe desired data before

going to the next lower-levelmemory.

The AMD Athlon has avictim cache with eight

entries. 3

Jouppi [1990] found that victim caches of one tofive entries are effective at reducing misses,

especially for small, direct-mapped data caches.Depending on the program, a four-entry victimcache might remove one quarter of the misses

in a 4-KB direct-mapped data cache.


4/31

Cache Optimization into categories

Reducing the miss rate

Larger block size,

Larger cache size,

Higher associativity, Way prediction Pseudo-associativity,

- In way-prediction, extra bits are kept in the cache to predict the setof the next cache access.

Compiler optimizations Reducing the time to hit in the cache

small and simple caches,

avoiding address translation,

and pipelined cache access. 4


5/31

Cache Optimization

Complier-based cache optimization reduces the miss rate

without any hardware change For Instructions

Reorder procedures in memory to reduce conflict

Profiling to determine likely conflicts among groups of instructions

For Data Merging Arrays: improve spatial locality by single array of compound

elements vs. two arrays

Loop Interchange: change nesting of loops to access data in orderstored in memory

Loop Fusion: Combine two independent loops that have samelooping and some variables overlap

Blocking: Improve temporal locality by accessing blocks of data

repeatedly vs. going down whole columns or rows

5


6/31

Examples

6

Reduces misses by improving spatial locality through combined arraysthat are accessed simultaneously

Sequential accesses instead of striding through memory every 100 words;improved spatial locality


7/31

Examples Some programs have separate sections of code that access the

same arrays (performing different computation on common data)

Fusing multiple loops into a single loop allows the data in cache tobe used repeatedly before being swapped out

Loop fusion reduces missed through improved temporal locality(rather than spatial locality in array merging and loop interchange)

7

Accessing array a and c would have caused twice the number of

misses without loop fusion


8/31

Blocking Example

8


9/31

Example

B called Blocking

Factor Conflict misses

can go down too

Blocking is also

useful for registerallocation

9


10/31

Summary of performance equations

10


11/31

VIRTUAL MEMORY

Youre running a huge program that requires 32MB

Your PC has only 16MB available... Rewrite your program so that it implements overlays

Execute the first portion of code (fit it in the availablememory)

When you need more memory... Find some memory that isnt needed right now

Save it to disk

Use the memory for the latter portion of code

So on... The memory is to disk as registers are to memory

Disk as an extension of memory

Main memory can act as a cache for the secondary

stage (magnetic disk) 11


12/31

A Memory Hierarchy

Disk

Extend the hierarchy

Main memory acts like acache for the disk

Cache: About $20/Mbyte


13/31

Virtual Memory Idea: Keep only the portions of a program (code, data)

that are currently needed in Main Memory

Currently unused data is saved on disk, ready to be brought inwhen needed

Appears as a very large virtual memory(limited only by the disksize)

Advantages: Programs that require large amounts of memory can be run (As

long as they dont need it all at once)

Multiple programs can be in virtual memory at once, only activeprograms will be loaded into memory

A program can be written (linked) to use whatever addresses it

wants to! It doesnt matter where it is physically loaded! When a program is loaded, it doesnt need to be placed in

continuous memory locations

Disadvantages: The memory a program needs may all be on disk

The operating system has to manage virtual memory 13


14/31

Virtual Memory

We will focus

on using the disk as a storage area for chunks ofmain memory that are not being used.

The basic concepts are similar to providing a cache formain memory, although we now view part of the hard

disk as being the memory. Only few programs are active

An active might not need all the memory that has been reservedby the program (store rest in the Hard disk)

14


15/31

The Virtual Memory Concept

Virtual Memory Space:All possible memory addresses(4GB in 32-bit systems)All that can be held as an option

(conceived) .

Disk Swap Space:Area on hard disk that can

be used as an extension ofmemory.(Typically equal to ram size)All that can be used.

Main Memory:Physical memory.(Typically 1GB)All that physically exists.

Virtual Memory Space

Disk SwapSpace

MainMemory

15


16/31

The Virtual Memory Concept

Virtual Memory Space

Disk SwapSpace

MainMemory

This address can be conceived of,but doesnt correspond to anymemory. Accessing it will producean error.

This address can be accessed.However, it currently is only ondisk and must be read intomain memory before being used.

A table maps from its virtualaddress to the disk location.

This address can be accessedimmediately since it is already in

memory. A table maps from itsvirtual addressto its physicaladdress. There will also be aback-uplocation on disk.

Error

Disk Address: 58984Not in main memory

Physical Address: 883232Disk Address: 322321

16


17/31

The Process

The CPU deals with Virtual Addresses

Steps to accessing memory with a virtual address1. Convert the virtual address to a physical address

Need a special table (Virtual Addr-> Physical Addr.)

The table may indicate that the desired address is

on disk, but not in physical memory Read the location from the disk into memory (this may require

moving something else out of memory to make room)

2. Do the memory access using the physical address Check the cache first (note: cache uses onlyphysical addresses)

Update the cache if needed

17


18/31

Structure of Virtual Memory

Virtual Address

Address Translator

Physical Address

From Processor

To Memory

Page fault

Using elaborate

Softwarepage fault

Handling

algorithm

Return our Library AnalogyVirtual addresses as the title of a bookPhysical address as the location of that in the library

18


19/31

Translation (hardware that translates these virtual addresses to physical addresses) Since the hardware access memory, we need to convert from a

logical address to a physical address in hardware

The Memory Management Unit (MMU) provides this functionality

0

2n-1

CPU MMUVirtual

Address(Logical)

PhysicalAddress

(Real)

PhysicalMemory

19


20/31

Address Translation

In Virtual Memory, blocks of memory (called pages) are mapped fromone set of address (called virtual addresses) to another set (calledphysical addresses)

20


21/31

If the valid bit for a virtual page is off, a page fault occurs. The operating systemmust be given control. Once the operating system gets control, it must find thepage in the next level of the hierarchy (usually magnetic disk) and decidewhere to place the requested page in main memory.

Page Faults

21


22/31

Terminology

page: The unit of memory transferred between disk and

the main memory. page fault: when a program accesses a virtual memory

location that is not currently in the main memory.

address translation: the process of finding the physical

address that corresponds to a virtual address.

22

Cache Virtual memoryBlock PageCache miss page faultBlock addressing Address translation


23/31

Difference between virtual and cache memory

The miss penalty is huge (millions of seconds)

Solution: Increase block size (page size) around 8KB- Because transfers have a large startup time, but data transfer is

relatively fast after started

Even on faults (misses) VM must provide info on thedisk location

VM system must have an entry for all possible locations

When there is a hit, the VM system provides the physicaladdress in memory (not the actual data, in the cache we havedata itself )

- Saves room one address rather than 8 KB data

Since miss penalty is very huge, VM systems typicallyhave a miss (page fault) rate of 0.00001- 0.0001%

23


24/31

In Virtual Memory Systems

Pages should be large enough to amortize the high

access time. (from 4 kB to 16 kB are typical, and some

designers are considering size as large as 64 kB.)

Organizations that reduce the page fault rate are

attractive. The primary technique used here is to allow

flexible placement of pages. (e.g. fully associative)

Sophisticated LRU replacement policy is preferable

Page faults can be handled in software.

Write-back (Write-through scheme does not work.)

we need a scheme that reduce the number of disk writes.

24


25/31

Keeping track of pages: The page table

All programs use the same virtual addressing space

Each program must have its own memory mapping Each program has its own page table to map virtual

addresses to physical addresses

virtual Address Physical Address

The page table resides in memory, and is pointed to by

the page table register The page table has an entry for every possible page (in

principle, not in practice...), no tags are necessary.

A valid bit indicates whether the page is in memory or on

disk. 25

Page Table


26/31

Virtual to Physical Mapping

Virtual Page Number Page Offset0121331

Physical Page Number Page Offset

0121323

Example

4GB (32-bit) VirtualAddress Space16MB (24-bit) PhysicalAddress Space8 KB (13-bit) page size(block size)

Translation

A 32-bit virtual address is given to the V.M. hardware

The virtual page number(index) is derived from this by removingthe page (block) offset

Note: may involve reading from diskPage tables are stored in main MEM

(index)

No tag - All entries are unique

The Virtual Page Number is looked up in a page table When found, entry is either:

The physical page number, if in memory V->1

The disk address, if not in memory (a page fault) V->0

If not found, the address is invalid

Both virtual and physical address are broken down a page number and page offset


27/31

Virtual Memory (32-bit system): 8KB page size,16MB Mem

Phys. Page # Disk AddressVirt.Pg.# V

012

512K

...

...

0121331

Index1319

Virtual Address

Page offset

0121323Physical Address

4GB / 8KB =512K entries

219=512K

11

27


28/31

Virtual Memory Consists

Bits for page address

Bits for virtual page number Number of virtual pages

Entries in the page table

Bits for physical page number

Number of physical pages

Bits per page table line

Total page table size

28


29/31

Write issues

Write Through - Update both disk and memory + Easy to implement

- Requires a write buffer

- Requires a separate disk write for every write to memory

-A write miss requires reading in the page first, then writing back thesingle word

Write Back - Write only to main memory. Write to the diskonly when block is replaced.

+ Writes are fast

+ Multiple writes to a page are combined into one disk write

- Must keep track of when page has been written (dirty bit)

29


30/31

Page replacement policy

Exact Least Recently Used (LRU) but it is expensive.

So, use Approximate LRU: a use bit (or reference bit) is added to every page table

line

If there is a hit, PPN is used to form the address and reference

bit is turned on so the bit is set at every access the OS periodically clears all use bits

the page to replace is chosen among the ones with theiruse bit at zero

Choose one entry as a victim randomly

If the OS chooses to replace the page, the dirty bitindicates whether the page to be written out before itslocation in memory can be given to another (give a

Figure) 30


31/31

Virtual memory example

Virtual Page # Valid Physical Page #/(index) Bit Disk address000000 1 1001

000001 0 sector 5000...000010 1 0010000011 0 sector 4323000100 1 1011000101 1 1010

000110 0 sector 1239...000111 1 0001

Page Table:

System with 20-bit V.A., 16KB pages, 256KB of physical memory

Page offset takes 14 bits, 6 bits for V.P.N. and 4 bits for P.P.N.Access to:000010001100 1010 1010

PPN = 0010

Physical Address:0010001100 1010 1010

Access to:00011001 0011 1100 0000

PPN = Page Fault tosector 1239...

Pick a page to kick out of memory (use LRU).

Assume LRU is VPN 000101for this example.

0

1 1010

sector xxxx...

Read data from sector 1239into PPN 1010

31

s10 memory system 5

Documents