s10 memory system 5

Upload: djrive

Post on 04-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 s10 Memory System 5

    1/31

    1

    CMPE 421 Parallel Computer Architecture

    PART5

    More Elaborations with cache

    &Virtual Memory

  • 7/31/2019 s10 Memory System 5

    2/31

    Cache Optimization into categories

    Reducing Miss Penalty

    Multilevel caches

    Critical word first: Dont wait for the full block to be loaded before

    sending the requested word and restarting the CPU

    Read Miss Before write miss: This optimization serves reads beforewrites have been completed.

    SW R2, 512(R0) ; M[512] R2 (cache index 0)

    LW R1,1024(R0) ; R1 M[1024] (cache index 0)

    LW R2,512(R0) ; R2 M[512] (cache index 0)

    - If the write buffer hasnt completed writing to location 512 in memory, theread of location 512 will put the old, wrong value into the cache block, andthen into R2.

    Victim Caches

    2

  • 7/31/2019 s10 Memory System 5

    3/31

    Victim Caches

    One approach to lower misspenalty is to remember what

    was discarded in case it isneeded again.

    This victim cache containsonly blocks that are

    discarded from a cachebecause of a miss

    victimsand are checkedon a miss to see if they havethe desired data before

    going to the next lower-levelmemory.

    The AMD Athlon has avictim cache with eight

    entries. 3

    Jouppi [1990] found that victim caches of one tofive entries are effective at reducing misses,

    especially for small, direct-mapped data caches.Depending on the program, a four-entry victimcache might remove one quarter of the misses

    in a 4-KB direct-mapped data cache.

  • 7/31/2019 s10 Memory System 5

    4/31

    Cache Optimization into categories

    Reducing the miss rate

    Larger block size,

    Larger cache size,

    Higher associativity, Way prediction Pseudo-associativity,

    - In way-prediction, extra bits are kept in the cache to predict the setof the next cache access.

    Compiler optimizations Reducing the time to hit in the cache

    small and simple caches,

    avoiding address translation,

    and pipelined cache access. 4

  • 7/31/2019 s10 Memory System 5

    5/31

    Cache Optimization

    Complier-based cache optimization reduces the miss rate

    without any hardware change For Instructions

    Reorder procedures in memory to reduce conflict

    Profiling to determine likely conflicts among groups of instructions

    For Data Merging Arrays: improve spatial locality by single array of compound

    elements vs. two arrays

    Loop Interchange: change nesting of loops to access data in orderstored in memory

    Loop Fusion: Combine two independent loops that have samelooping and some variables overlap

    Blocking: Improve temporal locality by accessing blocks of data

    repeatedly vs. going down whole columns or rows

    5

  • 7/31/2019 s10 Memory System 5

    6/31

    Examples

    6

    Reduces misses by improving spatial locality through combined arraysthat are accessed simultaneously

    Sequential accesses instead of striding through memory every 100 words;improved spatial locality

  • 7/31/2019 s10 Memory System 5

    7/31

    Examples Some programs have separate sections of code that access the

    same arrays (performing different computation on common data)

    Fusing multiple loops into a single loop allows the data in cache tobe used repeatedly before being swapped out

    Loop fusion reduces missed through improved temporal locality(rather than spatial locality in array merging and loop interchange)

    7

    Accessing array a and c would have caused twice the number of

    misses without loop fusion

  • 7/31/2019 s10 Memory System 5

    8/31

    Blocking Example

    8

  • 7/31/2019 s10 Memory System 5

    9/31

    Example

    B called Blocking

    Factor Conflict misses

    can go down too

    Blocking is also

    useful for registerallocation

    9

  • 7/31/2019 s10 Memory System 5

    10/31

    Summary of performance equations

    10

  • 7/31/2019 s10 Memory System 5

    11/31

    VIRTUAL MEMORY

    Youre running a huge program that requires 32MB

    Your PC has only 16MB available... Rewrite your program so that it implements overlays

    Execute the first portion of code (fit it in the availablememory)

    When you need more memory... Find some memory that isnt needed right now

    Save it to disk

    Use the memory for the latter portion of code

    So on... The memory is to disk as registers are to memory

    Disk as an extension of memory

    Main memory can act as a cache for the secondary

    stage (magnetic disk) 11

  • 7/31/2019 s10 Memory System 5

    12/31

    A Memory Hierarchy

    Disk

    Extend the hierarchy

    Main memory acts like acache for the disk

    Cache: About $20/Mbyte

  • 7/31/2019 s10 Memory System 5

    13/31

    Virtual Memory Idea: Keep only the portions of a program (code, data)

    that are currently needed in Main Memory

    Currently unused data is saved on disk, ready to be brought inwhen needed

    Appears as a very large virtual memory(limited only by the disksize)

    Advantages: Programs that require large amounts of memory can be run (As

    long as they dont need it all at once)

    Multiple programs can be in virtual memory at once, only activeprograms will be loaded into memory

    A program can be written (linked) to use whatever addresses it

    wants to! It doesnt matter where it is physically loaded! When a program is loaded, it doesnt need to be placed in

    continuous memory locations

    Disadvantages: The memory a program needs may all be on disk

    The operating system has to manage virtual memory 13

  • 7/31/2019 s10 Memory System 5

    14/31

    Virtual Memory

    We will focus

    on using the disk as a storage area for chunks ofmain memory that are not being used.

    The basic concepts are similar to providing a cache formain memory, although we now view part of the hard

    disk as being the memory. Only few programs are active

    An active might not need all the memory that has been reservedby the program (store rest in the Hard disk)

    14

  • 7/31/2019 s10 Memory System 5

    15/31

    The Virtual Memory Concept

    Virtual Memory Space:All possible memory addresses(4GB in 32-bit systems)All that can be held as an option

    (conceived) .

    Disk Swap Space:Area on hard disk that can

    be used as an extension ofmemory.(Typically equal to ram size)All that can be used.

    Main Memory:Physical memory.(Typically 1GB)All that physically exists.

    Virtual Memory Space

    Disk SwapSpace

    MainMemory

    15

  • 7/31/2019 s10 Memory System 5

    16/31

    The Virtual Memory Concept

    Virtual Memory Space

    Disk SwapSpace

    MainMemory

    This address can be conceived of,but doesnt correspond to anymemory. Accessing it will producean error.

    This address can be accessed.However, it currently is only ondisk and must be read intomain memory before being used.

    A table maps from its virtualaddress to the disk location.

    This address can be accessedimmediately since it is already in

    memory. A table maps from itsvirtual addressto its physicaladdress. There will also be aback-uplocation on disk.

    Error

    Disk Address: 58984Not in main memory

    Physical Address: 883232Disk Address: 322321

    16

  • 7/31/2019 s10 Memory System 5

    17/31

    The Process

    The CPU deals with Virtual Addresses

    Steps to accessing memory with a virtual address1. Convert the virtual address to a physical address

    Need a special table (Virtual Addr-> Physical Addr.)

    The table may indicate that the desired address is

    on disk, but not in physical memory Read the location from the disk into memory (this may require

    moving something else out of memory to make room)

    2. Do the memory access using the physical address Check the cache first (note: cache uses onlyphysical addresses)

    Update the cache if needed

    17

  • 7/31/2019 s10 Memory System 5

    18/31

    Structure of Virtual Memory

    Virtual Address

    Address Translator

    Physical Address

    From Processor

    To Memory

    Page fault

    Using elaborate

    Softwarepage fault

    Handling

    algorithm

    Return our Library AnalogyVirtual addresses as the title of a bookPhysical address as the location of that in the library

    18

  • 7/31/2019 s10 Memory System 5

    19/31

    Translation (hardware that translates these virtual addresses to physical addresses) Since the hardware access memory, we need to convert from a

    logical address to a physical address in hardware

    The Memory Management Unit (MMU) provides this functionality

    0

    2n-1

    CPU MMUVirtual

    Address(Logical)

    PhysicalAddress

    (Real)

    PhysicalMemory

    19

  • 7/31/2019 s10 Memory System 5

    20/31

    Address Translation

    In Virtual Memory, blocks of memory (called pages) are mapped fromone set of address (called virtual addresses) to another set (calledphysical addresses)

    20

  • 7/31/2019 s10 Memory System 5

    21/31

    If the valid bit for a virtual page is off, a page fault occurs. The operating systemmust be given control. Once the operating system gets control, it must find thepage in the next level of the hierarchy (usually magnetic disk) and decidewhere to place the requested page in main memory.

    Page Faults

    21

  • 7/31/2019 s10 Memory System 5

    22/31

    Terminology

    page: The unit of memory transferred between disk and

    the main memory. page fault: when a program accesses a virtual memory

    location that is not currently in the main memory.

    address translation: the process of finding the physical

    address that corresponds to a virtual address.

    22

    Cache Virtual memoryBlock PageCache miss page faultBlock addressing Address translation

  • 7/31/2019 s10 Memory System 5

    23/31

    Difference between virtual and cache memory

    The miss penalty is huge (millions of seconds)

    Solution: Increase block size (page size) around 8KB- Because transfers have a large startup time, but data transfer is

    relatively fast after started

    Even on faults (misses) VM must provide info on thedisk location

    VM system must have an entry for all possible locations

    When there is a hit, the VM system provides the physicaladdress in memory (not the actual data, in the cache we havedata itself )

    - Saves room one address rather than 8 KB data

    Since miss penalty is very huge, VM systems typicallyhave a miss (page fault) rate of 0.00001- 0.0001%

    23

  • 7/31/2019 s10 Memory System 5

    24/31

    In Virtual Memory Systems

    Pages should be large enough to amortize the high

    access time. (from 4 kB to 16 kB are typical, and some

    designers are considering size as large as 64 kB.)

    Organizations that reduce the page fault rate are

    attractive. The primary technique used here is to allow

    flexible placement of pages. (e.g. fully associative)

    Sophisticated LRU replacement policy is preferable

    Page faults can be handled in software.

    Write-back (Write-through scheme does not work.)

    we need a scheme that reduce the number of disk writes.

    24

  • 7/31/2019 s10 Memory System 5

    25/31

    Keeping track of pages: The page table

    All programs use the same virtual addressing space

    Each program must have its own memory mapping Each program has its own page table to map virtual

    addresses to physical addresses

    virtual Address Physical Address

    The page table resides in memory, and is pointed to by

    the page table register The page table has an entry for every possible page (in

    principle, not in practice...), no tags are necessary.

    A valid bit indicates whether the page is in memory or on

    disk. 25

    Page Table

  • 7/31/2019 s10 Memory System 5

    26/31

    Virtual to Physical Mapping

    Virtual Page Number Page Offset0121331

    Physical Page Number Page Offset

    0121323

    Example

    4GB (32-bit) VirtualAddress Space16MB (24-bit) PhysicalAddress Space8 KB (13-bit) page size(block size)

    Translation

    A 32-bit virtual address is given to the V.M. hardware

    The virtual page number(index) is derived from this by removingthe page (block) offset

    Note: may involve reading from diskPage tables are stored in main MEM

    (index)

    No tag - All entries are unique

    The Virtual Page Number is looked up in a page table When found, entry is either:

    The physical page number, if in memory V->1

    The disk address, if not in memory (a page fault) V->0

    If not found, the address is invalid

    Both virtual and physical address are broken down a page number and page offset

  • 7/31/2019 s10 Memory System 5

    27/31

    Virtual Memory (32-bit system): 8KB page size,16MB Mem

    Phys. Page # Disk AddressVirt.Pg.# V

    012

    512K

    ...

    ...

    0121331

    Index1319

    Virtual Address

    Page offset

    0121323Physical Address

    4GB / 8KB =512K entries

    219=512K

    11

    27

  • 7/31/2019 s10 Memory System 5

    28/31

    Virtual Memory Consists

    Bits for page address

    Bits for virtual page number Number of virtual pages

    Entries in the page table

    Bits for physical page number

    Number of physical pages

    Bits per page table line

    Total page table size

    28

  • 7/31/2019 s10 Memory System 5

    29/31

    Write issues

    Write Through - Update both disk and memory + Easy to implement

    - Requires a write buffer

    - Requires a separate disk write for every write to memory

    -A write miss requires reading in the page first, then writing back thesingle word

    Write Back - Write only to main memory. Write to the diskonly when block is replaced.

    + Writes are fast

    + Multiple writes to a page are combined into one disk write

    - Must keep track of when page has been written (dirty bit)

    29

  • 7/31/2019 s10 Memory System 5

    30/31

    Page replacement policy

    Exact Least Recently Used (LRU) but it is expensive.

    So, use Approximate LRU: a use bit (or reference bit) is added to every page table

    line

    If there is a hit, PPN is used to form the address and reference

    bit is turned on so the bit is set at every access the OS periodically clears all use bits

    the page to replace is chosen among the ones with theiruse bit at zero

    Choose one entry as a victim randomly

    If the OS chooses to replace the page, the dirty bitindicates whether the page to be written out before itslocation in memory can be given to another (give a

    Figure) 30

  • 7/31/2019 s10 Memory System 5

    31/31

    Virtual memory example

    Virtual Page # Valid Physical Page #/(index) Bit Disk address000000 1 1001

    000001 0 sector 5000...000010 1 0010000011 0 sector 4323000100 1 1011000101 1 1010

    000110 0 sector 1239...000111 1 0001

    Page Table:

    System with 20-bit V.A., 16KB pages, 256KB of physical memory

    Page offset takes 14 bits, 6 bits for V.P.N. and 4 bits for P.P.N.Access to:000010001100 1010 1010

    PPN = 0010

    Physical Address:0010001100 1010 1010

    Access to:00011001 0011 1100 0000

    PPN = Page Fault tosector 1239...

    Pick a page to kick out of memory (use LRU).

    Assume LRU is VPN 000101for this example.

    0

    1 1010

    sector xxxx...

    Read data from sector 1239into PPN 1010

    31