cache management

Cache Management

Presented By:

Babar Shahzaad

14F-MS-CP-12

Department of Computer Engineering ,

Faculty of Telecommunication and Information Engineering ,

University of Engineering and Technology (UET), Taxila

Outline What is the Problem

What is a Cache What is Locality

Cache Hit/Miss

Types of Cache Miss Memory Mapping Techniques

Direct Mapping

Fully-Associative Mapping

Set-Associative Mapping

Summary of Memory Mapping Techniques Methods to Overcome Cache Miss

Miss Cache

Victim Caches

Stream Buffers

Summary of Cache Miss Techniques

What is the Problem

What is a Cache

What is a Cache

Small, fast storage used to improve average access time to slow memory

Exploits spatial and temporal locality

In computer architecture, almost everything is a cache! Registers “a cache” on variables – software managed

First-level cache “a cache” on second-level cache

Second-level cache “a cache” on memory

Memory “a cache” on disk (virtual memory)

Translation Lookaside Buffer(TLB) “a cache” on page table

Branch-prediction “a cache” on prediction information

What is a Cache (Cont…)

Processor/

RegistersL1-

Cache

L2-Cache

Memory

Disk, Tape, etc.

Bigger Faster

What is Locality

Temporal Locality: locality in time If an item is referred, it will tend to be referred again soon

Spatial Locality: locality in space/distance If an item is referred, its neighboring addresses will be referred soon

Cache Hit/Miss

Cache Hit/Miss

Cache Hit Data found in cache. Results in data

transfer at maximum speed

Cache Miss Data not found in cache. Processor loads

data from Memory and copies into cache. This results in extra delay, called miss penalty

Types of Cache Miss

Types of Cache Miss

There are 4C’s

Memory Mapping Techniques

Direct Mapping

Direct Mapping

Each block of main memory maps to only one cache line i.e. if a block is in cache, it must be in one specific place

Address is in two parts Least Significant w bits identify unique word/byte

Most Significant s bits specify one memory block

The MSBs are split into a cache line field r and a tag of s-r (most significant)

Direct Mapping (Cont…)

Direct Mapping (Cont…)

Pros and Cons Simple

Inexpensive

No need of expensive associative search

Fixed location for given block

Miss rate may go up due to possible increase of mapping conflicts

If a program accesses 2 blocks that map to the same line repeatedly, cache misses(conflict misses) are very high

Cache Line Table


A main memory block can load into any line of cache

Memory address is interpreted as tag and word

Tag uniquely identifies block of memory

Every line’s tag is examined for a match

Cache searching gets expensive and power consumption due to parallel

Fully-Associative Mapping(Cont…)

Fully-Associative Mapping(Cont…) Pros and Cons

There is flexibility as to which block to replace when a new block is read into the cache

No restriction on mapping from Memory to Cache

Associative search tags is expensive

Feasible for very small size caches only

The complex circuitry required for parallel Tag comparison is however a major disadvantage


Cache is divided into a number of sets

Each set contains a number of lines

A given block maps to any line in a given set e.g. Block B can be in any line of set i

If 2 lines per set 2 way set associative mapping

A given block can be in one of 2 lines in only one set

Set-Associative Mapping(Cont…)

Set-Associative Mapping(Cont…)

Pros and Cons Most commercial cache have 2,4, or 8 way set associativity

Cheaper than a fully-associative cache

Lower miss ratio than a direct mapped cache

Direct mapped cache is the fastest

After simulating the hit ratio for direct mapped and (2,4,8 way) set associative mapped cache, it is observed that there is significant difference in performance at least up to cache size of 64KB, set associative being the better one.

However, beyond that, the complexity of cache increases in proportion to the associativity, hence both mapping give approximately similar hit ratio.

Summary of Memory Mapping Techniques Number of Misses

Direct Mapping > Set-Associative Mapping > Full-Associative Mapping

Access Latency Direct Mapping < Set-Associative Mapping < Full-Associative Mapping

Methods to Overcome Cache

Miss

Miss Cache

Miss Cache

Fully-associative cache inserted between L1 and L2

Contains between 2 and 5 cache lines of data

Aims to reduce conflict misses

Miss Cache Operation

On a miss in L1, we check the Miss Cache.

If the block is there, then we bring it into L1 So the penalty of a miss in L1 is just a few cycles, possibly as few as

one

Otherwise, fetch the block from the lower-levels, but store the retrieved value in the Miss Cache

Miss Cache Performance

What if we doubled the size of the L1 cache instead? 32% decrease in cache misses

So only .13% decrease per line

However, note that we are storing every block in the Miss Cache twice...

Victim Cache

Victim Cache

Motivation: Can we improve on our miss rates with miss caches by modifying the replacement policy (i.e. can we do something about the wasted space in the pure miss caching system)?

Fully-associative cache inserted between L1 and L2

Victim Cache Operation

On a miss in L1, we check the Victim Cache

If the block is there, then bring it into L1 and swap the ejected value into the miss cache Misses that are caught by the cache are still cheap, but better

utilization of space is made

Otherwise, fetch the block from the lower-levels

Victim Cache Performance

Even better than Miss Cache!

Smaller L1 caches benefit more from victim caches

Wait A Minute!

What about compulsory and capacity misses?

What about instruction misses?

Victim Cache and Miss Cache most helpful when temporal locality can be exploited

Pre-fetching techniques can help Pre-fetch Always

Pre-fetch on Miss

Tagged Pre-fetch

Can we improve on these techniques?

Stream Buffers

Stream Buffers

Stream Buffer is a FIFO queue placed in between L1 and L2

Stream Buffers Operation

When a miss occurs in L1, say at address A, the Stream Buffer immediately starts to pre-fetch elements at A+1

Subsequent accesses check the head of the Stream Buffer before going to L2

Note that non-sequential misses will cause the line to restart pre-fetching (i.e. A+2 and A+4 will each restart the pre-fetching process, even if A+4 was already in the stream)

Stream Buffer Performance

72% of instruction misses removed

25% of data misses removed

Summary of Cache Miss Techniques Miss Caches: Small caches in between L1 and L2. Whenever a

miss occurs in L1, they receive a copy of the data from the lower level cache

Victim Cache: Enhancement of Miss Caches. Instead of copying data from lower levels on a miss, they take whatever block has been evicted from L1

Stream Buffers: Simple FIFO buffers that are immediately pre-fetched into on a miss

cache management

Engineering

cache hitmiss cache

level cache

cache small

match cache

cache management

setassociative mapping

line of cache memory

memory memory