cache management
TRANSCRIPT
Cache Management
Presented By:
Babar Shahzaad
14F-MS-CP-12
Department of Computer Engineering ,
Faculty of Telecommunication and Information Engineering ,
University of Engineering and Technology (UET), Taxila
Outline What is the Problem
What is a Cache What is Locality
Cache Hit/Miss
Types of Cache Miss Memory Mapping Techniques
Direct Mapping
Fully-Associative Mapping
Set-Associative Mapping
Summary of Memory Mapping Techniques Methods to Overcome Cache Miss
Miss Cache
Victim Caches
Stream Buffers
Summary of Cache Miss Techniques
What is the Problem
What is a Cache
What is a Cache
Small, fast storage used to improve average access time to slow memory
Exploits spatial and temporal locality
In computer architecture, almost everything is a cache! Registers “a cache” on variables – software managed
First-level cache “a cache” on second-level cache
Second-level cache “a cache” on memory
Memory “a cache” on disk (virtual memory)
Translation Lookaside Buffer(TLB) “a cache” on page table
Branch-prediction “a cache” on prediction information
What is a Cache (Cont…)
Processor/
RegistersL1-
Cache
L2-Cache
Memory
Disk, Tape, etc.
Bigger Faster
What is Locality
Temporal Locality: locality in time If an item is referred, it will tend to be referred again soon
Spatial Locality: locality in space/distance If an item is referred, its neighboring addresses will be referred soon
Cache Hit/Miss
Cache Hit/Miss
Cache Hit Data found in cache. Results in data
transfer at maximum speed
Cache Miss Data not found in cache. Processor loads
data from Memory and copies into cache. This results in extra delay, called miss penalty
Types of Cache Miss
Types of Cache Miss
There are 4C’s
Memory Mapping Techniques
Direct Mapping
Direct Mapping
Each block of main memory maps to only one cache line i.e. if a block is in cache, it must be in one specific place
Address is in two parts Least Significant w bits identify unique word/byte
Most Significant s bits specify one memory block
The MSBs are split into a cache line field r and a tag of s-r (most significant)
Direct Mapping (Cont…)
Direct Mapping (Cont…)
Pros and Cons Simple
Inexpensive
No need of expensive associative search
Fixed location for given block
Miss rate may go up due to possible increase of mapping conflicts
If a program accesses 2 blocks that map to the same line repeatedly, cache misses(conflict misses) are very high
Cache Line Table
Fully-Associative Mapping
Fully-Associative Mapping
A main memory block can load into any line of cache
Memory address is interpreted as tag and word
Tag uniquely identifies block of memory
Every line’s tag is examined for a match
Cache searching gets expensive and power consumption due to parallel
Fully-Associative Mapping(Cont…)
Fully-Associative Mapping(Cont…) Pros and Cons
There is flexibility as to which block to replace when a new block is read into the cache
No restriction on mapping from Memory to Cache
Associative search tags is expensive
Feasible for very small size caches only
The complex circuitry required for parallel Tag comparison is however a major disadvantage
Set-Associative Mapping
Set-Associative Mapping
Cache is divided into a number of sets
Each set contains a number of lines
A given block maps to any line in a given set e.g. Block B can be in any line of set i
If 2 lines per set 2 way set associative mapping
A given block can be in one of 2 lines in only one set
Set-Associative Mapping(Cont…)
Set-Associative Mapping(Cont…)
Pros and Cons Most commercial cache have 2,4, or 8 way set associativity
Cheaper than a fully-associative cache
Lower miss ratio than a direct mapped cache
Direct mapped cache is the fastest
After simulating the hit ratio for direct mapped and (2,4,8 way) set associative mapped cache, it is observed that there is significant difference in performance at least up to cache size of 64KB, set associative being the better one.
However, beyond that, the complexity of cache increases in proportion to the associativity, hence both mapping give approximately similar hit ratio.
Summary of Memory Mapping Techniques Number of Misses
Direct Mapping > Set-Associative Mapping > Full-Associative Mapping
Access Latency Direct Mapping < Set-Associative Mapping < Full-Associative Mapping
Methods to Overcome Cache
Miss
Miss Cache
Miss Cache
Fully-associative cache inserted between L1 and L2
Contains between 2 and 5 cache lines of data
Aims to reduce conflict misses
Miss Cache Operation
On a miss in L1, we check the Miss Cache.
If the block is there, then we bring it into L1 So the penalty of a miss in L1 is just a few cycles, possibly as few as
one
Otherwise, fetch the block from the lower-levels, but store the retrieved value in the Miss Cache
Miss Cache Performance
What if we doubled the size of the L1 cache instead? 32% decrease in cache misses
So only .13% decrease per line
However, note that we are storing every block in the Miss Cache twice...
Victim Cache
Victim Cache
Motivation: Can we improve on our miss rates with miss caches by modifying the replacement policy (i.e. can we do something about the wasted space in the pure miss caching system)?
Fully-associative cache inserted between L1 and L2
Victim Cache Operation
On a miss in L1, we check the Victim Cache
If the block is there, then bring it into L1 and swap the ejected value into the miss cache Misses that are caught by the cache are still cheap, but better
utilization of space is made
Otherwise, fetch the block from the lower-levels
Victim Cache Performance
Even better than Miss Cache!
Smaller L1 caches benefit more from victim caches
Wait A Minute!
What about compulsory and capacity misses?
What about instruction misses?
Victim Cache and Miss Cache most helpful when temporal locality can be exploited
Pre-fetching techniques can help Pre-fetch Always
Pre-fetch on Miss
Tagged Pre-fetch
Can we improve on these techniques?
Stream Buffers
Stream Buffers
Stream Buffer is a FIFO queue placed in between L1 and L2
Stream Buffers Operation
When a miss occurs in L1, say at address A, the Stream Buffer immediately starts to pre-fetch elements at A+1
Subsequent accesses check the head of the Stream Buffer before going to L2
Note that non-sequential misses will cause the line to restart pre-fetching (i.e. A+2 and A+4 will each restart the pre-fetching process, even if A+4 was already in the stream)
Stream Buffer Performance
72% of instruction misses removed
25% of data misses removed
Summary of Cache Miss Techniques Miss Caches: Small caches in between L1 and L2. Whenever a
miss occurs in L1, they receive a copy of the data from the lower level cache
Victim Cache: Enhancement of Miss Caches. Instead of copying data from lower levels on a miss, they take whatever block has been evicted from L1
Stream Buffers: Simple FIFO buffers that are immediately pre-fetched into on a miss