cache parameters
DESCRIPTION
Cache Parameters. Cache size :S cache (lines) Set number:N (sets) Line number per set:K (lines/set) S cache = KN (lines) = KN * L (bytes) Here L is line size in bytes K-way set-associative. Trade-offs in Set-Associativity. Fully-associative: - PowerPoint PPT PresentationTRANSCRIPT
112/04/20 \course\cpeg324-08F\Topic7c 1
Cache Parameters
• Cache size : Scache (lines)
• Set number: N (sets)
• Line number per set: K (lines/set)
Scache = KN (lines)
= KN * L (bytes) Here L is line size in bytes
K-way set-associative
112/04/20 \course\cpeg324-08F\Topic7c 2
Trade-offs in Set-Associativity
Fully-associative:
- Higher hit ratio, concurrent search, but slow access when
associativity is large.
Direct mapping:
- Fast access (if hits) and simplicity for comparison.
- Trivial replacement algorithm.
Problem with hit ratio, e.g. in extreme case: if alternatively use 2
blocks which mapped into the same cache block frame: “trash”
may happen.
112/04/20 \course\cpeg324-08F\Topic7c 3
Note
Main memory size: Smain (blocks)
Cache memory Size: Scache (blocks)
Let P = Since P >>1.
Average search length is much greater than 1.
• Set-associativity provides a trade-off between:
- Concurrency in search.
- Average search/access time per block.
Smain
Scache
You need search!
112/04/20 \course\cpeg324-08F\Topic7c 4
1 N Scache< <
Fullassociative
Setassociative
DirectMapped
Number of sets
112/04/20 \course\cpeg324-08F\Topic7c 5
Important Factors in Cache Design
• Address partitioning strategy
(3-dimention freedom).
• Total cache size/memory size
• Work load
112/04/20 \course\cpeg324-08F\Topic7c 6
Address Partitioning
• Byte addressing mode
Cache memory size data part = NKL (bytes)
• Directory size (per entry)
M - log2N - log2L
• Reduce clustering (randomize accesses)
M bits
Log N Log L
Set number byte address in a line
set size
112/04/20 \course\cpeg324-08F\Topic7c 7
Note: The exists a knee1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.18 10 20 30 40 Cache Size
0.34
General Curve Describing Cache Behavior
Mis
s R
atio
112/04/20 \course\cpeg324-08F\Topic7c 8
…the data are sketchy and highly dependent on the
method of gathering...
… designer must make critical choices using a
combination of “hunches, skills, and experience” as
supplement…
“a strong intuitive feeling concerning a future event or
result.”
112/04/20 \course\cpeg324-08F\Topic7c 9
Basic Principle
• Typical workload study + intelligent estimate of others
• Good Engineering: small degree over-design
• “30% rule”:
- Each doubling of the cache size reduces misses
by 30% by Alan J. Smith. Cache Memories. Computing
Surveys, Vol. 14., No 13, Sep 1982.
- It is a rough estimate only.
112/04/20 \course\cpeg324-08F\Topic7c 10
K: Associativity
• Bigger Miss ratio
• Smaller is better in:
- Faster
- Cheaper
• 4 ~ 8 get best miss ratio
Simpler
112/04/20 \course\cpeg324-08F\Topic7c 11
L : Line Size
• Atomic unit of transmission
• Miss ratio
• Smaller- Larger average delay
- Less traffic
- Larger average hardware cost for associative search
- Larger possibility of “Line crossers”
• Workload dependent
• 16 ~ 128 byteMemory references spanning the boundary between two cache lines
112/04/20 \course\cpeg324-08F\Topic7c 12
Cache Replacement Policy
• FIFO (first-in-first-out) replace the block loaded furthest in
the past
• LRU (least-recently used) replace the block used
furthest in the past
• OPT (furthest-future used) replace the block which will
be used furthest in the future.
Do not retain lines that have next occurrence in the most
distant future
Note: LRU performance is close to OPT for frequently
encountered program structures.
Example: Misses and AssociativitySmall cache with four one-word blocks. Sequence 0, 8, 0, 6 and 8.
A. Direct Mapped Cache.
Blue text Data used in time t.
Black text Data used in time t-1.112/04/20 \course\cpeg324-08F\Topic7c 13
5 misses for the 5 accesses
Example: Misses and Associativity (cont’d)Small cache with four one-word blocks. Sequence 0, 8, 0, 6 and 8.
B. Two-way set-associative. LRU replacement policy
Blue text Data used in time t.
Black text Data used in time t-1.112/04/20 \course\cpeg324-08F\Topic7c 14
4 misses for the 5 accesses
Example: Misses and Associativity (cont’d)Small cache with four one-word blocks. Sequence 0, 8, 0, 6 and 8.
C. Fully associative Cache.
- Any memory block can be stored in any cache block.
Blue text Data used in time t.
Black text Data used in time t-1.
Red text Data used in time t-2.
112/04/20 \course\cpeg324-08F\Topic7c 15
3 misses for the 5 accesses
112/04/20 \course\cpeg324-08F\Topic7c 16
Program Structure
for i = 1 to n
for j = 1 to n
endfor
endfor
Last-in-first-out feature makes the recent past likes the near future
….
112/04/20 \course\cpeg324-08F\Topic7c 19
Problem with LRU• Not good in mimic sequential/cyclic
Example
ABCDEF ABC…… ABC……
Exercise: With a set size of 3, what is the miss ratio
assuming all 6 addresses mapped to the same set ?
112/04/20 \course\cpeg324-08F\Topic7c 23
Performance Evaluation Methods for Workload
• Analytical modeling.
• Simulation
• Measuring
112/04/20 \course\cpeg324-08F\Topic7c 24
Cache Analysis Methods
• Hardware monitoring:
- Fast and accurate.
- Not fast enough (for high-performance
machines).
- Cost.
- Flexibility/repeatability.
112/04/20 \course\cpeg324-08F\Topic7c 25
Cache Analysis Methods
• Address traces and machine simulator:
- Slow.
- Accuracy/fidelity.
- Cost advantage.
- Flexibility/repeatability.
- OS/other impacts - How to put them in?
cont’d
112/04/20 \course\cpeg324-08F\Topic7c 26
Trace Driven Simulation for Cache
• Workload dependence:
- Difficulty in characterizing the load.
- No general accepted model.
• Effectiveness:
- Possible simulation for many parameters.
- Repeatability.
112/04/20 \course\cpeg324-08F\Topic7c 27
Problem in Address Traces
• Representative of the actual workload (hard)
- Only cover a small fraction of real workload.
- Diversity of user programs.
• Initialization transient
- Use long enough traces to absorb the impact
of cold misses
• Inability to properly model multiprocessor effects