learning cache models by measurements jan reineke joint work with andreas abel uppsala university...
TRANSCRIPT
Learning Cache Models by Measurements
Jan Reineke
joint work with Andreas Abel
Uppsala University
December 20, 2012
Jan Reineke, Saarland 2
The Timing Analysis Problem
Embedded Software
Timing Requirements?
Microarchitecture
+
Jan Reineke, Saarland 3
What does the execution time of a program depend on?
Input-dependentcontrol flow
Microarchitectural State
+ Pipeline,Memory Hierarchy,Interconnect
Jan Reineke, Saarland 4
Example of Influence of Microarchitectural State
Motorola PowerPC 755
Jan Reineke, Saarland 5
Typical Structure of Timing Analysis Tools
ISA-specific analysis parts
Microarchitecture-specific part
Jan Reineke, Saarland 6
Construction of Timing Models
Needs to accurately model all aspects of the microarchitecture that influence execution time.
Jan Reineke, Saarland 7
Construction of Timing Models Today
Jan Reineke, Saarland 8
Construction of Timing Models Tomorrow
Jan Reineke, Saarland 9
Cache Models
Cache Model is important part of Timing Model Caches have simple interface: loads+stores Can be characterized by a few parameters:
ABC: associativity, block size, capacity Replacement policy
Jan Reineke, Saarland 10
High-level Approach
Generate memory access sequences Determine number of cache misses on these
sequences using performance counters, or by measuring execution times.
Infer property from measurement results.
Jan Reineke, Saarland 11
Warm-up I: Inferring the Cache Capacity
Notation: Preparatory accesses
Accesses to measure
Basic idea: Access sets of memory blocks A once. Then access A again and count cache misses. If A fits into the cache, no misses will occur.
Measure with .
Jan Reineke, Saarland 12
Example: Intel Core 2 Duo E6750, L1 Data Cache
Capacity = 32 KB
Way Size = 4 KB
|Misses|
|Size|
Jan Reineke, Saarland 13
Warm-up II: Inferring the Block Size
Given: way size W and associativity A
Wanted: block size B
Jan Reineke, Saarland 14
Inferring the Replacement Policy
There are infinitely many conceivable replacement policies… For any set of observations, multiple policies
remain possible. Need to make some assumption on possible
policies to render inference possible.
Jan Reineke, Saarland 15
A Class of Replacement Policies: Permutation Policies
Permutation Policies: Maintain total order on blocks in each cache set Evict the greatest block in the order Update the order based on the position of the
accessed block in the order
Can be specified by A+1 permutationsAssociativity-many “hit” permutations
one “miss” permutation
Examples: LRU, FIFO, PLRU, …
Jan Reineke, Saarland 16
Permutation Policy LRU
LRU (least recently used):
order on blocks based on recency of last access
a
b
c
d
c
a
b
d
c
e
c
a
b
e
most-recently-used
least-recently-used
“hit” permutation 3 “miss” permutation
Jan Reineke, Saarland 17
Permutation Policy FIFO
FIFO (first-in first-out):
order on blocks based on order of cache misses
a
b
c
d
a
b
c
d
c
e
a
b
c
e
last-in
first-in
“hit” permutation 3 “miss” permutation
Jan Reineke, Saarland 18
Inferring Permutation Policies
Strategy to infer permutation i:
1) Establish a known cache state s
2) Trigger permutation i
3) “Read out” the resulting cache state s’. Deduce permutation from s and s’.
Jan Reineke, Saarland 19
1) Establishing a known cache state
Assume “miss” permutation of FIFO, LRU, PLRU:
?
?
?
?
d
?
?
?
c
d
?
?
b
c
d
?
a
b
c
d
d c b a
Jan Reineke, Saarland 20
2) Triggering permutation i
Simply access ith block in cache state.
E,g, for i = 3, access c:
a
b
c
d
?
?
?
?
?
c
Jan Reineke, Saarland 21
3) Reading out a cache state
Exploit “miss” permutation to determine position of each of the original blocks:
If position is j then A-j+1 misses will evict the block, but A-j misses will not.
?
?
?
c
?
?
c
?
?
?
?
?
After 1 miss,c is still cached.
After 2 misses, c is not cached
Jan Reineke, Saarland 22
Inferring Permutation Policies:Example of LRU, Permutation 3
a
b
c
d
c
a
b
d
c
most-recently-used
least-recently-used
1) Establish known state
2) Trigger permutation 3
3) Read out resulting state
Jan Reineke, Saarland 23
Implementation Challenges
Interference Prefetching Instruction Caches Virtual Memory L2+L3 Caches:
Strictly-inclusive Non-inclusive Exclusive
Shared Caches: Coherence
Jan Reineke, Saarland 24
Experimental Results
Undocumented variant of LRU
Nehalem introduced“L0” micro-op buffer
Exclusive hierarchy
Surprising measurements ;-)
Jan Reineke, Saarland 25
L2 Caches on some Core 2 Duos
|misses|
n
Number of cache sets, 4096
associativities
Core 2 Duo E6300, 2 MB, 8 ways
Core 2 Duo E6750, 4 MB, 16 ways
Core 2 Duo E8400, 6 MB, 24 ways
Jan Reineke, Saarland 26
Replacement Policy of the Intel Atom D525
Discovered to our knowledge undocumented “hierarchical” policy:
a b
c d
e f
d c
a b
e f
d
x e
d c
a b
x
ATOM = LRU(3, LRU(2))
PLRU(2k) = LRU(2, PLRU(k))
Jan Reineke, Saarland 27
Conclusions and Future Work
Inference of cache models I More general class of replacement policies,
e.g. by inferring canonical register automata. Shared caches in multicores, coherency
protocols, etc. Deal with other architectural features:
translation lookaside buffers, branch predictors, prefetchers
Infer abstractions instead of “concrete” models
Jan Reineke, Saarland 28
Questions?
Measurement-based Modeling of the Cache Replacement PolicyA. Abel and J. ReinekeRTAS, 2013 (to appear).