skewed compressed cache micro 2014. somayeh sardashti, david a. wood computer sciences department...
TRANSCRIPT
Skewed Compressed CacheMICRO 2014.
Somayeh Sardashti, David A. Wood
Computer Sciences Department University of Wisconsin-Madison
SCC
• Off-chip access -> latency, BW, Power
• LLC size already big => Effective ca-pacity
Cache Compression
• Observation : many cache lines have low dynamic range data.
SCCDesigning a Compressed Cache.
• (1) a compression algorithm to com-press blocks
• (2) a compaction mechanism to fit compressed blocks in the cache.
*In general, SCC is independent of the compression algorithm in use.
Motivation.
• How can we design a compressed cache?
(Design goal)• 1. tightly compacting variable-size com-
pressed blocks.• 2. keeping tag and other metadata over-
head low• 3. allowing fast lookups.=> Previous compressed cache designs failed to achieve all these goals.
Compressed Cache Taxon-omy
1) How to provide additional tags2) How to find the corresponding block given a matching tag.
SCC
• Key Observation.• 1) spatial locality • ( neighboring blocks tend to reside in the cache at the same time)
• 2) compression locality • ( neighboring blocks tend to compress similarly )
SCC
48bits PA
tag data
8B
64Byte = 16W
32B
16B
8B
CF = 2b00
CF = 2b11
CF = 2b10
CF = 2b01
Superblock tag
SSC
SuperBlock Cache
16-way set-associative Cache
4-way set associative
Cache Block
Address 48bits047
subblock
SuperBlock
1 Superblock = 8 contiguous blocks = 64Bytes x 8 = 512B
047 58
Byte Select
9 61011
6bits -> 64B
Block ID
047 589 61011
xor
Way group Selection
Superblock tag
047 589 61011
047 589 61011
예 :
예 )
047 589 61011
2-way Skewed Cache.
SCC
• 16-way cache with 8 cache sets into 4 way groups.• 64Byte cache block, 8-block Superblocks. (1,2,4 or 8 subblocks)• Separate sparse super-block tag
SCC
• * 97% of updated blocks fit in original place.
Area Overhead
Baseline : conventional 16-way 8MB LLCFixedC : doubles the # of tags. Compression only to half the size.VSC : 0-4 16B subblocksDCC4-16 : 0-4 16B subblocksSCC8-8 : 0-8 8B subblocks
Methodology• GEMS simulator, CACTI6.5 (area, power at 32nm)• Run mixes of multi-programmed workloads from memory
bound and compute bound SPEC CPU 2006 benchmarks.
Baseline : conventional 16-way 8MB LLC
2XBaseline : conventional 32-way 16MB LLC
Evaluation-MPKI
• 2X Baseline – average 15% im-provement
• SCC – avg. 13%
Evaluation-Energy
• SCC improves system energy up to 20%.• Avg. 6%
Conclusion
• SCC achieves performance compara-ble to that of a conventional cache with twice the capacity and associa-tivity with less area overhead 1.5%. (DCC - 6.8%)
= Area overhead : SCC 1.5% vs DCC 6.8%
• Lower design complexity. = Replacement mechanism is simpler than DCC
FixedC
VSC
DCC
SSC
Sector Cache
2-way Skewed Cache.
Cache Compression
[Goal]• Fast (low decompression latency)• Simple (avoid complex hardware
changes)• Effective (good compression ratio)
Motivation
• Off-chip memory latency is high.• -> larger cache reduce misses at the cost
of bigger area and power. • Off-chip memory access requires high en-
erygy.• -> larger cache reduce accesses to Off-chip
memory.• Off-chip interconnects bandwidth is limited.• -> larger cache