skewed compressed cache micro 2014. somayeh sardashti, david a. wood computer sciences department...

36
Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

Upload: alexander-golden

Post on 21-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

Skewed Compressed CacheMICRO 2014.

Somayeh Sardashti, David A. Wood

Computer Sciences Department University of Wisconsin-Madison

Page 2: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

SCC

• Off-chip access -> latency, BW, Power

• LLC size already big => Effective ca-pacity

Page 3: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

Cache Compression

• Observation : many cache lines have low dynamic range data.

Page 4: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

SCCDesigning a Compressed Cache.

• (1) a compression algorithm to com-press blocks

• (2) a compaction mechanism to fit compressed blocks in the cache.

*In general, SCC is independent of the compression algorithm in use.

Page 5: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

Motivation.

• How can we design a compressed cache?

(Design goal)• 1. tightly compacting variable-size com-

pressed blocks.• 2. keeping tag and other metadata over-

head low• 3. allowing fast lookups.=> Previous compressed cache designs failed to achieve all these goals.

Page 6: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

Compressed Cache Taxon-omy

1) How to provide additional tags2) How to find the corresponding block given a matching tag.

Page 7: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

SCC

• Key Observation.• 1) spatial locality • ( neighboring blocks tend to reside in the cache at the same time)

• 2) compression locality • ( neighboring blocks tend to compress similarly )

Page 8: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

SCC

48bits PA

tag data

8B

64Byte = 16W

32B

16B

8B

CF = 2b00

CF = 2b11

CF = 2b10

CF = 2b01

Superblock tag

Page 9: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

SSC

Page 10: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

SuperBlock Cache

Page 11: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison
Page 12: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

16-way set-associative Cache

4-way set associative

Cache Block

Address 48bits047

subblock

Page 13: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

SuperBlock

1 Superblock = 8 contiguous blocks = 64Bytes x 8 = 512B

047 58

Byte Select

9 61011

6bits -> 64B

Block ID

Page 14: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

047 589 61011

xor

Way group Selection

Superblock tag

Page 15: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

047 589 61011

Page 16: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

047 589 61011

예 :

Page 17: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

예 )

047 589 61011

Page 18: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

2-way Skewed Cache.

Page 19: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

SCC

• 16-way cache with 8 cache sets into 4 way groups.• 64Byte cache block, 8-block Superblocks. (1,2,4 or 8 subblocks)• Separate sparse super-block tag

Page 20: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

SCC

• * 97% of updated blocks fit in original place.

Page 21: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

Area Overhead

Baseline : conventional 16-way 8MB LLCFixedC : doubles the # of tags. Compression only to half the size.VSC : 0-4 16B subblocksDCC4-16 : 0-4 16B subblocksSCC8-8 : 0-8 8B subblocks

Page 22: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

Methodology• GEMS simulator, CACTI6.5 (area, power at 32nm)• Run mixes of multi-programmed workloads from memory

bound and compute bound SPEC CPU 2006 benchmarks.

Baseline : conventional 16-way 8MB LLC

2XBaseline : conventional 32-way 16MB LLC

Page 23: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

Evaluation-MPKI

• 2X Baseline – average 15% im-provement

• SCC – avg. 13%

Page 24: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

Evaluation-Energy

• SCC improves system energy up to 20%.• Avg. 6%

Page 25: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

Conclusion

• SCC achieves performance compara-ble to that of a conventional cache with twice the capacity and associa-tivity with less area overhead 1.5%. (DCC - 6.8%)

= Area overhead : SCC 1.5% vs DCC 6.8%

• Lower design complexity. = Replacement mechanism is simpler than DCC

Page 26: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison
Page 27: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison
Page 28: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

FixedC

Page 29: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

VSC

Page 30: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

DCC

Page 31: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

SSC

Page 32: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

Sector Cache

Page 33: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison
Page 34: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

2-way Skewed Cache.

Page 35: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

Cache Compression

[Goal]• Fast (low decompression latency)• Simple (avoid complex hardware

changes)• Effective (good compression ratio)

Page 36: Skewed Compressed Cache MICRO 2014. Somayeh Sardashti, David A. Wood Computer Sciences Department University of Wisconsin-Madison

Motivation

• Off-chip memory latency is high.• -> larger cache reduce misses at the cost

of bigger area and power. • Off-chip memory access requires high en-

erygy.• -> larger cache reduce accesses to Off-chip

memory.• Off-chip interconnects bandwidth is limited.• -> larger cache