![Page 1: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/1.jpg)
Evaluating the Reuse Cache for mobile processors
Lokesh Jindal, Urmish Thakker, Swapnil HariaCS-752 Fall 2014
University of Wisconsin-Madison
1
![Page 2: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/2.jpg)
Executive SummaryProblem : Mobile SOCs – Area is money!Cache Area = 20-30% of SOC die area
Solutions?Technology Scaling
Expensive, diminishing returnsReuse Caches
Reduced cache size, comparable performance
Results=>50% Area reductions for 5% performance loss
![Page 3: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/3.jpg)
Outline
• Motivation• Implementation• Methodology• Performance Evaluation• Future Directions and Conclusion
![Page 4: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/4.jpg)
Outline
• Motivation• Implementation• Methodology• Performance Evaluation• Future Directions and Conclusion
![Page 5: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/5.jpg)
Caches in mobile SOCs
![Page 6: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/6.jpg)
Area-Performance Tradeoffs*Pe
rfor
man
ce
Cache Size
Conventional Cache
* Representative Graph
![Page 7: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/7.jpg)
Area-Performance Tradeoffs*Pe
rfor
man
ce
Cache Size
Conventional CacheReuse Cache
* Representative Graph
![Page 8: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/8.jpg)
Motivation
Jorge Albericio, Pablo Ibáñez, Víctor Viñals, and José M. Llabería. 2013. The reuse cache: downsizing the shared last-level cache. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46).
![Page 9: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/9.jpg)
Reuse Cache : Original idea
Jorge Albericio, Pablo Ibáñez, Víctor Viñals, and José M. Llabería. 2013. The reuse cache: downsizing the shared last-level cache. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46).
![Page 10: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/10.jpg)
Jorge Albericio, Pablo Ibáñez, Víctor Viñals, and José M. Llabería. 2013. The reuse cache: downsizing the shared last-level cache. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46).
![Page 11: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/11.jpg)
Reuse Cache : good idea for desktop processors (seemingly)
but what about for mobile processors?
![Page 12: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/12.jpg)
Revisiting Reuse Caches for the Mobile Environment
• Questions– Memory characteristics of mobile workloads?– Spatial/Temporal/Reuse locality at L2 level?– Reuse Cache performance for smaller, simpler caches?
![Page 13: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/13.jpg)
Low temporal locality
% Dead Lines (on Fetch) =
% Dead Lines (on Eviction) =
90.38 90.89 90.15 90.34 90.25 90.7785.83 86.45 85.58 85.78 85.72 86.26
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
baidumap bbench frozenbubble k9mail kingsoftoffice netease
% o
f Dea
d Li
nes
AsimBench Benchmarks
Dead Lines (On Load)
Dead Lines (On Eviction)
Fetched Lines #Reused Lines # - 1
Evicted Lines # Lines Evicted Unused#
![Page 14: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/14.jpg)
Low temporal locality
% Dead Lines (on Fetch) =
% Dead Lines (on Eviction) =
90.38 90.89 90.15 90.34 90.25 90.7785.83 86.45 85.58 85.78 85.72 86.26
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
baidumap bbench frozenbubble k9mail kingsoftoffice netease
% o
f Dea
d Li
nes
AsimBench Benchmarks
Dead Lines (On Load)
Dead Lines (On Eviction)
Fetched Lines #Reused Lines # - 1
Evicted Lines # Lines Evicted Unused#
85-90% dead lines!
![Page 15: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/15.jpg)
Outline
• Motivation• Implementation• Methodology• Performance Evaluation• Future Directions and Conclusion
![Page 16: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/16.jpg)
Organization
• Tag Array– Inclusive (aids coherence)– Set associative– (Forward) Pointer to data array
entry
8 ways….
![Page 17: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/17.jpg)
Organization
• Tag Array– Inclusive (aids coherence)– Set associative– (Forward) Pointer to data array
entry
• Data Array– Set associative– Stores reused lines– (Reverse) Pointer to tag array
entry
8 ways….
8 ways….
![Page 18: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/18.jpg)
Coherence
• TO-MOESI protocol– Small extension to the MOESI protocol
• New Tag-Only state• Tag+Data states : Modified, Shared, Owned, Exclusive• Transitions triggered by data insertions/evictions• Read the paper!
![Page 19: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/19.jpg)
Replacement
• Independent replacement policies for Tag and Data arrays
• Tag Array– Not Recently Reused (NRR)
• Data Array– Not Recently Used (NRU) + Shadow directory
![Page 20: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/20.jpg)
Shadow Directory• Pomerene et al. proposed shadow directory
– Transient lines that must be flushed quickly – Lines that become active after long periods of inactivity.
Prefetching mechanism for a high speed buffer store, James Herbert Pomerene, Thomas Roberts Puzak, Rudolph Nathan Rechtschaffen, Frank John Sparacio, 1984, Patent EP 0157175 A2
Tag
Tag+Data
Tag
Tag+Data
No Tag
1
2
3
4
![Page 21: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/21.jpg)
Shadow Directory• Pomerene et al. proposed shadow directory
– Transient lines that must be flushed quickly – Lines that become active after long periods of inactivity.
• Shadow bit added in reuse cache to break inefficient cycles
Shadow bit set here
Least priority for eviction
Tag
Tag+Data
Tag
Tag+Data
No Tag
1
2
3
4
![Page 22: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/22.jpg)
Outline
• Motivation• Implementation• Methodology• Performance Evaluation• Future Directions and Conclusion
![Page 23: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/23.jpg)
Methodology
Baseline• 32KB/4way L1 I$ + 32KB/4way L1 D$ • 4MB/8way L2$ (Shared LLC)• 8 wide OOO pipeline• 1.4GHZ | 32nm | Low power device (Based on ARM A15 specifications)
Workloads• AsimBench• SPEC CPU 2006• Self-written microbenchmarks(functional verification)
![Page 24: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/24.jpg)
Workload – AsimBench*
• Popular Android applications• 11 diverse applications
• 6 most memory-intensive apps selected for performance evaluation– BBench (Web Browser): Load web pages– K9Mail (Email): Load/Show emails– NeteaseNews (News): Check and load news– KingsoftOffice (Document Process): Open doc/xls/ppt file– BaiduMap (Map): Load map information of a specific area– FrozenBubble (Game): Load game
* Yongbing Huang, Zhongbin Zha, Mingyu Chen, Lixin Zhang. AsimBench: A Mobile Benchmark Suite for Architectural Simulators. IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Monterey, CA, March 2014.
![Page 25: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/25.jpg)
Outline
• Motivation• Implementation• Methodology• Performance Evaluation• Future Directions and Conclusion
![Page 26: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/26.jpg)
Performance - AsimBench
Simulation length of 5 Billion instructions after boot in Full System mode
0
10
20
30
40
50
60
70
80
90
100
baidumap bbench frozenbubble k9mail kingsoftoffice netease
Nor
mal
ized
IPC
Reuse Cache (4M+1M)
Conventional Cache (4M + 4M)
Average performance loss of 4.16% (not bad!)
![Page 27: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/27.jpg)
Performance – SPEC CPU2006
(Simulation length of 10B instructions in System Emulation mode)
0
10
20
30
40
50
60
70
80
90
100N
orm
aliz
ed IP
C
Reuse Cache(4M+1M)
Conventional Cache (4M+4M)
Average performance drop of 7.17 %
![Page 28: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/28.jpg)
Configuration Exploration (WIP)
1.101.00
1.050.98
1.49
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
Nor
mal
ized
IPC
Reuse 4M + 2M
Reuse 4M + 2M
![Page 29: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/29.jpg)
Multi-core Performance
0
0.5
1
1.5
2
2.5
3
IPC
Reuse Cache (4M + 1M)
Conventional Cache (4M + 4M)
SPEC benchmarks simulated on 4 cores, running 10 Billion instructions each(Full system + multi-core + AsimBench == Too slow)
~15% Degradation
![Page 30: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/30.jpg)
Area Improvements
Conventional (4M + 4M) Reuse Cache (4M + 1M) Reuse Cache (4M + 2M)
Area (in mm2) 2.37 1.18 1.5
Area savings of 50% for the 4+1 reuse cache, 37% for the 4+2 cache(Data generated using Cacti v6.5)
![Page 31: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/31.jpg)
% Improvement in Power
0
2
4
6
8
10
12
14
% Im
prov
emen
t in
Pow
er
![Page 32: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/32.jpg)
% Improvement in Power~10 % Improvement
0
2
4
6
8
10
12
14
% Im
prov
emen
t in
Pow
er
+ Reduced Leakage power- Increased DRAM power
![Page 33: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/33.jpg)
Live-ness Analysis - AsimBench
• % Live lines increased from 9.54 % to 47.95%
0
20
40
60
80
100
% L
ive
Line
s
Conventional Cache (4M+4M)
Reuse Cache (4M+1M)
![Page 34: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/34.jpg)
Pictures!
• AsimBench apps running on gem5 …
![Page 35: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/35.jpg)
![Page 36: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/36.jpg)
![Page 37: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/37.jpg)
![Page 38: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/38.jpg)
![Page 39: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/39.jpg)
Outline
• Motivation• Implementation• Methodology• Performance Evaluation• Future Directions and Conclusion
![Page 40: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/40.jpg)
Future Directions
• Replacement Policies (NRR + Shadow) for Fully Associative data array
• Explore allotment policies• Better uses for extra tag entries• Longer/Better simulations using simpoints• Multi-core simulations using AsimBench to validate coherence
protocols• Comprehensive energy analysis
![Page 41: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/41.jpg)
Summary
For mobile workloads, reuse cache promises + Significant area reduction (~50%)+ Decent power savings (~10%)- Marginal performance loss (~6%)
![Page 42: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/42.jpg)
Questions?
Thank You!!
![Page 43: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/43.jpg)
BACKUP
![Page 44: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/44.jpg)
Configuration Exploration
0
50
100
150
200
250
1 2 3 4 5
Pow
er in
mW
Increased Power (mW)
![Page 45: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/45.jpg)
L2 => 1 MB CacheAnimate this to cross out 1 MB and say 50% of data area
L2 => 4 MB CacheCross out and say Performance of 16 MB Cache
![Page 46: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/46.jpg)
Results
• Mapping exploration -> IPC , ASIM + SPEC• 4+4 vs (4+1) spec , asimbench• Shadow vs NRR • Boot times• Bigger is better• CACti graph• Energy graph• Dead lines
![Page 47: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/47.jpg)
Baseline
• 32KB/4way L1 I$ + 32KB/4way L1 D$ • 4MB/8way L2$ (Shared LLC)• 8 wide OOO pipeline• 1.4GHZ | 32nm | Low power device
• Based on ARM A15 specifications
![Page 48: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/48.jpg)
Challenges
• ARM aarch32 incompatible with gem5’s Ruby memory model
• Simulation infrastructure for ARM in full system mode
![Page 49: Evaluating the Reuse Cache for mobile processorspages.cs.wisc.edu/~swapnilh/resources/752_presn.pdf · Evaluating the Reuse Cache for mobile processors Lokesh Jindal, Urmish Thakker,](https://reader030.vdocuments.mx/reader030/viewer/2022040615/5f1039967e708231d4480ed6/html5/thumbnails/49.jpg)
Contributions