application-to-core mapping policies to reduce memory system interference reetuparna das * rachata...
TRANSCRIPT
![Page 1: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/1.jpg)
Application-to-Core Mapping Policies to Reduce Memory System Interference
Reetuparna Das* Rachata Ausavarungnirun$ Onur Mutlu$ Akhilesh Kumar§ Mani Azimi§
*University of Michigan $Carnegie Mellon University
§Intel
![Page 2: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/2.jpg)
Multi-Core to Many-Core
Multi-Core Many-Core
2
![Page 3: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/3.jpg)
Many-Core On-Chip Communication
3
Memory Controller
SharedCache Bank$
$
Light
Heavy
Applications
![Page 4: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/4.jpg)
Task Scheduling Traditional
When to schedule a task? – Temporal
Many-Core When to schedule a task? – Temporal+ Where to schedule a task? – Spatial
Spatial scheduling impacts performance of memory hierarchy Latency and interference in interconnect,
memory, caches4
![Page 5: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/5.jpg)
Problem: Spatial Task Scheduling
Applications Cores
How to map applications to cores?
5
![Page 6: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/6.jpg)
Challenges in Spatial Task Scheduling
Applications Cores
How to reduce destructive interference between applications?
How to reduce communication distance?
6
How to prioritize applications to improve throughput?
![Page 7: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/7.jpg)
7
Application-to-Core Mapping
Clustering
Balancing
Isolation
Radial Mapping
Improve LocalityReduce Interference
Improve Bandwidth Utilization
Reduce Interference
Improve BandwidthUtilization
![Page 8: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/8.jpg)
Step 1 — Clustering
8
Inefficient data mapping to memory and caches
Memory Controller
![Page 9: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/9.jpg)
Step 1 — Clustering
Improved Locality
9
Reduced Interference
Cluster 0 Cluster 2
Cluster 1 Cluster 3
![Page 10: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/10.jpg)
10
Step 1 — Clustering Clustering memory accesses
Locality aware page replacement policy (cluster-CLOCK) When allocating free page, give
preference to pages belonging to the cluster’s memory controllers (MCs)
Look ahead “N” pages beyond the default replacement candidate to find page belonging to cluster’s MC
Clustering cache accesses Private caches automatically enforce
clustering Shared caches can use Dynamic Spill
Receive* mechanism
*Qureshi et al, HPCA 2009
![Page 11: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/11.jpg)
Step 2 — Balancing
Heavy
Light
Applications Cores
11
Too much load in clusters with heavy applications
![Page 12: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/12.jpg)
Step 2 — Balancing
Is this the best we can do? Let’s take a look at application characteristics
Heavy
Light
Applications Cores
12
Better bandwidth utilization
![Page 13: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/13.jpg)
13
Application Types
( c ) PHD Comics
![Page 14: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/14.jpg)
Application Types
Identify and isolate sensitive applications while ensuring load balance
14
Medium
Med Miss RateHigh MLP
GuruThere for cookies
Heavy
High Miss RateHigh MLP
AdversaryBitter rival
Light
Low Miss Rate
Nice GuyNo opinions
Asst. Professor
Sensitive
High Miss RateLow MLP
AdvisorSensitive
Th
esis
Com
mit
tee
Ap
plicati
on
s
( c ) PHD Comics
![Page 15: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/15.jpg)
Step 3 — Isolation
Heavy
Light
Applications Cores
Sensitive
Medium
Isolate sensitive applications to a cluster
15Balance load for remaining applications across clusters
![Page 16: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/16.jpg)
Step 3 — Isolation How to estimate sensitivity?
High Miss— high misses per kilo instruction (MPKI) Low MLP— high relative stall cycles per miss (STPM) Sensitive if MPKI > Threshold and relative STPM is
high
Whether to or not to allocate cluster to sensitive applications?
How to map sensitive applications to their own cluster? Knap-sack algorithm
16
![Page 17: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/17.jpg)
Step 4 — Radial Mapping
Heavy
Light
Applications Cores
Sensitive
Medium
Map applications that benefit most from being close to memory controllers close to these resources
17
![Page 18: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/18.jpg)
18
Step 4 — Radial Mapping What applications benefit most from
being close to the memory controller? High memory bandwidth demand Also affected by network performance Metric => Stall time per thousand
instructions
![Page 19: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/19.jpg)
19
Putting It All Together
Balancing Radial MappingIsolationClustering
Inter-Cluster Mapping
Intra-Cluster Mapping
Improve Locality
Reduce Interference
Improve Shared Resource Utilization
![Page 20: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/20.jpg)
Evaluation Methodology 60-core system
x86 processor model based on Intel Pentium M 2 GHz processor, 128-entry instruction window 32KB private L1 and 256KB per core private L2
caches 4GB DRAM, 160 cycle access latency, 4 on-chip
DRAM controllers CLOCK page replacement algorithm
Detailed Network-on-Chip model 2-stage routers (with speculation and look ahead
routing) Wormhole switching (4 flit data packets) Virtual channel flow control (4 VCs, 4 flit buffer
depth) 8x8 Mesh (128 bit bi-directional channels)
20
![Page 21: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/21.jpg)
21
Configurations Evaluated configurations
BASE—Random core mapping BASE+CLS—Baseline with clustering A2C
Benchmarks Scientific, server, desktop benchmarks (35
applications) 128 multi-programmed workloads 4 categories based on aggregate workload
MPKI MPKI500, MPKI1000, MPKI1500,
MPKI2000
![Page 22: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/22.jpg)
22
System Performance
MPKI500 MPKI1000 MPKI1500 MPKI2000 Avg0.8
0.9
1.0
1.1
1.2
1.3BASE BASE+CLS A2C
Norm
alized
Weig
hte
d
Sp
eed
up
System performance improves by 17%
![Page 23: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/23.jpg)
23
Network Power
MPKI500 MPKI1000 MPKI1500 MPKI2000 Avg0.0
0.2
0.4
0.6
0.8
1.0
1.2BASE BASE+CLS A2C
Norm
alized
NoC
P
ow
er
Average network power consumption reduces by 52%
![Page 24: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/24.jpg)
24
Summary of Other Results A2C can reduce page fault rate
0 1 2 3 4 5 6 7 80
20
40
60
80
100
120
CLOCK cluster-CLOCK
memory footprint of workload (GB)
% A
ccesses w
ith
in
Clu
ste
r
0 1 2 3 4 5 6 7 80.0
0.2
0.4
0.6
0.8
1.0
1.2
memory footprint of workload (GB)
Norm
alized
Pag
e F
au
lts
![Page 25: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/25.jpg)
25
Summary of Other Results A2C can reduce page faults Dynamic A2C also improves system
performance Continuous “Profiling” + “Enforcement”
intervals Retains clustering benefits Migration overheads are minimal
A2C complements application-aware packet prioritization* in NoCs
A2C is effective for a variety of system parameters Number of and placement of memory
controllers Size and organization of last level cache
*Das et al, MICRO 2009
![Page 26: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/26.jpg)
Conclusion Problem: Spatial scheduling for Many-Core
processors Develop fundamental insights for core mapping
policies Solution: Application-to-Core (A2C) mapping
policies
A2C improves system performance, system fairness and network power significantly
26
Clustering Balancing RadialIsolation
![Page 27: Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani](https://reader036.vdocuments.mx/reader036/viewer/2022062318/551c44aa5503467b488b4c76/html5/thumbnails/27.jpg)
Application-to-Core Mapping Policies to Reduce Memory System Interference
Reetuparna Das* Rachata Ausavarungnirun$ Onur Mutlu$ Akhilesh Kumar§ Mani Azimi§
*University of Michigan $Carnegie Mellon University
§Intel