supported%by%nsfgrant%1438958…synergy.cs.vt.edu/2015-nsf-xps-workshop/reports/... ·...
TRANSCRIPT
Sherief Reda School of Engineering Brown University Providence, RI
1
XPS: EXPL: Symbio-c Power Management for Integrated CPU -‐ GPU Pla?orms
Supported by NSF grant 1438958: August 2014 – August 2017
Mo=va=on
• Power and thermal bo@lenecks are one of the main reasons mul=-‐core processors became mainstream.
• Customiza=on (domain or applica=on specific) à heterogeneous processors.
• Power/thermal management of heterogeneous processors is key to maximize energy efficiency. – Main degrees of freedom: sleep modes, dynamic voltage & frequency
scaling (DVFS), and scheduling
2
x86$Module1,(core0,,core1),
x86$Module2,(core2,,core3),
L2$M1, L2$M2,SIMD1,SIMD2,SIMD3,SIMD4,SIMD5,SIMD6,
UNB
GMC,
DDR3,controller,
PCIe,and,Display,,controllers,
GPU$Aux,
GPU$Aux,
GPU$Aux,
GPU$Aux,
AMD A10-‐5700
Understanding the implica=ons CPU+GPU pla[orms on power management
3
§ Parallel programming languages (e.g., OpenCL) for heterogeneous pla[orms enable execu=on of threads on CPU and/or GPU à scheduling decisions are executed within the applica=on.
§ DVFS and scheduling are =ghtly coupled for CPU+GPU pla[orms in comparison with tradi=onal mul=-‐core processors.
à Need to re-‐think power management for heterogeneous processors.
x86$Module1,(core0,,core1),
x86$Module2,(core2,,core3),
L2$M1, L2$M2,SIMD1,SIMD2,SIMD3,SIMD4,SIMD5,SIMD6,
UNB
GMC,
DDR3,controller,
PCIe,and,Display,,controllers,
GPU
$Aux,
GPU
$Aux,
GPU$Aux,
GPU$Aux,
CFD (CPU-GPU) CFD (CPU)
22.2#
4.9#4.1#
7.5#
3.4#
x86#modules# L25caches# Memory#(UNB+DDR3+GMC)# GPU#(SIMD+Aux)# Others#
3.5#1.7#
4.5#
16.2#
3.2#10.7#
3.1#
5.1#
19.1#
3.7#
5.7#
1.9#3.2#
5.2#
2.7#
1.4 GHz; PTOTAL = 29.1 W; Runtime = 40.7 s
3.0 GHz; PTOTAL = 41.8 W; Runtime = 35.2 s
1.4 GHz; PTOTAL = 18.8 W;
Runtime = 369.5 s
3.0 GHz; PTOTAL = 42.2 W;
Runtime = 187.3 s
Tmax = 53.9 C Tmax = 76.4 C Tmax = 39.5 C Tmax = 88.0 C
40
60
80
Power breakdown
(W)
Thermal maps
(C)
C
AMD A10-‐5700
Project goal
4
Project Goal: Explore a new paradigm for symbio=c power management that exploits the structure and seman=cs of parallel programming languages for integrated heterogeneous CPU-‐GPU pla[orms, in par=cular OpenCL, to deliver improved power management capabili=es.
Intel Core i5 4670K
AMD Kaveri A10-‐7700K
Symbio=c power management
5
OpenCL applica=on
OpenCL applica=on
OpenCL applica=on
power/thermal management (OS)
phase phase phase
CPU DVFS
GPU DVFS
CPU+GPU pla?orm
schedule
performance & power budget
schedule
sensors
Exis=ng power management paradigms
Proposed Symbio=c power management
paradigm
Symbiosis between parallel (OpenCL) applica=on and OS-‐based power management.
power/thermal management (OS)
CPU+GPU pla?orm
applica=on1
applica=on n IBM Power 8
phases DVFS schedule sensors
performance & power budget
Results from direct phase iden=fica=on
6
0 5 10 15 20 25
10
20
30
40
50
60
time (s)
Pow
er (W
)
Symbiotic3.4 GHz1.4 GHzPC−based
• PC is typical performance counter based power management. • Symbio=c is based on direct phase iden=fica=on from OpenCL. • Phase informa=on is used to determine DVFS levels. • 15% on average improvement in power consump=on compared to PC-‐
based methods with the same run=me!
Streamcluster from Rodinia
Work in progress
• Applica=on-‐based models to predict performance and power on CPU & GPU.
• Deriving scheduling decisions based on models and target performance and power budgets.
• Aggrega=on of phases from mul=ple OpenCL applica=ons. • Simultaneous control of DVFS and scheduling • Wrappers/API for communica=ng power management
decisions to applica=ons. 7
OpenCL applica=on
OpenCL applica=on
OpenCL applica=on
power/thermal management (OS)
phase phase phase
CPU DVFS
GPU DVFS
CPU+GPU pla?orm
schedule
performance & power budget
schedule
sensors