1
Power, Energy and
Performance
Tajana Simunic Rosing
Department of Computer Science and Engineering
UCSD
2
Motivation for Power Management
Power consumption is a critical issue in system design today
Mobile systems: maximize battery life
High performance systems: minimize operational costs
1.2% of total electrical use in US devoted for powering and cooling data centers
$1 operation => $1 cooling
3
Power and Energy Relationship
dtPE
t
P
E
4
Low Power vs. Low Energy
Minimizing the power consumption is important for the design of the power supply
the design of voltage regulators
the dimensioning of interconnect
short term cooling
Minimizing the energy consumption is important due to restricted availability of energy (mobile systems)
limited battery capacities (only slowly improving)
very high costs of energy (solar panels, in space)
cooling high costs
limited space
dependability
long lifetimes, low temperatures
Intuition
System and components are: Designed to deliver peak performance, but …
Not needing peak performance most of the time
:Working
:Idle
Dynamic Power Management (DPM) Shut down components during idle times
Dynamic Voltage Frequency Scaling (DVFS) Reduce voltage and frequency of components
time
System Level Power Management Policies Manage devices with different power management capabilities
Understand tradeoff between DPM and DVFS
DPM
6
DPM is an effective way of reducing the CPU
energy consumption by shutting down (completely
or partially) when components are not in use.
Idle States
7
http://impact.asu.edu/cse591sp11/Nahelempm.pdf
8
Dynamic Voltage Scaling (DVS)
Power consumption of CMOS
circuits (ignoring leakage):
frequency clock
voltagesupply
ecapacitanc load
activity switching
with2
:
:
:
:
f
V
C
fVCP
dd
L
ddL
) (
voltagethreshhold:
with2
ddt
t
tdd
ddL
VV
V
VV
VCk
Delay for CMOS circuits:10 25
(b) Power-down
Time25
(c) Dynamic
voltage
scaling
(a) No
power-down
DeadlinePower
10 25
DVFS vs. power down
Android Frequency Governors
9
Highest Frequency
Lowest Frequency
Ondemand
Performance
Powersave
Governors
Fre
quen
cy
In-between frequency range
Time More info at:
http://docs.fedoraproject.org/en-US/Fedora/15/html/Power_Management_Guide/index.html
http://doc.opensuse.org/documentation/html/openSUSE/opensuse-tuning/cha.tuning.power.html
Energy Savings with DVFS
Reduction in CPU power
Extra system power
Effectiveness of DVFS
For energy savings
ER > EE
Factors in modern systems affecting this
equation:
Performance delay (tdelay)
Idle CPU power consumption
Power consumption of other devices (PE)
40%
50%
60%
70%
80%
90%
100%
110%
120%
130%
140%
40% 60% 80% 100%
En
fn
mem
combo
burn_loop
DVFS: Mem vs. CPU
40%
50%
60%
70%
80%
90%
100%
40% 60% 80% 100%
En
fn
mem
combo
burn_loop
Static power ratio = 30% Static power ratio = 50%
Successful Low Power S/W Techniques
1. Understand workload variations
2. Devise efficient ways to detect them
3. Utilize the detected workload variations with
available H/W
Relatively easy for real-time jobs
Hard for non real-time jobs
No timing constraints, no periodic execution
Unknown execution time
It is hard to predict the future workload!!
PAST
Looking a fixed window into the past
Assume the next window will be like the
previous one
If the past window was
mostly busy increase speed
mostly idle decrease speed
Example: PAST
timelow utilization low utilization ?
PAST FUTURE
Decreasespeed
Decreasespeed
size window
busy timenUtilizatio
FLAT
Try to smooth speed to a global average
Make the utilization of next window=<const>
Set speed fast enough to complete the
predicted new work being pushed into the
coming window
Example: FLAT
time
Increase/Decrease the speedthe next utilization to be 0.7
<Const>=0.7
?
LONG-SHORT
Look up the last 12 windows
Short-term past : 3 most recent windows
Long-term past : the remaining windows
Workload Prediction
the utilization of next window will be a
weighted average of these 12 windows’
utilizations
Example: LONG-SHORT
time0 1 2 3 4 5 6 7 8 9 10 11 12
utilization = # cycles of busy interval / window size
0 .3 .5 1 1 1 .8 .5 .3 .1 0 0
276.0)3(49
)001(.43.5.8.1115.3.0
max276.0 ffclk
currenttime
AGED-AVERAGE
Employs an exponential-smoothing
method
Workload Prediction
The utilization of next window will be a
weighted average of all previous windows’
utilizations
geometrically reduce the weight
Example: AGED-AVERAGE
time0 1 2 3 4 5 6 7 8 9 10 11 12
utilization = # cycles of busy interval / window size
0 .3 .5 1 1 1 .8 .5 .3 .1 0 0
)3.0(81
8)1.0(
27
40
9
20
3
1average
maxfaveragefclk
currenttime
CYCLE
Workload Prediction
Examine the last 16 windows
Does there exist a cycle of length X?
If so, predict by extending this cycle
Otherwise, use the FLAT algorithm
Example: CYCLE
time0 1 2 3 4 5 6 7 8
utilization = # cycles of busy interval / window size
0 .4 .8 .1 .3 .5 .7 .0
currenttime
Predict : The next utilization will be .3
PATTERN
A generalized version of CYCLE
Workload Prediction
Convert the n-most recent windows’ utilizations
into a pattern in alphabet {A, B, C, D}.
Find the same pattern in the past
Example: PATTERN
time1 2 3 4 8 9 10 11 12
0 .3 .5 1 .1 .35 .6 .9
ABCDDPattern
currenttime
5
1
ABCDPattern
Predict : The next utilization will be D
A B C D
0 0.25 0.5 7.25 1.0
26
Dynamic Power Management Components with several internal states
Corresponding to power and service levels
Abstracted as power state machines
State diagram with:
Power and service annotation on states
Power and delay annotation on edges
Example: SA-1100 RUN: operational
IDLE: a sw routine may stop the CPU when not in use, while monitoring interrupts
SLEEP: Shutdown of on-chip activity
RUN
SLEEPIDLE
400mW
160uW50mW
90us
90us10us
10us160ms
27
Example: Hard Disk Drive
Working: 2.2 W
(spinning + IO)
Sleeping: 0.13 W
(stop spinning)
Idle: 0.95 W
(spinning)
read / write
IO complete
shut down
0.36 J, 0.67 secspin up
4.4J, 1.6 sec
Fujitsu MHF 2043 AT
28
Waking Up Hard Disk
0
0.5
1
1.5
2
2.5
0 2 4 6
pow
er
(W)
.
time (sec)
sleeping working
waking up
Measurements done on Fujitsu MHF 2043 AT hard disk
29
The Applicability of DPM
State transition power (Ptr) and delay (Ttr)
If Ttr = 0, Ptr = 0 the policy is trivial
Stop a component when it is not
needed
If Ttr != 0 or Ptr != 0 (always…)
Shutdown only when idleness is long
enough to amortize the cost
What if T and P fluctuate?
Ptr Ttr
OFF
Ptr Ttr
ON
Break Even Time
Minimum idle time for amortizing the cost of component
shutdown
Tbe
Request
Active
Request
Off Active
Workload
State
Tof f
DPM Applicability
If idle period length < Tbe
Not possible to save energy by turning off
Need accurate estimation of upcoming idle periods: Underestimation: potential energy savings lost
Overestimation: perf delay + possible –ve energy savings
Challenge: Manage energy/performance tradeoff
DPM Policy Goal: Maximize energy savings by utilizing sleep states while minimizing performance delay
Classification of DPM Policies
Timeout Policies
Predictive Policies
Stochastic Policies
Hybrid Policies
Online learning DPM
Timeout Policies
Use elapsed idle time to predict the total
idle period duration
Request
Active
Request
Off Active
Workload
State
To
Tidle
Assume, if Tidle > To
P(Tidle > To + Tbe) ≈ 1
To is referred to as the timeout
Can be fixed or adaptive
Timeout Policies
Advantages
Extremely simple to implement
Safety (in terms of perf delay) can be easily improved
by increasing To
Disadvantages
Waste energy while waiting for To to expire
Heuristic in nature: no guarantee on energy savings
or performance delay
Always incur performance penalty on wakeup: no
mechanism to wake before request arrival
Predictive Policies Description:
Use statistical analysis to predict the length of the idle period
Shut down if predicted idle period is longer than Tbe
Advantages
More energy efficient than timeout policies
Disadvantages
Depend a lot on correlation between past and future events
Tend to be aggressive in shutdown and hence higher
performance latency
Heuristic with no performance guarantees
Stochastic Policy Example:
Time-indexed Semi-Markov Model
Service
Requester
(SR)
Service
Queue (SQ)
Service
Provider
(SP)
Globally optimal solution with performance constraints met
Valid for a stationary class of workloads
Implemented and measured large savings in mobile systems
Simunic et.al, TCAD’01
Hybrid: Online Learning for Power Management
Selects the best performing
expert for managing power
Selected expert manages
power
for the operative period
Evaluates performance of all
the experts
………..EXP 1
Controller
Device
EXP 2 EXP 3 EXP n
EXP y :Dormant Experts EXP y :Operational Expert
Get up to 70% energy savings per device
DPM: A state of the art DPM policy
DVFS: v-f setting
Performance converges to that of the best performing expert with successive idle periods at rate TNO /)(ln
Dhiman et.al, TCAD’09
Summary
Simple power management policies provide great
energy performance tradeoffs
Lower v-f setting offer worse e/p tradeoffs due to
high performance delay
Research topics: Multithreaded workload scheduling and power management
Speed up a core instead of slow down
Fast sleep times
Interaction with the rest of the system
39
Sources and References
Jihong Kim, ES Class, SNU
Peter Marwedel, “Embedded Systems
Design,” 2004.
Frank Vahid, Tony Givargis, “Embedded
System Design,” Wiley, 2002.
Wayne Wolf, “Computers as
Components,” Morgan Kaufmann, 2001.
Nikil Dutt @ UCI
SMALL PROJECT PART 2
40
Motivation
With limited battery power and low number of cores,
DVFS is still primary method of runtime power saving in
smartphones
Linux frequency governors
CPU: ondemand, performance, powersave
GPU: trustzone, msm
Given policies are static, or depends only on respective
utilizations (or are proprietary)
CPU vs GPU frequencies scaled independently
41
Development phones
Qualcomm mobile development platform (MDP)
Snapdragon MSM8660 / MSM8960
Android 4.0.3 / 4.1.2, rooted
Checkout procedure
CSE 2148 (2nd floor, facing south)
Friday 1/16 10-12p, 1-4:30p; Tuesday 1/20 1-1:45p
Bring phone release form (in project 2 directory)
Signature required from both team members
Return at demo day of project part 2 (or final project)
42
Runtime controls & statistics
Power and utilization statistics available through OS
CPU frequencies
CPU utilizations
GPU frequencies
GPU utilizations
Battery voltage
Battery power
Battery capacity
Use these statistics to analyze workloads and design
your DVFS policies
43
Tasks
Observe the applications’ CPU/GPU power and
performance profiles, observe the tradeoffs
SPEC benchmarks for CPU only governor
Interactive apps for CPU/GPU governor
Design two governors:
CPU-only policy that minimizes energy-delay product
of SPEC benchmarks
CPU/GPU policy that optimizes energy efficiency
while maximizing user satisfaction
44
Materials provided
Information on where to find CPU, GPU, battery
statistics
Trepn utility to get power information
C skeleton for reading statistics$ arm-linux-gnueabi-gcc -static -o m.out m.c
$ adb push m.out /data/local
$ adb shell
# ./data/local/m.out
45
Project deliverables Code
Governor for CPU-only that optimizes energy-delay product
DVFS policy for CPU/GPU
Report Governor policy descriptions and justifications
Results for both governors
In-class demo CPU-only governor: energy-delay product for our benchmark
combination (similar to SPEC)
CPU/GPU: demo with interactive apps
Reminders:
Enroll in Piazza http://piazza.com/class#winter2015/cse237a
Email TA if you cannot access the class on Ted.ucsd.edu
46