energy conservation and adaptation in mobile and embedded systems
DESCRIPTION
Energy Conservation and Adaptation in Mobile and Embedded Systems. Faculty: Kang G. Shin Grad students: Babu Pillai Hai Huang Sabbatical leave from Real-Time Computing Laboratory University of Michigan http://www.eecs.umich.edu/~kgshin. - PowerPoint PPT PresentationTRANSCRIPT
Energy Conservation and Adaptationin
Mobile and Embedded Systems
Faculty: Kang G. ShinGrad students: Babu Pillai Hai HuangSabbatical leave fromReal-Time Computing LaboratoryUniversity of Michiganhttp://www.eecs.umich.edu/~kgshin
My Current Thrust Areas Wired and Wireless QoS Networking
Internet QoS with DiffServ, MPLS, overlay networks
Dependable real-time protocols Wireless QoS and security protocols
Embedded Systems Model-based integration of application SW Modeling and integration of OS and network
services Energy-aware real-time OS and applications Embedded systems networks
Internet Servers and Adaptware Workload differentiation and protection Overload detection and avoidance DDoS attacks
Outline
Motivation Energy-efficient real-time OSs Real-time dynamic voltage scaling Energy-aware QoS Conclusions
Motivation
Handheld and mobile comp and comm devices are everywhere!
Increasingly complex SW and faster HW demand more energy
Rapid increases in HW complexity, speed, and power consumption, but battery technology is not keeping up
Need to conserve energy, improve computational efficiency via OS on power-constrained systems
Real-Time & Energy Constraints
Many power-constrained embedded or mobile systems have real-time tasks Time-critical computations, typically periodic Need to provide guarantees for meeting
deadlines Available stored energy fundamentally
limits system runtime Need to use energy efficiently, allocate to
the most critical or desirable computations, while meeting real-time requirements
Real-Time Task Characteristics
Typically well-defined task set Canonical real-time task, Ti:
Is periodic, with period Pi
Has worst-case execution time (WCET), Ci
Has relative deadline, di typically equal to Pi
Periodic model can accommodate aperiodic and sporadic tasks
Schedulability of RT systems well-studied
Energy-Efficient RTOS
Reduce overhead services => lower computational overhead =>lower CPU power consumption Optimized IPC for periodic RT tasks Combined Static Dynamic (CSD) scheduling Protocol stack layer-bypassing Eliminate naming services Reported at ACM SOSP’99, NOSSDAV’00, USENIX’02
Exploit HW mechanisms, e.g., voltage scaling of CPU, power management of memory subsystem
Memory Power Management
Goal: reduce power dissipation for memory access
Main memory consists of multiple devices, each with independently-controlled power states
Switch devices not needed for current task to low-power states
Modify page allocation to reduce number of devices in use by each task
59-94% memory power reduction with RDRAM Will present at USENIX’03
RT-DVS
Goal: reduce per-cycle CPU energy costs Reducing frequency permits lower voltage Lower voltage on CPU to obtain V2 savings
per cycle Frequency change affects execution time,
disrupts RT scheduling Have developed energy-conserving algs
for DVS that preserve RT guarantees (ACM SOSP’01)
CPU Operating Voltage
Predominant device technology is CMOS
E V 2
Maximum gate delays inversely related to voltage
Can reduce unit computation energy by
reducing frequency and voltage
Dynamic Voltage Scaling (DVS)
[Weiser+94] busy system increase frequency
idle system reduce frequency Many algorithms for non-RT applications Software adjustable PLL, voltage regulator
often available, but intended for other thingsXScale, SpeedStep, PowerNow!, Crusoe
Our Approach
Development of real-time DVS (RT-DVS)DVS algorithms that maintain RT guaranteesSimple enough for online schedulingWork closely with existing RT sched algs.
In-depth simulation Implementation in a real, working system
Static Voltage Scaling
Earliest-Deadline-First (EDF) Worst-case utilization: ui = Ci / Pi
Frequency selection: ui f / fmax
1.000.750.500.250.00
u3= 0.4
u2= 0.2u1= 0.3
f = 0.9*fmax
util
izat
ion
Simulation
Parameterized simulation Synthetic random real-time task sets
uniform distribution of short (1-10), medium (10-100), long (100-1000ms) periods
Vary computation time distributions Vary hardware specifications Compare different scheduling
algorithms and theoretical bounds.
Simulation Setup
Input: task set, system parameters, sched alg Output: energy consumption of each alg System parameters:
list of freqs and voltagesactual fraction of WCET for each task idle level (idle vs. normal op energy
consumed) Theoretical lower bounds:
task execution thruput only w/o timing issues
Simulation Results
8 tasks in a task set 100 random task
sets Workload = total
worst-case utilization 3 freq./volt. settings:
5V, 1.0*fmax
4V, 0.75*fmax
3V, 0.5*fmax
0
0.2
0.4
0.6
0.8
1
0.1 0.3 0.5 0.7 0.9
Workload
Nor
mal
ized
En
ergy
Static
Dynamic Scaling
If each task uses less than WCET, we can use lower frequency during its invocation interval
1.000.750.500.250.00 kP3(k-1)P3
time
util
izat
ion
u3= 0.4
u2= 0.2u1= 0.3
f = 0.9*fmax
u3= 0.2 f = 0.7*fmax
Cycle-Conserving EDF
fdesired = fmax* ui
at Ti release set ui = Ci / Pi
at Ti finish set ui = actual execution time / Pi
D1 D2 D3 time
1.000.750.500.250.00
f / f
max
Task 1Task 2Task 3
u1 = 0.3u2 = 0.2u3 = 0.4
C1/3 C2/2 C3/2 C1/3
U=0.9U=0.7
U=0.6
U=0.8U=0.6
U=0.7
U=0.5
U=0.7
Simulation Results
8 tasks in a set 70% WCET each
invocation 3 freq./volt.
settings: 5V, 1.0*fmax
4V, 0.75*fmax
3V, 0.5*fmax
0
0.2
0.4
0.6
0.8
1
0.1 0.3 0.5 0.7 0.9
Workload
Nor
mal
ized
En
ergy
StaticccEDF
Proactive Techniques
Tasks typically use much less than WCETs
Proactively reduce frequencies Look ahead to meet future deadlines Consider all tasks together
Look-Ahead EDF
Minimize current frequency
Trade current savings for potential future loss
Plan to defer work beyond next immediate deadline
Ensure future deadlines with “reservation”
timeD1 D2 D3
D1 D2 D3 time
Task 1Task 2Task 3
D1D1 D2 D3 time
Simulation Results
8 tasks in a set 70% WCET each
invocation 3 freq./volt.
settings: 5V, 1.0*fmax
4V, 0.75*fmax
3V, 0.5*fmax
0
0.2
0.4
0.6
0.8
1
0.1 0.3 0.5 0.7 0.9
Workload
Nor
mal
ized
En
ergy
StaticccEDFlaEDF
More Simulation Results
Number of tasks is not important Voltage and frequency settings
greatly affect performance Look-ahead does not always perform
best Algorithms can perform close to
theoretical lower bound
Implementation
PC notebook computer AMD K6-2+ processor, 550 MHz PowerNow! Technology:
frequency can be changed 200-550MHz in 50MHz increments
voltage selection 1.4V or 2.0V empirical mapping between voltage and frequency switching overheads: 0.4ms (voltage), 41 micros
(freq) Processing 20W, screen backlight 7W
Software Architecture
Real-time extension to Linux 2.2 Modular design, plug-in schedulers
Hardware
Linux 2.2 Kernel
PowerNowmodule
Real-Timemodule
Schedulermodule
RT Task Set
Measurements
Use oscilloscope with current probe Synthetic RT workload w/ backlighting off Similar to simulation results (20-40%
savings)
0
5
10
15
20
25
30
0.2 0.4 0.6 0.8 0.95
Workload
Mea
sure
d W
atts
EDF
ccEDF
laEDF
0
0.5
1
1.5
2
2.5
3
3.5
4
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Workload
Sim
ula
ted
Wat
ts
EDF
ccEDF
laEDF
Power Measurement
Interesting Observations
The very first invocation of a task may overrun its WCET due to ‘cold’ processor OS states
Dynamic addition of a task may cause transient deadline misses, especially with more aggressive schemes Insert the task immediately, but release it after
completion of current invocations of all existing tasks
Related work
Most are loosely-coupled with OS and based on avg processor utilization
Many non-RT DVS papers (esp., UCB) Offline WCET analysis + online heuristic
[Krishna+00], [Swaminathan+00] Computation time probability heuristics
[Gruian01] Compiler-based, application-level DVS
[Mosse+00]
RT-DVS Discussion Designed and evaluated 5 DVS algorithms for
real-time systems (ACM SOSP’01) Provide deadline guarantees while scaling
frequency and voltage Simple enough to be used as online schedulers
Excellent energy savings, comparable to non-RT DVS
Implemented in real system on top of Linux Code available at:
http://kabru.eecs.umich.edu/rtos/
But, we need a larger energy framework
Limits of Low-level Techniques
DVS & processor halt conserve energy only when extra capacity available
No general guidelines on how to make best use of limited energy
Cannot provide more energy and runtime to more critical or valuable tasks
Need to adapt app workload to maximize system gains or utility of computation
Example
A remote surveillance device transmits compressed video and audio
Solar-powered, but must run overnight 3 real-time tasks:
Radio transmitter (critical): constant bit rate Video codec (degradable):
-high quality (30 fps, 640x480) MPEG4 -low quality (10 fps, 160x120)
MPEG1 Audio codec (noncritical): mp3, either ON or OFF
Example, cont’d
Adapt task set based on power consumption of tasks, available energy, hours until daylight, and relative value of the tasks, e.g., During daytime or high battery levels: r
adio, video at high quality, audio ON Low battery at night: radio, video low quality, audio ON Energy is critically low: radio, video low quality, audio
OFF
Dynamic adaptation needed in general, as battery levels and time until daylight are variable
EQoS
Need to maximize benefits gained from energy spent, but HOW?
=> Energy-aware Quality of Service (EQoS): Vary per-task QoS, which directly affects the
task’s utility and energy consumption Select a set of task QoS levels to maximize
total system utility over given runtime Cast selection into tractable, optimization
problem
EQoS Design
EQoS design goals: Leverage low-level halt, DVS
mechanisms Meet system runtime goal Maximize benefits of task
execution Need methods of changing task QoS levels and specifying
benefits and energy requirements
How to change per-task QoS?
Adopt RT fault-tolerance techniques: Period extension Imprecise computation Apply different algorithms or CODECs Omission
Degraded service requires less energy For EQoS, need to specify set of QoS
levels and their average required energy for each task
Utility
Abstract notion of value from executing tasks
Need to specify utility for each QoS level of a task: Increasing Rewards for Increasing Service (IRIS) Performance Index (PI) for control applications Perceived quality metrics for multimedia
Actual specification flexible to types of applications and systems designed
EQoS Algorithms
Given: tasks with QoS levels defined, energy required, and utility
gained for each level remaining system energy desired runtime, or known time until recharge
Select a QoS level for each task to: achieve desired runtime maximize total utility
This can be formulated as a MCKP Each task as a category, and the set of its QoS levels as
items in the category Knapsack size = power budget Item values and weights = utility rates and power
consumption
MCKP vs. EQoS Problem
ValueWeight
ValueWeight
ValueWeight
Weight Limit
Item 1
Item 2
Item n
...
Knapsack ProblemMC
...
...
...
ValueWeight
ValueWeight
ValueWeight
Category
Category
Category
Energy Energy
Energy Energy
EnergyEnergy
Energy
Runtime Constraint
Task
Task
Task
EQoS
Optimal Algorithms
NP-hard: all KP can be expressed as MCKP
Exponential Search - O(mn) Branch-and-Bound (BB)
Need fast bound computationCan use LMCKP as upper boundMay still require exponential time
...
Optimal Algorithms, cont’d
Dynamic Programming (DP)Pseudo-polynomial time, O(mnk) Partial solutions for 1, 2, …, n tasks for all
possible power budgets (energy/runtime)
Heuristics
Linear:Use LMCKP solution, as with BB boundDrop fractional part
Greedy:Start with same approach as LMCKPContinue selecting smaller upgrades
O(nm) overhead, without upgrades sorting
Simulation
Permits exploring a large space multi-dimensional task set space
Simulate various hardware configurations, RT scheduling, DVS mechanisms Static RM, Static EDF, ccRM, ccEDF, laEDF
Generated 1000 random task sets, each with 10 tasks, and each of which has up to 5 QoS levels QoS degradation models period extension,
imprecise computation, algorithmic change
EQoS algorithms achieve desired runtime DVS conserves extra energy, throws off
estimated runtime
Simulation Results
0
0.5
1
1.5
2
0.00E+00 2.00E+09 4.00E+09 6.00E+09 8.00E+09
Initial Energy
No
rma
lize
d R
un
tim
e
Greedy Linear Max Min Opt
0
0.2
0.4
0.6
0.8
1
1.2
min linear greedy opti max
No
rmal
ized
Util
ity
Simulation Results - DVS
DVS increases energy efficiency Throws off adaptation -- extends
runtime
3 volt/freq:5V, 1.0*fmax4V, .75*fmax3V, .35*fmax
0
0.5
1
1.5
2
2.5
3
3.5
0.00E+00 2.00E+09 4.00E+09 6.00E+09 8.00E+09
Initial Energy
No
rma
lize
d R
un
tim
e
Greedy Linear Max Min Opt
DVS compensation achieves desired runtime with higher utility
Simulation Results, cont’d
Adaptation w/ DVS CompensationUtility comparison between DVS compensation and w/o compensation
0
0.5
1
1.5
2
0.00E+00 2.00E+09 4.00E+09 6.00E+09 8.00E+09
Initial Energy
No
rma
lize
d R
un
tim
e
Greedy Linear Max Min Opt
0
0.5
1
1.5
2
2.5
3
min linear greedy opti max
Nor
mal
ized
Util
ity
DVS
DVS-comp
Implementation
Implemented on Linux 2.2 Periodic real-time support PowerNow! driver Real-time scheduler modules EQoS adaptation module Battery monitoring module
Currently supports Athlon, Duron, K6-2 processors that implement AMD’s PowerNow! Technology
Experiments
Measurements on a Compaq Presario 1200Z
Implement RT version of Lame MP3 encoderuse quality parameter to vary QoSmultiple concurrent instances
Results follow trend observed in simulations
Conclusions
RT-DVS provides low-level CPU voltage control Maintains timing guarantees for RT tasks Significant energy savings, comparable to non-RT
DVS EQoS provides task adaptation in energy-
constrained real-time systems Provides guidelines to best utilize available energy
among tasks Frame energy adaptation as a tractable problem Heuristics work nearly as well as optimal algorithms
in practice