lcu14-410: how to build an energy model for your soc
DESCRIPTION
LCU14-410: How to build an Energy Model for your SoC --------------------------------------------------- Speaker: Morten Rasmussen Date: September 18, 2014 --------------------------------------------------- ★ Session Summary ★ - ARM to provide a quick overview of the current energy model - Introduce the methodology/recipe used to build the energy model - Discuss ways in which the model is used today and intended next steps - Key outcomes: - Describe the - Identify gaps and limitations Summary of EAS workshop (Amit) -Summary of hacking sessions - plan to integrate Qualcomm-ARM-Linaro work to send upstream -Key outcomes: -List of features and responsibilities -Dependencies between upstreaming of features, if any --------------------------------------------------- ★ Resources ★ Zerista: http://lcu14.zerista.com/event/member/137778 Google Event: https://plus.google.com/u/0/events/ck3ti7eurknnsq0a4e9ks5a1sbs Video: https://www.youtube.com/watch?v=JfZt8W3NVgk&list=UUIVqQKxCyQLJS6xvSmfndLA Etherpad: http://pad.linaro.org/p/lcu14-410 --------------------------------------------------- ★ Event Details ★ Linaro Connect USA - #LCU14 September 15-19th, 2014 Hyatt Regency San Francisco Airport --------------------------------------------------- http://www.linaro.org http://connect.linaro.orgTRANSCRIPT
1
How to build an energy model for your SoCLinaro Connect CLU14, Burlingame,CA.
Morten Rasmussen
2
Why do you need an energy model? Most of the Linux kernel is blissfully unaware of SoC power
management features: P-states, clock domains, C-states, power domains, ...
Only largely autonomous subsystems are aware of some of these details (cpufreq, cpuidle, …)
The plan is to change that by coordinating task scheduling, frequency scaling, and idle-state selection to improve power management.
Energy saving techniques must be applied under the right circumstances which vary between SoCs.
The kernel must therefore have a better understanding of power(energy)/performance trade-offs for the particular SoC to make the right decisions.
An energy model can provide that information.
As a bonus, the energy model may also be used by tools to quick energy estimates based on execution traces.
3
Modelling limitations
Model are never accurate, but we only need enough detail to make the right decisions most of the time.
The model will be used by critical code paths in the kernel, so it has to be as simple as possible.
Only considers cpus, no memory or peripherals.
4
A simplified system view
cpu0 cpu1
Shared HW
G G
G
cpu2 cpu3
Shared HW
G G
G
Power
Clock source Clock source
GG GG
G Clock gating
G Power gating
Power domain
5
Px
Energy consumption simplified
time
power
Py
Cz
BusyTransitionIdle
Busy energy Busy energy
Idle energy
Transition
energy
6
Scheduler Topology Hierarchy
0 1 2 3
Disclaimer: This a simplified view of the sched_domain hiearchy.
Struct sched_group
Energy model tables Per-core C-states
Cluster/package C-states
Cluster/package P-states
7
Energy model data P-states:
Compute capacity: Performance score normalize to highest P-state of fastest cpu in the system (1024). Choose benchmark carefully. Preferably use a suite of benchmarks.
Power: Busy power = energy/second. Normalized to any reference, but must be consistent across all cpus.
C-states: Power: Idle power = energy/second. Normalized.
Wake-up energy. Energy consumed during P->C + C->P state transitions. Unit must be consistent with power numbers.
Note: Power numbers should only include power consumption associated
with the group where the tables are attached, i.e. per-core P-state power should only include power consumed by the core itself, shared HW is accounted for in the table belonging to the level above.
8
Energy model data
0 1
power wu (state)
0 0 (WFI)
... ... ...
power wu (state)
10 6 (C1)
... ... ...
C-states
C-statesP-states
capacity power (freq)
358 2967 (350)
... ... ...
1024 4905 (1000)
capacity power (freq)
358 187 (350)
... ... ...
1024 1024 (1000)
P-statesCluster
CPU
9
Energy model algorithm
for_each_domain(cpu, sd) {
sg = sched_group_of(cpu)
energy_before = curr_util(sg) * busy_power(sg)
+ (1-curr_util(sg)) * idle_power(sg)
energy_after = new_util(sg) * busy_power(sg)
+ (1-new_util(sg)) * idle_power(sg)
+ (1-new_util(sg)) * wakeups * wakeup_energy(sg)
energy_diff += energy_before - energy_after
if (energy_before == energy_after)
break;
}
return energy_diff
10
Backups
11
Platform performance/energy data/model in scheduler or user-spaceEnergy-Aware Workshop @ Kernel Summit 2014, Chicago
Morten Rasmussen
12
Sub-topics Techniques for reducing energy consumption vary between
platforms: Race-to-idle
Task packing
P- and C-state constraints (Turbo Mode, package C-states, …)
… but they are not universally all good. Most likely only to a certain extend.
We need to know when to apply each of the techniques for a particular platform.
Proposals: Tunable heuristics for each technique that can controlled by somebody
else (user-space?), basically passing the problems to others.
Provide in-kernel performance/energy model that can estimate the impact of scheduling decisions.
13
Backup/More stuff
14
Model Validation: ARM TC2, sysbench
Correlation (Pearson):
A15 = 0.93
A7 = 0.96
15
Model Validation: ARM TC2, periodic
Correlation (Pearson):
A15 = 0.17
A7 = -0.01
16
Model Validation: ARM TC2, Android audio
Correlation (Pearson):
A15 = 0.03
A7 = 0.48
17
Model Validation: ARM TC2, Android bbench
Correlation (Pearson):
A15 = 0.67
A7 = 0.80
18
Old slides
19
Motivation Energy cost driven task placement (load-balancing)
Focus on the actual goal of the energy-aware scheduling activities:
Saving energy while achieving (near) optimum performance.
Energy benefit of scheduling decision clear when made.
Assuming energy cost estimates are fairly accurate.
Introduce a simple energy model to estimate costs and guide scheduling decisions. Requested by maintainers at the KS workshop.
Gives the right amount of packing and spreading.
May simplify balancing decision logic.
Strong focus on saving energy in load balancing algorithms.
big.LITTLE support comes naturally and almost for free.
This just one part of the energy efficiency work. Several related sessions this week.
20
Energy Load Balancing
The idea (a bit simplified): Let the resulting energy consumption guide all balancing decisions:
if (energy_diff(task, src_cpu, dst_cpu) > 0) {move_task(task, src_cpu, dst_cpu);
} else {/* Try some other task */
} Ideally, we should get the optimum balance if we try all combinations
of tasks and cpus.
In reality it is not that simple. We can't try all combinations, but we can get fairly close for most scenarios.
If the energy model is accurate enough we get packing and spreading implicitly and only when it saves energy
Should work for any system. SMP and big.LITTLE (with a few extensions).
21
Power and Energy
Goal: Save energy, not power. Power
Time
Energy
ecpu=P⋅t , t=instcc
ecpu=P (cc)instcc
ecpu=P (cc)(inst task
cc+
inst idlecc
)
ecpu=etask+eidle
Compute capacity (~ freq * uarch)
= Energy/inst: This is what we try to minimize.
ecpu=Pbusy(cc)inst task
cc+Pidle
inst idlecc
If we have cpuidle support we get:
We have to add an additional leakage energy term to reflect that it is better not wake cpus
unnecessarily.
~ utilization
Tracked load
TimeTime in runnable state
~ utilization*
Work
22
Simple Energy Model cpu_energy = power(cc) * util/cc
+ idle_power * (1-(util/cc))+ leakage_energy
cluster_energy =c_active_power * c_util+ c_idle_power * (1-c_util)
util = Scale invariant cpu utilization (Tracked load).
cc = Current compute capacity (depends on freq and uarch).
power(cc) = Busy power (fully loaded) at current capacity from table.
idle_power = Idle power consumption (~WFI).
leakage_energy = Constant representing the cost of waking the cpu.
c_util = Cluster utilization. Depends on max(util/cc) ratio of its cpus.
c_active_power = Cluster active power.
c_idle_power = Cluster idle power.
23
Compute Capacity and Power
Processor specific table expressing power and compute capacity at each P-state. The sched domain hierarchy is in a good position to hold this type of
information.
Example (entirely made up):
Capacity Power
0.2 0.4
0.4 0.9
0.6 1.5
0.8 2.2
1.0 3.2
Capacity Power
0.4 1.6
0.8 4.4
1.2 9.0
1.6 15.0
2.0 23.0
Little Big
Equal compute capacity
idle 0.1
leakage 0.1
idle 0.3
leakage 0.5
Little Big
active 2.4 6.0
idle 0.0 0.0
cluster
24
energy_diff()
def energy_diff(tload, scpu, dcpu): # Estimate the next compute capacity (P-state) s_new_cc = find_cpu_cap(scpu, cpu_util(scpu)) # energy model cost for task on source cpu s_task_energy = tload/s_new_cc * cpu_cc_power(scpu, s_new_cc) if nr_running(scpu) == 1: s_task_energy += cpu_leakage_energy[cpu_type[scpu]] # Estimate destination cpu cc after adding the task d_new_cc = find_cpu_cc(dcpu, cpu_util(dcpu)+tload) # energy model cost for task on destination cpu d_task_energy = tload/d_new_cc * cpu_cc_power(dcpu, d_new_cc) if nr_running(dcpu) == 0: d_task_energy += cpu_leakage_energy[cpu_type[dcpu]] return s_task_energy - d_task_energy
Balancing two cpus:
Balancing sched domains is slightly more complicated as it involves cluster power as well.
25
Examplecpu rq util cap cc_power leak power
0 {0.2} 0.2 0.2 0.4 0.1 0.5
1 {0.1} 0.1 0.2 0.4 0.1 0.35
2 {} 0.0 0.2 0.4 0.1 0.1
cluster - 1.0 - 2.4 - 2.4
Total 3.35
energy_diff()
= 0.075*
* energy_diff() ignores cluster power and other tasks to keep computations cheap and simple.Better accuracy can be added if necessary.
0.55
saved
cpu rq util cap cc_power leak power
0 {0.2, 0.1} 0.3 0.4 0.9 0.1 0.8
1 {} 0.0 0.4 0.9 0.1 0.1
2 {} 0.0 0.4 0.9 0.1 0.1
cluster - 0.75 - 2.4 - 1.8
Total 2.8
After EA load balance:
26
Is the energy model too simple? It is essential that the energy model is fast and is easy to use for load-
balancing. The scheduler is a critical path and already complex enough.
Python model tests Disclaimer: These numbers have not been validated in any way.
Test configuration: 3+3 big.LITTLE, 1000 random balance scenarios.
Rand/Opt: Random balance energy (starting point) worse than best possible balance energy (brute-force).
EA/Opt: Energy model based balance energy worse than best possible balance energy.
EA == Opt: Scenarios where EA found best possible balance.
Tasks Rand/Opt EA/Opt EA == Opt
2 7.86% 0.09% 72.60%
3 7.79% 0.15% 64.80%
4 9.39% 0.45% 62.00%
5 10.02% 1.15% 51.10%
6 11.44% 2.23% 38.30%
27
What is next?
Early prototype to validate the idea. Initial focus getting energy_diff() working on simple SMP system. Post on LKML very soon.
Open Issues Exposing power/capacity tables to kernel. Essential to make the right
decisions.
Plumbing: Where do the tables come from? DT?
Next steps: Scale invariance: Requirement for the energy model to work.
Fix cpu_power/compute capacity use in scheduler.
Tooling and benchmarks (covered in another session)
Idle integration (covered in another session)
28
Questions?