lcu14-410: how to build an energy model for your soc

28
1 How to build an energy model for your SoC Linaro Connect CLU14, Burlingame,CA. Morten Rasmussen

Upload: linaro

Post on 18-Nov-2014

289 views

Category:

Software


0 download

DESCRIPTION

LCU14-410: How to build an Energy Model for your SoC --------------------------------------------------- Speaker: Morten Rasmussen Date: September 18, 2014 --------------------------------------------------- ★ Session Summary ★ - ARM to provide a quick overview of the current energy model - Introduce the methodology/recipe used to build the energy model - Discuss ways in which the model is used today and intended next steps - Key outcomes: - Describe the - Identify gaps and limitations Summary of EAS workshop (Amit) -Summary of hacking sessions - plan to integrate Qualcomm-ARM-Linaro work to send upstream -Key outcomes: -List of features and responsibilities -Dependencies between upstreaming of features, if any --------------------------------------------------- ★ Resources ★ Zerista: http://lcu14.zerista.com/event/member/137778 Google Event: https://plus.google.com/u/0/events/ck3ti7eurknnsq0a4e9ks5a1sbs Video: https://www.youtube.com/watch?v=JfZt8W3NVgk&list=UUIVqQKxCyQLJS6xvSmfndLA Etherpad: http://pad.linaro.org/p/lcu14-410 --------------------------------------------------- ★ Event Details ★ Linaro Connect USA - #LCU14 September 15-19th, 2014 Hyatt Regency San Francisco Airport --------------------------------------------------- http://www.linaro.org http://connect.linaro.org

TRANSCRIPT

Page 1: LCU14-410: How to build an Energy Model for your SoC

1

How to build an energy model for your SoCLinaro Connect CLU14, Burlingame,CA.

Morten Rasmussen

Page 2: LCU14-410: How to build an Energy Model for your SoC

2

Why do you need an energy model? Most of the Linux kernel is blissfully unaware of SoC power

management features: P-states, clock domains, C-states, power domains, ...

Only largely autonomous subsystems are aware of some of these details (cpufreq, cpuidle, …)

The plan is to change that by coordinating task scheduling, frequency scaling, and idle-state selection to improve power management.

Energy saving techniques must be applied under the right circumstances which vary between SoCs.

The kernel must therefore have a better understanding of power(energy)/performance trade-offs for the particular SoC to make the right decisions.

An energy model can provide that information.

As a bonus, the energy model may also be used by tools to quick energy estimates based on execution traces.

Page 3: LCU14-410: How to build an Energy Model for your SoC

3

Modelling limitations

Model are never accurate, but we only need enough detail to make the right decisions most of the time.

The model will be used by critical code paths in the kernel, so it has to be as simple as possible.

Only considers cpus, no memory or peripherals.

Page 4: LCU14-410: How to build an Energy Model for your SoC

4

A simplified system view

cpu0 cpu1

Shared HW

G G

G

cpu2 cpu3

Shared HW

G G

G

Power

Clock source Clock source

GG GG

G Clock gating

G Power gating

Power domain

Page 5: LCU14-410: How to build an Energy Model for your SoC

5

Px

Energy consumption simplified

time

power

Py

Cz

BusyTransitionIdle

Busy energy Busy energy

Idle energy

Transition

energy

Page 6: LCU14-410: How to build an Energy Model for your SoC

6

Scheduler Topology Hierarchy

0 1 2 3

Disclaimer: This a simplified view of the sched_domain hiearchy.

Struct sched_group

Energy model tables Per-core C-states

Cluster/package C-states

Cluster/package P-states

Page 7: LCU14-410: How to build an Energy Model for your SoC

7

Energy model data P-states:

Compute capacity: Performance score normalize to highest P-state of fastest cpu in the system (1024). Choose benchmark carefully. Preferably use a suite of benchmarks.

Power: Busy power = energy/second. Normalized to any reference, but must be consistent across all cpus.

C-states: Power: Idle power = energy/second. Normalized.

Wake-up energy. Energy consumed during P->C + C->P state transitions. Unit must be consistent with power numbers.

Note: Power numbers should only include power consumption associated

with the group where the tables are attached, i.e. per-core P-state power should only include power consumed by the core itself, shared HW is accounted for in the table belonging to the level above.

Page 8: LCU14-410: How to build an Energy Model for your SoC

8

Energy model data

0 1

power wu (state)

0 0 (WFI)

... ... ...

power wu (state)

10 6 (C1)

... ... ...

C-states

C-statesP-states

capacity power (freq)

358 2967 (350)

... ... ...

1024 4905 (1000)

capacity power (freq)

358 187 (350)

... ... ...

1024 1024 (1000)

P-statesCluster

CPU

Page 9: LCU14-410: How to build an Energy Model for your SoC

9

Energy model algorithm

for_each_domain(cpu, sd) {

sg = sched_group_of(cpu)

energy_before = curr_util(sg) * busy_power(sg)

+ (1-curr_util(sg)) * idle_power(sg)

energy_after = new_util(sg) * busy_power(sg)

+ (1-new_util(sg)) * idle_power(sg)

+ (1-new_util(sg)) * wakeups * wakeup_energy(sg)

energy_diff += energy_before - energy_after

if (energy_before == energy_after)

break;

}

return energy_diff

Page 10: LCU14-410: How to build an Energy Model for your SoC

10

Backups

Page 11: LCU14-410: How to build an Energy Model for your SoC

11

Platform performance/energy data/model in scheduler or user-spaceEnergy-Aware Workshop @ Kernel Summit 2014, Chicago

Morten Rasmussen

Page 12: LCU14-410: How to build an Energy Model for your SoC

12

Sub-topics Techniques for reducing energy consumption vary between

platforms: Race-to-idle

Task packing

P- and C-state constraints (Turbo Mode, package C-states, …)

… but they are not universally all good. Most likely only to a certain extend.

We need to know when to apply each of the techniques for a particular platform.

Proposals: Tunable heuristics for each technique that can controlled by somebody

else (user-space?), basically passing the problems to others.

Provide in-kernel performance/energy model that can estimate the impact of scheduling decisions.

Page 13: LCU14-410: How to build an Energy Model for your SoC

13

Backup/More stuff

Page 14: LCU14-410: How to build an Energy Model for your SoC

14

Model Validation: ARM TC2, sysbench

Correlation (Pearson):

A15 = 0.93

A7 = 0.96

Page 15: LCU14-410: How to build an Energy Model for your SoC

15

Model Validation: ARM TC2, periodic

Correlation (Pearson):

A15 = 0.17

A7 = -0.01

Page 16: LCU14-410: How to build an Energy Model for your SoC

16

Model Validation: ARM TC2, Android audio

Correlation (Pearson):

A15 = 0.03

A7 = 0.48

Page 17: LCU14-410: How to build an Energy Model for your SoC

17

Model Validation: ARM TC2, Android bbench

Correlation (Pearson):

A15 = 0.67

A7 = 0.80

Page 18: LCU14-410: How to build an Energy Model for your SoC

18

Old slides

Page 19: LCU14-410: How to build an Energy Model for your SoC

19

Motivation Energy cost driven task placement (load-balancing)

Focus on the actual goal of the energy-aware scheduling activities:

Saving energy while achieving (near) optimum performance.

Energy benefit of scheduling decision clear when made.

Assuming energy cost estimates are fairly accurate.

Introduce a simple energy model to estimate costs and guide scheduling decisions. Requested by maintainers at the KS workshop.

Gives the right amount of packing and spreading.

May simplify balancing decision logic.

Strong focus on saving energy in load balancing algorithms.

big.LITTLE support comes naturally and almost for free.

This just one part of the energy efficiency work. Several related sessions this week.

Page 20: LCU14-410: How to build an Energy Model for your SoC

20

Energy Load Balancing

The idea (a bit simplified): Let the resulting energy consumption guide all balancing decisions:

if (energy_diff(task, src_cpu, dst_cpu) > 0) {move_task(task, src_cpu, dst_cpu);

} else {/* Try some other task */

} Ideally, we should get the optimum balance if we try all combinations

of tasks and cpus.

In reality it is not that simple. We can't try all combinations, but we can get fairly close for most scenarios.

If the energy model is accurate enough we get packing and spreading implicitly and only when it saves energy

Should work for any system. SMP and big.LITTLE (with a few extensions).

Page 21: LCU14-410: How to build an Energy Model for your SoC

21

Power and Energy

Goal: Save energy, not power. Power

Time

Energy

ecpu=P⋅t , t=instcc

ecpu=P (cc)instcc

ecpu=P (cc)(inst task

cc+

inst idlecc

)

ecpu=etask+eidle

Compute capacity (~ freq * uarch)

= Energy/inst: This is what we try to minimize.

ecpu=Pbusy(cc)inst task

cc+Pidle

inst idlecc

If we have cpuidle support we get:

We have to add an additional leakage energy term to reflect that it is better not wake cpus

unnecessarily.

~ utilization

Tracked load

TimeTime in runnable state

~ utilization*

Work

Page 22: LCU14-410: How to build an Energy Model for your SoC

22

Simple Energy Model cpu_energy = power(cc) * util/cc

+ idle_power * (1-(util/cc))+ leakage_energy

cluster_energy =c_active_power * c_util+ c_idle_power * (1-c_util)

util = Scale invariant cpu utilization (Tracked load).

cc = Current compute capacity (depends on freq and uarch).

power(cc) = Busy power (fully loaded) at current capacity from table.

idle_power = Idle power consumption (~WFI).

leakage_energy = Constant representing the cost of waking the cpu.

c_util = Cluster utilization. Depends on max(util/cc) ratio of its cpus.

c_active_power = Cluster active power.

c_idle_power = Cluster idle power.

Page 23: LCU14-410: How to build an Energy Model for your SoC

23

Compute Capacity and Power

Processor specific table expressing power and compute capacity at each P-state. The sched domain hierarchy is in a good position to hold this type of

information.

Example (entirely made up):

Capacity Power

0.2 0.4

0.4 0.9

0.6 1.5

0.8 2.2

1.0 3.2

Capacity Power

0.4 1.6

0.8 4.4

1.2 9.0

1.6 15.0

2.0 23.0

Little Big

Equal compute capacity

idle 0.1

leakage 0.1

idle 0.3

leakage 0.5

Little Big

active 2.4 6.0

idle 0.0 0.0

cluster

Page 24: LCU14-410: How to build an Energy Model for your SoC

24

energy_diff()

def energy_diff(tload, scpu, dcpu): # Estimate the next compute capacity (P-state) s_new_cc = find_cpu_cap(scpu, cpu_util(scpu)) # energy model cost for task on source cpu s_task_energy = tload/s_new_cc * cpu_cc_power(scpu, s_new_cc) if nr_running(scpu) == 1: s_task_energy += cpu_leakage_energy[cpu_type[scpu]] # Estimate destination cpu cc after adding the task d_new_cc = find_cpu_cc(dcpu, cpu_util(dcpu)+tload) # energy model cost for task on destination cpu d_task_energy = tload/d_new_cc * cpu_cc_power(dcpu, d_new_cc) if nr_running(dcpu) == 0: d_task_energy += cpu_leakage_energy[cpu_type[dcpu]] return s_task_energy - d_task_energy

Balancing two cpus:

Balancing sched domains is slightly more complicated as it involves cluster power as well.

Page 25: LCU14-410: How to build an Energy Model for your SoC

25

Examplecpu rq util cap cc_power leak power

0 {0.2} 0.2 0.2 0.4 0.1 0.5

1 {0.1} 0.1 0.2 0.4 0.1 0.35

2 {} 0.0 0.2 0.4 0.1 0.1

cluster - 1.0 - 2.4 - 2.4

Total 3.35

energy_diff()

= 0.075*

* energy_diff() ignores cluster power and other tasks to keep computations cheap and simple.Better accuracy can be added if necessary.

0.55

saved

cpu rq util cap cc_power leak power

0 {0.2, 0.1} 0.3 0.4 0.9 0.1 0.8

1 {} 0.0 0.4 0.9 0.1 0.1

2 {} 0.0 0.4 0.9 0.1 0.1

cluster - 0.75 - 2.4 - 1.8

Total 2.8

After EA load balance:

Page 26: LCU14-410: How to build an Energy Model for your SoC

26

Is the energy model too simple? It is essential that the energy model is fast and is easy to use for load-

balancing. The scheduler is a critical path and already complex enough.

Python model tests Disclaimer: These numbers have not been validated in any way.

Test configuration: 3+3 big.LITTLE, 1000 random balance scenarios.

Rand/Opt: Random balance energy (starting point) worse than best possible balance energy (brute-force).

EA/Opt: Energy model based balance energy worse than best possible balance energy.

EA == Opt: Scenarios where EA found best possible balance.

Tasks Rand/Opt EA/Opt EA == Opt

2 7.86% 0.09% 72.60%

3 7.79% 0.15% 64.80%

4 9.39% 0.45% 62.00%

5 10.02% 1.15% 51.10%

6 11.44% 2.23% 38.30%

Page 27: LCU14-410: How to build an Energy Model for your SoC

27

What is next?

Early prototype to validate the idea. Initial focus getting energy_diff() working on simple SMP system. Post on LKML very soon.

Open Issues Exposing power/capacity tables to kernel. Essential to make the right

decisions.

Plumbing: Where do the tables come from? DT?

Next steps: Scale invariance: Requirement for the energy model to work.

Fix cpu_power/compute capacity use in scheduler.

Tooling and benchmarks (covered in another session)

Idle integration (covered in another session)

Page 28: LCU14-410: How to build an Energy Model for your SoC

28

Questions?