estimating the worst-case energy consumption of embedded software

47
1 Estimating the Worst- Case Energy Consumption of Embedded Software Ramkumar Jayaseelan Tulika Mitra Xianfeng Li School of Computing National University of Singapore

Upload: von

Post on 09-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Estimating the Worst-Case Energy Consumption of Embedded Software. Ramkumar Jayaseelan Tulika Mitra Xianfeng Li School of Computing National University of Singapore. Motivation. Conventional scheduling techniques give timing guarantees Processor cycles is the critical resource - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Estimating the Worst-Case Energy Consumption of Embedded Software

1

Estimating the Worst-Case Energy Consumption of Embedded Software

Ramkumar Jayaseelan Tulika Mitra Xianfeng Li

School of ComputingNational University of Singapore

Page 2: Estimating the Worst-Case Energy Consumption of Embedded Software

2

Motivation

Conventional scheduling techniques give timing guarantees Processor cycles is the critical resource WCET of the tasks are required input

Battery life is equally important for mobile devices Scheduling technique have to give energy

guarantees Worst-Case Energy Consumption (WCEC) of the

tasks are required input

Page 3: Estimating the Worst-Case Energy Consumption of Embedded Software

3

Remotely Deployed Systems

Available energy unevenly distributed among nodes Spatio-temporal scheduling benefits from WCEC

Local Station

Sensor Network

Page 4: Estimating the Worst-Case Energy Consumption of Embedded Software

4

Energy-Based Guarantees

Scheduling critical and non-critical tasks in a battery-operated system

Non-critical tasks can be run only if energy constraints for critical tasks are satisfied

Worst-case energy estimation is crucial

Page 5: Estimating the Worst-Case Energy Consumption of Embedded Software

5

Reward-Based Scheduling

Energy consumption Voltage Delay (1 / Voltage) Reward-based scheduling attempts to satisfy

constraints on energy and timing Energy guarantee only if worst-case energy

consumption of tasks are known

Page 6: Estimating the Worst-Case Energy Consumption of Embedded Software

6

Outline

Background Relation between WCET and Worst-case

energy consumption Estimation technique: Simplified model Instruction cache and speculation Experimental results Conclusion

Page 7: Estimating the Worst-Case Energy Consumption of Embedded Software

7

Background

Power and energy are often used interchangeably

Power is energy consumed per unit time Energy consumed during program execution

E = P × t Approximation as P is also a function of time

Page 8: Estimating the Worst-Case Energy Consumption of Embedded Software

8

In reality when a program executes

Energy is the area under the curve E = ∫P(t)dt

E=P×T is an approximation

Power

Time

Page 9: Estimating the Worst-Case Energy Consumption of Embedded Software

9

WCEC versus WCET

13000

14000

15000

16000

17000

18000

19000

20000

21000

Program Inputs

En

erg

y(n

an

o J

ou

les)

4500

4600

4700

4800

4900

5000

5100

Execu

tio

n T

ime(c

ycle

s)

Total Energy

Execution Time

Full Input Space Expansion for a 5-element Insertion Sort program

Page 10: Estimating the Worst-Case Energy Consumption of Embedded Software

10

Cannot Estimate WCEC from WCETBenchmark WCET×avg_power

µJ

Observed

µJ

isort 489.92 525.88

fft 12106.49 10260.86

fdct 138.20 105.57

ludcmp 131.76 119.33

matsum 972.03 1154.31

minver 93.61 80.80

bsearch 3.84 3.07

des 724.05 643.75

matmult 178.12 166.88

qsort 54.79 43.73

qurt 23.80 17.65

Possible underestimation using WCEC=WCET × power

Page 11: Estimating the Worst-Case Energy Consumption of Embedded Software

11

WCEC versus WCET

WCEC path need not be the same as the WCET path

WCEC cannot be directly estimated from the WCET value

Page 12: Estimating the Worst-Case Energy Consumption of Embedded Software

12

A closer look at Power

Dynamic Power : Power Consumption due to switching of transistors

Leakage Power: Power consumed independent of switching activity

Dynamic power forms the bulk of power consumption in today’s processors

Page 13: Estimating the Worst-Case Energy Consumption of Embedded Software

13

Dynamic Power

Dynamic Power

P=(1/2) × A × V2 × C × fV is supply voltage

C is the capacitance of the circuit

f is the frequency

A is the activity factor V, C, f are independent of program execution Variation in P is due to the variation in A

Page 14: Estimating the Worst-Case Energy Consumption of Embedded Software

14

Variation in Activity Factor (A) Not all parts of the processor are used in

every cycle e.g., data-cache is used only for loads/stores

Clock gating disables unused components Activity factor (A) varies during the execution

of the program Model variation in A through static analysis

Page 15: Estimating the Worst-Case Energy Consumption of Embedded Software

15

Switch-off Energy

An inactive component cannot be fully switched off A certain portion of the peak energy is consumed

even in idle cycles Switch-off energy is proportional to the

number of idle cycles

Page 16: Estimating the Worst-Case Energy Consumption of Embedded Software

16

Clock Energy and Leakage Energy Clock power: power consumed in clock

distribution network Leakage power: power consumed due to

leakage in transistors Clock energy and leakage energy are directly

proportional to the execution time

Page 17: Estimating the Worst-Case Energy Consumption of Embedded Software

17

Energy Components Summary Dynamic Energy

Switching of transistors during execution Independent of execution time

Switch-off Energy Energy consumed in unused components Depends on idle cycles

Clock and Leakage energy Directly proportional to execution time

Page 18: Estimating the Worst-Case Energy Consumption of Embedded Software

18

WCEC versus WCET

13000

14000

15000

16000

17000

18000

19000

20000

21000

Program Inputs

En

erg

y(n

an

o J

ou

les)

4500

4600

4700

4800

4900

5000

5100

Execu

tio

n T

ime(c

ycle

s)

Total Energy

Execution Time

Full Input Space Expansion for a 5-element Insertion Sort program

Page 19: Estimating the Worst-Case Energy Consumption of Embedded Software

19

Our Analysis: Overview

Operate on the control flow graph Estimate worst-case energy of basic blocks Formulate estimation for whole program as

an integer linear programming (ILP) problem

Page 20: Estimating the Worst-Case Energy Consumption of Embedded Software

20

ILP Formulation

Input: Control flow graph of the program Objective function:

Need to estimate Worst-Case Energy Consumption( WCECB) for each basic block

Worst Case Energy = Worst Case Energy = WCEC WCECB B count countBB

Page 21: Estimating the Worst-Case Energy Consumption of Embedded Software

21

Flow Constraints

E0,1 = B0 = 1

E2,3 + E1,3 = B3 = 1

E0,1 + E2,1 = E1,2 + E1,3 = B1

E1,2 = E2,3 + E2,1= B2

Loop bound: E2,1 <= 100

B0

B1

B2

B3

Inflow = Basic Block Execution Count = OutflowInflow = Basic Block Execution Count = Outflow

Bounds on maximum loop iterationsBounds on maximum loop iterations

Page 22: Estimating the Worst-Case Energy Consumption of Embedded Software

22

Worst-Case Energy of a Basic Block Processor Model Energy Components

Instruction Specific Energy Pipeline Specific Energy

Page 23: Estimating the Worst-Case Energy Consumption of Embedded Software

23

Processor Model

I-1 I-4

I-2 I-3

IBUF

ROB

ALU

MULT

FPU

I+1I

IF

ID

EX

WB

CM

ISSUE

Page 24: Estimating the Worst-Case Energy Consumption of Embedded Software

24

Pipelined Execution of InstructionsADD R1,R2,R3

MUL R4,R5,R6SUB R7,R8,R9

1 2 3 4 5 6 7 8CC

ADD IF ID IS EX WB CM

MUL IF ID IS EX WB CM

SUB IF ID IS EX WB CM

Difficult to statically predict the energy consumption in each cycle

Page 25: Estimating the Worst-Case Energy Consumption of Embedded Software

25

Pipelined Execution of InstructionsADD R1,R2,R3

MUL R4,R5,R6SUB R7,R8,R9

1 2 3 4 5 6 7 8CC

ADD IF ID IS EX WB CM

MUL IF ID IS EX WB

SUB IF ID IS EX

Difficult to statically predict the energy consumption in each cycle

Stall Stall

Page 26: Estimating the Worst-Case Energy Consumption of Embedded Software

26

Our Approach

Determine the maximum energy consumed on a component by component basis

Static analysis to determine the maximum energy consumed by a component in a specified interval

Page 27: Estimating the Worst-Case Energy Consumption of Embedded Software

27

Execution of InstructionIF

ID

EX

WB

CM

ISSUE

Page 28: Estimating the Worst-Case Energy Consumption of Embedded Software

28

Instruction Specific Energy Energy consumed due to the sub-tasks

associated with execution of an instruction e.g., register file access, ALU usage, etc.

Depends on the type of executed instruction No correlation with execution time

Page 29: Estimating the Worst-Case Energy Consumption of Embedded Software

29

Pipeline Specific Energy

During program execution energy is consumed due to Switch-off power (idle cycles) Leakage power (every cycle) Clock network power (every cycle)

Cannot be attributed to any instruction Energy consumed even in idle cycles

Page 30: Estimating the Worst-Case Energy Consumption of Embedded Software

30

Energy Components

Observation: Energy consumed can be separated out as Instruction Specific energy

Energy associated with the execution of a particular instruction

Independent of execution time Pipeline Specific energy

Energy consumed in other components such as clock network, leakage etc.

Related to execution time

Page 31: Estimating the Worst-Case Energy Consumption of Embedded Software

31

Worst-case Energy of a Basic block

dynamicBB : Instruction-Specific Energy for BB

switchoffBB , leakageBB and clockBB are energy consumed in unused components, leakage and clock network during WCETBB

BBBBBBBBclockleakageswitchoffdynamicenergyBB

Page 32: Estimating the Worst-Case Energy Consumption of Embedded Software

32

Instruction Specific Energy

Energy consumed due to switching activity generated by the instructions in BB

Sum of energy consumed by individual instructions in BB

BBinstrinstrdynamicdynamic

BB

Page 33: Estimating the Worst-Case Energy Consumption of Embedded Software

33

Switch-off Energy

Unused units consume 10% of peak energy Switch-off energy for a specific component (C)

Switch-off energy for basic block BB

1.0)())(()(

1.0)()(_)(

CenergyCusesWCETCswitchoff

CenergyCcyclesIdleCswitchoff

BBBBBB

BB

componentsC

CswitchoffswitchoffBBBB

)(

Page 34: Estimating the Worst-Case Energy Consumption of Embedded Software

34

Clock Energy and Leakage Energy Clock Energy

Leakage Energy

BBcycleBBWCETyclockenergyclockenerg

BBcycleBBWCETenergyleakageenergyleakage __

Page 35: Estimating the Worst-Case Energy Consumption of Embedded Software

35

Overlap among basic blocks

B1 B2

BB

B3

B1

B3

Time

t1

t2

t3

t4

t5

WCETBB

Page 36: Estimating the Worst-Case Energy Consumption of Embedded Software

36

Switch-off Energy

Unused units consume 10% of peak energy Switch-off energy for a specific component (C)

Switch-off energy for basic block BB

1.0)())(()(

1.0)()(_)(

CenergyCusesWCETCswitchoff

CenergyCcyclesIdleCswitchoff

BBBBBB

BB

componentsC

BBBB Cswitchoffswitchoff )(

Page 37: Estimating the Worst-Case Energy Consumption of Embedded Software

37

Instruction Cache Modeling

Context based ILP formulation used in WCET analysis [Li et al RTSS 2004]

Basic block divided into memory blocks A context comprises of mapping each of

these memory blocks to hit/miss Estimate the worst-case energy of each

context taking into account main memory access energy

Page 38: Estimating the Worst-Case Energy Consumption of Embedded Software

38

Modeling Branch miss-prediction

BB’

BB

BB’

BX

BB

Time

t1

t2

t3

BX

Page 39: Estimating the Worst-Case Energy Consumption of Embedded Software

39

Objective function

count(c,ω) is the number of times the basic block Bi is executed with path from Bj and the branch is predicted correctly

count(m,ω) is similarly defined where the branch is miss-predicted

In a similar manner energy(c,ω) and energy(m,ω) are defined The ILP problem is solved to generate values for count using

constraints similar to WCET analysis

),(),(

),(),(1 )(

mcountmenergy

ccountcenergyEnergy

ijij

N

i ij iCijij

Page 40: Estimating the Worst-Case Energy Consumption of Embedded Software

40

Results

Platform: Simplescalar toolset Modified WCET analysis tool [Li et al RTSS

2004] to estimate worst-case energy Energy values for processor components

derived from parameterized models in Wattch ILP problem is solved using CPLEX

Page 41: Estimating the Worst-Case Energy Consumption of Embedded Software

41

Results

Compare estimated WCEC against the observed values for eleven benchmarks

Observed values are obtained using Wattch power simulator

Actual inputs producing WCEC is unknown Manually select inputs that might produce WCEC

Page 42: Estimating the Worst-Case Energy Consumption of Embedded Software

42

Styles of Clock Gating

Simple: Peak power is consumed even if there is one access to a specific component

Ideal : Power consumed is proportional to the number of ports accessed

Realistic: Same as ideal but unused components consume switch-off power

Page 43: Estimating the Worst-Case Energy Consumption of Embedded Software

43

Results

Results for ideal clock gating more accurate than simple because of distribution of accesses

Benchmarks

isort

fft

fdct

ludcmp

matsum

minver

bsearch

des

matmult

qsort

qurt

Est(µJ) Obs(µJ) Ratio

468.85 422.76 1.11

9600.99 8586.49 1.12

89.92 83.63 1.08

98.75 92.77 1.06

1012.83 929.94 1.09

63.66 59.61 1.07

2.54 2.40 1.06

546.41 518.22 1.05

149.70 132.08 1.13

34.90 31.16 1.12

13.98 11.91 1.17

Ideal Clock Gating

Est(µJ) Obs(µJ) Ratio

524.95 455.94 1.15

11057.50 9185.39 1.20

99.31 88.79 1.11

115.39 100.32 1.15

1227.37 994.11 1.23

74.91 64.15 1.17

3.51 3.07 1.14

613.16 553.74 1.10

172.39 136.93 1.26

39.50 33.84 1.17

16.36 12.97 1.26

Simple Clock Gating

Page 44: Estimating the Worst-Case Energy Consumption of Embedded Software

44

Results

Results for ideal clock gating more accurate than realistic because of conservative WCET estimation

Benchmarks

isort

fft

fdct

ludcmp

matsum

minver

bsearch

des

matmult

qsort

qurt

Est(µJ) Obs(µJ) Ratio

596.93 525.88 1.14

13631.21 10260.86 1.33

121.65 105.57 1.15

139.75 119.33 1.17

1397.72 1154.31 1.21

90.95 80.80 1.13

3.81 3.07 1.24

715.58 643.75 1.11

212.94 166.88 1.28

49.84 43.73 1.14

21.95 17.65 1.24

Realistic Clock Gating

Est(µJ) Obs(µJ) Ratio

468.85 422.76 1.11

9600.99 8586.49 1.12

89.92 83.63 1.08

98.75 92.77 1.06

1012.83 929.94 1.09

63.66 59.61 1.07

2.54 2.40 1.06

546.41 518.22 1.05

149.70 132.08 1.13

34.90 31.16 1.12

13.98 11.91 1.17

Ideal Clock Gating

Page 45: Estimating the Worst-Case Energy Consumption of Embedded Software

45

Conclusion

Static worst-case energy estimation technique that takes into account pipelining, instruction cache and branch prediction

Future work Validation using commercial processors Explore the possibility of providing thermal

guarantees

Page 46: Estimating the Worst-Case Energy Consumption of Embedded Software

46

Execution of an Add InstructionIF

ID

EX

WB

CM

ISSUE

I-Cache Access

Instruction Decode + Rename Logic

Wakeup + Selection logic

Register File Read + Add unit access

Result Bus

ROB-retire + Register file Update

ADD

ADD

ADD

ADD

ADD

ADD

Page 47: Estimating the Worst-Case Energy Consumption of Embedded Software

47

Instruction Specific Energy

Each Component Accessed once Selection logic maybe accessed multiple times Instruction Specific Energy is

BBinstrinstrBBcycleBB dynamicwcetpowerselectiondynamic _