Transcript
Page 1: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

Profile-based Dynamic Voltage Scheduling with Program Checkpoints

The COPPER Team:

Ana Azevedo, Ilya Issenin, Radu Cornea, Rajesh Gupta, Nikil Dutt, Alex Nicolau, Alex Veidenbaum

Page 2: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

2Paper 327 02CCECS, UC Irvine

The COPPER Context

Compiler-controlled Power-Performance Management

• Develop efficient architectural support and compiler techniques for power management

• continuously -- as an application runs

• targeted for high performance/VLIW machines

• Coordinated management of multiple techniques

• reduction in power with little or no loss of performance.

• Develop techniques for dynamic compilation to actively trade off performance and power consumption

• Develop a retargetable, ADL-based, power-aware system simulation capability.

Page 3: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

3Paper 327 02CCECS, UC Irvine

Approach• Compiler Strategies for Power Management

• Compiler-directed architectural “configuration”

–generate embedded “configuration code”

–code “adapts” to new architectural organization at runtime

• JIT vs multi-version compilation techniques

• dynamic, on-demand optimization

• Code annotation for dynamic compilation

–trade-off compilation overhead for quality of generated code

• Power-use Estimation for Compiler Control

–static analysis to select “optimal” configuration

–profile-based selection techniques

–static or dynamic prediction methods

Page 4: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

4Paper 327 02CCECS, UC Irvine

Power/Performance “Knobs”

Memory hierarchy

Instruction issue logic & issue width for VLIW m/c

Dynamic Register File Reconfiguration

Frequency and Voltage scaling

Page 5: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

5Paper 327 02CCECS, UC Irvine

Timing Constraints• We consider timing constraints as bounds on

operation intervals

• upper and lower bounds

• (determination of optimum interval separation possible statically)

• Time constraints specified via checkpoints

• User-defined checkpoints are inserted in the source code and time constraints between checkpoints are defined.

• The problem addressed here:

• Given a profile of power availability and a constraints on specified operation intervals minimize total processor energy consumption while meeting timing and power profile constraints.

Page 6: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

6Paper 327 02CCECS, UC Irvine

Constrained Dynamic F/V Scaling• Power-performance profiling compiler

• Estimates max energy/cycle ratio and cycle count between checkpoints

• Compiler-inserted (frequency adjustment points) and user-inserted checkpoints (time constraints)

• Run-time scheduler

• Calculates run-time freq limit based on available power and energy profile between curr. chp. and all possible next chps.

• Calculates optimal target freq based on both time constraints and run-time freq limit between curr. chp. and all possible next chps.

• Final target freq is selected so that the code runs as slow as possible within the imposed time constraints.

Page 7: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

7Paper 327 02CCECS, UC Irvine

Program CheckpointsProgram Checkpoints are generated at compile time and indicate places in the code where the processor speed/voltage should be re-calculated; checkpoints also carry user-defined time constraints

foo(){

read(i);

if (i > 5) {

i = i - calc_new_i(i);

} else

a++;

}

i = 36;

for (j = 0; j < i, j++) {

k = k*sin(j/100 + k/10);

}

}

calc_new_i(int I){

for (k = 0; k < limit, k++){

i += new_i[k];

show_value(i);

}

}

(a) Original code.

CDBCheckpoint Min Time Max TimeTransition (ms) (ms)

1-2 10 302-3 20 203-3 50 2003-4 200 200

(c) Checkpoint Database (CDB).

foo(){

read(i);

CHECKPOINT(1);

if (i > 5) do {

i = i - calc_new_i(i);

} else {

a++;

}

i = 36;

k = i + a;

CHECKPOINT(2);

for (j = 0; j < i, j++) {

CHECKPOINT (3);

k = k*sin(j/100 + k/10);

}

CHECKPOINT(4);

}

(b) Transformed foo code with checkpoints 1, 2, 3 and 4 carrying time constraints.

Constraint 1

deadline 2

deadline 1

Constraint 2

Task 1

Page 8: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

8Paper 327 02CCECS, UC Irvine

Basic Approach

• Compiling phase: Checkpoint profiling• Estimate max energy/cycle ratio and cycle count

between checkpoints• set time constraints

–e.g., devices response time, WCET

• Scheduling phase• At program checkpoints and power profile change

points, dynamically adjust frequency and voltage

Page 9: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

9Paper 327 02CCECS, UC Irvine

Example

0

200

400

600

800

0 100 200 300

Tim e

Fre

qu

ency

Checkpoint 3

Checkpoint 4

Frequencylim itOptim alfrequency

Calculating optimal frequencyFrequency limit (determined by available power profile) is lower than potential optimal frequency

Page 10: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

10Paper 327 02CCECS, UC Irvine

Exploiting Runtime Slack

CHECKPOINT(0);

read(i);

CHECKPOINT(1);

if (i > 5) do {

CHECKPOINT(2);

i = i - calc_new_i(i);

} else {

CHECKPOINT(3);

a++;

}

CHECKPOINT(4);

i = 36;

k = i + a;

CHECKPOINT(5);

for (j = 0; j < i, j++) {

CHECKPOINT (6);

k = k*sin(j/100 + k/10);

CHECKPOINT (7);

}

CHECKPOINT(8);(a) Transformed code with checkpoints carrying time constraints (0, 1, 3, 8, 9 and 10) and extra checkpoints for exploiting run-time slack.

(c) Checkpoint Database (CDB).

Checkpoint Database (CDB)

Checkpoint Max TimeTransition (ms)

0-3 501-8 3009-10 10

(b) Hierarchical control flow graph.

0

1if

2func 3

4

5loop

6

78

9

end end

10

calc_new_i(i){

CHECKPOINT(9);

for (k = 0; k < limit, k++){

i += new_i[k];

show_value(i);

}

CHECKPOINT(10);

}

Page 11: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

11Paper 327 02CCECS, UC Irvine

Slack-based Checkpointing • Compiling phase

• Build a hierarchical CFG (HCFG) program representation

• Insert checkpoints at function calls, loops, if-statements

• Checkpoint profiling and removal

• Estimate max energy/cycle ratio and cycle count between checkpoints, maximum iteration number for loops

• Prune the HCFG removing unnecessary checkpoints

– Nodes with low maximum execution cycle count

– Nodes with small variation in the execution cycle count

• Annotate the HCFG with the profiling information

• Scheduling

• Determine active checkpoint transitions from precomputed information

• Estimate the number of cycles from current node to the ends of active time constraints. This is minimum of the statically computed longest path to the time constraint and execution delay update on the profiling information (if available)

Page 12: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

12Paper 327 02CCECS, UC Irvine

Our Approach: Slack Algorithm• Algorithm at work Current checkpoint

(with I iterations left)1

2 3

4

5

6

7

9

10X1

cycles8

X2

cycles

Y1 cycles

Calculating estimated cycles C

Method1:

C(7-10) = Y1

C(7-9) = X2+Y1+I*cycle_per_iter

Method2:

C(7-10) = cycle_per_iter – elapsed(6)C(7-9) = X1 – elapsed(5)

CDB

Time Max TimeConstraints

1-9 T16-10 T2

Checkpoint Database (CDB)

Page 13: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

13Paper 327 02CCECS, UC Irvine

COPPER framework• MIPS R10K like processor, Wattch power models

Cycle-LevelPerformance

Simulator

ParameterizablePower Models

HardwareConfig

CodeVersions Performance

Estimate

PowerEstimate

Cycle-by-CycleHardware Access

Counts

Power Simulator

PowerScheduler

PowerProfiler

CompilerApplication

ChosenCode Version

AvailablePower

Time Constraints

Page 14: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

14Paper 327 02CCECS, UC Irvine

Results• Power consumption highlighting time constraints

for parafffins (f=600 MHz)

0

1

2

3

4

5

6

7

0 200 400 600 800 1000 1200

[Pow

er]

[Time, microseconds]

4747474747474747474747474

74

74

74

74

74

7

600 MHzPower

Page 15: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

15Paper 327 02CCECS, UC Irvine

Results: Slack-based DVS for paraffins

• Calculated target frequencies satisfying time and power constraints using Formula 1 for paraffins

• Time constraint on checkpoint transition 4-7

150

200

250

300

350

400

450

500

550

600

0 200 400 600 800 1000 1200 1400 1600 1800

[Fre

quen

cy, M

Hz]

[Time, microseconds]

47474747474747474747474747

474 7

4 74

74 7

47474747474747474747474747

474 7

4 74

74 7

Frequency LimitFrequency

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0 500 1000 1500 2000 2500 3000

[Pow

er]

[Time, microseconds]

474747474747474747474747474

74 7

4 74

74 7

Power ConsumptionAvailable Power Profile

52% energy savingsFrequency Power

Page 16: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

16Paper 327 02CCECS, UC Irvine

Results• Calculated target frequencies satisfying time and

power constraints using Formula 2 for paraffins

• Slack-based DVS for paraffins

200

250

300

350

400

450

500

550

600

0 500 1000 1500 2000 2500 3000

[Fre

quen

cy, M

Hz]

[Time, microseconds]

47474747474747474747474747

474 7

4 74

74 7

47474747474747474747474747

474 7

4 74

74 7

Frequency LimitFrequency

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0 500 1000 1500 2000 2500 3000

[Pow

er]

[Time, microseconds]

47474747474747474747474747

474 7

4 74

74 7

Power ConsumptionAvailable Power Profile

82% energy savingsFrequency Power

Page 17: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

17Paper 327 02CCECS, UC Irvine

Summary• While average power reduction is important, effective

control of dynamic power consumption is essential

• especially for software management of power and performance

• The hard problem here is

• identification of effective architectural mechanisms and their deterministic control through software

• COPPER approach

• use architectural features common to a range of processor architectures

–memory hierarchy, register files, instruction issue.

• Coordinate with technology and OS strategies

–frequency and voltage scaling.

Page 18: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

18Paper 327 02CCECS, UC Irvine

Our Approach : Base Algorithm • Scheduling phase

• Create list of events• Calculate frequency limit

• Calculate optimal frequency–Case 1: One future checkpoint transition–Case 2: Frequency limit lower than potential

optimal frequency–Case 3: Several possible future checkpoints

0

2

4

6

8

10

0 5 10 15

Tim e

Po

we

r

AvailablePower ProfileCheckpoint 5

Checkpoint 6

Checkpoint 7 0

200

400

600

800

0 5 10 15

Tim e

Fre

qu

ency

Checkpoint 5

Checkpoint 6

Checkpoint 7

Frequency lim it

Page 19: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

19Paper 327 02CCECS, UC Irvine

Our Approach : Base Algorithm• Calculate optimal frequency (cont’d)

0

200

400

600

800

0 100 200 300

Tim e

Fre

qu

ency

Checkpoint 3

Checkpoint 4

Frequencylim itOptim alfrequency

0

200

400

600

800

0 100 200 300

Tim e

Fre

qu

ency

Checkpoint 3

Checkpoint 4

Frequencylim itOptim alfrequency

a) Calculating optimal frequency, Case 1.One future checkpoint transition

(b) Calculating optimal frequency, Case 2.Frequency limit lower than potential optimal frequency

0

200

400

600

800

0 5 10 15 20 25 30Time

Fre

qu

ency

Checkpoint 1

Checkpoint 2

Checkpoint 3

Frequency limit

Optimal frequency ch1 - ch2

Optimal frequency ch1 - ch3

Final frequency values

(c) Calculating optimal frequency, Case 3.Several possible future checkpoints

Page 20: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

20Paper 327 02CCECS, UC Irvine

Baseline Architecture• A MIPS R10K like processor

• 4-wide issue, out-of-order (OOO) processor

–5-stage pipeline: fetch, dispatch, issue, writeback, commit

• 32b integers, 64b f.p. numbers

• register files: 32 integer and 32 FP registers

• 32K L1 instruction cache, 32K L1 data cache

–32B L1 line size,

• 512K L2 unified cache

–64B L2 line size

• 2 int ALUs, 1 FP adder, 1 FP multiplier

• 512-entry BTB, 2K entry branch predictor

Page 21: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

21Paper 327 02CCECS, UC Irvine

Power Management by F/V Scaling

• 4 available versions (600MHz,2.2V-500MHz,2.0V-400MHz,1.8V-300MHz,1.6V)

Page 22: Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

22Paper 327 02CCECS, UC Irvine

Related Work• DVS Theoretical Studies and Simulations

• [Weiser94], [Govil95], [Yassura98], [Lee98], [Pering98], [Mosse00],

• Practical DVS Implementations

• Transmeta Crusoe, Intel XScale, lpARM

• Interval-based and inter-task DVS techniques under OS control

• [Weiser94], [Govil95], [Yao95], [Ishihara98], [Hong99], [Manzak00], [Sinha01], [Poulwelse01]

• Intra-task DVS techniques under compiler control

• [Shin01], [Hsu01], [Krshna00], [Lee00]


Top Related