profile-based dynamic voltage scheduling with program checkpoints

Download Profile-based Dynamic Voltage Scheduling  with Program Checkpoints

Post on 27-Jan-2016




0 download

Embed Size (px)


Profile-based Dynamic Voltage Scheduling with Program Checkpoints. The COPPER Team: Ana Azevedo, Ilya Issenin, Radu Cornea, Rajesh Gupta , Nikil Dutt, Alex Nicolau, Alex Veidenbaum. The COPPER Context. Compiler-controlled Power-Performance Management - PowerPoint PPT Presentation


  • Profile-based Dynamic Voltage Scheduling with Program CheckpointsThe COPPER Team:

    Ana Azevedo, Ilya Issenin, Radu Cornea, Rajesh Gupta, Nikil Dutt, Alex Nicolau, Alex Veidenbaum

    *Paper 327 02CCECS, UC Irvine

    The COPPER ContextCompiler-controlled Power-Performance ManagementDevelop efficient architectural support and compiler techniques for power managementcontinuously -- as an application runstargeted for high performance/VLIW machinesCoordinated management of multiple techniquesreduction in power with little or no loss of performance. Develop techniques for dynamic compilation to actively trade off performance and power consumptionDevelop a retargetable, ADL-based, power-aware system simulation capability.

    *Paper 327 02CCECS, UC Irvine

    ApproachCompiler Strategies for Power ManagementCompiler-directed architectural configurationgenerate embedded configuration codecode adapts to new architectural organization at runtimeJIT vs multi-version compilation techniquesdynamic, on-demand optimizationCode annotation for dynamic compilationtrade-off compilation overhead for quality of generated codePower-use Estimation for Compiler Controlstatic analysis to select optimal configurationprofile-based selection techniquesstatic or dynamic prediction methods

    *Paper 327 02CCECS, UC Irvine

    Power/Performance KnobsMemory hierarchy

    Instruction issue logic & issue width for VLIW m/c

    Dynamic Register File Reconfiguration

    Frequency and Voltage scaling

    *Paper 327 02CCECS, UC Irvine

    Timing ConstraintsWe consider timing constraints as bounds on operation intervalsupper and lower bounds(determination of optimum interval separation possible statically)Time constraints specified via checkpointsUser-defined checkpoints are inserted in the source code and time constraints between checkpoints are defined.The problem addressed here:Given a profile of power availability and a constraints on specified operation intervals minimize total processor energy consumption while meeting timing and power profile constraints.

    *Paper 327 02CCECS, UC Irvine

    Constrained Dynamic F/V ScalingPower-performance profiling compiler Estimates max energy/cycle ratio and cycle count between checkpointsCompiler-inserted (frequency adjustment points) and user-inserted checkpoints (time constraints)Run-time schedulerCalculates run-time freq limit based on available power and energy profile between curr. chp. and all possible next chps.Calculates optimal target freq based on both time constraints and run-time freq limit between curr. chp. and all possible next chps. Final target freq is selected so that the code runs as slow as possible within the imposed time constraints.

    *Paper 327 02CCECS, UC Irvine

    Program CheckpointsProgram Checkpoints are generated at compile time and indicate places in the code where the processor speed/voltage should be re-calculated; checkpoints also carry user-defined time constraints

    *Paper 327 02CCECS, UC Irvine

    Basic Approach

    Compiling phase: Checkpoint profilingEstimate max energy/cycle ratio and cycle count between checkpointsset time constraintse.g., devices response time, WCET

    Scheduling phaseAt program checkpoints and power profile change points, dynamically adjust frequency and voltage

    *Paper 327 02CCECS, UC Irvine

    ExampleCalculating optimal frequencyFrequency limit (determined by available power profile) is lower than potential optimal frequency

    *Paper 327 02CCECS, UC Irvine

    Exploiting Runtime SlackCHECKPOINT(0);read(i);CHECKPOINT(1);if (i > 5) do { CHECKPOINT(2); i = i - calc_new_i(i);} else { CHECKPOINT(3); a++;}CHECKPOINT(4);i = 36;k = i + a;CHECKPOINT(5);for (j = 0; j < i, j++) { CHECKPOINT (6); k = k*sin(j/100 + k/10); CHECKPOINT (7);}CHECKPOINT(8);(a) Transformed code with checkpoints carrying time constraints (0, 1, 3, 8, 9 and 10) and extra checkpoints for exploiting run-time slack.(c) Checkpoint Database (CDB).Checkpoint Database (CDB)

    Checkpoint Max TimeTransition (ms)

    0-3 501-8 3009-10 10calc_new_i(i){ CHECKPOINT(9); for (k = 0; k < limit, k++){ i += new_i[k]; show_value(i); } CHECKPOINT(10);}

    *Paper 327 02CCECS, UC Irvine

    Slack-based Checkpointing Compiling phaseBuild a hierarchical CFG (HCFG) program representation Insert checkpoints at function calls, loops, if-statementsCheckpoint profiling and removalEstimate max energy/cycle ratio and cycle count between checkpoints, maximum iteration number for loopsPrune the HCFG removing unnecessary checkpointsNodes with low maximum execution cycle countNodes with small variation in the execution cycle countAnnotate the HCFG with the profiling informationSchedulingDetermine active checkpoint transitions from precomputed informationEstimate the number of cycles from current node to the ends of active time constraints. This is minimum of the statically computed longest path to the time constraint and execution delay update on the profiling information (if available)

    *Paper 327 02CCECS, UC Irvine

    Our Approach: Slack AlgorithmAlgorithm at workCurrent checkpoint (with I iterations left)Calculating estimated cycles CMethod1: C(7-10) = Y1 C(7-9) = X2+Y1+I*cycle_per_iter Method2: C(7-10) = cycle_per_iter elapsed(6) C(7-9) = X1 elapsed(5) CDB

    Time Max TimeConstraints

    1-9 T16-10 T2Checkpoint Database (CDB)

    *Paper 327 02CCECS, UC Irvine

    COPPER frameworkMIPS R10K like processor, Wattch power models

    *Paper 327 02CCECS, UC Irvine

    ResultsPower consumption highlighting time constraints for parafffins (f=600 MHz)


    *Paper 327 02CCECS, UC Irvine

    Results: Slack-based DVS for paraffinsCalculated target frequencies satisfying time and power constraints using Formula 1 for paraffinsTime constraint on checkpoint transition 4-7 52% energy savingsFrequencyPower

    *Paper 327 02CCECS, UC Irvine

    ResultsCalculated target frequencies satisfying time and power constraints using Formula 2 for paraffinsSlack-based DVS for paraffins82% energy savingsFrequencyPower

    *Paper 327 02CCECS, UC Irvine

    SummaryWhile average power reduction is important, effective control of dynamic power consumption is essential especially for software management of power and performanceThe hard problem here isidentification of effective architectural mechanisms and their deterministic control through softwareCOPPER approachuse architectural features common to a range of processor architecturesmemory hierarchy, register files, instruction issue.Coordinate with technology and OS strategiesfrequency and voltage scaling.

    *Paper 327 02CCECS, UC Irvine

    Our Approach : Base Algorithm Scheduling phaseCreate list of eventsCalculate frequency limit

    Calculate optimal frequencyCase 1: One future checkpoint transitionCase 2: Frequency limit lower than potential optimal frequencyCase 3: Several possible future checkpoints

    *Paper 327 02CCECS, UC Irvine

    Our Approach : Base AlgorithmCalculate optimal frequency (contd)

    a) Calculating optimal frequency, Case 1.One future checkpoint transition(b) Calculating optimal frequency, Case 2. Frequency limit lower than potential optimal frequency(c) Calculating optimal frequency, Case 3. Several possible future checkpoints

    *Paper 327 02CCECS, UC Irvine

    Baseline ArchitectureA MIPS R10K like processor4-wide issue, out-of-order (OOO) processor5-stage pipeline: fetch, dispatch, issue, writeback, commit32b integers, 64b f.p. numbersregister files: 32 integer and 32 FP registers32K L1 instruction cache, 32K L1 data cache32B L1 line size, 512K L2 unified cache64B L2 line size2 int ALUs, 1 FP adder, 1 FP multiplier512-entry BTB, 2K entry branch predictor

    *Paper 327 02CCECS, UC Irvine

    Power Management by F/V Scaling4 available versions (600MHz,2.2V-500MHz,2.0V-400MHz,1.8V-300MHz,1.6V)

    *Paper 327 02CCECS, UC Irvine

    Related WorkDVS Theoretical Studies and Simulations[Weiser94], [Govil95], [Yassura98], [Lee98], [Pering98], [Mosse00],Practical DVS ImplementationsTransmeta Crusoe, Intel XScale, lpARM

    Interval-based and inter-task DVS techniques under OS control[Weiser94], [Govil95], [Yao95], [Ishihara98], [Hong99], [Manzak00], [Sinha01], [Poulwelse01]Intra-task DVS techniques under compiler control[Shin01], [Hsu01], [Krshna00], [Lee00]


View more >