post compiler software optimization for reducing energy

Download Post compiler software optimization for reducing energy

Post on 14-Apr-2017

284 views

Category:

Engineering

1 download

Embed Size (px)

TRANSCRIPT

  • Post-compiler Software Optimization for Reducing Energy

    Eric Schulte, Jonathan Dorn, et all

    Presented By: Abhishek Abhyankar

    MS Computer Science Virginia Tech

    08-May-15 Computer Architecture CS 5504 Spring 2015 1

  • Traditional Way of doing things

    Make a case for reduction in energy consumption.

    Traditionally Energy optimization handled in Hardware. Voltage Scaling , Heterogeneous Cores, Specialized Cores and, many others.

    On Software side, its mainly concerned about increasing speed and reducing size of the compiled code. Extracting Instruction, Thread, and Data level parallelism.

    08-May-15 Computer Architecture CS 5504 Spring 2015 2

  • Post Compile Software Optimization

    Handle the Optimizations on Software level.

    Take the compiled code output from standard compiler.

    How can this be achieved ? One of the approach is :

    Genetic Optimization algorithm which uses concepts from Evolutionary computation which stochastically mutilates the software for optimum implementation, all this while preserving strict functional semantics.

    08-May-15 Computer Architecture CS 5504 Spring 2015 3

  • Background Concepts

    Functional Vs Non Functional Requirements. On going debate between

    Functional Requirements: Adhering to Specifications, Correctness of the code.

    Non Functional Requirements: Memory Utilization, Energy Consumption.

    Stochastic Methods Used heavily in Evolutionary computation.

    Randomly trying out different combinations.

    08-May-15 Computer Architecture CS 5504 Spring 2015 4

  • Background Concepts .. continued

    Profile Guided Optimizations. Program is profiled by running it and gathering run time data.

    Call graph generation.

    Enforcing nearest is the best policy.

    Software robustness even after mutilation. Random mutilations of the software preserve the semantic meaning.

    Many implementation possible which lead to same semantic goal.

    08-May-15 Computer Architecture CS 5504 Spring 2015 5

  • Background Concepts .. continued

    Evolutionary Computation. Darwinian principles.

    Generally applied in black box approach.

    Steady State Algorithms. After each iterations candidates are simply inserted back in populous.

    Best among them is selected or rather worse is deleted.

    08-May-15 Computer Architecture CS 5504 Spring 2015 6

  • Genetic Optimization Algorithm(GOA)

    Genetic Optimization algorithm which uses concepts from Evolutionary computation which stochastically mutilates the software for optimum implementation, all this while preserving strict functional semantics.

    Takes in three inputs to start. Benchmark Applications or Kernels.

    Test Suites which validate the mutation.

    Fitness Function

    08-May-15 Computer Architecture CS 5504 Spring 2015 7

  • High-level working of GOA

    08-May-15 Computer Architecture CS 5504 Spring 2015 8

  • GOA Working .. continued

    Take the program

    create many random variants of the program by changing the order of the instructions , deleting and editing some

    Test the new variant with the test suites which are submitted

    If they pass then check for improvement in the non functional requirements function

    If yes spit out the assembly code as an optimized code after applying Minimization technique.

    08-May-15 Computer Architecture CS 5504 Spring 2015 9

  • Representation of Assembly code

    Very simple strategy adopted to represent the assembly code.

    Each line will have a cell in an array.

    One line can be broken down and also have multiple cells too.

    The Augmented instructions are avoided. Limits the search space.

    08-May-15 Computer Architecture CS 5504 Spring 2015 10

  • Experimental Setup and Benchmark Kernels:

    Intel machine used as an example of Desktop computer. I7 , 4 Cores, 8 GB Ram

    AMD machine used as an example for Server Scale machine. 48 Cores, 128 GB Ram

    8 Kernels from PARSEC benchmark suite used. Blackscholes, bodytrack, ferret, fluidanimate, freqmine, swaptions, vips, and

    x264

    They should at-least keep the underlying Architecture running for 1 sec and produce output.

    08-May-15 Computer Architecture CS 5504 Spring 2015 11

  • Input Test Suites

    Comprehensive test suites for each kernel.

    Smallest input size of the test suite is considered.

    Just for validating requirements specification, stress or border testing not needed that this point.

    08-May-15 Computer Architecture CS 5504 Spring 2015 12

  • Fitness Function

    GOA proposes a linear scalar energy model

    Hardware counters are captured using the perf utility in Linux. Tightly coupled with the underlying Architecture and Fine grained.

    Heavily dependent on time factor.

    08-May-15 Computer Architecture CS 5504 Spring 2015 13

    power = Cconst + Cins + Cfpos + Ctca + Cmem

    energy = seconds power

    ins fpos tca mem

    cycle cycle cycle cycle

  • Constants Derived from Empirical Study

    08-May-15 Computer Architecture CS 5504 Spring 2015 14

  • Minimization Technique

    Iteration tend to create redundant patterns of code.

    The goal is to get the best energy efficiency with least amount of changes.

    Delta Debugging is used to compare and remove redundant , non influential changes.

    08-May-15 Computer Architecture CS 5504 Spring 2015 15

  • Code Example of GOA

    08-May-15 Computer Architecture CS 5504 Spring 2015 16

  • Post processing the optimized code

    Execute the original code with Held-out test suite. Obtain Wall-Socket real measurements.

    Execute the optimized code with Held-out test suite. Obtain Wall-Socket real measurements.

    Compare the two results and find out patterns which saw improvements and percentage improvement in Energy consumption.

    08-May-15 Computer Architecture CS 5504 Spring 2015 17

  • Results

    In blackscholes kernel GOA caught the induced repeatition loop and found a way around it.

    In swaptions kernel GOA gave a 42% energy savings. Have to take it with a pinch of salt though.

    In vips kernel , the cache misses actually increased instructions lines decreased and hence 20% improvement was observed.

    08-May-15 Computer Architecture CS 5504 Spring 2015 18

  • Interesting Observations

    7% average error found in most prediction models and so as in GOA. But still works fine with it.

    Empirical studies show that GOA might be better suited to finding efficient sequence of assembly instructions but not efficient memory access patterns.

    Energy reduction percentage is consistently more on AMD machines. But mainly due more opportunities due to bigger machine.

    08-May-15 Computer Architecture CS 5504 Spring 2015 19

  • QoS dependent Optimization

    Relaxed preservation of semantics and more emphasis on QoS.

    The plug and play testing suite policy gives the developer option of making GOA strict or loose on semantics.

    Relaxed functional requirements provide much more energy efficiency but risk is taken by the developer to see the program semantic does not break.

    08-May-15 Computer Architecture CS 5504 Spring 2015 20

  • Key contributions

    Genetic Optimization Algorithm (GOA) combines insights from profile-guided optimization, superoptimization, evolutionary computation and mutational robustness.

    This technique gave 20% average energy savings across all benchmarks.

    Very simple and mostly leverages from already available techniques.

    08-May-15 Computer Architecture CS 5504 Spring 2015 21

  • Drawbacks of GOA

    Energy constant are taken empirically over repeated run on specific hardware. Introducing GOA on new architecture will take considerable amount of work.

    Non deterministic approach makes it almost impossible to restore to earlier code path after the software is changed even slightly. Must provide indexing of the code paths and remember them.

    Very High quality test suites required Failure to provide them might result in over optimized false working code.

    08-May-15 Computer Architecture CS 5504 Spring 2015 22

  • Proposed Future Work

    Currently only applied to x86. A matrix implementation proposed as a solution to this problem.

    Indirect selection can optimize one parameter at the cost of worsening other. Should be generalized to Java Byte code and ARM.

    Instead of Compiler which takes a predefined agreed path, a code should be compiled with multiple compiler using multiple paths and then best should be selected.

    08-May-15 Computer Architecture CS 5504 Spring 2015 23

  • Questions / Discussion

    08-May-15 Computer Architecture CS 5504 Spring 2015 24

Recommended

View more >