Post-compiler Software Optimization for Reducing Energy
Eric Schulte, Jonathan Dorn, et all
Presented By: Abhishek Abhyankar
MS Computer Science Virginia Tech
08-May-15 Computer Architecture CS 5504 Spring 2015 1
Traditional Way of doing things
• Make a case for reduction in energy consumption.
• Traditionally Energy optimization handled in Hardware.• Voltage Scaling , Heterogeneous Cores, Specialized Cores and, many others.
• On Software side, its mainly concerned about increasing speed and reducing size of the compiled code.• Extracting Instruction, Thread, and Data level parallelism.
08-May-15 Computer Architecture CS 5504 Spring 2015 2
Post Compile Software Optimization
• Handle the Optimizations on Software level.
• Take the compiled code output from standard compiler.
• How can this be achieved ? One of the approach is :
“Genetic Optimization algorithm which uses concepts from Evolutionary computation which stochastically mutilates the software for optimum implementation, all this while preserving strict functional semantics.”
08-May-15 Computer Architecture CS 5504 Spring 2015 3
Background Concepts
• Functional Vs Non Functional Requirements.• On going debate between
• Functional Requirements: Adhering to Specifications, Correctness of the code.
• Non Functional Requirements: Memory Utilization, Energy Consumption.
• Stochastic Methods• Used heavily in Evolutionary computation.
• Randomly trying out different combinations.
08-May-15 Computer Architecture CS 5504 Spring 2015 4
Background Concepts .. continued
• Profile Guided Optimizations.• Program is profiled by running it and gathering run time data.
• Call graph generation.
• Enforcing “nearest is the best” policy.
• Software robustness even after mutilation.• Random mutilations of the software preserve the semantic meaning.
• Many implementation possible which lead to same semantic goal.
08-May-15 Computer Architecture CS 5504 Spring 2015 5
Background Concepts .. continued
• Evolutionary Computation.• Darwinian principles.
• Generally applied in black box approach.
• Steady State Algorithms.• After each iterations candidates are simply inserted back in populous.
• Best among them is selected or rather worse is deleted.
08-May-15 Computer Architecture CS 5504 Spring 2015 6
Genetic Optimization Algorithm(GOA)
• “Genetic Optimization algorithm which uses concepts from Evolutionary computation which stochastically mutilates the software for optimum implementation, all this while preserving strict functional semantics.”
• Takes in three inputs to start.• Benchmark Applications or Kernels.
• Test Suites which validate the mutation.
• Fitness Function
08-May-15 Computer Architecture CS 5504 Spring 2015 7
High-level working of GOA
08-May-15 Computer Architecture CS 5504 Spring 2015 8
GOA Working .. continued
• Take the program
• create many random variants of the program by changing the order of the instructions , deleting and editing some
• Test the new variant with the test suites which are submitted
• If they pass then check for improvement in the non functional requirements function
• If yes spit out the assembly code as an optimized code after applying Minimization technique.
08-May-15 Computer Architecture CS 5504 Spring 2015 9
Representation of Assembly code
• Very simple strategy adopted to represent the assembly code.
• Each line will have a cell in an array.
• One line can be broken down and also have multiple cells too.
• The Augmented instructions are avoided.• Limits the search space.
08-May-15 Computer Architecture CS 5504 Spring 2015 10
Experimental Setup and Benchmark Kernels:
• Intel machine used as an example of Desktop computer.• I7 , 4 Cores, 8 GB Ram
• AMD machine used as an example for Server Scale machine.• 48 Cores, 128 GB Ram
• 8 Kernels from PARSEC benchmark suite used.• Blackscholes, bodytrack, ferret, fluidanimate, freqmine, swaptions, vips, and
x264
• They should at-least keep the underlying Architecture running for 1 sec and produce output.
08-May-15 Computer Architecture CS 5504 Spring 2015 11
Input Test Suites
• Comprehensive test suites for each kernel.
• Smallest input size of the test suite is considered.
• Just for validating requirements specification, stress or border testing not needed that this point.
08-May-15 Computer Architecture CS 5504 Spring 2015 12
Fitness Function
• GOA proposes a linear scalar energy model
• Hardware counters are captured using the “perf” utility in Linux.• Tightly coupled with the underlying Architecture and Fine grained.
• Heavily dependent on time factor.
08-May-15 Computer Architecture CS 5504 Spring 2015 13
power = Cconst + Cins + Cfpos + Ctca + Cmem
energy = seconds power
ins fpos tca mem
cycle cycle cycle cycle
Constants Derived from Empirical Study
08-May-15 Computer Architecture CS 5504 Spring 2015 14
Minimization Technique
• Iteration tend to create redundant patterns of code.
• The goal is to get the best energy efficiency with least amount of changes.
• Delta Debugging is used to compare and remove redundant , non influential changes.
08-May-15 Computer Architecture CS 5504 Spring 2015 15
Code Example of GOA
08-May-15 Computer Architecture CS 5504 Spring 2015 16
Post processing the optimized code
• Execute the original code with Held-out test suite.• Obtain Wall-Socket real measurements.
• Execute the optimized code with Held-out test suite.• Obtain Wall-Socket real measurements.
• Compare the two results and find out patterns which saw improvements and percentage improvement in Energy consumption.
08-May-15 Computer Architecture CS 5504 Spring 2015 17
Results
• In blackscholes kernel GOA caught the induced repeatition loop and found a way around it.
• In swaptions kernel GOA gave a 42% energy savings.• Have to take it with a pinch of salt though.
• In vips kernel , the cache misses actually increased instructions lines decreased and hence 20% improvement was observed.
08-May-15 Computer Architecture CS 5504 Spring 2015 18
Interesting Observations
• 7% average error found in most prediction models and so as in GOA.• But still works fine with it.
• Empirical studies show that GOA might be better suited to finding efficient sequence of assembly instructions but not efficient memory access patterns.
• Energy reduction percentage is consistently more on AMD machines.• But mainly due more opportunities due to bigger machine.
08-May-15 Computer Architecture CS 5504 Spring 2015 19
QoS dependent Optimization
• “Relaxed” preservation of semantics and more emphasis on QoS.
• The plug and play testing suite policy gives the developer option of making GOA strict or loose on semantics.
• Relaxed functional requirements provide much more energy efficiency but risk is taken by the developer to see the program semantic does not break.
08-May-15 Computer Architecture CS 5504 Spring 2015 20
Key contributions
• Genetic Optimization Algorithm (GOA) combines insights from profile-guided optimization, superoptimization, evolutionary computation and mutational robustness.
• This technique gave 20% average energy savings across all benchmarks.
• Very simple and mostly leverages from already available techniques.
08-May-15 Computer Architecture CS 5504 Spring 2015 21
Drawbacks of GOA
• Energy constant are taken empirically over repeated run on specific hardware.• Introducing GOA on new architecture will take considerable amount of work.
• Non deterministic approach makes it almost impossible to restore to earlier code path after the software is changed even slightly.• Must provide indexing of the code paths and remember them.
• Very High quality test suites “required”• Failure to provide them might result in over optimized false working code.
08-May-15 Computer Architecture CS 5504 Spring 2015 22
Proposed Future Work
• Currently only applied to x86.• A matrix implementation proposed as a solution to this problem.
• Indirect selection can optimize one parameter at the cost of worsening other.• Should be generalized to Java Byte code and ARM.
• Instead of Compiler which takes a predefined “agreed” path, a code should be compiled with multiple compiler using multiple paths and then best should be selected.
08-May-15 Computer Architecture CS 5504 Spring 2015 23
Questions / Discussion
08-May-15 Computer Architecture CS 5504 Spring 2015 24