HLL VM Implementation

Download HLL VM Implementation

Post on 05-Jan-2016




1 download

Embed Size (px)


HLL VM Implementation. chap6.5~6.7 Kim, Jung ki October 11 th , 2006. System programming , 2006. Contents. Basic Emulation High-performance Emulation Optimization Framework Optimizations Case study: The Jikes Research Virtual Machine. Basic Emulation. - PowerPoint PPT Presentation


  • HLL VM Implementation chap6.5~6.7

    Kim, Jung kiOctober 11th, 2006System programming , 2006

  • Basic EmulationHigh-performance EmulationOptimization FrameworkOptimizationsCase study: The Jikes Research Virtual MachineContents

  • Basic EmulationThe emulation engine in a JVM can be implemented in a number of waysinterpretationjust-in-time compilation(JIT)JITMethods are compiled at the time they are first invokedJIT compilation is enabled because the Java ISAs instructions in a method can easily be discovered

  • JIT vs conventional compilerJIT doesnt have a frontend for parsing and syntax checking before intermediate formDifferent intermediate form before optimizationOptimization strategymultiple optimization levels through profilingapplying optimizations selectively to hot spot(not entire method)Examplesinterpretation : Sun HotSpot, IBM DKcompilation : Jikes RVM

  • Basic EmulationHigh-performance EmulationOptimization FrameworkOptimizationsCase study: The Jikes Research Virtual MachineContents

  • High-Performance EmulationTwo challenges for HLL VMsto offset run-time optimization overhead with execution-time improvementto make an object-oriented program go fastFrequent use of addressing indirection and small methods

  • Optimization FrameworkHost PlatformInterpreterBytecodesProfile DataCompiled CodeOptimized CodeSimple CompilerOptimizing Compilertranslated codeprofile data

  • Basic EmulationHigh-performance EmulationOptimization FrameworkOptimizationsCode RelayoutMethod InliningOptimizing Virtual Method CallsMultiversioning and SpecializationOn-Stack ReplacementOptimization of Heap-Allocated ObjectsLow-Level OptimizationsOptimizing Garbage CollectionCase study: The Jikes Research Virtual MachineContents

  • Code Relayoutthe most commonly followed control flow paths are in contiguous location in memoryimproved locality and conditional branch predictability

  • Code RelayoutABDCFG97301170291368E68292ABr cond1 = = falseDBr cond3 = = trueFBr uncondGBr cond2 = = falseEBr uncondCBr cond4 = = trueBr uncond

  • Method InliningBenefitscalling overheads decrease especially in object-orientedpassing parametersmanaging stack framecontrol transfercode analysis scope expandsmore optimizations are applicable.Effects may be different by methods sizesmall method : beneficial in most of cases large method : low portion of calling sequence, sophisticated cost-benefit analysis is needed -> code explosion may occur : poor cache behavior, performance losses

  • Method InliningProcessing sequence 1. profiling by instrument 2. constructing call-graph at certain intervals 3. invoking dynamic optimization system when call counts exceeds certain thresholdReducing analysis overheadprofile counter is included in stack frame.When meet the threshold, walk backward through the stack

  • Method InliningMAINAXBCY9001001500100100025MAIN900A1500C With a call graphvia stack framethresholdthreshold

  • Optimizing Virtual Method Callsthe most common caseDetermination of which code to use is done at run time via a dynamic method table lookup.

    Invokevirtual If (a.isInstanceof(Sqaure)) {inlined code ..} Else invokevirtual

  • Optimizing Virtual Method CallsIf inlining is not useful, just removing method table lookup is also helpfulPolymorphic Inline Caching invokevirtual perimeter call PIC stub if type = circle jump to circle perimeter codeelse if type = square jump to square perimeter code else call lookupcircle perimeter codesquare perimeter codeupdate PIC stub; method table lookup codepolymorphic Inline Cache stub

  • Multiversioning and SpecializationMultiversioning by specializationIf some variables or references are always assigned data values or types known to be constant (or from a limited range) simplified, specialized code can be usedfor (int i=0;i
  • Multiversioning and Specializationdefered compilation of the general case

    for (int i=0;i

  • On-Stack Replacementdue to no benefit until the next call, OSR is neededImplementation stack needs to be modified on the fly.OSR is needed in this caseinlining in long-running methoddefered compilationdebugging (user expect to observe the architected instruction sequence)

  • On-Stack Replacementstackimplementation frame Astackimplementation frame Bmethod code opt. level xarchitected framemethod code opt. level yoptimize/de-optimize method code1. extract architected state2. generate a new implementation frame3. replace the current implementation stack frame

  • On-Stack ReplacementOSR is a complex operationIf the initial stack frame is maintained by an interpreter or an nonoptimizing compiler, then extracting architected stack state straightforwardOn the other hand, compiler may define a set of program points where OSR can potentially occur and then ensure that the architected values are live at that point in the execution.

  • On-Stack ReplacementMeaning of OSRstate performance benefits are smallallowing the implementation of debuggersreducing start-up time in defered compilation improving cache performance

  • Optimization of Heap-Allocated ObjectsCreating objects and garbage collection have high costthe code for the heap allocation and object initialization can be inlined for frequently allocated objectsscalar replacementescape analysiseffective for reducing object access delays

  • Optimization of Heap-Allocated ObjectsScarlar Replacement -access delays are reducedclass square { int side; int area; } void calculate() { a = new square(); a.side = 3; a.area = a.side * a.side; System.out.println(a.area);}void calculate() { int t1 = 3; int t2 = t1 * t1; System.out.println(t2);}

  • Optimization of Heap-Allocated Objectsfield ordering for data usage patternsto improve data cache performanceto remove redundant object accessesa = new square;b = new square;c = a; a.side = 5;

    b.side = 10;z = c.side;a = new square;b = new square;c = a;t1 = 5;a.side = t1;b.side = 10z = t1;redundant getfield (load) removal

  • Low-Level Optimizationsarray range and null reference checking is significant in object-oriented HLL VMsarray range and null reference checking for throwing exception may cause two performance lossesoverhead needed to perform check itselfsome optimizations are inhibited for a precise state

  • Low-Level Optimizationsp = new Zq = new Zr = p p.x = = p.x q.x = r.x = p = new Zq = new Zr = p p.x = = p.x r.x = q.x = Removing Redundant Null Checks

  • Low-Level OptimizationsHoisting an Invariant Checkchecking can be hoisted outside the loop

    for (int i=0;i

  • Low-Level OptimizationsLoop Peelingthe null check is not needed for the remaining loop iterations

    for (int i=0;i

  • Optimizing Garbage CollectionCompiler support Compiler provide the garbage collector with yield point at regular intervals in the code. At these points a thread can guarantee a consistent heap state, and control can be yielded to the garbage collector. Compiler also helps specific garbage-collection algorithm.

  • Basic EmulationHigh-performance EmulationOptimization FrameworkOptimizationsCase study: The Jikes Research Virtual MachineContents

  • The Jikes Research Virtual Machinesan open source Java compiler. The original version was developed by IBM It is much faster in compiling small projects than Sun's own compiler. Unfortunately it is no longer actively being developed.

  • The Jikes Research Virtual Machinesonly to use compile (without interpretation step)first, compiler translates bytecodes into native codegenerated code simply emulates the Java stack, and then optimization is applieddynamic compiler supports optimization depending on an estimate of cost-benefitmultithreaded implementationpreemptive thread scheduling by control bit

  • The Jikes Research Virtual MachinesAdaptive Optimization Systemruntime measurement system *gathers raw performance data by sampling at yield pointrecompilation systemcontroller *responsible for coordinating the activities *determining optimization level in recomilation cost-benefit function(to given method)Cj(recompile time) , Tj(recompiled exec) , Ti(not recompiled) Cj + Tj < Ti -> recompilation.

  • The Jikes Research Virtual MachinesControllerHot Method OrganizerCompilation ThreadAOS Database (Profile Data)Executing CodeOptimizing CompilerMethod SamplesInstrumentation/ Compilation PlanInstrumented/ Optimized CodeRecompilation SubsystemRuntime Measurement SubsystemCollected sampleNew codeEvent queueCompilation queue

  • The Jikes Research Virtual MachinesLevel 0copy/constant propagation, branch optimization, etc.trivial methods inliningsimple code relayout, register allocation (simple linear scan)Level 1higher-level code restructuring : more aggressive inlining, code relayoutLevel 2use a static single assignment intermediate formSSA allows global optimizationsloop unrolling, eliminating loop-closing branchesOptimization levels

  • The Jikes Research Virtual Machinesstart-up

  • The Jikes Research Virtual Machinessteady-state

  • The Jikes Research Virtual Machines

  • The Jikes Research Virtual Machines