Transcript
Page 1: HLL VM Implementation

chap6.5~6.7

김정기

Kim, Jung ki

October 11th, 2006

HLL VM Implementation HLL VM Implementation

System programming 특강 , 2006

Page 2: HLL VM Implementation

Contents

Basic Emulation

High-performance Emulation–Optimization Framework–Optimizations

Case study: The Jikes Research Virtual Machine

Page 3: HLL VM Implementation

Basic Emulation

The emulation engine in a JVM can be implemented in a number of ways–interpretation–just-in-time compilation(JIT)

JIT–Methods are compiled at the time they are first invoked–JIT compilation is enabled because the Java ISA’s instructions in a method can easily be discovered

Page 4: HLL VM Implementation

JIT vs conventional compiler

JIT doesn’t have a frontend for parsing and syntax checking before intermediate form

Different intermediate form before optimization

Optimization strategy–multiple optimization levels through profiling–applying optimizations selectively to hot spot(not entire method)

Examples–interpretation : Sun HotSpot, IBM DK–compilation : Jikes RVM

Page 5: HLL VM Implementation

Contents

Basic Emulation

High-performance Emulation–Optimization Framework–Optimizations

Case study: The Jikes Research Virtual Machine

Page 6: HLL VM Implementation

High-Performance Emulation

Two challenges for HLL VMs–to offset run-time optimization overhead with execution-time improvement–to make an object-oriented program go fast

Frequent use of addressing indirection and small methods

Page 7: HLL VM Implementation

Optimization Framework

Host Platform

Interpreter

Bytecodes

Profile Data CompiledCode

OptimizedCode

SimpleCompiler

OptimizingCompiler

translated codeprofile data

Page 8: HLL VM Implementation

Basic Emulation

High-performance Emulation–Optimization Framework–Optimizations•Code Relayout•Method Inlining•Optimizing Virtual Method Calls•Multiversioning and Specialization•On-Stack Replacement•Optimization of Heap-Allocated Objects• Low-Level Optimizations•Optimizing Garbage Collection

Case study: The Jikes Research Virtual Machine

Contents

Page 9: HLL VM Implementation

Code Relayout

the most commonly followed control flow paths are in contiguous location in memory

improved locality and conditional branch predictability

Page 10: HLL VM Implementation

Code Relayout

A

B D

CF

G97

30

1

1

70

29

1

3

68

E

6829

2

ABr cond1 = = falseDBr cond3 = = true

F

Br uncond

G

Br cond2 = = false

E

Br uncond

B

C

Br cond4 = = true

Br uncond

Page 11: HLL VM Implementation

Method Inlining

Benefits– calling overheads decrease especially in object-oriented• passing parameters• managing stack frame• control transfer

– code analysis scope expands

• more optimizations are applicable.

Effects may be different by method’s size– small method : beneficial in most of cases – large method : low portion of calling sequence, sophisticated

cost-benefit analysis is needed -> code explosion may occur : poor cache behavior, performance losses

Page 12: HLL VM Implementation

Method Inlining

Processing sequence1. profiling by instrument2. constructing call-graph at certain intervals3. invoking dynamic optimization system when call counts exceeds certain threshold

Reducing analysis overhead–profile counter is included in stack frame.–When meet the threshold, “walk” backward through the stack

Page 13: HLL VM Implementation

Method Inlining

MAIN

A X

B C Y

900 100

1500100 100025

MAIN

900

A

1500

C

With a call graph

via stack frame

threshold

threshold

Page 14: HLL VM Implementation

Optimizing Virtual Method Calls

–the most common case–Determination of which code to use is done at run time via a dynamic method table lookup.

Invokevirtual <perimeter>

If (a.isInstanceof(Sqaure)) {inlined code…

.

.}Else invokevirtual <perimeter>

Page 15: HLL VM Implementation

Optimizing Virtual Method Calls

If inlining is not useful, just removing method table lookup is also helpful

Polymorphic Inline Caching

…invokevirtual

perimeter…

…call PIC stub

if type = circle jump to circle perimeter codeelse if type = square jump to square perimeter codeelse call lookup

circle perimeter code

square perimeter code

update PIC stub;method table lookup code

polymorphic Inline Cache stub

Page 16: HLL VM Implementation

Multiversioning and Specialization

Multiversioning by specialization–If some variables or references are always assigned data values or types known to be constant (or from a limited range)– simplified, specialized code can be used

for (int i=0;i<1000;i++) { if(A[i]<0) B[i] = -A[i]*C[i]; else B[i] = A[i]*C[i];}

for (int i=0;i<1000;i++) { if (A[i] ==0 )

B[i] = 0;

}

if(A[i]<0) B[i] = -A[i]*C[i]; else B[i] = A[i]*C[i];

specialized code

Page 17: HLL VM Implementation

Multiversioning and Specialization

defered compilation of the general case

for (int i=0;i<1000;i++) { if(A[i]<0) B[i] = -A[i]*C[i]; else B[i] = A[i]*C[i];}

for (int i=0;i<1000;i++) { if (A[i] ==0 )

B[i] = 0;

}

jump to dynamic compiler for deferred compilation

Page 18: HLL VM Implementation

On-Stack Replacement

due to no benefit until the next call, OSR is needed

Implementation stack needs to be modified on the fly.

OSR is needed in this case–inlining in long-running method–defered compilation–debugging (user expect to observe the architected instruction sequence)

Page 19: HLL VM Implementation

On-Stack Replacement

stack

implementationframe A

stack

implementationframe B

methodcode

opt. level x

architectedframe

methodcode

opt. level y

optimize/de-optimizemethod code

1. extract architected state

2. generate a new implementation frame

3. replace the current implementation stack frame

Page 20: HLL VM Implementation

On-Stack Replacement

OSR is a complex operation

If the initial stack frame is maintained by an interpreter or an nonoptimizing compiler, then extracting architected stack state straightforward

On the other hand, compiler may define a set of program points where OSR can potentially occur and then ensure that the architected values are live at that point in the execution.

Page 21: HLL VM Implementation

On-Stack Replacement

Meaning of OSR–state performance benefits are small–allowing the implementation of debuggers–reducing start-up time in defered compilation –improving cache performance

Page 22: HLL VM Implementation

Optimization of Heap-Allocated Objects

Creating objects and garbage collection have high cost

the code for the heap allocation and object initialization can be inlined for frequently allocated objects

scalar replacement–escape analysis–effective for reducing object access delays

Page 23: HLL VM Implementation

Optimization of Heap-Allocated Objects

Scarlar Replacement

-access delays are reducedclass square {int side;int area;}void calculate() { a = new square(); a.side = 3; a.area = a.side * a.side; System.out.println(a.area);}

void calculate() { int t1 = 3; int t2 = t1 * t1; System.out.println(t2);}

Page 24: HLL VM Implementation

Optimization of Heap-Allocated Objects

field ordering for data usage patterns–to improve data cache performance–to remove redundant object accesses

a = new square;b = new square;c = a; …a.side = 5;

b.side = 10;z = c.side;

a = new square;b = new square;c = a;…t1 = 5;a.side = t1;b.side = 10z = t1;

redundant getfield (load) removal

Page 25: HLL VM Implementation

Low-Level Optimizations

array range and null reference checking is significant in object-oriented HLL VMs

array range and null reference checking for throwing exception may cause two performance losses–overhead needed to perform check itself–some optimizations are inhibited for a precise state

Page 26: HLL VM Implementation

Low-Level Optimizations

p = new Zq = new Zr = p …p.x = … <null check p>… = p.x <null check p> …q.x = … <null check q> …r.x = … <null check r(p)>

p = new Zq = new Zr = p …p.x = … <null check p>… = p.x …r.x = …q.x = … <null check q>

Removing Redundant Null Checks

Page 27: HLL VM Implementation

Low-Level Optimizations

Hoisting an Invariant Check–checking can be hoisted outside the loop

for (int i=0;i<j;i++) { sum += A[i]; <range check A>}

if (j < A.length)then for (int i=0;i<j;i++) { sum += A[i];}else for (int i=0;i<j;i++) { sum += A[i]; <range check A>}

Page 28: HLL VM Implementation

Low-Level Optimizations

Loop Peeling–the null check is not needed for the remaining loop iterations

for (int i=0;i<100;i++) { r = A[i]; B[i] = r*2; p.x += A[i]; <null check p>}

r = A[0];B[0] = r*2;p.x = A[0]; <null check p>for (int i=1;i<100;i++) { r = A[i]; p.x += A[i]; B[i] = r*2;}

Page 29: HLL VM Implementation

Optimizing Garbage Collection

Compiler support– Compiler provide the garbage collector with “yield point” at regular intervals in the code. At these points a thread can guarantee a consistent heap state, and control can be yielded to the garbage collector.– Compiler also helps specific garbage-collection algorithm.

Page 30: HLL VM Implementation

Contents

Basic Emulation

High-performance Emulation–Optimization Framework–Optimizations

Case study: The Jikes Research Virtual Machine

Page 31: HLL VM Implementation

The Jikes Research Virtual Machines

an open source Java compiler.

The original version was developed by IBM

It is much faster in compiling small projects than Sun's own compiler.

Unfortunately it is no longer actively being developed.

Page 32: HLL VM Implementation

The Jikes Research Virtual Machines

only to use compile (without interpretation step)

first, compiler translates bytecodes into native code

generated code simply emulates the Java stack, and then optimization is applied

dynamic compiler supports optimization depending on an estimate of cost-benefit

multithreaded implementation

preemptive thread scheduling by control bit

Page 33: HLL VM Implementation

The Jikes Research Virtual Machines

Adaptive Optimization System–runtime measurement system *gathers raw performance data by sampling at yield point–recompilation system–controller *responsible for coordinating the activities *determining optimization level in recomilation

cost-benefit function(to given method)–Cj(recompile time) , Tj(recompiled exec) , Ti(not recompiled)– Cj + Tj < Ti -> recompilation.

Page 34: HLL VM Implementation

The Jikes Research Virtual Machines

Controller

Hot MethodOrganizer

CompilationThread

AOS Database(Profile Data)

Executing Code

OptimizingCompiler

MethodSamples

Instrumentation/Compilation Plan

Instrumented/Optimized Code

RecompilationSubsystem

RuntimeMeasurementSubsystem

Collected sample

New code

Event queue Compilation queue

Page 35: HLL VM Implementation

The Jikes Research Virtual Machines

Level 0–copy/constant propagation, branch optimization, etc.–trivial methods inlining–simple code relayout, register allocation (simple linear scan)

Level 1–higher-level code restructuring : more aggressive inlining, code relayout

Level 2–use a static single assignment intermediate form–SSA allows global optimizations–loop unrolling, eliminating loop-closing branches

Optimization levels

Page 36: HLL VM Implementation

The Jikes Research Virtual Machines

start-up

Page 37: HLL VM Implementation

The Jikes Research Virtual Machines

steady-state

Page 38: HLL VM Implementation

The Jikes Research Virtual Machines

Page 39: HLL VM Implementation

The Jikes Research Virtual Machines


Top Related