client compiler for the java hotspot virtual machine … · session 3198 client compiler for the...
TRANSCRIPT
S ess ion 3198
Client Compiler for the Java HotSpot™ Virtual Machine
Tom Rodriguez, Ken RussellSun Microsystems, Inc.
Technology and Application
S ess ion 3198 2
Overall Presentation Goal
L earn about compilation in the J ava HotS pot™ V irtual Machine, and the C lient C ompiler
Understand how the C lient C ompiler deals with specific J ava™ programming language features
G et to know tuning and trouble- shooting techniques
S ess ion 3198 3
Learning Objectives
• A s a result of this presentation, you will be able to:– Understand “J ava HotS pot compilation”– Write better code in the J ava programming
language– Improve the performance of your applications– S ee what’s new in 1.4 and what’s coming
S ess ion 3198 4
Speaker’s Qualifications
• T om R odriguez is the technical lead for the J ava HotS pot C lient C ompiler
• K en R ussell has worked on the J ava HotS pot runtime and compiler for over two years
S ess ion 3198 5
Presentation Agenda
• C ompilation in the J ava HotS pot™ V M
• S tructure of the C lient C ompiler
• Implications for C ode written in the J ava™ P rogramming L anguage
• Miscellaneous
• S ummary
• Q &A
S ess ion 3198 6
Compilation in theJava HotSpot™ VM
• V M C onfigurations
• C ompilation S teps
• O n-S tack R eplacement
• C lass Hierarchy A nalys is
• D eoptimization
• Q uick S ummary
S ess ion 3198 7
VM Configurations
O S L ibraries
O S
J ava
Hardware
libjvm.so/jvm.dll
Typ
ica
l Ja
va V
M S
oft
wa
re S
tack
?C ore V M
C lientC ompiler
S erver C ompiler
javajava-client
java- server
T opic of T his T alk
S ess ion 3198 8
Compilation Steps
• E very method is interpreted firs t
• Hot methods are scheduled for compilation– Method invocations– L oops
• C ompilation can be foreground/background– F oreground compilation default for C lient V M– Background compilation in parallel
S ess ion 3198 9
OnStack Replacement (1)
• C hoice between interpreted/compiled execution
• P roblem with long- running interpreted methods– L oops!
• Need to switch to compiled method in the middle of interpreted method execution– O n-S tack R eplacement (O S R )
S ess ion 3198 10
OnStack Replacement (2)
S tackbefore O S R
void m1() { ... while (i < n-1) { // OSR here ... } ...}
dead m1frame
m0 frame
S tackafter O S R
m1compiled
frame
m1interpreted
frame
m0 frame
O S R
S ess ion 3198 11
Class Hierarchy Analysis (1)
• D ynamic pruning of receiver class set– S tatic calls instead of virtual calls– Inlining across virtual calls– F aster type tests
• C lass Hierarchy A nalys is (C HA )– A nalys is of loaded classes– C an change over time– E ffective
S ess ion 3198 12
Class Hierarchy Analysis (2)
Avirtual m
Bvirtual m
Dvirtual n
Cvirtual m
Evirtual n F
extends
loaded
unloaded
a.m A .m, B .mb.m B.md.m A .md.n D .nf.m A .mf.n D .n
Gvirtual m
A a;B b;C c;D d;E e;F f;G g;
S ess ion 3198 13
Deoptimization (1)
• C ompile- time assumptions may become invalid over time– C lass loading
• D ebugging of program des ired– S ingle- s tepping
• A ctive compiled methods become invalid
• Need to switch to interpreted method in the middle of compiled method execution– D eoptimization
S ess ion 3198 14
Deoptimization (2)
T3 m3(...) { ... // Deopt. here}
T2 m2(...) { m3(...);}
T1 m1() { ... m2(...); ...}
S tack
m0 frame
m1interpreted
m2interpreted
m3 interpr.
m0 frame
comp. m1w/ inlinedm2, m3frame
D eopt.
S ess ion 3198 15
Quick Summary
• C lient V M differs from S erver V M in compiler
• Hotspots trigger compilation– C ompiled method invocation– O S R
• C lass loading, debugging changes compile- time assumptions– D eoptimization
S ess ion 3198 16
What’s New in 1.4
• L ow- level Intermediate R epresentation
• D eoptimization
• Inlining
• Improved D ebugging and P rofiling S upport
S ess ion 3198 17
Deoptimization and Inlining
• 1.3.x client compiler inlined s imple accessors
• 1.4 uses deoptimization and C HA– More inlining poss ibilities
• 1.4.0 client compiler does not inline:– Methods with exception handlers– S ynchronized methods
S ess ion 3198 18
Improved Debugging and Profiling
• F ull- speed debugging– C ompiler no longer disabled when us ing
debuggers (-X debug)– A pplication runs at full speed• Breakpoint setting causes deoptimization
– S erver compiler support coming in 1.4.1
• P rofiling– S ubstantial work on J V M P I robustness– E xisting tools work much better with 1.4
S ess ion 3198 19
Structure of the Client Compiler
• R epresentation of bytecodes
• F rontend
• B ackend
• Machine code generation
S ess ion 3198 20
Representation of Code
• High L evel Intermediate (HIR )– C losely mimics bytecode– D irected graph to enforce evaluation order
• L ow L evel Intermediate (L IR )– C lose to machine instructions– S ome high level instructions• V irtual calls• T ype checks• Intrins ics
S ess ion 3198 21
Frontend
• P arse bytecodes into HIR– E liminate loads (copy propagation)– L ocal value numbering– C onstant folding and identities– Inline– Use C HA to optimize calls
• G lobal O ptimizations– Block merging– E liminate null checks
S ess ion 3198 22
Intermediate Representation
basic block
instructions
successor(s)
CFG
Bytecodes
0 aload_1 1 bipush 46 3 invokevirtual #139 6 istore_2 7 iload_2 8 ifgt 1811 aload_012 getfield #215 invokevirtual #9318 aload_119 iconst_020 ...
S ess ion 3198 23
Backend
• G enerate L IR by walking HIR– E xpress ion level register allocation
• O ptimize L IR– A ss ign locals to registers• L oop focused
– P eephole optimizer
• E mit machine code from L IR– G enerate object maps for use by G C– G enerate information for deoptimization
S ess ion 3198 24
Code Generation
Machine code:
movl %edi,%eaxmovl 0x4,%esicmpl 0x80000000,%eaxjne -0x252e6b7bxorl %edx,%edxcmpl -1,%esije -0x252e6b78cltd idivl %esi,%eaxcmpl 0,%eaxmovl 0x0,%esijne -0x252e6b65movl 0x1,%esimovl %esi,%eaxmovl %ebp,%esppopl %ebpret
LIR:
label [label:0x8176838] move [edi|I] [eax|I] move [int:4|I] [esi|I] idiv [eax|I] [esi|I] [edx|I] [eax|I] [bci:8] cmp [eax|I] [int:0|I] move [int:0|I] [esi|I] branch [NE] [label:0x8182d60] move [int:1|I] [esi|I] label [label:0x8182d60] move [esi|I] [eax|I] return [eax|I]
S ess ion 3198 25
Quick Summary
• F rontend does– P arse and analyze bytecodes– Build and optimize IR– Use C HA for very effective O O optimizations– R eorder C F G for code generation
• B ackend does– L IR– R egister allocation, low- level optimizations– C ode generation
S ess ion 3198 26
Implications for Code Written in the Java™ Programming Language
• A ccessors
• Usage of final
• O bject A llocation (new)
• E xception Handling
• O ther Issues
• Q uick S ummary
S ess ion 3198 27
Accessors
• Use accessors
– XType getX() { return _x; }
– void setX(XType x) { _x = x; }
• A bstracts from implementation
• E as ier to maintain
• No performance penalty– Inlining!
S ess ion 3198 28
Usage of final
• D on’t use final for performance tuning
• C HA will do the work
– Where C HA can’t do it, final doesn’t help either
• K eep software extens ible
• No performance penalty
– S tatic calls
– Inlining
S ess ion 3198 29
Object Allocation (new)
• O bject allocation (new) inlined
– Works in most cases
– E xtremely fast (~ 10–20 clock cycles )
• D o not manage memory yourself
– G C will s low down
– L arger memory footprint
• K eep software s imple
S ess ion 3198 30
Exception Handling
• E xception object creation is very expens ive
• E xception handling is not optimized– Use it for exceptional s ituations– D on’t use it as programming paradigm– D on’t use instead of regular return
• E xception handling costs only when used– S afe to declare exceptions
• If you must use exceptions for control flow:– P reallocate exception object
S ess ion 3198 31
Other Issues
• C lient C ompiler optimized for clean O O code– “Hand- tuning” often counterproductive– G enerated code can be problematic• O bfuscators
• D o not optimize prematurely– Use profiling information
S ess ion 3198 32
Quick Summary
• Write clean O O code– Use accessors– Use final by des ign only– Use new for object allocation– Use exception handling for exceptional cases
• K eep it s imple, keep it clean
S ess ion 3198 33
Miscellaneous
• B uilt- in P rofiler
• When to use the C lient C ompiler
• What’s coming in 1.4.1
• Q uick S ummary
S ess ion 3198 34
BuiltIn Profiler
• O ption: -X prof– E .g.: java -X prof - jar J ava2D emo.jar
• S tatis tical (sampling) flat profiler– Not hierarchical
• P er thread– O utput when thread terminates
S ess ion 3198 35
Sample Profiler Output
Flat profile of 27.38 secs (2574 total ticks): AWT-EventQueue-0
Interpreted + native Method 7.2% 0 + 90 sun.java2d.loops.Blit.Blit 0.7% 0 + 9 sun.awt.windows.Win32BlitLoops.Blit ... 19.8% 72 + 174 Total interpreted (including elided)
Compiled + native Method 9.2% 115 + 0 java.awt.GradientPaintContext.clip... ... 15.0% 179 + 8 Total compiled (including elided)
Thread-local ticks: 51.7% 1330 Blocked (of total) 0.2% 2 Class loader 0.3% 4 Interpreter 10.0% 124 Compilation 0.5% 6 Unknown: running frame 0.2% 2 Unknown: thread_state
S ess ion 3198 36
When to Use the Client Compiler
• C lient C ompiler characteris tics– F ast compilation• Q uick s tartup time
– S mall footprint
• Use for apps with same expectations
• R ecommendation– T ry client and server, choose best• java - client• java - server
S ess ion 3198 37
What’s New in 1.4.1
• B etter peephole optimization
• R educed footprint for compiled code
• New fast subtype check
• Inlining of js rs
• Improved exception handling– Better full- speed debugging
• R educed footprint on the S olaris™ O E
S ess ion 3198 38
Overall Summary
• C ompilation with the J ava HotS pot™ V M
• C lient C ompiler Internals
• P rogramming and T uning Hints
S ess ion 3198 39
References
• R obert G riesemer, S rdjan Mitrovic: “A C ompiler for the J ava HotS pot™ V irtual Machine”, in The School of Niklaus Wirth—The Art of Simplicity, Morgan K aufmann P ublishers , 2000, IS B N 1-55860-723-4
• J ava HotS pot™ T echnology D ocuments, http://java.sun.com/products/hotspot/2.0/docs .html
S ess ion 3198
S ess ion 3198
S ess ion 3198