jvm magic
Post on 10-May-2015
6.928 Views
Preview:
DESCRIPTION
TRANSCRIPT
The JVM Magic
Baruch SadogurskyConsultant & Architect,
AlphaCSP
Agenda
• Introduction• GC Magic 101• General Optimizations• Compiler Optimizations• What can I do?
• Programming tips• JVM configuration flags
2
Introduction
Introduction
• In the past, JVM was considered by many as Java Achilles’ heel• Interpreter?!
• JVM team improved performance in 300 to 3000 times• JDK 1.6 compared to JDK 1.0
• Java is measured to be 50% to 100+% the speed of C and C++• Jake2 vs Quake2
• How can it be?
Java Virtual Machines Zoo
• CEE-J • Excelsior JET• Hewlett-Packard• J9 (IBM)• Jbed• Jblend• Jrockit• MRJ• MicroJvm• MS JVM• OJVM• PERC• Blackdown Java• CVM• Gemstone• Golden Code Development• Intent• Novell• NSIcom CrE-ME• ChaiVM• HotSpot• AegisVM• Apache Harmony• CACAO• Dalvik• IcedTea
• IKVM.NET• Jamiga• JamVM• Jaos• JC• Jelatine JVM• JESSICA• Jikes RVM• Jnode• JOP• Juice• Jupiter• JX• Kaffe• leJOS• Mika VM• Mysaifu• NanoVM• SableVM• Squawk virtual machine• SuperWaba• TinyVM• VMkit of Low Level Virtual Machine• Wonka VM• Xam
5
HotSpot Virtual Machine
• Developed by Longview Technologies back in 1999
• Contains:• Class loader• Bytecode interpreter• 2 Virtual machines• 7 Garbage collectors• 2 Compilers• Runtime libraries
HotSpot Virtual Machine
• Configured by hundreds of –XX flags
• Reminder• -X options are non-standard• -XX options have specific system
requirements for correct operations• Both are subject to change without
notice
GC Magic 101
GC Is Slow?
• GC has bad performance reputation• Reduces throughput• Introduces pauses• Unpredictable• Uncontrolled• Performance degradation is proportional to
objects count• Just give me the damn free() and malloc()!
I’ll be just fine!• Is it so?
Generational Collectors
• Weak generational hypothesis• Most objects die young (AKA Infant mortality)• Few old to young references
• Generations: regions holding objects of different ages• GC is done separately once a generation fills• Different GC algorithms• The young (nursery) generation
• Collected by “Minor garbage collection”• The old (tenured) generation
• Collected by “Minor garbage collection”
GC Magic 101
• Young is better than Tenured• Let your objects die in young
generation• When possible and makes sense
11
vs
GC Magic 101
12
vs
• Swapping is bad• Application's memory footprint should
not exceed the available physical memory
GC Magic 101
13
vs
• Choose:• Throughput (client)• Low-pause (server)
GC Magic 101
http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html
14
Garbage First (G1)
• New in JDK 1.6 u14 (May 29th)• All memory is divided to 1MB buckets• Calculates objects liveness in buckets
• Drops “dead” buckets• If a bucket is not total garbage, it’s not dropped
• Collects the most garbage buckets first
• Pauses only on “mark”• No sweep
• User can provide pause time goals• Actual seconds or Percentage of runtime• G1 records bucket collection time and can estimate how
many buckets to collect during pause
Garbage First (G1)
• Targets multi-process machines and large heaps
• G1 will be the long-term replacement for the CMS collector• Unlike CMS, compacts to battle fragmentation
• A bucket’s space is fully reclaimed• Better throughput• Predictable pauses (high probability)• Garbage left in buckets with high live ratio
• May be collected later
GC Is Slow? – The Answers
• Reduces throughput• You choose
• Introduces pauses• You choose
• Unpredictable• Not any more
• Uncontrolled• Configurable
• Performance degradation is proportional to objects count• Not true
• Just give me the damn free() and malloc()! I’ll be just fine!• Bad idea (see more later)
General Optimizations
HotSpot Optimizations
• JIT Compilation• Compiler Optimizations• Generates more performant code that
you could write in native• Adaptive Optimization• Split Time Verification• Class Data Sharing
Two Virtual Machines?
• Client VM• Reducing start-up time and memory footprint• -client CL flag
• Server VM• Maximum program execution speed• -server CL flag
• Auto-detection• Server: >1 CPUs & >=2GB of physical memory• Win32 – always detected as client• Many 64bit OSes don’t have client VMs
47
Just-In-Time Compilation
• Everyone knows about JIT!• Hot code is compiled to native• What is “hot”?
• Server VM – 10000 invocations• Client VM – 1500 invocations• Use -XX:CompileThreshold=# to change
• More invocations – better optimizations• Less invocations – shorter warmup time
Just-In-Time Compilation
• The code is being optimized by the compiler• Coming soon…
Adaptive Optimization
• Allows HotSpot to uncompile previously compiled code
• Much more aggressive, even speculative optimizations may be performed
• And rolled back if something goes wrong or new data gathered• E.g. classloading might invalidate inlining
Split Time Verification
• Java suffers from long boot time• One of the reasons is bytecode
verification• Valid flow control• Type safety• Visibility
• In order to ease on the weak KVM, J2ME started performing part of the verification in compile time
• It’s good, so now it’s in Java SE 6 too
Class Data Sharing
• Helps improve startup time• During JDK installation part of
rt.jar is preloaded into shared memory file which is attached in runtime
• No need to reload and reverify those classes every time
Compiler Optimizations
Two Types of Optimizations
• Java has two compilers:• javac bytecode compiler• HotSpot VM JIT compiler
• Both implement similar optimizations
• Bytecode compiler is limited• Dynamic linking• Can apply only static optimizations
Warning
• Caution! Don’t try this at home yourself!
• The source code you are about to see is not real!• It’s pseudo assembly code
• Don’t write such code!• Source code should be
readable and object-oriented• Bytecode will become
performant automagically
55
Optimization Rules
• Make the common case fast• Don't worry about uncommon/infrequent
case• Defer optimization decisions
• Until you have data• Revisit decisions if data warrants
56
Null check Elimination
• Java is null-safe language• Pointer can’t point to
meaningless portion of memory• Null checks are added by the
compiler, NullPointerException is thrown
• JVM’s profiler can eliminate those checks
57
Example – Original Source
58
1 public class Game { 2 3 private Logger logger; 4 private int totalScore; 5 6 public void score(String player, int points) { 7 logger.info(player + " scores " + points); 8 totalScore += points; 9 } 10 }
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 game.score("Bob", 12); 6 } 7 }
Example – Null Check Elimination
59
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 if(game == null) throw new NullPointerException("null"); 6 game.score("Bob", 12); 7 } 8 }
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 if(game == null) throw new NullPointerException("null"); 6 game.score("Bob", 12); 7 } 8 }
Inlining
• Love Encapsulation?• Getters and setters
• Love clean and simple code?• Small methods
• Use static code analysis?• Small methods
• No penalty for using those!• JIT brings the implementation of these
methods into a containing method• This optimization known as “Inlining”
Inlining
• Not just about eliminating call overhead• Provides optimizer with bigger blocks• Enables other optimizations
• hoisting, dead code elimination, code motion, strength reduction
61
Inlining
• But wait, all public non-final methods in Java are virtual!• HotSpot examines the exact case in place• In most cases there is only one implementation,
which can be inlined• But wait, more implementations may be
loaded later!• In such case HotSpot undoes the inlining• Speculative inlining
• By default limited to 35 bytes of bytecode• Use -XX:MaxInlineSize=# to change
Example - Inlining
63
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 game.score("Bob", 12); 6 game.logger.info("Bob" + " scores " + 12); 7 game.totalScore += 12; 8 } 9 }
Example – Source Code Revision
64
1 public class GameMove { 2 3 private final String player; 4 private final int points; 5 6 public GameMove(String player, int points) { 7 this.player = player; 8 this.points = points; 9 } 10 11 public String getPlayer() { 12 return player; 13 } 14 15 public int getPoints() { 16 return points; 17 } 18 }
Example – Source Code Revision
65
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 for (GameMove gameMove : gameMoves) { 7 game.score(gameMove.getPlayer(), gameMove.getPoints()); 8 } 9 } 10 }
Code Hoisting
• Hoist = to raise or lift• Size optimization• Eliminate duplicate code in
method bodies by hoisting expressions or statements• Duplicate bytecode, not necessarily
source code
Example – Code Hoisting
67
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 for (int i = 0; i < gameMoves.length; i++) { 7 if (i < 0 || gameMoves.length <= i) throw new ArrayIndexOutOfBoundsException(); 8 game.score(gameMoves[i].getPlayer(), gameMoves[i].getPoints()); 9 } 10 } 11 }
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 int length = gameMoves.length; 7 for (int i = 0; i < length; i++) { 8 if (i < 0 || length <= i) throw new ArrayIndexOutOfBoundsException(); 9 game.score(gameMoves[i].getPlayer(), gameMoves[i].getPoints()); 10 } 11 } 12 }
Bounds Check Elimination
• Java promises automatic boundary checks for arrays• Exception is thrown
• If programmer checks the boundaries of its array by himself, the automatic check can be turned off
Example – Bounds Check Elimination
69
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 int length = gameMoves.length; 7 for (int i = 0; i < length; i++) { 8 if (i < 0 || length <= i) throw new ArrayIndexOutOfBoundsException(); 9 game.score(gameMoves[i].getPlayer(), gameMoves[i].getPoints()); 10 } 11 } 12 }
Sub-Expression Elimination
• Avoids redundant memory access
70
1 @Test 2 public void testScore() { 3 Game game = new Game(); 4 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 5 for (int i = 0; i < gameMoves.length; i++) { 6 game.score(gameMoves[i].getPlayer(), gameMoves[i].getPoints()); 7 } 8 }
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 for (int i = 0; i < gameMoves.length; i++) { 7 GameMove move = gameMoves[i]; 8 game.score(move.getPlayer(), move.getPoints()); 9 } 10 } 11 }
Loop Unrolling
• Some loops shouldn’t be loops• In performance meaning, not code
readability• Those can be unrolled to set of
statements• If the boundaries are dynamic,
partial unroll will occur
Example – Loop Unrolling
72
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 for (int i = 0; i < gameMoves.length; i++) { 7 GameMove move = gameMoves[i]; 8 game.score(move.getPlayer(), move.getPoints()); 9 } 10 } 11 }
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 GameMove move = gameMoves[0]; 7 game.score(move.getPlayer(), move.getPoints()); 8 move = gameMoves[1]; 9 game.score(move.getPlayer(), move.getPoints()); 10 } 11 }
Example – Inlining
73
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 GameMove move = gameMoves[0]; 7 game.score(move.getPlayer(), move.getPoints()); 8 game.score(move.player, move.points); 9 move = gameMoves[1]; 10 game.score(move.getPlayer(), move.getPoints()); 11 game.score(move.player, move.points); 12 } 13 }
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 GameMove move = gameMoves[0]; 7 game.score(move.player, move.points); 8 game.logger.info(move.player + " scores " + move.points); 9 game.totalScore += move.points; 10 move = gameMoves[1]; 11 game.score(move.player, move.points); 12 game.logger.info(move.player + " scores " + move.points); 13 game.totalScore += move.points; 14 } 15 }
Escape Analysis
• Escape analysis is not optimization
• It is check for object not escaping local scope• E.g. created in private method, assigned
to local variable and not returned• Escape analysis opens up
possibilities for lots of optimizations
Scalar Replacement
• Remember the rule “new == always new object”?• False!
• JVM can optimize away allocations• Fields are hoisted into registers
• Object becomes unneeded• But object creation is cheap!
• Yap, but GC is not so cheap…
75
Example – Source Code Revision
76
1 public class Moves { 2 3 GameMove nextMove(){ 4 return new GameMove(nextPlayer(), calcPoints()); 5 } 6 7 private int calcPoints() {…} 8 9 private String nextPlayer() {…} 10 }
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 Moves moves = new Moves(); 6 GameMove move = moves.nextMove(); 7 game.score(move.getPlayer(), move.getPoints()); 8 } 9 }
Example – Scalar Replacement
77
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 Moves moves = new Moves(); 6 GameMove move = moves.nextMove(); 7 GameMove move = new GameMove(moves.nextPlayer(), moves.calcPoints()); 8 game.score(move.getPlayer(), move.getPoints()); 9 } 10 }
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 Moves moves = new Moves(); 6 GameMove move = new GameMove(moves.nextPlayer(), moves.calcPoints()); 7 String player = moves.nextPlayer(); 8 int points = moves.calcPoints(); 9 game.score(move.getPlayer(), move.getPoints()); 10 game.score(player, points); 11 } 12 }
Example – Scalar Replacement
78
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 Moves moves = new Moves(); 6 String player = moves.nextPlayer(); 7 int points = moves.calcPoints(); 8 game.score(player, points); 9 game.score(moves.nextPlayer(), moves.calcPoints()); 10 } 11 }
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 Moves moves = new Moves(); 6 game.score(moves.nextPlayer(), moves.calcPoints()); 7 int points = moves.calcPoints(); 8 game.logger.info(moves.nextPlayer() + " scores " + points); 9 game.totalScore += points; 10 } 11 }
Lock Coarsening
• HotSpot merges adjacent synchronized blocks using the same lock
• The compiler is allowed to moved statements into merged coarse blocks
• Tradeoff performance and responsiveness• Reduces instruction count• But locks are held longer
1 private void addPresentor 2 (StringBuffer lecture, String presentor) { 3 lecture 4 .append( " by ") 5 .append(presentor); 6 }
Example – Source Code Revision
80
1 public class Game { 2 3 Logger logger; 4 int totalScore; 5 6 public void score(String player, int points) { 7 logger.info(player + " scores " + points); 8 totalScore += points; 9 } 10 11 public synchronized void multithreadedScore(String player, int points){ 12 score(player, points); 13 } 14 }
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 game.multithreadedScore("Bob", 5); 6 game.multithreadedScore("Jane", 7); 7 game.multithreadedScore("Dwane", 1); 8 } 9 }
Example – Lock Coarsening
81
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 lock(game); 6 game.multithreadedScore("Bob", 5); 7 unlock(game); 8 lock(game); 9 game.multithreadedScore("Jane", 7); 10 unlock(game); 11 lock(game); 12 game.multithreadedScore("Dwane", 1); 13 unlock(game); 14 } 15 }
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 lock(game); 6 game.multithreadedScore("Bob", 5); 7 game.logger.info("Bob" + " scores " + 5); 8 game.totalScore += 5; 9 game.multithreadedScore("Jane", 7); 10 game.logger.info("Jane" + " scores " + 7); 11 game.totalScore += 7; 12 game.multithreadedScore("Dwane", 1); 13 game.logger.info("Dwane" + " scores " + 1); 14 game.totalScore += 1; 15 unlock(game); 16 } 17 }
Lock Elision
• A thread enters a lock that no other thread will synchronize on• Synchronization has no effect• Can be deducted using escape analysis
• Such locks can be elided• Elides 4 StringBuffer
synchronized calls: 1 private String getLecture(String topic, String presentor) { 2 return new StringBuffer() 3 .append(topic) 4 .append( " by ") 5 .append(presentor) 6 .toString(); 7 }
Example - Lock Elision
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 lock(game); 6 game.logger.info("Bob" + " scores " + 5); 7 game.totalScore += 5; 8 game.logger.info("Jane" + " scores " + 7); 9 game.totalScore += 7; 10 game.logger.info("Dwane" + " scores " + 1); 11 game.totalScore += 1; 12 unlock(game); 13 } 14 }
Constants Folding
• Trivial optimization• How many constants are there?
• More than you think!• Inlining generates constants• Unrolling generates constants• Escape analysis generates constants
• JIT determines what is constant in runtime• Whatever doesn’t change
Constants Folding
• Literals folding• Before: int foo = 9*10;• After: int foo = 90;
• String folding or StringBuilder-ing• Before: String foo = "hi Joe " + (9*10);
• After: String foo = new StringBuilder().append("hi Joe ").append(9 * 10).toString();
• After: String foo = "hi Joe 90";
Example – Constants Folding
86
1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 game.logger.info("Bob" + " scores " + 5); 6 game.logger.info("Bob scores 5"); 7 game.totalScore += 5; 8 game.logger.info("Jane" + " scores " + 7); 9 game.logger.info("Jane scores 7"); 10 game.totalScore += 7; 11 game.logger.info("Dwane" + " scores " + 1); 12 game.logger.info("Dwane scores 1"); 13 game.totalScore += 1; 14 } 15 }
Dead Code Elimination
• Dead code - code that has no effect on the outcome of the program execution public static void main(String[] args) {
long start = System.nanoTime();
int result = 0;
for (int i = 0; i < 10 * 1000 * 1000; i++) {
result += Math.sqrt(i);
}
long duration = (System.nanoTime() - start) / 1000000;
System.out.format("Test duration: %d (ms) %n", duration);
}
OSR - On Stack Replacement
• Normally code is switched from interpretation to native in heap context• Before entering method
• OSR - switch from interpretation to compiled code in local context• In the middle of a method call• JVM tracks code block execution count
• Less optimizations• May prevent bound check elimination and
loop unrolling
0Read a to
stack
1Increment
Store a to heap
1
Read a to stack
1
Add 8 9
Store a to heap
9
0
0
1
1
1
9
HeapStack
Out-Of-Order Execution
1 public class OutOfOrder { 2 3 private int a; 4 5 public void foo(){ 6 a++; 7 a+=8; 8 } 9 10 }
1 public class OutOfOrder { 2 3 private int a; 4 5 public void foo(){ 6 a++; 7 a+=8; 8 } 9 10 }
Out-Of-Order Execution
0Read a to
stack
1Increment
Add 8 9
Store a to heap
9
0
0
0
9
HeapStack
Programming & Tuning Tips
• 91
How Can I Help?
• Just write good quality Java code• Object Orientation• Polymorphism• Abstraction• Encapsulation• DRY• KISS
• Let the HotSpot optimize
93
How Can I Help?
• final keyword• For fields:
• Allows caching• Allows lock coarsening
• For methods:• Simplifies Inlining decisions
• Immutable objects die younger
JVM tuning tips
• Reminder: -XX options are non standard• Added for HotSpot development purposes• Mostly tested on Solaris 10• Platform dependent
• Some options may contradict each other
• Know and experiment with these options
94
Monitoring & Troubleshooting
Option Comment
HeapDumpOnOutOfMemoryError Since 5.0u7, HPROF
HeapDumpOnCtrlBreak Since 5.0u14, HPROF
OnError Runs list of user defined command on fatal error
OnOutOfMemoryError Runs list of user defined command on 1st out of memory error
PrintClassHistogram Print a histogram of class instances on Ctrl-Break (jmap –histo)
PrintConcurrentLocks Print java.util.concurrent locks on Ctrl-Break (jstack –l)
PrintCompilation Traces compiled methods
PrintInlining Traces inlining
PrintOptoAssembly Traces the generated Assemlby
TraceClassLoading/Unloading Traces class loading/unloading
95
References
• The HotSpot Home Page• Java HotSpot VM Options• Dynamic compilation and perfor
mance measurement• Urban performance legends, revi
sited• Synchronization optimizations in
Mustang• Robust Java benchmarking• Garbage Collection Tuning
96
References
• JavaOne 2009 Sessions:• Garbage Collection Tuning in the Java
HotSpot™ Virtual Machine• Under the Hood: Inside a High-
Performance JVM™ Machine• Practical Lessons in Memory Analysis• Debugging Your Production JVM™
Machine• Inside Out: A Modern Virtual Machine
Revealed
97
Thank you for your attention
Thanks to Ori Dar!
top related