jvm bytecode - the secret language behind java and scala

34
JVM bytecode The secret language behind Java and Scala

Upload: takipi

Post on 10-May-2015

3.073 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: JVM bytecode - The secret language behind Java and Scala

JVM bytecode

The secret language behind Java and Scala

Page 2: JVM bytecode - The secret language behind Java and Scala

About Me

1. Dev at IDF intl

2. Team lead, architect at IAI Space Industries

3. CEO at VisualTao

4. Director - AutoCAD Web & Mobile, GM Autodesk IL

5. CEO at Takipi

Writing code for the past ~15 years.

Page 3: JVM bytecode - The secret language behind Java and Scala

Overview

1. What is bytecode

2. The 3 biggest differences between source code and bytecode

3. 5 things you should know about bytecode

4. Practical uses

Page 4: JVM bytecode - The secret language behind Java and Scala

What is bytecode?

A set of low-level instructions to be executed by the JVM.

~200 instruction types, each ~1-2 bytes in size.

Some instructions are very similar to Java, some completely different.

Page 5: JVM bytecode - The secret language behind Java and Scala

Bytecode is very similar to Assembly

.cpp -> g++ -> exec file -> OS

.java -> JavaC -> .class file -> JVM

.scala-> ScalaC -> .class file -> JVM

(That’s why we avoid it…)

assembly - > exec file -> OS

bytecode -> .class file -> JVM

Page 6: JVM bytecode - The secret language behind Java and Scala

The 3 biggest differences between source and byte code

Page 7: JVM bytecode - The secret language behind Java and Scala

1. No variables

Bytecode employs an Assembly-like register stack known as the locals stack to hold variables.

Values of fields, functions and of binary operations (+, -, * ..) are held in a stack known as the operand stack.

Page 8: JVM bytecode - The secret language behind Java and Scala

public class LocalVars { private int intField; private double doubleField; private boolean state; public double getField() { if (state) { int a = intField + 1; return a; } else { double b = doubleField + 1; return b; } }}

Notes

Notice how the same register slot (1) is re-used between blocks for different variables.

The variable meta-data table describes the mappings between registers and source code variables

public getField() : double L0 ALOAD 0: this GETFIELD NoLocalVars.state : boolean IFEQ L1 L2 ALOAD 0: this GETFIELD NoLocalVars.intField : int ICONST_1 IADD ISTORE 1 L3 GETSTATIC System.out : PrintStream ILOAD 1: a INVOKEVIRTUAL PrintStream.println(int) : void L4 ILOAD 1: a I2D DRETURN L1 ALOAD 0: this GETFIELD NoLocalVars.doubleField : double DCONST_1 DADD DSTORE 1 L5 GETSTATIC System.out : PrintStream DLOAD 1: b INVOKEVIRTUAL PrintStream.println(double) : void L6 DLOAD 1: b DRETURN L7 LOCALVARIABLE this NoLocalVars L0 L7 0 LOCALVARIABLE a int L3 L1 1 LOCALVARIABLE b double L5 L7 1 MAXSTACK = 4 MAXLOCALS = 3

Page 9: JVM bytecode - The secret language behind Java and Scala

2. No binary logical operators

No built-in support for &&, ||, ^

Compilers implement these using jump instructions.

Page 10: JVM bytecode - The secret language behind Java and Scala

public class NoLogicalOperators { public void and(boolean a, boolean b) { if (a && b) { System.out.println("its true"); } }

public void or(boolean a, boolean b) { if (a || b) { System.out.println("its true"); } }}

public and(boolean, boolean) : void L0 ILOAD 1: a IFEQ L1 ILOAD 2: b IFEQ L1 L2 GETSTATIC System.out : PrintStream LDC "its true" INVOKEVIRTUAL PrintStream.println(String) : void L1 RETURN L3

public or(boolean, boolean) : void L0 ILOAD 1: a IFNE L1 ILOAD 2: b IFEQ L2 L1 GETSTATIC System.out : PrintStream LDC "its true" INVOKEVIRTUAL PrintStream.println(String) : void L2 RETURN L3

Notes

Notice how both && and || operators are implemented using jump instructions who evaluate the last value if the operand stack

Page 11: JVM bytecode - The secret language behind Java and Scala

public class NoLogicalOperators { public void orX2(boolean a, boolean b, boolean c, boolean d) { if ((a || b) && (c || d)) { System.out.println("its true"); } }}

public orX2(boolean, boolean, boolean, boolean) : void L0 ILOAD 1: a IFNE L1 ILOAD 2: b IFEQ L2 L1 ILOAD 3: c IFNE L3 ILOAD 4: d IFEQ L2 L3 GETSTATIC System.out : PrintStream LDC "its true" INVOKEVIRTUAL PrintStream.println(String) : void L2 RETURN L4

Notes

For composite ||, && conditions compilers will generate multiple jump combinations

Page 12: JVM bytecode - The secret language behind Java and Scala

3. No loop constructs

There’s no built-in support for while, for, for-each loops.

Compilers implement these using jump instructions.

Page 13: JVM bytecode - The secret language behind Java and Scala

public class Loops{ public void forLoop(int n) { for (int i = 0; i < n; i++) { System.out.println(i); } }}

public forLoop(int) : void L0 ICONST_0 ISTORE 2 L1 GOTO L2 L3 GETSTATIC System.out : PrintStream ILOAD 2: i INVOKEVIRTUAL PrintStream.println(int) : void L4 IINC 2: i 1 L2 ILOAD 2: i ILOAD 1: n IF_ICMPLT L3 L5 RETURN L6 LOCALVARIABLE this Loops L0 L6 0 LOCALVARIABLE n int L0 L6 1 LOCALVARIABLE i int L1 L5 2

Notes

A for loop is implemented using a conditional jump instruction comparing i and n

Page 14: JVM bytecode - The secret language behind Java and Scala

public class Loops{ public void whileLoop(int n) { int i = 0;

while (i < n) { System.out.println(i); i++; } }}

public whileLoop(int) : void L0 ICONST_0 ISTORE 2 L1 GOTO L2 L3 GETSTATIC System.out : PrintStream ILOAD 2: i INVOKEVIRTUAL PrintStream.println(int) : void L4 IINC 2: i 1 L2 ILOAD 2: i ILOAD 1: n IF_ICMPLT L3 L5 RETURN L6 LOCALVARIABLE this Loops L0 L6 0 LOCALVARIABLE n int L0 L6 1 LOCALVARIABLE i int L1 L6 2

Notes

This while loop is also implemented using a conditional jump instruction comparing i and n. The bytecode is nearly identical to the previous loop.

Page 15: JVM bytecode - The secret language behind Java and Scala

public class Loops{ public void forEachLoop(List<String> strings) { for (String s : strings) { System.out.println(s); } }}

public forEachLoop(List) : void L0 ALOAD 1: strings INVOKEINTERFACE List.iterator() : Iterator ASTORE 3 GOTO L1 L2 ALOAD 3 INVOKEINTERFACE Iterator.next() : Object CHECKCAST String ASTORE 2 L3 GETSTATIC System.out : PrintStream ALOAD 2: s INVOKEVIRTUAL PrintStream.println(String) : void L1 ALOAD 3 INVOKEINTERFACE Iterator.hasNext() : boolean IFNE L2 L4 RETURN L5 LOCALVARIABLE this Loops L0 L5 0 LOCALVARIABLE strings List L0 L5 1 // declaration: java.util.List<java.lang.String> LOCALVARIABLE s String L3 L1 2}

Notes

A for-each loop is generated by the javaC compiler by jumping against the hasNext() method. The result bytecode is unaware of the for-each construct.

Also notice how register 3 is used to hold the iterator

Page 16: JVM bytecode - The secret language behind Java and Scala

5 Things you should know about bytecode that affect everyday programming

Page 17: JVM bytecode - The secret language behind Java and Scala

1. No String support

Like in C, there’s no built-in support for strings, only char arrays.

Compilers usually use StringBuilder to compensate.

No penalty for concatenating different data types

Page 18: JVM bytecode - The secret language behind Java and Scala

public class ImplicitStrings{ public String toString(int a, int b) { String c = "Hello " + a + "World" + b; return c; }}

// access flags 0x1 public toString(int, int) : String L0 NEW StringBuilder DUP LDC "Hello " INVOKESPECIAL StringBuilder.<init>(String) : void ILOAD 1: a INVOKEVIRTUAL StringBuilder.append(int) : StringBuilder LDC "World" INVOKEVIRTUAL StringBuilder.append(String) : StringBuilder ILOAD 2: b INVOKEVIRTUAL StringBuilder.append(int) : StringBuilder INVOKEVIRTUAL StringBuilder.toString() : String ASTORE 3 L1 ALOAD 3: c ARETURN L2 LOCALVARIABLE this ImplicitStrings L0 L2 0 LOCALVARIABLE a int L0 L2 1 LOCALVARIABLE b int L0 L2 2 LOCALVARIABLE c String L1 L2 3

Notes

JavaC uses java.lang.StringBuilder to combine (+)strings. Different overloads of the .append() method are used to concat different data types.

Page 19: JVM bytecode - The secret language behind Java and Scala

public class ImplicitStrings{ public String toString1(int a, int b) { String c;

c = "Hello" + a; c += "World" + b;

return c; }}

public toString1(int, int) : String L0 NEW StringBuilder DUP LDC "Hello" INVOKESPECIAL StringBuilder.<init>(String) : void ILOAD 1: a INVOKEVIRTUAL StringBuilder.append(int) : StringBuilder INVOKEVIRTUAL StringBuilder.toString() : String ASTORE 3 L1 NEW StringBuilder DUP ALOAD 3: c INVOKESTATIC String.valueOf(Object) : String INVOKESPECIAL StringBuilder.<init>(String) : void LDC "World" INVOKEVIRTUAL StringBuilder.append(String) : StringBuilder ILOAD 2: b INVOKEVIRTUAL StringBuilder.append(int) : StringBuilder INVOKEVIRTUAL StringBuilder.toString() : String ASTORE 3: c L2 ALOAD 3: c ARETURN L3

Notes While this code is identical to the previous example in terms of functionality, there’s a performance penalty to note as 2 StringBuilders are constructed

Page 20: JVM bytecode - The secret language behind Java and Scala

2. Only 4 primitive types

Bytecode only operates on 4 primitives types ( int, float, double, long) vs. the 8 Java primitives.

Doesn’t operate on char, bool, byte, short (treated as ints)

Page 21: JVM bytecode - The secret language behind Java and Scala

public mulByeShort(byte, short) : void L0 GETSTATIC System.out : PrintStream ILOAD 1: b ILOAD 2: c IMUL INVOKEVIRTUAL PrintStream.println(int) : void L1 RETURN L2 LOCALVARIABLE this B_BytecodePrimitives L0 L2 0 LOCALVARIABLE b byte L0 L2 1 LOCALVARIABLE c short L0 L2 2

public mulInts(int, int) : void L0 GETSTATIC System.out : PrintStream ILOAD 1: b ILOAD 2: c IMUL INVOKEVIRTUAL PrintStream.println(int) : void L1 RETURN L2 LOCALVARIABLE this B_BytecodePrimitives L0 L2 0 LOCALVARIABLE b int L0 L2 1 LOCALVARIABLE c int L0 L2 2Notes

Notice how the bytecode for these 2 methods is identical, regardless of the difference in var types

public class BytecodePrimitives { public void mulByeShort(byte b, short c) { System.out.println(b * c); }

public void mulInts(int b, int c) { System.out.println(b * c); }}

Page 22: JVM bytecode - The secret language behind Java and Scala

Notes The same observation is also true when evaluating conditions. See how both boolean and int operations are treated the same.

public class BytecodePrimitives { public void printIfTrue(boolean b) { if (b) { System.out.println("Hi"); } }

public void printIfN0(int i) { if (i != 0) { System.out.println("Hi"); } }}

public printIfTrue(boolean) : void L0 ILOAD 1: b IFEQ L1 L2 GETSTATIC System.out : PrintStream LDC "Hi" INVOKEVIRTUAL PrintStream.println(String) : void L1 RETURN L3 LOCALVARIABLE this B_BytecodePrimitives L0 L3 0 LOCALVARIABLE b boolean L0 L3 1

public printIfN0(int) : void L0 ILOAD 1: i IFEQ L1 L2 GETSTATIC System.out : PrintStream LDC "Hi" INVOKEVIRTUAL PrintStream.println(String) : void L1 RETURN L3 LOCALVARIABLE this B_BytecodePrimitives L0 L3 0 LOCALVARIABLE i int L0 L3 1

Page 23: JVM bytecode - The secret language behind Java and Scala

3. Using nested classes?

Compilers will add synthetic $this fields.

If you’re not making calls to your outer-class - don’t forget to add a static modifier.

Page 24: JVM bytecode - The secret language behind Java and Scala

public class NestedClasses { public class NestedClass { }

public static class StaticNestedClass {

}}

public class C_NestedClasses { public class C_NestedClasses$NestedClass { final C_NestedClasses this$0

public <init>(C_NestedClasses) : void L0 ALOAD 0: this ALOAD 1 PUTFIELD C_NestedClasses$NestedClass.this$0 : C_NestedClasses ALOAD 0: this INVOKESPECIAL Object.<init>() : void RETURN L1 LOCALVARIABLE this C_NestedClasses$NestedClass L0 L1 0

}

public class C_NestedClasses$StaticNestedClass { public <init>() : void L0 ALOAD 0: this INVOKESPECIAL Object.<init>() : void RETURN L1 }

Notes The NestedClass inner-class has an implicit $this0 created for him, and assigned in the constructor.

Page 25: JVM bytecode - The secret language behind Java and Scala

Using nested classes (2)?

Try and avoid implicitly creating bridge methods by invoking private members (use protected)

Page 26: JVM bytecode - The secret language behind Java and Scala

public class BridgeMethods { private int member;

public class BridgeMethodClass { public void printBridge() { System.out.println(member); } }}

static access$0(D_BridgeMethods) : int L0 ALOAD 0 GETFIELD D_BridgeMethods.member : int IRETURN MAXSTACK = 1 MAXLOCALS = 1

public printBridge() : void L0 GETSTATIC System.out : PrintStream ALOAD 0: this GETFIELD D_BridgeMethods$BridgeMethodClass.this$0 : D_BridgeMethods INVOKESTATIC D_BridgeMethods.access$0(D_BridgeMethods) : int INVOKEVIRTUAL PrintStream.println(int) : void L1 RETURN L2 LOCALVARIABLE this D_BridgeMethods$BridgeMethodClass L0 L2 0

Notes When a private field or method is invoked javaC will add synthetic bridge methods to allow the internal class to access private members of its outer-class

Page 27: JVM bytecode - The secret language behind Java and Scala

4. Boxing and unboxing

Boxing is added by the Java/Scala compiler.

There’s no such concept in bytecode or in the JVM.

Watch out for NullPointerExceptions

Page 28: JVM bytecode - The secret language behind Java and Scala

public class Boxing{ public void printSqr(int a) { int a1 = a;

Integer a2 = a;

System.out.println(a1 * a2);

}

public void check(Integer i) { if (i == 0) { System.out.println("zero"); } }}

public printSqr(int) : void L0 ILOAD 1: a ISTORE 2 L1 ILOAD 1: a INVOKESTATIC Integer.valueOf(int) : Integer ASTORE 3 L2 GETSTATIC System.out : PrintStream ILOAD 2: a1 ALOAD 3: a2 INVOKEVIRTUAL Integer.intValue() : int IMUL INVOKEVIRTUAL PrintStream.println(int) : void L3 RETURN L4 LOCALVARIABLE this E_Boxing L0 L4 0 LOCALVARIABLE a int L0 L4 1 LOCALVARIABLE a1 int L1 L4 2 LOCALVARIABLE a2 Integer L2 L4 3

public check(Integer) : void L0 ALOAD 1: i INVOKEVIRTUAL Integer.intValue() : int IFNE L1 L2 GETSTATIC System.out : PrintStream LDC "zero" INVOKEVIRTUAL PrintStream.println(String) : void L1 RETURN L3 LOCALVARIABLE this E_Boxing L0 L3 0 LOCALVARIABLE i Integer L0 L3 1}

Notes Notice how javaC implicitly invokes the various java.lang.Integer methods.

Other compilers, such as scalaC, use their own boxed types

Page 29: JVM bytecode - The secret language behind Java and Scala

5. 3 bytecode myths

1. Bytecode supports multiple inheritance

2. Illegal bytecode can crash the JVM (it’s blocked by the JVM verifier)

3. Low-level bytecode can operate outside the JVM sandbox

Page 30: JVM bytecode - The secret language behind Java and Scala

3 Main bytecode uses

1. Building a compiler

2. Static analysis

3. JVM bytecode instrumentation

Page 31: JVM bytecode - The secret language behind Java and Scala

Building a “better Java” through Scala and ScalaC.

A new OO/functional language which compiles into standard JVM bytecode.

Transparent from the JVM’s perspective.

Page 32: JVM bytecode - The secret language behind Java and Scala

Takipi - Overview

Explain the cause of server exceptions, latency and unexpected code behavior at scale.

Help R&D teams solve errors and downtime in production systems, without having to re-deploy code or sift through log files.

Page 33: JVM bytecode - The secret language behind Java and Scala

Takipi & bytecode

1. Index bytecode in the cloud.

2. When an exception occurs, query the DB to understand which variables, fields and conditions are causing it.

3. Instrument new bytecode to log the values causing the exception.

4. Present a “story” of the exception to the developer.

Page 34: JVM bytecode - The secret language behind Java and Scala

Thanks!

[email protected]

@takipid (tweeting about Java, Scala, DevOps and Cloud)

Join our private beta - takipi.com/signup