jvm bytecode - the secret language behind java and scala

Post on 10-May-2015

3.075 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

JVM bytecode

The secret language behind Java and Scala

About Me

1. Dev at IDF intl

2. Team lead, architect at IAI Space Industries

3. CEO at VisualTao

4. Director - AutoCAD Web & Mobile, GM Autodesk IL

5. CEO at Takipi

Writing code for the past ~15 years.

Overview

1. What is bytecode

2. The 3 biggest differences between source code and bytecode

3. 5 things you should know about bytecode

4. Practical uses

What is bytecode?

A set of low-level instructions to be executed by the JVM.

~200 instruction types, each ~1-2 bytes in size.

Some instructions are very similar to Java, some completely different.

Bytecode is very similar to Assembly

.cpp -> g++ -> exec file -> OS

.java -> JavaC -> .class file -> JVM

.scala-> ScalaC -> .class file -> JVM

(That’s why we avoid it…)

assembly - > exec file -> OS

bytecode -> .class file -> JVM

The 3 biggest differences between source and byte code

1. No variables

Bytecode employs an Assembly-like register stack known as the locals stack to hold variables.

Values of fields, functions and of binary operations (+, -, * ..) are held in a stack known as the operand stack.

public class LocalVars { private int intField; private double doubleField; private boolean state; public double getField() { if (state) { int a = intField + 1; return a; } else { double b = doubleField + 1; return b; } }}

Notes

Notice how the same register slot (1) is re-used between blocks for different variables.

The variable meta-data table describes the mappings between registers and source code variables

public getField() : double L0 ALOAD 0: this GETFIELD NoLocalVars.state : boolean IFEQ L1 L2 ALOAD 0: this GETFIELD NoLocalVars.intField : int ICONST_1 IADD ISTORE 1 L3 GETSTATIC System.out : PrintStream ILOAD 1: a INVOKEVIRTUAL PrintStream.println(int) : void L4 ILOAD 1: a I2D DRETURN L1 ALOAD 0: this GETFIELD NoLocalVars.doubleField : double DCONST_1 DADD DSTORE 1 L5 GETSTATIC System.out : PrintStream DLOAD 1: b INVOKEVIRTUAL PrintStream.println(double) : void L6 DLOAD 1: b DRETURN L7 LOCALVARIABLE this NoLocalVars L0 L7 0 LOCALVARIABLE a int L3 L1 1 LOCALVARIABLE b double L5 L7 1 MAXSTACK = 4 MAXLOCALS = 3

2. No binary logical operators

No built-in support for &&, ||, ^

Compilers implement these using jump instructions.

public class NoLogicalOperators { public void and(boolean a, boolean b) { if (a && b) { System.out.println("its true"); } }

public void or(boolean a, boolean b) { if (a || b) { System.out.println("its true"); } }}

public and(boolean, boolean) : void L0 ILOAD 1: a IFEQ L1 ILOAD 2: b IFEQ L1 L2 GETSTATIC System.out : PrintStream LDC "its true" INVOKEVIRTUAL PrintStream.println(String) : void L1 RETURN L3

public or(boolean, boolean) : void L0 ILOAD 1: a IFNE L1 ILOAD 2: b IFEQ L2 L1 GETSTATIC System.out : PrintStream LDC "its true" INVOKEVIRTUAL PrintStream.println(String) : void L2 RETURN L3

Notes

Notice how both && and || operators are implemented using jump instructions who evaluate the last value if the operand stack

public class NoLogicalOperators { public void orX2(boolean a, boolean b, boolean c, boolean d) { if ((a || b) && (c || d)) { System.out.println("its true"); } }}

public orX2(boolean, boolean, boolean, boolean) : void L0 ILOAD 1: a IFNE L1 ILOAD 2: b IFEQ L2 L1 ILOAD 3: c IFNE L3 ILOAD 4: d IFEQ L2 L3 GETSTATIC System.out : PrintStream LDC "its true" INVOKEVIRTUAL PrintStream.println(String) : void L2 RETURN L4

Notes

For composite ||, && conditions compilers will generate multiple jump combinations

3. No loop constructs

There’s no built-in support for while, for, for-each loops.

Compilers implement these using jump instructions.

public class Loops{ public void forLoop(int n) { for (int i = 0; i < n; i++) { System.out.println(i); } }}

public forLoop(int) : void L0 ICONST_0 ISTORE 2 L1 GOTO L2 L3 GETSTATIC System.out : PrintStream ILOAD 2: i INVOKEVIRTUAL PrintStream.println(int) : void L4 IINC 2: i 1 L2 ILOAD 2: i ILOAD 1: n IF_ICMPLT L3 L5 RETURN L6 LOCALVARIABLE this Loops L0 L6 0 LOCALVARIABLE n int L0 L6 1 LOCALVARIABLE i int L1 L5 2

Notes

A for loop is implemented using a conditional jump instruction comparing i and n

public class Loops{ public void whileLoop(int n) { int i = 0;

while (i < n) { System.out.println(i); i++; } }}

public whileLoop(int) : void L0 ICONST_0 ISTORE 2 L1 GOTO L2 L3 GETSTATIC System.out : PrintStream ILOAD 2: i INVOKEVIRTUAL PrintStream.println(int) : void L4 IINC 2: i 1 L2 ILOAD 2: i ILOAD 1: n IF_ICMPLT L3 L5 RETURN L6 LOCALVARIABLE this Loops L0 L6 0 LOCALVARIABLE n int L0 L6 1 LOCALVARIABLE i int L1 L6 2

Notes

This while loop is also implemented using a conditional jump instruction comparing i and n. The bytecode is nearly identical to the previous loop.

public class Loops{ public void forEachLoop(List<String> strings) { for (String s : strings) { System.out.println(s); } }}

public forEachLoop(List) : void L0 ALOAD 1: strings INVOKEINTERFACE List.iterator() : Iterator ASTORE 3 GOTO L1 L2 ALOAD 3 INVOKEINTERFACE Iterator.next() : Object CHECKCAST String ASTORE 2 L3 GETSTATIC System.out : PrintStream ALOAD 2: s INVOKEVIRTUAL PrintStream.println(String) : void L1 ALOAD 3 INVOKEINTERFACE Iterator.hasNext() : boolean IFNE L2 L4 RETURN L5 LOCALVARIABLE this Loops L0 L5 0 LOCALVARIABLE strings List L0 L5 1 // declaration: java.util.List<java.lang.String> LOCALVARIABLE s String L3 L1 2}

Notes

A for-each loop is generated by the javaC compiler by jumping against the hasNext() method. The result bytecode is unaware of the for-each construct.

Also notice how register 3 is used to hold the iterator

5 Things you should know about bytecode that affect everyday programming

1. No String support

Like in C, there’s no built-in support for strings, only char arrays.

Compilers usually use StringBuilder to compensate.

No penalty for concatenating different data types

public class ImplicitStrings{ public String toString(int a, int b) { String c = "Hello " + a + "World" + b; return c; }}

// access flags 0x1 public toString(int, int) : String L0 NEW StringBuilder DUP LDC "Hello " INVOKESPECIAL StringBuilder.<init>(String) : void ILOAD 1: a INVOKEVIRTUAL StringBuilder.append(int) : StringBuilder LDC "World" INVOKEVIRTUAL StringBuilder.append(String) : StringBuilder ILOAD 2: b INVOKEVIRTUAL StringBuilder.append(int) : StringBuilder INVOKEVIRTUAL StringBuilder.toString() : String ASTORE 3 L1 ALOAD 3: c ARETURN L2 LOCALVARIABLE this ImplicitStrings L0 L2 0 LOCALVARIABLE a int L0 L2 1 LOCALVARIABLE b int L0 L2 2 LOCALVARIABLE c String L1 L2 3

Notes

JavaC uses java.lang.StringBuilder to combine (+)strings. Different overloads of the .append() method are used to concat different data types.

public class ImplicitStrings{ public String toString1(int a, int b) { String c;

c = "Hello" + a; c += "World" + b;

return c; }}

public toString1(int, int) : String L0 NEW StringBuilder DUP LDC "Hello" INVOKESPECIAL StringBuilder.<init>(String) : void ILOAD 1: a INVOKEVIRTUAL StringBuilder.append(int) : StringBuilder INVOKEVIRTUAL StringBuilder.toString() : String ASTORE 3 L1 NEW StringBuilder DUP ALOAD 3: c INVOKESTATIC String.valueOf(Object) : String INVOKESPECIAL StringBuilder.<init>(String) : void LDC "World" INVOKEVIRTUAL StringBuilder.append(String) : StringBuilder ILOAD 2: b INVOKEVIRTUAL StringBuilder.append(int) : StringBuilder INVOKEVIRTUAL StringBuilder.toString() : String ASTORE 3: c L2 ALOAD 3: c ARETURN L3

Notes While this code is identical to the previous example in terms of functionality, there’s a performance penalty to note as 2 StringBuilders are constructed

2. Only 4 primitive types

Bytecode only operates on 4 primitives types ( int, float, double, long) vs. the 8 Java primitives.

Doesn’t operate on char, bool, byte, short (treated as ints)

public mulByeShort(byte, short) : void L0 GETSTATIC System.out : PrintStream ILOAD 1: b ILOAD 2: c IMUL INVOKEVIRTUAL PrintStream.println(int) : void L1 RETURN L2 LOCALVARIABLE this B_BytecodePrimitives L0 L2 0 LOCALVARIABLE b byte L0 L2 1 LOCALVARIABLE c short L0 L2 2

public mulInts(int, int) : void L0 GETSTATIC System.out : PrintStream ILOAD 1: b ILOAD 2: c IMUL INVOKEVIRTUAL PrintStream.println(int) : void L1 RETURN L2 LOCALVARIABLE this B_BytecodePrimitives L0 L2 0 LOCALVARIABLE b int L0 L2 1 LOCALVARIABLE c int L0 L2 2Notes

Notice how the bytecode for these 2 methods is identical, regardless of the difference in var types

public class BytecodePrimitives { public void mulByeShort(byte b, short c) { System.out.println(b * c); }

public void mulInts(int b, int c) { System.out.println(b * c); }}

Notes The same observation is also true when evaluating conditions. See how both boolean and int operations are treated the same.

public class BytecodePrimitives { public void printIfTrue(boolean b) { if (b) { System.out.println("Hi"); } }

public void printIfN0(int i) { if (i != 0) { System.out.println("Hi"); } }}

public printIfTrue(boolean) : void L0 ILOAD 1: b IFEQ L1 L2 GETSTATIC System.out : PrintStream LDC "Hi" INVOKEVIRTUAL PrintStream.println(String) : void L1 RETURN L3 LOCALVARIABLE this B_BytecodePrimitives L0 L3 0 LOCALVARIABLE b boolean L0 L3 1

public printIfN0(int) : void L0 ILOAD 1: i IFEQ L1 L2 GETSTATIC System.out : PrintStream LDC "Hi" INVOKEVIRTUAL PrintStream.println(String) : void L1 RETURN L3 LOCALVARIABLE this B_BytecodePrimitives L0 L3 0 LOCALVARIABLE i int L0 L3 1

3. Using nested classes?

Compilers will add synthetic $this fields.

If you’re not making calls to your outer-class - don’t forget to add a static modifier.

public class NestedClasses { public class NestedClass { }

public static class StaticNestedClass {

}}

public class C_NestedClasses { public class C_NestedClasses$NestedClass { final C_NestedClasses this$0

public <init>(C_NestedClasses) : void L0 ALOAD 0: this ALOAD 1 PUTFIELD C_NestedClasses$NestedClass.this$0 : C_NestedClasses ALOAD 0: this INVOKESPECIAL Object.<init>() : void RETURN L1 LOCALVARIABLE this C_NestedClasses$NestedClass L0 L1 0

}

public class C_NestedClasses$StaticNestedClass { public <init>() : void L0 ALOAD 0: this INVOKESPECIAL Object.<init>() : void RETURN L1 }

Notes The NestedClass inner-class has an implicit $this0 created for him, and assigned in the constructor.

Using nested classes (2)?

Try and avoid implicitly creating bridge methods by invoking private members (use protected)

public class BridgeMethods { private int member;

public class BridgeMethodClass { public void printBridge() { System.out.println(member); } }}

static access$0(D_BridgeMethods) : int L0 ALOAD 0 GETFIELD D_BridgeMethods.member : int IRETURN MAXSTACK = 1 MAXLOCALS = 1

public printBridge() : void L0 GETSTATIC System.out : PrintStream ALOAD 0: this GETFIELD D_BridgeMethods$BridgeMethodClass.this$0 : D_BridgeMethods INVOKESTATIC D_BridgeMethods.access$0(D_BridgeMethods) : int INVOKEVIRTUAL PrintStream.println(int) : void L1 RETURN L2 LOCALVARIABLE this D_BridgeMethods$BridgeMethodClass L0 L2 0

Notes When a private field or method is invoked javaC will add synthetic bridge methods to allow the internal class to access private members of its outer-class

4. Boxing and unboxing

Boxing is added by the Java/Scala compiler.

There’s no such concept in bytecode or in the JVM.

Watch out for NullPointerExceptions

public class Boxing{ public void printSqr(int a) { int a1 = a;

Integer a2 = a;

System.out.println(a1 * a2);

}

public void check(Integer i) { if (i == 0) { System.out.println("zero"); } }}

public printSqr(int) : void L0 ILOAD 1: a ISTORE 2 L1 ILOAD 1: a INVOKESTATIC Integer.valueOf(int) : Integer ASTORE 3 L2 GETSTATIC System.out : PrintStream ILOAD 2: a1 ALOAD 3: a2 INVOKEVIRTUAL Integer.intValue() : int IMUL INVOKEVIRTUAL PrintStream.println(int) : void L3 RETURN L4 LOCALVARIABLE this E_Boxing L0 L4 0 LOCALVARIABLE a int L0 L4 1 LOCALVARIABLE a1 int L1 L4 2 LOCALVARIABLE a2 Integer L2 L4 3

public check(Integer) : void L0 ALOAD 1: i INVOKEVIRTUAL Integer.intValue() : int IFNE L1 L2 GETSTATIC System.out : PrintStream LDC "zero" INVOKEVIRTUAL PrintStream.println(String) : void L1 RETURN L3 LOCALVARIABLE this E_Boxing L0 L3 0 LOCALVARIABLE i Integer L0 L3 1}

Notes Notice how javaC implicitly invokes the various java.lang.Integer methods.

Other compilers, such as scalaC, use their own boxed types

5. 3 bytecode myths

1. Bytecode supports multiple inheritance

2. Illegal bytecode can crash the JVM (it’s blocked by the JVM verifier)

3. Low-level bytecode can operate outside the JVM sandbox

3 Main bytecode uses

1. Building a compiler

2. Static analysis

3. JVM bytecode instrumentation

Building a “better Java” through Scala and ScalaC.

A new OO/functional language which compiles into standard JVM bytecode.

Transparent from the JVM’s perspective.

Takipi - Overview

Explain the cause of server exceptions, latency and unexpected code behavior at scale.

Help R&D teams solve errors and downtime in production systems, without having to re-deploy code or sift through log files.

Takipi & bytecode

1. Index bytecode in the cloud.

2. When an exception occurs, query the DB to understand which variables, fields and conditions are causing it.

3. Instrument new bytecode to log the values causing the exception.

4. Present a “story” of the exception to the developer.

Thanks!

tal.weiss@takipi.com

@takipid (tweeting about Java, Scala, DevOps and Cloud)

Join our private beta - takipi.com/signup

top related