one compiler - oracle · jdk . reachable methods, all java classes from . fields, and...

38

Upload: nguyentruc

Post on 06-Sep-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

One Compiler

Christian Wimmer VM Research Group, Oracle Labs

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement The following is intended to provide some insight into a line of research in Oracle Labs. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described in connection with any Oracle product or service remains at the sole discretion of Oracle. Any views expressed in this presentation are my own and do not necessarily reflect the views of Oracle.

3

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Typical Stack of Java HotSpot VM Running Nashorn

4

Bytcode interpreter

Client compiler

Server compiler

Java code (bytecode from .java file)

JavaScript (dynamically generated bytecode)

VM startup code

Startup Java code

Native code

Java HotSpot VM

JDK native code

JNI native code

Startup JavaScript code

Hot Java code

Hot JavaScript code

Deoptimized JavaScript code

VM runtime code

Stack frame layout: Source of code:

How do you find all the GC root pointers?

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Duplication: Everything Implement Three Times Bytecode interpreter Compiled bytecode Native code (C/C++)

Stack frame layout Close to JVM spec Spill slots Unspecified

Stack frame size variable fixed per method unknown

Root pointers for GC Bytecode liveness (expensive to compute)

Pointer map from compiler Explicit handles (error prone)

Exception handling Interpret metadata Compiled in (mostly) Explicit checks (error prone)

Porting to new architecture Write assembly code Write client compiler and server compiler backends

Write gcc backend

Debugging Java debugger Java debugger gdb

5

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Truffle System Structure

Low-footprint VM, also suitable for embedding

Common API separates language implementation, optimization system, and tools (debugger)

Language agnostic dynamic compiler

AST Interpreter for every language

Integrate with Java applications

Substrate VM

Graal

JavaScript Ruby LLVM R

Graal VM

Truffle

6

Your language should be here!

Tools

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Graal and Truffle Tutorials

7

https://wiki.openjdk.java.net/display/Graal/Publications+and+Presentations

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Speculate and Optimize …

8

U

U U

U

U I

I I

G

G I

I I

G

G

Node Specializationfor Profiling Feedback

AST InterpreterSpecialized Nodes

AST InterpreterUninitialized Nodes

Compilation usingPartial Evaluation

Compiled Code

Node Transitions

S

UI

D

G

Uninitialized Integer

Generic

DoubleString

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

I

I I

G

G I

I I

G

G

Transfer back to AST Interpreter

D

I D

G

G D

I D

G

G

Node Specialization to Update Profiling Feedback

Recompilation usingPartial Evaluation

… and Transfer to Interpreter and Reoptimize!

9

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Performance: Graal VM

10

1.02 1.2

4.1 4.5

0.85 0.9

0

1

2

3

4

5

Java Scala Ruby R Native JavaScript

Speedup, higher is better

Performance relative to: HotSpot/Server, HotSpot/Server running JRuby, GNU R, LLVM AOT compiled, V8

Graal Best Specialized Competition

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Possible Stack of Java HotSpot VM Running Truffle

11

Bytcode interpreter

Graal compiler

Java code (bytecode from .java file)

VM startup code

Startup Java code

Native code

Java HotSpot VM

JDK native code

JNI native code

Startup JavaScript code

Hot Java code

Hot JavaScript code

Deoptimized JavaScript code

VM runtime code

Stack frame layout: Source of code:

Our default configuration of Truffle still uses Client and Server compiler for Java code

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

The Substrate VM is …

an embeddable VM

for, and written in, a subset of Java

optimized to execute Truffle languages

ahead-of-time compiled using Graal

integrating with native development tools.

12

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Typical Stack of Substrate VM Running Truffle

13

Graal compiler Java code (bytecode from .java file)

VM startup code

Startup Java code

SystemJava code

Startup JavaScript code

Hot Java code

Hot JavaScript code

Deoptimized JavaScript code

VM runtime code

Stack frame layout: Source of code:

Same compiler for ahead-of-time compiled Java code and dynamically compiled Truffle AST

Substrate VM runtime is written in Java

Transfer to AST interpreter (deoptimization) to Graal compiled code with extra deoptimization entry points

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Substrate VM: Execution Model

14

Ahead-of-Time Compilation

Points-To Analysis

Substrate VM

Truffle Language

JDK

Reachable methods, fields, and classes

Machine Code

Initial Heap

All Java classes from Truffle language

(or any application), JDK, and Substrate VM

Application running without dependency on JDK and without Java class loading

DWARF Info

ELF / MachO Binary

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Substrate VM Building Blocks • Reduced runtime system, all written in Java

– Stack walking, exception handling, garbage collector, deoptimization – Graal for ahead-of-time compilation and dynamic compilation

• Points-to analysis – Closed-world assumption: no dynamic class loading, no reflection – Using Graal for bytecode parsing – Fixed-point iteration: propagate type states through methods

• SystemJava for integration with C code – Machine-word sized value, represented as Java interface, but unboxed by compiler – Import of C functions and C structs to Java

• Substitutions for JDK methods that use unsupported features – JNI code replaced with SystemJava code that directly calls to C library

15

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Key Features of Graal • Designed for speculative optimizations and deoptimization

– Metadata for deoptimization is propagated through all optimization phases

• Designed for exact garbage collection – Read/write barriers, pointer maps for garbage collector

• Aggressive high-level optimizations – Example: partial escape analysis

• Modular architecture – Configurable compiler phases – Compiler-VM separation: snippets, provider interfaces

• Written in Java to lower the entry barrier – Graal compiling and optimizing itself is also a good optimization opportunity

16

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Deoptimization

17

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Deoptimization • Transfer from optimized machine code back to unoptimized code

• Enables speculative optimizations – Optimized code does not need to deal with corner cases

• No control flow merges from slow-path code back into the fast path • More potential for optimizations

– Optimized code does not need to check assumptions • Instead, it gets invalidated externally when assumption is no longer valid

• Speculative optimizations are essential for optimizing dynamic languages – Speculate on JavaScript type stability – Speculate that Ruby operators for primitive types are not changed by program – Polymorphic inline caches for function calls, property accesses, ...

18

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 19

Mapping from optimized to bytecode interpreter frames

Deoptimization on HotSpot VM

f1()

f2()

f3()

f1 inlined f2 inlined f3

physical stack

Java bytecode

frames

Method to Deoptimize

physical stack

Target Methods

match fixed layout

f1() bci 42

f2() bci 7

f3() bci 11

42

7

11

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 20

Mapping from optimized to unoptimized stack frames

Deoptimization on Substrate VM

f1()

f2()

f3()

f1 inlined f2 inlined f3

physical stack

Java bytecode

frames

Method to Deoptimize

physical stack

Target Methods

match match

f1() bci 42

f2() bci 7

f3() bci 11

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Deoptimization on Substrate VM • Source and target are Graal compiled frames

– Both have metadata that describes the layout with respect to JVM specification – Stack frame location of all used local variables and expression stack elements – Source and target describe the same bytecode index (bci), i.e., a matching state

• Source is a fully optimized Graal frame – Method inlining: multiple target frames for one source frame – Escape analysis: virtual objects that are re-allocated during deoptimization – Global value numbering: elimination of duplicate computations

• Targets are Graal frames with limited optimizations – No method inlining: multiple target frames restored when source frame has inlined methods – No escape analysis: all objects are re-allocated during deoptimization – Limited value numbering: only values in Java frame state can be live across a deoptimization entry

point 21

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 22

Example: Graal IR for Deoptimization

public class BasicDeoptTest { static int field; static int proxyNeeded(int x) { field = x * 2; return x * 2; } }

Java source code:

Graal IR for optimized compilation:

Graal IR for compilation with deoptimization entry points:

Two explicit DeoptEntry points

No elimination of second multiplication

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

SystemJava

23

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

SystemJava

• Legacy C code integration – Need a convenient way to access preexisting C functions and structures – Example: libc, database

• Legacy Java code integration – Leverage preexisting Java libraries – "Patch" violations of our reduced Java rules – Example: JDK class library

• Call Java from C code – Entry points into our Java code

New System Java

Code

Preexisting C Code

Preexisting Java Code

Call Java from C

Legacy C Code Integration

Legacy Java Code Integration

24

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

SystemJava vs. JNI • Java Native Interface (JNI)

– Write custom C code to integrate existing C code with Java – C code knows about Java types – Java objects passed to C code using handles

• SystemJava – Write custom Java code to integrate existing C code with Java – Java code knows about C types – No need to pass Java objects to C code

25

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Word type for low-level memory access • Requirements

– Support raw memory access and pointer arithmetic – No extension of the Java programming language – Pointer type modeled as a class to prevent mixing with, e.g., long – Transparent bit width (32 bit or 64 bit) in code using it

• Base interface Word – Looks like an object to the Java IDE, but is a primitive value at run time – Graal does the transformation

• Subclasses for type safety – Pointer: C equivalent void* – Unsigned: C equivalent size_t – Signed: C equivalent ssize_t

26

public static Unsigned strlen(CharPointer str) { Unsigned n = Word.zero(); while (str.read(n) != 0) { n = n.add(1); } return n; }

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Java Annotations to Import C Elements

#include <time.h> @CContext(PosixDirectives.class)

#define CLOCK_MONOTONIC 1

struct timespec { __time_t tv_sec; __syscall_slong_t tv_nsec; };

int* pint;

int** ppint;

@CConstant static native int CLOCK_MONOTONIC();

@CPointerTo(nameOfCType="int") interface CIntPointer extends PointerBase { int read(); void write(int value); }

@CPointerTo(CIntPointer.class) interface CIntPointerPointer ...

-lrt @CLibrary("rt")

@CStruct interface timespec extends PointerBase { @CField long tv_sec(); @CField long tv_nsec(); }

int clock_gettime(clockid_t __clock_id, struct timespec *__tp) @CFunction static native int clock_gettime(int clock_id, timespec tp);

27

static long nanoTime() { timespec tp = StackValue.get(SizeOf.get(timespec.class)); clock_gettime(CLOCK_MONOTONIC(), tp); return tp.tv_sec() * 1_000_000_000L + tp.tv_nsec(); }

Implementation of System.nanoTime() using SystemJava:

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Points-To Analysis

28

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Graal as a Static Analysis Framework • Graal and the hosting Java VM provide

– Class loading (parse the class file) – Access the bytecodes of a method – Access to the Java type hierarchy, type checks – Build a high-level IR graph in SSA form – Linking / method resolution of method calls

• Static points-to analysis and compilation use same intermediate representation – Simplifies applying the analysis results for optimizations

• Goals of points-to analysis – Identify all methods reachable from a root method – Identify the types assigned to each field – Identify all instantiated types

• Fixed point iteration of type flows: Types are propagated from sources (allocations) to usages

29

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

bar

Example Type Flow Graph Object f; void foo() { allocate(); bar(); } Object allocate() { f = new Point() } int bar() { return f.hashCode(); }

putField f

new Point

getField f

obj vcall hashCode

this

allocate

Point.hashCode

[Point]

[Point]

[Point]

f

[Point]

[Point]

Analysis is context insensitive: One type state per field

30

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

bar

Example Type Flow Graph Object f; void foo() { allocate(); bar(); } Object allocate() { f = new Point() } int bar() { return f.hashCode(); }

putField f

new Point

getField f

obj vcall hashCode

this

allocate

Point.hashCode

[Point]

[Point]

[Point, String]

f

[String]

[Point, String]

[Point, String]

this

String.hashCode

Analysis is context insensitive: One type state per field

f = "abc";

31

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Results

32

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 33

Microbenchmark for Startup and Peak Performance (1) function benchmark(n) { var obj = {i: 0, result: 0}; while (obj.i <= n) { obj.result = obj.result + obj.i; obj.i = obj.i + 1; } return obj.result; }

Function benchmark is invoked in a loop by harness (0 to 40000 iterations)

n fixed to 50000 for all iterations

JavaScript VM Version Command Line Flags

Google V8 Version 4.2.27 [none]

Mozilla Spidermonkey Version JavaScript-C45.0a1 [none]

Nashorn JDK 8 update 60 build 1.8.0_60-b27 -J-Xmx256M

Truffle on HotSpot VM graal-js changeset a8947301fd1e from Nov 30, 2015 graal-enterprise changeset f47fff503e49 from Nov 30, 2015

-J-Xmx256M

Truffle on Substrate VM substratevm changeset 45c61d192d43 from Dec 1, 2015 graal-enterprise changeset d8ee392c83e3 from Nov 21, 2015

[none]

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

-101030507090

110130150

0 2 4 6 8 10Mem

ory

Foot

prin

t [M

Byte

]

00.10.20.30.40.50.60.70.80.9

1

0 2 4 6 8 10

Exec

utio

n Ti

me

[Sec

onds

]

Iterations

0123456789

10

0 10000 20000 30000 40000Iterations

0

50

100

150

200

250

0 50 100 150 200

0

0.5

1

1.5

2

2.5

0 50 100 150 200Iterations

34

Microbenchmark for Startup and Peak Performance (2)

Background compilation

Background compilation finished

00.10.20.30.40.50.60.70.8

0

Google V8Mozilla SpidermonkeyNashorn JDK 8u60Truffle on HotSpot VMTruffle on Substrate VM

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Summary • Substrate VM uses a "One Compiler" approach

– For ahead-of-time compilation and dynamic compilation – For all levels: Java, SystemJava, JavaScript, all other Truffle languages – For deoptimization entry points – For static points-to analysis

• Graal is flexible enough to support all these use cases – Snippets for compiler-VM separation – Configuration of phases

35

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Acknowledgements

36

Oracle Danilo Ansaloni Stefan Anzinger Cosmin Basca Daniele Bonetta Matthias Brantner Petr Chalupa Jürgen Christ Laurent Daynès Gilles Duboscq Martin Entlicher Bastian Hossbach Christian Humer Mick Jordan Vojin Jovanovic Peter Kessler David Leopoldseder Kevin Menard Jakub Podlešák Aleksandar Prokopec Tom Rodriguez

Oracle (continued) Roland Schatz Chris Seaton Doug Simon Štěpán Šindelář Zbyněk Šlajchrt Lukas Stadler Codrut Stancu Jan Štola Jaroslav Tulach Michael Van De Vanter Adam Welc Christian Wimmer Christian Wirth Paul Wögerer Mario Wolczko Andreas Wöß Thomas Würthinger

JKU Linz Prof. Hanspeter Mössenböck Benoit Daloze Josef Eisl Thomas Feichtinger Matthias Grimmer Christian Häubl Josef Haider Christian Huber Stefan Marr Manuel Rigger Stefan Rumzucker Bernhard Urban University of Edinburgh Christophe Dubach Juan José Fumero Alfonso Ranjeet Singh Toomas Remmelg LaBRI Floréal Morandat

University of California, Irvine Prof. Michael Franz Gulfem Savrun Yeniceri Wei Zhang Purdue University Prof. Jan Vitek Tomas Kalibera Petr Maj Lei Zhao T. U. Dortmund Prof. Peter Marwedel Helena Kotthaus Ingo Korb University of California, Davis Prof. Duncan Temple Lang Nicholas Ulle University of Lugano, Switzerland Prof. Walter Binder Sun Haiyang Yudi Zheng

Oracle Interns Brian Belleville Miguel Garcia Shams Imam Alexey Karyakin Stephen Kell Andreas Kunft Volker Lanting Gero Leinemann Julian Lettner Joe Nash David Piorkowski Gregor Richards Robert Seilbeck Rifat Shariyar Alumni Erik Eckstein Michael Haupt Christos Kotselidis Hyunjin Lee David Leibs Chris Thalinger Till Westmann

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 37