scala e xchange 2013 haoyi li on metascala a tiny diy jvm

33
Metascala A tiny DIY JVM https://github.com/lihaoyi/Metascala Li Haoyi [email protected] Scala Exchange 2nd Dec 2013

Upload: skills-matter

Post on 13-May-2015

1.087 views

Category:

Technology


0 download

DESCRIPTION

Metascala is a tiny metacircular Java Virtual Machine (JVM) written in the Scala programming language. Metascala is barely 3000 lines of Scala, and is complete enough that it is able to interpret itself metacircularly. Being written in Scala and compiled to Java bytecode, the Metascala JVM requires a host JVM in order to run. The goal of Metascala is to create a platform to experiment with the JVM: a 3000 line JVM written in Scala is probably much more approachable than the 1,000,000 lines of C/C++ which make up HotSpot, the standard implementation, and more amenable to implementing fun features like continuations, isolates or value classes. The 3000 lines of code gives you: The bytecode interpreter, together with all the run-time data structures A stack-machine to SSA register-machine bytecode translator A custom heap, complete with a stop-the-world, copying garbage collector Implementations of parts of the JVM's native interface Although it is far from a complete implementation, Metascala already provides the ability to run untrusted bytecode securely (albeit slowly), since every operation which could potentially cause harm (including memory allocations and CPU usage) is virtualized and can be controlled. Ongoing work includes tightening of the security guarantees, improving compatibility and increasing performance. ENJOYIN

TRANSCRIPT

Page 1: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

MetascalaA tiny DIY JVM

https://github.com/lihaoyi/Metascala

Li [email protected]

Scala Exchange2nd Dec 2013

Page 2: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Who am I?

Li Haoyi

Write Python during the day

Write Scala at night

Page 3: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

What is Metascala?

● A JVM

● in 3000 lines of Scala

● Which can load & interpret java programs

● And can interpret itself!

Page 4: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Size comparison

● Metascala: ~3,000 lines

● Avian JVM: ~80,000 lines

● OpenJDK: ~1,000,000 lines

Page 5: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Basic Usage

Closure’s class file is given to VM to load/parse/execute

Result is extracted from VM into host environment

Captured variables are serialized into VM’s environment

Create a new metascala VMPlain Old Java Object

Any other classes necessary to evaluate the closure are loaded from the current ClasspathNo global state

Page 6: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

It’s Metacircular! Need to give the outer VM more than the 1mb default heap

Simpler program avoids initializing the scala/java std libraries, which takes forever under double-interpretation.

Takes a while (~10s) to produce result

VM inside a VM!

Page 7: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Limitations

● Single-threaded

● Limited IO

● Slowww

Page 8: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Performance Comparison

● OpenJDK: 1.0x

● Metascala: ~100x

● Meta-Metascala: ~10000x

Page 9: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Why Metascala?

● Fun to explore the innards of the JVM

● An almost-fully secure Java runtime!

● Small size makes fiddling fun

Page 10: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Why Metascala?

● Fun to explore the innards of the JVM

● An almost-fully secure Java runtime!

● Small size makes fiddling fun

Page 11: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Quick TourImmutable Defs: ~380 loc

Native Bindings: ~650 loc

Bytecode SSA transform: ~650 loc

Runtime data structures: 820 loc

Binary heap & Copying GC: 132 loc

DIY scala-pickling: 132 loc

“This is a VM”: 243 loc

Page 12: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Quick Tour: Tests

Tests for basic Java features

GC Fuzz-tests

Test Metacircularity!

Scala std lib usage

Page 13: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

What’s a Heap?

Fig 1. A Heap

Page 14: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

What’s a Garbage Collector?Blit (copy) all roots to new heap

Scan the already copied things for more things and copy them too

Stop when you’ve scanned everything

Not pseudocode

Page 15: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Why Metascala?

● Fun to explore the innards of the JVM

● An almost-fully secure Java runtime!

● Small size makes fiddling fun

Page 16: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Limited Instruction Count

Page 17: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

And Limited Memory!

Not an OOM Error!We throw this ourselves

Page 18: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Explicitly defined capabilities

Every external call has to be explicitly defined and enabled

Page 19: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Security Characteristics

● Finite instruction count● Finite memory● Well-defined interface to outside world● Doesn’t rely on Java Security Model at all!

● Still some holes…

Page 20: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Security Holes

● Classloader can read from anywhere● Time spent classloading not accounted● Memory spent on classes not accounted● GC time not accounted● “native” methods’ time/memory not

accounted

Page 21: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Basic Problem

User code resource consumption is bounded

VM’s runtime resource usage can be made to grow arbitrarily large

User Code

Classes

Runtime Data Structures

Native method calls

Garbage Collector

Outside World

Page 22: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Possible Solution

Put a VM Inside a VM!

Works,

... but 10000x slowdown

Outside World

User Code

Classes

Runtime Data Structures

Native method calls

Garbage Collector

Outside WorldFixed Unaccounted Costs

Page 23: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Another Possible Solution

Move more components intovirtual runtime

Difficult to bootstrap correctly

WIP

Outside World

User Code

Classes

Runtime Data Structures

Native method calls

Garbage Collector

Page 24: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Why Metascala?

● Fun to explore the innards of the JVM

● An almost-fully secure Java runtime!

● Small size makes fiddling fun

Page 25: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Live Demo

Page 26: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Ugliness

● Long compile times

● Nasty JVM Interface

● Impossible Debugging

Page 27: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Long compile times

● 100 lines/s

● Twice as slow (50 lines/s) on my older machine!

Page 28: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Real World

Nasty JVM Interface

Ideal World

User Code

Std Library

VM

Initialized

Std Library

User Code

VM

Initialized

Nasty Language VM Interface

Lazy-Initialization means repeated dives back into lib/native code

Clean Interfaces

Linear Initialization

Page 29: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Java’s dirty little secret

WTF! I’d never use these things!

The Verbosity of Java with the Safety of C

Page 30: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

You probably do

Almost every Java program ever uses these things.

What happens if you don’t have them

Page 31: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Next Steps

● Maximize correctness○ Implement Threads & IO○ Fix bugs (GC, native calls, etc.)

● Solidify security characteristics○ Still plenty of unaccounted-for memory/processing○ Some can be hosted “within” VM itself

● Simplify Std-Lib/VM interface○ Try using Android Std Lib?

Page 32: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Possible Experiments

● Native codegen instead of an interpreter?○ Generate/exec native code through JNI○ Heap is already a binary blob that can be easily

passed to native code● Bytecode transforms and optimizations?

○ Already in SSA form● Continuations, Isolates, Value Classes?● Port the whole thing to Scala.Js?

Page 33: Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

Metascala: a tiny DIY JVM

Ask me about:● Single Static Assignment form● Copying Garbage Collection● sun.misc.Unsafe● Warts of the .class file format● Capabilities-based security● Abandoned approaches