phosphor: illuminating dynamic data flow in commodity jvms

Post on 01-Jul-2015

361 Views

Category:

Science

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

Dynamic taint analysis is a well-known information flow analysis problem with many possible applications. Taint tracking allows for analysis of application data flow by assigning labels to data, and then propagating those labels through data flow. Taint tracking systems traditionally compromise among performance, precision, soundness, and portability. Performance can be critical, as these systems are often intended to be deployed to production environments, and hence must have low overhead. To be deployed in security-conscious settings, taint tracking must also be sound and precise. Dynamic taint tracking must be portable in order to be easily deployed and adopted for real world purposes, without requiring recompilation of the operating system or language interpreter, and without requiring access to application source code. We present Phosphor, a dynamic taint tracking system for the Java Virtual Machine (JVM) that simultaneously achieves our goals of performance, soundness, precision, and portability. Moreover, to our knowledge, it is the first portable general purpose taint tracking system for the JVM. We evaluated Phosphor's performance on two commonly used JVM languages (Java and Scala), on two successive revisions of two commonly used JVMs (Oracle's HotSpot and OpenJDK's IcedTea) and on Android's Dalvik Virtual Machine, finding its performance to be impressive: as low as 3% (53% on average; 220% at worst) using the DaCapo macro benchmark suite.

TRANSCRIPT

@_jon_bell_ October 22, 2014OOPSLA 2014

Jonathan Bell and Gail Kaiser Columbia University, New York, NY USA

Fork me on GithubPhosphor: Illuminating Dynamic Data Flow in Commodity JVMs

@_jon_bell_ October 22, 2014OOPSLA 2014

Dynamic Data Flow Analysis: Taint Tracking

ApplicationInputs Outputs

Flagged (“Tainted”)Input

Output that is derivedfrom tainted input

@_jon_bell_ October 22, 2014OOPSLA 2014

Taint Tracking: Applications

• End-user privacy testing: Does this application send my personal data to remote servers?

• SQL injection attack avoidance: Do SQL queries contain raw user input

• Debugging: Which inputs are relevant to the current (crashed) application state?

@_jon_bell_ October 22, 2014OOPSLA 2014

Qualities of a Successful Analysis

@_jon_bell_ October 22, 2014OOPSLA 2014

Soundness No data leakage!

@_jon_bell_ October 22, 2014OOPSLA 2014

Precision Data is tracked with the right tag

@_jon_bell_ October 22, 2014OOPSLA 2014

Performance Minimal slowdowns

@_jon_bell_ October 22, 2014OOPSLA 2014

Portability No special hardware, OS, or JVM

@_jon_bell_ October 22, 2014OOPSLA 2014

“Normal” Taint Tracking• Associate tags with data, then propagate the tags• Approaches:

• Operating System modifications [Vandebogart ’07], [Zeldovich ’06]

• Language interpreter modifications [Chandra ’07], [Enck ’10], [Nair ’07], [Son ’13]

• Source code modifications [Lam ‘06], [Xu ’06]

• Binary instrumentation of applications [Clause ’07], [Cheng ’06], [Kemerlis ’12]Hard to be sound, precise, and performant

Not portable

@_jon_bell_ October 22, 2014OOPSLA 2014

Phosphor• Leverages benefits of interpreter-based

approaches (information about variables) but fully portably

• Instruments all byte code that runs in the JVM (including the JRE API) to track taint tags

• Add a variable for each variable

• Adds propagation logic

@_jon_bell_ October 22, 2014OOPSLA 2014

Key contribution:How do we efficiently store meta-data

for every variable without modifying the JVM itself?

@_jon_bell_ October 22, 2014OOPSLA 2014

JVM Type Organization• Primitive Types

• int, long, char, byte, etc.

• Reference Types

• Arrays, instances of classes

• All reference types are assignable to java.lang.Object

Phosphor’s taint tag storageLocal

variableMethod

argumentReturn value

Operand stack Field

Object Stored as a field of the object

Object array Stored as a field of each object

Primitive

Primitive array

Shadow variable

Shadow array

variable

Shadow argument

Shadow array

argument

"Boxed"

"Boxed"

Below the value on stack

Array below value on stack

Shadow field

Shadow array field

@_jon_bell_ October 22, 2014OOPSLA 2014

@_jon_bell_ October 22, 2014OOPSLA 2014

Taint Propagation• Modify all byte code instructions to be taint-aware

by adding extra instructions

• Examples:

• Arithmetic -> combine tags of inputs

• Load variable to stack -> Also load taint tag to stack

• Modify method calls to pass taint tags

@_jon_bell_ October 22, 2014OOPSLA 2014Two big problems

@_jon_bell_ October 22, 2014OOPSLA 2014

Challenge 1: Type Mayhem

java.lang.ObjectPrimitive Types

Instances of classes (Objects) Primitive Arrays

instanceof instanceofAlways has

extra variable!

Always has extra variable

Sometimes has extra variable!

Never has extra variable!

@_jon_bell_ October 22, 2014OOPSLA 2014

Challenge 1: Type Mayhem

Solution 1: Box taint tag with array when we lose type information

byte[] array = new byte[5];Object ret = array;return ret;

int[] array_tag = new int[5];byte[] array = new byte[5];

Object ret = new TaintedByteArray(array_tag,array);

@_jon_bell_ October 22, 2014OOPSLA 2014

Challenge 2: Native Code

We can’t instrument everything!

@_jon_bell_ October 22, 2014OOPSLA 2014

Challenge 2: Native Codepublic int hashCode() { return super.hashCode() * field.hashCode();}

public native int hashCode();

Solution: Wrappers. Rename every method, and leave a wrapper behind

@_jon_bell_ October 22, 2014OOPSLA 2014

Challenge 2: Native Codepublic int hashCode() { return super.hashCode() * field.hashCode();}

public native int hashCode();

public TaintedInt hashCode$$wrapper() {return new TaintedInt(0, hashCode());

}

Solution: Wrappers. Rename every method, and leave a wrapper behind

@_jon_bell_ October 22, 2014OOPSLA 2014

Challenge 2: Native CodeWrappers work both ways: native code can still call a

method with the old signature

public int[] someMethod(byte in)

@_jon_bell_ October 22, 2014OOPSLA 2014

Challenge 2: Native CodeWrappers work both ways: native code can still call a

method with the old signature

public int[] someMethod(byte in)

public TaintedIntArray someMethod$$wrapper(int in_tag, byte in){ //The original method "someMethod", but with taint tracking}

@_jon_bell_ October 22, 2014OOPSLA 2014

Challenge 2: Native CodeWrappers work both ways: native code can still call a

method with the old signature

public int[] someMethod(byte in){ return someMethod$$wrapper(0, in).val;}

public TaintedIntArray someMethod$$wrapper(int in_tag, byte in){ //The original method "someMethod", but with taint tracking}

@_jon_bell_ October 22, 2014OOPSLA 2014

Challenge 2: Native CodeWrappers work both ways: native code can still call a

method with the old signature

public int[] someMethod(byte in){ return someMethod$$wrapper(0, in).val;}

public TaintedIntArray someMethod$$wrapper(int in_tag, byte in){ //The original method "someMethod", but with taint tracking}

@_jon_bell_ October 22, 2014OOPSLA 2014

Design Limitations

• Tracking through native code

• Return value’s tag becomes combination of all parameters (heuristic); not found to be a problem in our evaluation

• Tracks explicit data flow only (not through control flow)

@_jon_bell_ October 22, 2014OOPSLA 2014

Evaluation

• Soundness & Precision

• Performance

• Portability

@_jon_bell_ October 22, 2014OOPSLA 2014

Soundness & Precision

• DroidBench - series of unit tests for Java taint tracking

• Passed all except for implicit flows (intended behavior)

@_jon_bell_ October 22, 2014OOPSLA 2014

Performance

• Macrobenchmarks (DaCapo, Scalabench)

• Microbenchmarks

• Versus TaintDroid [Enck, 2010] on CaffeineMark

• Versus Trishul [Nair, 2008] on JavaGrande

@_jon_bell_ October 22, 2014OOPSLA 2014

0% 50% 100% 150% 200% 250% tmt

specs scalaxb

scalatest scalariform

scalap scaladoc

kiama factorie apparat actors xalan

tradesoap tradebeans

tomcat sunflow

pmd lusearch luindex jython

h2 fop

eclipse batik

avrora

Relative Runtime Overhead

scalab

ench

daca

po

Phosphor Relative Runtime Overhead (Hotspot 7)

Average: 53.3%

Macrobenchmarks

@_jon_bell_ October 22, 2014OOPSLA 2014

0% 100% 200% 300% 400% 500% 600% 700% 800% 900% tmt

specs scalaxb

scalatest scalariform

scalap scaladoc

kiama factorie apparat actors xalan

tradesoap tradebeans

tomcat sunflow

pmd lusearch luindex jython

h2 fop

eclipse batik

avrora

Relative Memory Overhead

scalab

ench

daca

po

Phosphor Relative Memory Overhead (Hotspot 7)

Average: 270.9%

Macrobenchmarks

@_jon_bell_ October 22, 2014OOPSLA 2014

MicrobenchmarksPhosphor (Hotspot 7) and Trishul Relative Overhead

0% 20% 40% 60% 80% 100% 120%

Serial

Method

Math

Loop

Exception

Create

Cast

Assign

Arithmetic

Relative Runtime Overhead

Phoshpor Trishul

@_jon_bell_ October 22, 2014OOPSLA 2014

MicrobenchmarksPhosphor (Hotspot 7) and Trishul Relative Overhead

0% 20% 40% 60% 80% 100% 120%

Serial

Method

Math

Loop

Exception

Create

Cast

Assign

Arithmetic

Relative Runtime Overhead

Phoshpor Trishul

0% 20% 40% 60% 80% 100% 120%

Serial

Method

Math

Loop

Exception

Create

Cast

Assign

Arithmetic

Relative Runtime Overhead

Phoshpor Trishul Kaffe

@_jon_bell_ October 22, 2014OOPSLA 2014

Microbenchmarks: Taintdroid

• Taintdroid: Taint tracking for Android’s Dalvik VM [Enck, 2010]

• Not very precise: one tag per array (not per array element!)

• Applied Phosphor to Android!

@_jon_bell_ October 22, 2014OOPSLA 2014

0% 50% 100% 150% 200%

Float

Logic

Loop

Method

Sieve

String Buffer

Relative Runtime Overhead

Taintdroid 4.3 Phosphor (Android 4.3)

MicrobenchmarksPhosphor and Taintdroid Relative Overhead

Array-heavy benchmarks

@_jon_bell_ October 22, 2014OOPSLA 2014

PortabilityJVM Version(s) Success?

Oracle (Hotspot) 1.7.0_45, 1.8.0_0 Yes

OpenJDK 1.7.0_45, 1.8.0_0 Yes

Android Dalvik 4.3.1 Yes

Apache Harmony 6.0M3 Yes

Kaffe VM 1.1.9 Yes

Jikes RVM 3.1.3 No, but may be possible with more work

@_jon_bell_ October 22, 2014OOPSLA 2014

Future Work & Extension

• This is a general approach for tracking metadata with variables in unmodified JVMs

• Could track any sort of data in principle

• We have already extended this approach to track path constraints on inputs

@_jon_bell_ October 22, 2014OOPSLA 2014

Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

https://github.com/Programming-Systems-Lab/Phosphor

Jonathan Bell and Gail Kaiser Columbia University

jbell@cs.columbia.edu @_jon_bell_

Fork me on Github

Consist

ent *Complete *

Well D

ocumented*Easyt

oR

euse* *

Evaluated*

OOPSLA*

Artifact *

AEC

top related