shadow execution frameworks for llvm ir and javascriptkubitron/courses/... · shadow execution...

Berkeley Parlab

1. Motivation

2. Overview

6. Evaluation for JavaScript 4. Evaluation for LLVM IR

Liang Gong & Cuong Nguyen* * Name are in alphabetical order. Each author makes equal contributions to the project.

5. Techniques and Challenges for JavaScript

Original Javascript: var a = b; Simplified Transformed Javascript: var a = J$.W(9, 'a', J$.R(5, 'b', b), a);

J$.W = function( … ) { … } J$.R = function( … ) { … }

Code Transformation:

Jalangi Framework Runtime Code:

Different places that may import JavaScript code •  Inline, between <script> and </script> •  External file: src a2ribute of a <script> tag •  HTML event handler a2ribute, such as onclick or onmouseover •  URL: javascript: protocol •  Ajax: jQuery.getScript() or importScript() •  Generated by Script: src_elem.innerHTML = “function(){}”

More complex code execution model/environment: •  JS Code may be triggered at different Kme (page loading, specific event, asynchronous call) •  Different JS Context

We add the following hooks: •  Binary OperaKon •  Unary OperaKon •  Variable Read/Write •  Field Get/Put •  FuncKon/Method Call/Enter/Return •  Script Enter/Return •  Object/FuncKon/Regexp/Array Literal •  CondiKon/Switch

With hooks you can do: •  Debugging •  Performance analysis •  Monitoring dynamic behaviors •  Record and replay •  Almost any dynamic analysis

Detect a NaN in AWer diagnosing, confirm that is a bug.

Check NaN Bug [ < 100 Loc]

[object Object].d <= undefined c <= undefined [object Object].refcount <= NaN

Found interesting operations in the following websites’ homepages

Simply loading the Facebook homepage, the analysis detects hundreds of this kind of interesKng operaKons.

Operating uninitialized variable

0

10

20

30

40

50

60

70

Vector Chart Bitmap Game Vector Chart Mobile HTML5 CharKng

Mobile HTML5 Gaming

AWer TransformaKon Before TransformaKon

Slow down is not obvious

Demo Available

Shadow Execution Frameworks for LLVM IR and JavaScript

Motivation •  Lesson learnt: code instrumentation/transformation and

computation of extra information are foundation to dynamic analysis.

•  Goal: minimize the efforts to build dynamic analysis by generalizing this approach with high scalability.

Contributions •  Design and implement an LLVM Shadow Execution System.

•  Based on instrumentation, concolic execution and shadow execution. •  Perform instrumentation at the instruction level. The analysis is

heavyweight but powerful. •  Support multiple languages: C/C++, Java, OpenCL, Fortran, etc.

•  Design and implement a JavaScript Dynamic Analysis Framework. •  JavaScript is the most popular programming language for client-side web programming. •  Facilitate analyzing any real-world webpages. •  Analysis module plug in and remove on the fly. •  Powerful analysis interaction based on web console. •  Detect real-world bugs using the framework.

Shadow Execution •  Associate a shadow value with each concrete value. •  Carry useful information: error propagation, array

bound checking, symbolic value, etc. •  Perform side-by-side with the concrete execution.

Concrete execution assists shadow execution.

Challenges •  Modeling dynamic allocation and pointer

arithmetic in shadow execution. •  Handle all tricky C pointer arithmetic

(castings, pointer arithmetic, function pointers, etc.).

•  Array/struct and their combinations. •  Handling uninterpreted calls. •  With/without side effects + return

arbitrary types (pointer, array, struct). •  Model malloc.

•  Achieving scalability: fast variable lookup and lazy construction of objects.

Evaluation of the Core Interpreter •  Robustness: interpret 50 commonly used Linux core utilities

(rm/ls/mkdir …). •  Check for each memory write that the interpretation and

concrete execution writes the same value. •  Check at each branch that two executions do not diverge.

•  Performance:

Evaluation of the Usability •  Implement an array-out-of-bound analysis (< 300 LoC). •  Can trace back to the root causes of the problem. •  Differentiate harmful and benign bugs. •  Demo is available.

•  Implement a NaN detection analysis (in progress).

Portable to:

[object Object].now = end – start; NaN “30% 0”

Core Interpreter (Resemble a compiler) •  Front end: instrument all 57 LLVM instructions + compute stack frame

size for local/global variables + create fast lookup indices for variables. •  Back end: re-interpret all 57 LLVM instructions with shadow execution

+ model heap/stack and runtime environment.

2x in average 4x in the worst case

Input: target application under analysis. Output: instrumented/transformed codes with hooks. Extensibility: override hooks to perform additional analysis.

%a = alloca double %b = alloca double %1 = load %a %2 = load %b %3 = add %1 %2

…

LLVM bitcode

Instrumenter (LLVM Pass)

Callback Interpreter (Dynamically loaded/

Perform analysis)

my_alloca(double) %a = alloca double

… my_load(%my_a) %1 = load %a

… my_add(%my_1, %my_2) %3 = add %1 %2

…

LLVM bitcode

Analysis Result

3. Techniques and Challenges for LLVM IR

LLVM Shadow Execution Framework Interesting Analysis Applications

Check NaN:• Find a bug in jQuery-‐1.8.3• Find meaningless(maybe) operations

Hack Frontend Code:• Modify frontend dynamic effects• For program understanding purposes

• Graphviz online• AttackMap

Generate Runtime Call Graph

Analyze Performance Issue:• Function or operation counterCheck Polymorphic getField Statements:• Function or operation counter

shadow execution frameworks for llvm ir and javascriptkubitron/courses/... · shadow execution...

Documents