shadow execution frameworks for llvm ir and javascriptkubitron/courses/... · shadow execution...

1
1. Motivation 2. Overview 6. Evaluation for JavaScript 4. Evaluation for LLVM IR Liang Gong & Cuong Nguyen* * Name are in alphabetical order. Each author makes equal contributions to the project. 5. Techniques and Challenges for JavaScript Original Javascript: var a = b; Simplified Transformed Javascript: var a = J$.W(9, 'a', J$.R(5, 'b', b), a); J$.W = function( … ) { … } J$.R = function( … ) { … } Code Transformation: Jalangi Framework Runtime Code: Different places that may import JavaScript code Inline, between <script> and </script> External file: src a2ribute of a <script> tag HTML event handler a2ribute, such as onclick or onmouseover URL: javascript: protocol Ajax: jQuery.getScript() or importScript() Generated by Script: src_elem.innerHTML = “function(){}” More complex code execution model/environment: JS Code may be triggered at different Kme (page loading, specific event, asynchronous call) Different JS Context We add the following hooks: Binary OperaKon Unary OperaKon Variable Read/Write Field Get/Put FuncKon/Method Call/Enter/Return Script Enter/Return Object/FuncKon/Regexp/Array Literal CondiKon/Switch With hooks you can do: Debugging Performance analysis Monitoring dynamic behaviors Record and replay Almost any dynamic analysis Detect a NaN in AWer diagnosing, confirm that is a bug. Check NaN Bug [ < 100 Loc] [object Object].d <= undefined c <= undefined [object Object].refcount <= NaN Found interesting operations in the following websites’ homepages Simply loading the Facebook homepage, the analysis detects hundreds of this kind of interesKng operaKons. Operating uninitialized variable 0 10 20 30 40 50 60 70 Vector Chart Bitmap Game Vector Chart Mobile HTML5 CharKng Mobile HTML5 Gaming AWer TransformaKon Before TransformaKon Slow down is not obvious Demo Available Shadow Execution Frameworks for LLVM IR and JavaScript Motivation Lesson learnt: code instrumentation/transformation and computation of extra information are foundation to dynamic analysis. Goal: minimize the efforts to build dynamic analysis by generalizing this approach with high scalability. Contributions Design and implement an LLVM Shadow Execution System. Based on instrumentation, concolic execution and shadow execution. Perform instrumentation at the instruction level. The analysis is heavyweight but powerful. Support multiple languages: C/C++, Java, OpenCL, Fortran, etc. Design and implement a JavaScript Dynamic Analysis Framework. JavaScript is the most popular programming language for client-side web programming. Facilitate analyzing any real-world webpages. Analysis module plug in and remove on the fly. Powerful analysis interaction based on web console. Detect real-world bugs using the framework. Shadow Execution Associate a shadow value with each concrete value. Carry useful information: error propagation, array bound checking, symbolic value, etc. Perform side-by-side with the concrete execution. Concrete execution assists shadow execution. Challenges Modeling dynamic allocation and pointer arithmetic in shadow execution. Handle all tricky C pointer arithmetic (castings, pointer arithmetic, function pointers, etc.). Array/struct and their combinations. Handling uninterpreted calls. With/without side effects + return arbitrary types (pointer, array, struct). Model malloc. Achieving scalability: fast variable lookup and lazy construction of objects. Evaluation of the Core Interpreter Robustness: interpret 50 commonly used Linux core utilities (rm/ls/mkdir …). Check for each memory write that the interpretation and concrete execution writes the same value. Check at each branch that two executions do not diverge. Performance: Evaluation of the Usability Implement an array-out-of-bound analysis (< 300 LoC). Can trace back to the root causes of the problem. Differentiate harmful and benign bugs. Demo is available. Implement a NaN detection analysis (in progress). Portable to: [object Object].now = end – start; NaN “30% 0” Core Interpreter (Resemble a compiler) Front end: instrument all 57 LLVM instructions + compute stack frame size for local/global variables + create fast lookup indices for variables. Back end: re-interpret all 57 LLVM instructions with shadow execution + model heap/stack and runtime environment. 2x in average 4x in the worst case Input: target application under analysis. Output: instrumented/transformed codes with hooks. Extensibility: override hooks to perform additional analysis. %a = alloca double %b = alloca double %1 = load %a %2 = load %b %3 = add %1 %2 LLVM bitcode Instrumenter (LLVM Pass) Callback Interpreter (Dynamically loaded/ Perform analysis) my_alloca(double) %a = alloca double my_load(%my_a) %1 = load %a my_add(%my_1, %my_2) %3 = add %1 %2 LLVM bitcode Analysis Result 3. Techniques and Challenges for LLVM IR LLVM Shadow Execution Framework Interesting Analysis Applications Check NaN: Find a bug in jQuery1.8.3 Find meaningless(maybe) operations Hack Frontend Code: Modify frontend dynamic effects For program understanding purposes Graphviz online AttackMap Generate Runtime Call Graph Analyze Performance Issue: Function or operation counter Check Polymorphic getField Statements: Function or operation counter

Upload: others

Post on 27-May-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Shadow Execution Frameworks for LLVM IR and JavaScriptkubitron/courses/... · Shadow Execution Frameworks for LLVM IR and JavaScript Motivation! ... • Handle all tricky C pointer

Berkeley Parlab

1. Motivation

2. Overview

6. Evaluation for JavaScript 4. Evaluation for LLVM IR

Liang Gong & Cuong Nguyen* * Name are in alphabetical order. Each author makes equal contributions to the project.

5. Techniques and Challenges for JavaScript

Original Javascript: var a = b; Simplified Transformed Javascript: var a = J$.W(9, 'a', J$.R(5, 'b', b), a);

J$.W = function( … ) { … } J$.R = function( … ) { … }

Code Transformation:

Jalangi Framework Runtime Code:

Different places that may import JavaScript code •  Inline,  between  <script> and   </script> •  External  file:  src a2ribute  of  a    <script> tag  •  HTML  event  handler  a2ribute,  such  as  onclick  or onmouseover •  URL:  javascript: protocol •  Ajax:  jQuery.getScript()  or    importScript() •  Generated  by  Script:  src_elem.innerHTML = “function(){}”

More complex code execution model/environment: •  JS  Code  may  be  triggered  at  different  Kme    (page  loading,  specific  event,  asynchronous  call)  •  Different  JS  Context  

We add the following hooks: •  Binary  OperaKon  •  Unary  OperaKon  •  Variable  Read/Write  •  Field  Get/Put  •  FuncKon/Method  Call/Enter/Return  •  Script  Enter/Return  •  Object/FuncKon/Regexp/Array  Literal  •  CondiKon/Switch  

With hooks you can do: •  Debugging  •  Performance  analysis  •  Monitoring  dynamic  behaviors  •  Record  and  replay  •  Almost  any  dynamic  analysis  

Detect  a  NaN  in   AWer  diagnosing,  confirm  that  is  a  bug.

Check NaN Bug [ < 100 Loc]

[object Object].d <= undefined c <= undefined [object Object].refcount <= NaN

Found interesting operations in the following websites’ homepages

Simply  loading  the  Facebook  homepage,  the  analysis  detects  hundreds  of  this  kind  of  interesKng  operaKons.

Operating uninitialized variable

0  

10  

20  

30  

40  

50  

60  

70  

Vector  Chart   Bitmap  Game   Vector  Chart   Mobile  HTML5  CharKng  

Mobile  HTML5  Gaming  

AWer  TransformaKon  Before  TransformaKon  

Slow  down  is  not  obvious

Demo Available

Shadow Execution Frameworks for LLVM IR and JavaScript

Motivation •  Lesson learnt: code instrumentation/transformation and

computation of extra information are foundation to dynamic analysis.

•  Goal: minimize the efforts to build dynamic analysis by generalizing this approach with high scalability.

Contributions •  Design and implement an LLVM Shadow Execution System.

•  Based on instrumentation, concolic execution and shadow execution. •  Perform instrumentation at the instruction level. The analysis is

heavyweight but powerful. •  Support multiple languages: C/C++, Java, OpenCL, Fortran, etc.

•  Design and implement a JavaScript Dynamic Analysis Framework. •  JavaScript is the most popular programming language for client-side web programming. •  Facilitate analyzing any real-world webpages. •  Analysis module plug in and remove on the fly. •  Powerful analysis interaction based on web console. •  Detect real-world bugs using the framework.

Shadow Execution •  Associate a shadow value with each concrete value. •  Carry useful information: error propagation, array

bound checking, symbolic value, etc. •  Perform side-by-side with the concrete execution.

Concrete execution assists shadow execution.

Challenges •  Modeling dynamic allocation and pointer

arithmetic in shadow execution. •  Handle all tricky C pointer arithmetic

(castings, pointer arithmetic, function pointers, etc.).

•  Array/struct and their combinations. •  Handling uninterpreted calls. •  With/without side effects + return

arbitrary types (pointer, array, struct). •  Model malloc.

•  Achieving scalability: fast variable lookup and lazy construction of objects.

Evaluation of the Core Interpreter •  Robustness: interpret 50 commonly used Linux core utilities

(rm/ls/mkdir …). •  Check for each memory write that the interpretation and

concrete execution writes the same value. •  Check at each branch that two executions do not diverge.

•  Performance:

Evaluation of the Usability •  Implement an array-out-of-bound analysis (< 300 LoC). •  Can trace back to the root causes of the problem. •  Differentiate harmful and benign bugs. •  Demo is available.

•  Implement a NaN detection analysis (in progress).

Portable  to:

[object Object].now = end – start; NaN “30% 0”

Core Interpreter (Resemble a compiler) •  Front end: instrument all 57 LLVM instructions + compute stack frame

size for local/global variables + create fast lookup indices for variables. •  Back end: re-interpret all 57 LLVM instructions with shadow execution

+ model heap/stack and runtime environment.

2x in average 4x in the worst case

Input: target application under analysis. Output: instrumented/transformed codes with hooks. Extensibility: override hooks to perform additional analysis.

%a = alloca double %b = alloca double %1 = load %a %2 = load %b %3 = add %1 %2

LLVM bitcode

Instrumenter (LLVM Pass)

Callback Interpreter (Dynamically loaded/

Perform analysis)

my_alloca(double) %a = alloca double

… my_load(%my_a) %1 = load %a

… my_add(%my_1, %my_2) %3 = add %1 %2

LLVM bitcode

Analysis Result

3. Techniques and Challenges for LLVM IR

LLVM Shadow Execution Framework Interesting Analysis Applications

Check NaN:• Find  a  bug  in  jQuery-­‐1.8.3• Find  meaningless(maybe)  operations

Hack Frontend Code:• Modify  frontend  dynamic  effects• For  program  understanding  purposes

• Graphviz online• AttackMap

Generate Runtime Call Graph

Analyze Performance Issue:• Function  or  operation  counterCheck Polymorphic getField Statements:• Function  or  operation  counter